Case Study — Finger Lakes Daily News

When Pharma Spammers Try to Poison a Local News Site

How we caught and killed an SEO attack that was weaponizing WordPress's own RSS feeds—in a single afternoon.

4,080
Peak Crawl Spike
vs ~200/day baseline
14 days
Attack Cadence
Jan 27 → Apr 7, +/-0 days
410 Gone
Response Now Served
Was 200 OK before
0
Legitimate Feeds Broken
Verified URL-by-URL

The Signal

Something Was Telling Google to Come Back for More

Google Search Console's feed-crawler report showed a strange pattern: every two weeks, crawl requests against Finger Lakes Daily News jumped from ~200/day to 2,000–4,000/day, then returned to baseline. Like clockwork—Jan 27, Feb 10, Feb 24, Mar 10, Mar 24, Apr 7. Exactly 14 days apart.

Normal crawl behavior doesn't spike on a calendar. Something was pulling Googlebot back on a schedule.

Feed-Crawler Requests — Google Search Console (Jan 20 – Apr 18, 2026)
GSC feed-crawler requests chart showing six biweekly spikes between 2,000 and 4,000 requests between late January and early April, with a ~200/day baseline.
Why it matters: Crawl-budget anomalies are usually the first visible symptom of an SEO attack—long before polluted results appear in SERPs. Most sites never look. We only saw it because our overnight review pipeline reads the GSC feed-crawler report into BigQuery every morning.

The Investigation

Three Problems in the URL List — One of Them an Active Attack

We pulled the list of URLs Google's feed crawler was hitting. The pattern was immediately obvious—and it wasn't just one thing going wrong.

1. Pharma Spam Injection
Dozens of crafted URLs hitting /search/<pharma-query>/feed/rss2/. WordPress turns any search query into an indexable RSS feed. Attackers link to those URLs from compromised sites, Google crawls them, and pharmacy spam ends up indexed under our domain.
Active, ongoing — P0
📋
2. Low-Value Feed Surface
Thousands of crawls against /tag/*/feed, /author/…/page/850/feed, /category/local/page/559/feed, and /ad_block/…/feed/—archive pages with no unique content, burning crawl budget that should be going to actual journalism.
High — crawl waste
🛠
3. Slugification Bug
Thousands of tag slugs starting with 8216 — the HTML entity number for a left-single-quote (&#8216;). The RSS-import pipeline was slugifying article titles without decoding HTML entities first, generating a trail of broken tag pages Googlebot kept revisiting.
Medium — data hygiene

The Fix

410 Gone, at the Origin, Before Anything Else Runs

Two commits to the theme's SEO module—roughly 70 lines of PHP. The fix had to neutralize the active attack without breaking the feeds that actual readers and syndication partners depend on.

1. Hard-fail the spam vector

A template_redirect hook at priority -1 intercepts /search/*/feed/, /tag/*/feed/, /author/*/feed/, and deep-pagination archive feeds before WordPress tries to render them. Returns 410 Gone with X-Robots-Tag: noindex, nofollow. 410 is the strongest signal we can send Google: this URL existed, it's gone, stop asking.

2. Seal it off at robots.txt

Five new Disallow: rules covering tag feeds, author feeds, category deep-pagination feeds, the /ad_block/ taxonomy leak, and generic paginated feeds. Belt-and-suspenders: 410 handles already-indexed URLs; robots.txt prevents new crawlers from hitting them in the first place.

Careful carve-outs. The legitimate feeds real readers use—the main site feed, single-post comment feeds, top-level category feeds, the Google News sitemap—are all untouched. Verified URL-by-URL on production after the deploy.
// Intercept spam-vector and low-value feeds before WordPress rewrites kick in. add_action( 'template_redirect', 'fldn_seo_gone_low_value_feeds', -1 ); function fldn_seo_gone_low_value_feeds() { $path = strtok( $_SERVER['REQUEST_URI'], '?' ); if ( 0 === strpos( $path, '/search/' ) && strpos( $path, '/feed' ) !== false ) { status_header( 410 ); header( 'X-Robots-Tag: noindex, nofollow' ); exit; } if ( preg_match( '#^/(tag|author)/[^/]+/(feed|page/\d+/feed)/?$#', $path ) ) { status_header( 410 ); exit; } }

Verification

What Changed, What Didn't

Measured on production after CF cache purge. Every URL confirmed with curl -sI.

URLBeforeAfter
/search/cialis/feed/rss2/200 OK (served spam)410 Gone
/author/lucas/feed/200 OK410 Gone
/tag/geneva/feed/301 → homepage410 Gone
/category/local/page/559/feed200 (deep pagination)301 → top-level feed
/feed/ (home)200200 ✓
/<slug>/feed/ (single-post comments)200200 ✓
/news-sitemap.xml200200 ✓
/local/feed/ (top-level category)200200 ✓
🕑
Next check: April 22. Based on the 14-day attack pattern (last spike Apr 7–8), that's the next expected hit. The fix deployed hours before the window opened—best possible test, because attackers haven't had time to change tactics. If it works, Apr 22 looks like every other day. Follow-up: GSC URL Removal sweep for already-polluted /search/* entries to accelerate the cleanup Google will eventually do on its own.

Why It Matters

This Is Running Against Thousands of Sites Right Now

This kind of attack isn't rare. It's quietly running against thousands of WordPress sites this week. Cheap for attackers—just links from compromised sites, nothing installed, nothing exploited. Expensive for the targets—index pollution, crawl-budget waste, potential manual spam penalties. And invisible unless someone is actually reading their crawl-stats reports.

Local newsrooms don't have a dedicated SEO engineer watching GSC every week. What caught this was the monitoring infrastructure we built out earlier this year: daily crawl-stats ingested into BigQuery, anomaly detection running on the droplet, and an AI assistant reading the report so humans don't have to.

The attackers have 14 days to notice they're being rejected. We have 14 days of clean data coming.

Ready to grow?

Let's build a plan that actually works for your business.

No pressure, no jargon. Just a conversation about your business and what might move the needle. We'll bring the ideas — you bring the goals.