Case Study — Finger Lakes Daily News

When Pharma Spammers Try to Poison a Local News Site

How we caught and killed an SEO attack that was weaponizing WordPress's own RSS feeds—in a single afternoon.

4,080

Peak Crawl Spike

vs ~200/day baseline

14 days

Attack Cadence

Jan 27 → Apr 7, +/-0 days

410 Gone

Response Now Served

Was 200 OK before

Legitimate Feeds Broken

Verified URL-by-URL

The Signal

Something Was Telling Google to Come Back for More

Google Search Console's feed-crawler report showed a strange pattern: every two weeks, crawl requests against Finger Lakes Daily News jumped from ~200/day to 2,000–4,000/day, then returned to baseline. Like clockwork—Jan 27, Feb 10, Feb 24, Mar 10, Mar 24, Apr 7. Exactly 14 days apart.

Normal crawl behavior doesn't spike on a calendar. Something was pulling Googlebot back on a schedule.

Feed-Crawler Requests — Google Search Console (Jan 20 – Apr 18, 2026)

GSC feed-crawler requests chart showing six biweekly spikes between 2,000 and 4,000 requests between late January and early April, with a ~200/day baseline.

Why it matters: Crawl-budget anomalies are usually the first visible symptom of an SEO attack—long before polluted results appear in SERPs. Most sites never look. We only saw it because our overnight review pipeline reads the GSC feed-crawler report into BigQuery every morning.

The Investigation

Three Problems in the URL List — One of Them an Active Attack

We pulled the list of URLs Google's feed crawler was hitting. The pattern was immediately obvious—and it wasn't just one thing going wrong.

⚠

1. Pharma Spam Injection

Dozens of crafted URLs hitting /search/<pharma-query>/feed/rss2/. WordPress turns any search query into an indexable RSS feed. Attackers link to those URLs from compromised sites, Google crawls them, and pharmacy spam ends up indexed under our domain.

Active, ongoing — P0

📋

2. Low-Value Feed Surface

Thousands of crawls against /tag/*/feed, /author/…/page/850/feed, /category/local/page/559/feed, and /ad_block/…/feed/—archive pages with no unique content, burning crawl budget that should be going to actual journalism.

High — crawl waste

🛠

3. Slugification Bug

Thousands of tag slugs starting with 8216 — the HTML entity number for a left-single-quote (‘). The RSS-import pipeline was slugifying article titles without decoding HTML entities first, generating a trail of broken tag pages Googlebot kept revisiting.

Medium — data hygiene

The Fix

410 Gone, at the Origin, Before Anything Else Runs

Two commits to the theme's SEO module—roughly 70 lines of PHP. The fix had to neutralize the active attack without breaking the feeds that actual readers and syndication partners depend on.

1. Hard-fail the spam vector

A template_redirect hook at priority -1 intercepts /search/*/feed/, /tag/*/feed/, /author/*/feed/, and deep-pagination archive feeds before WordPress tries to render them. Returns 410 Gone with X-Robots-Tag: noindex, nofollow. 410 is the strongest signal we can send Google: this URL existed, it's gone, stop asking.

2. Seal it off at robots.txt

Five new Disallow: rules covering tag feeds, author feeds, category deep-pagination feeds, the /ad_block/ taxonomy leak, and generic paginated feeds. Belt-and-suspenders: 410 handles already-indexed URLs; robots.txt prevents new crawlers from hitting them in the first place.

Careful carve-outs. The legitimate feeds real readers use—the main site feed, single-post comment feeds, top-level category feeds, the Google News sitemap—are all untouched. Verified URL-by-URL on production after the deploy.

// Intercept spam-vector and low-value feeds before WordPress rewrites kick in.
add_action( 'template_redirect', 'fldn_seo_gone_low_value_feeds', -1 );

function fldn_seo_gone_low_value_feeds() {
    $path = strtok( $_SERVER['REQUEST_URI'], '?' );
    if ( 0 === strpos( $path, '/search/' ) && strpos( $path, '/feed' ) !== false ) {
        status_header( 410 );
        header( 'X-Robots-Tag: noindex, nofollow' );
        exit;
    }
    if ( preg_match( '#^/(tag|author)/[^/]+/(feed|page/\d+/feed)/?$#', $path ) ) {
        status_header( 410 );
        exit;
    }
}

Verification

What Changed, What Didn't

Measured on production after CF cache purge. Every URL confirmed with curl -sI.

URL	Before	After
`/search/cialis/feed/rss2/`	200 OK (served spam)	410 Gone
`/author/lucas/feed/`	200 OK	410 Gone
`/tag/geneva/feed/`	301 → homepage	410 Gone
`/category/local/page/559/feed`	200 (deep pagination)	301 → top-level feed
`/feed/` (home)	200	200 ✓
`/<slug>/feed/` (single-post comments)	200	200 ✓
`/news-sitemap.xml`	200	200 ✓
`/local/feed/` (top-level category)	200	200 ✓

Why It Matters

This Is Running Against Thousands of Sites Right Now

This kind of attack isn't rare. It's quietly running against thousands of WordPress sites this week. Cheap for attackers—just links from compromised sites, nothing installed, nothing exploited. Expensive for the targets—index pollution, crawl-budget waste, potential manual spam penalties. And invisible unless someone is actually reading their crawl-stats reports.

Local newsrooms don't have a dedicated SEO engineer watching GSC every week. What caught this was the monitoring infrastructure we built out earlier this year: daily crawl-stats ingested into BigQuery, anomaly detection running on the droplet, and an AI assistant reading the report so humans don't have to.

The attackers have 14 days to notice they're being rejected. We have 14 days of clean data coming.

When Pharma Spammers Try to Poison a Local News Site

Something Was Telling Google to Come Back for More

Three Problems in the URL List — One of Them an Active Attack

410 Gone, at the Origin, Before Anything Else Runs

1. Hard-fail the spam vector

2. Seal it off at robots.txt

What Changed, What Didn't

This Is Running Against Thousands of Sites Right Now

Want this kind of monitoring on your site?

Let's build a plan that actually works for your business.