A glitch in the SEO matrix

(or: A comprehensive ecosystem of open-source software for big data management)

07-14-23

Izzy Miller

Last week, I found a glitch in the matrix of SEO. For some reason, every month 2,400 people search for the exact string “a comprehensive ecosystem of open-source software for big data management”.


Data from SEMRush

And weirdly, there are ~1,000 results for the exact query “A comprehensive ecosystem of open-source software for big data management”. This is at once a weirdly small and weirdly large number— small because most Google searches have tens of millions of results, but large because most Google searches for exact string matches of that length actually turn up few, or no results. So there's something to this phrase.

Looking at the results, it's immediately obvious that most of these are very thin AI generated blogs just published in the last couple of months.


Sell gold near me!

Wtf? Of course, I stopped doing my SEO research (b o r i n g) and started trying to figure out how to rank for this keyword why the heck people were dogpiling into this weird query.

A strange game...

And I figured it out! But to fully understand this phenomenon, we need a bit of background on SEO.

SEO is a strange gameAlmost as strange as global thermonuclear war. Here's how it goes:

  1. You run a business, and your business has a website.
  2. You want people to find your website when they search things on Google, so they buy your stuff.
  3. There's an almost infinite number of keywords people are searching…
  4. But only some of those keywords are reasonable in topic for your business.
  5. And for those valuable keywords, there's only a couple places worth anything in the SERP (Search Engine Results Page). Even if your site is result #12 out of “About 82,800,000 results” for a hot keyword, you probably get nothing for your effort.
  6. So you keep trying new content until you get something in the top few spots.

SEO is also a zero-sum game. The only way to get into those coveted top spots is to unseat an existing entry— someone just like you, with a business just like yours, who set off on the same mission as you, but simply didn't do a good enough job to keep your new incredible content from bumping them off. Their loss, your gain.

But there is one way to outplay this competition a little bit: find a new keyword that nobody else is ranking for, but that people are searching for. Of course, you have to strike fast, and after a bit of time this will also turn into the zero-sum leapfrogging of any popular keyword. But you have a tiny bit of green field.

A comprehensive ecosystem of open-source software for big data management

That's exactly what's happening here! There is a Cisco Introduction to IoT course that must have gotten pretty popular recently. One of the potential questions on a sample final exam is about Hadoop, which, as we all know, is a comprehensive ecosystem of open source software for big data management !

This means that thousands of students started searching for “a comprehensive ecosystem of open source software for big data management” every month as they studied for their final IoT exam. And the SEO analytics dashboards noticed.

The really interesting thing about this case though, is that the original source content driving this search interest is not publicly available or indexed. This query is copied and pasted verbatim from an exam, which are famously not something you want to be found on Google.

Most of the time when something new emerges that's worth searching for, there's some party that's already front-run the SEO game. New Apple Vision Pro? Been talked about for months, plus Apple's got a page for it. OpenAI Code Interpreter? There's already 8 million results.

But here, there was a rare SEO vacuum: a very specific high volume query, with almost zero content to match. The SEO dashboard tools see that, think "JACKPOT!", and that's why we have a thousand low-quality posts desperately trying to rank for this strange and confusingly specific keyword. Mystery solved.

The problem with SEO

But there's a problem here that exposes the strange, mindless behavior of finding a keyword in an SEO dashboard and pouncing on it without thinking twice.

That 2 second Quizlet GIF above actually contains 100% of the information the user searching needs. They just want to know the answer to their Jeopardy-style question, and nothing more. One word, “Hadoop”, does the trick.

Google knows this, so it adds a “Question and Answers” block before the first SERP result, probably diverting a tremendous amount of traffic away from that page. Honors for that top page go to Ben Farrell at Data Driven Daily— Ben if you're reading I'd love to know the backstory of how you discovered this keyword and how much traffic it actually drives!

But since that article was written purely to dominate the SEO rankings and please The Algorithm, it's metastasized from the one-word answer “Hadoop” into 1,066 words of jargon. The sentence about Hadoop doesn't even appear until word 491, almost halfway through the article. It wasn't written to answer someone's question or satisfy their search intent. It was just written to rank.

And this is BY FAR the best article in the entire SERP about the subject. All the rest appear to be either copy/pastes of Ben's article, AI generated garbage, or just outright scam links. Of the three results below, the top one is a site dedicated to copying & pasting answers to Computer Based Tests (CBTs) in a freeform stream-of-consciousness blog that covers test subjects from apartheid history to printer network management. The other two escape me.

My favorite result is from “HighAdviser.com”, which has one of the most powerful homepages I have ever seen. My favorite little detail here is that the date is actually wrong, somehow? It's July 14th as I write this, not 10th.

Also, I mean, "Innovation Management Generate Fresh Ideas"? “Boys And A Dog Homemaking Homeschool Tips— For Busy Folks? No notes. I desperately want to quit my job and write for High Adviser full time. I think I could really uplevel their innovation management, and generate fresh ideas— for Busy Folks!

There's even a 4chan /b/ post in the SERP, obviously with an NSFW image attached to it. The irony is that this is genuinely the most helpful item on the entire search page. This should be the #1 result!

So what?

I sort of doubt that this is driving meaningful traffic to these sites, but it's fascinating to me to watch an SEO Meme like this develop, and see how fast the programmatic SEO hounds hurl themselves at a fresh target.

This is only going to get weirder with LLMs, at least in the interim before everyone stops using Google. There's now a ton of tools that automatically find low competition keywords and generate hundreds of AI blog posts for you in just a few minutes. Here's a guy claiming to be generating 150,000 AI SEO blogs:

Some of these might be helpful! But lots of them wont, and there'll be thousands of "glitch" keywords just like this one that get caught in a circular AI arms race of unhelpful SEO content. The real problem is just the lack of alignment these articles have with search intent— if you want people to land on your site and remember you favorably, you should just answer their question. Everything else is extraneous. I don't have high hopes for AI accurately determining the intent of all the strange keyword combinations out there, and so I expect we'll see more and more of these glitches.

Perhaps, only 4chan can save us with their reputation for kindness and straight to the point questions and answers...

Liked this post? Follow me on Twitter @isidoremiller.