By Mike Grehan
Fifteen years ago, I wrote a book about search engine optimization that was largely regarded as the most comprehensive book of its kind. I had written an earlier edition in 1999, but between that edition and the updated version I had become absorbed in the science of information retrieval and was able to add so much more detail about the type of science and technology on which Google based its search results. And I still continue to study and observe advances in the field. That in itself has frequently helped me to stay ahead of the game when it comes to what we know as search engine optimization (SEO).
Back in the Nineties, when I first started manipulating search engine results (yes, that’s exactly what I was doing) it wasn’t called SEO. In fact, there wasn’t really a term for it. There weren’t even that many people doing it. A kind of small cottage industry was beginning to emerge. We stuffed pages with keywords and did some hypertext-markup-tinkering to create what we called “doorway pages” and the end game was simple: Get indexed and rank somewhere in the first two pages of search engine results.
In the past, text was the strongest signal for ranking purposes. If the text in a query matched the text on a page then it was a candidate for ranking. However, the “vocabulary mismatch” is a common phenomenon in the usage of natural languages. It occurs when different people name the same thing or concept differently. Early research work has shown that, on average, 80% of the time different people (experts in the same field) will name the same thing differently. There are usually many possible names that can be attributed to the same thing. This research motivated the work on latent semantic indexing.
[x_pullquote cite=”Mike Grehan” type=”left”]People still wonder why one web page linking to another is so hugely important. In the simplest of terms, Google based its algorithm fundamentally on citation analysis. [/x_pullquote]No, I won’t explain latent semantic indexing here. But to put it in context, a 2012 quantitative study of the vocabulary mismatch problem in an information retrieval setting determined that an average query term fails to appear in 30-40% of the documents that are relevant to the user query. Yes, up to 40% of relevant documents for any given query don’t even have the words from the query appear anywhere in them. This is why Google has become so adept at using “query expansion” techniques.Looking back, even though a search engine crawler seemed like such an advanced piece of technology, it was actually almost primitive at the time. And crawlers were so easily foiled by just about any change in web development technology. That’s what really spawned the industry. An army of what effectively could have been seen as hypertext remodeling workers. Masters of the angle-bracket, tag-technicians, keyword-explorers and tweakers-of-text. And that was before Google arrived with its all new, fancy, hyperlink-induced, algorithms. When that happened, the emerging industry, which had conquered the “on page” optimization process, now focused squarely on links, links and more links.
I still get asked about links so often. People still wonder why one web page linking to another is so hugely important. In the simplest of terms, Google based its algorithm fundamentally on citation analysis. In the academic world, if a number of recognized experts in a given field all cite your paper, then basically you’re recognized as an authority on that particular subject. That’s where the term “authority site” came from. So when one web page links to another, two basic assumptions can be made. First, one page is giving a vote, as such, to the other. And second, they’re perhaps both focused on the same subject. You can throw a little network theory in here too with the observation all those years ago of “cyber communities” forming (or birds of a feather and all that).
So, the art of SEO as it had become was developing and it was a mixture of keyword analysis, tinkering with web pages, keeping your web server in check and, for ranking purposes, links and rich link anchor text. However, although the ranking factor was supposed to be mainly influenced by the “democratic nature of the web” as Google put it, and by that meaning, she with the best quality links wins, there was actually something totally “undemocratic” going on. Where did the millions and millions of end users that had no web site, and therefore no web pages to link from or too, fit in all of this?
You know, I always thought that was a little like saying that the people who make televisions get to decide what you should see on them. Quality is a subjective thing. But who’s better to decide on that? The web page authors creating content and linking to and from it, or the end users consuming it?
It was inevitable that end-user data had to be folded into the mix at some point. In exactly the same way that TV networks look at audience data to judge and rank the popularity of specific programming. For Google, “relevance feedback” has always been a signal to determine what content satisfied the information need of the end user. This is implicit data gathered on a massive scale.
Over the years, within the science many strategic approaches to information retrieval have been developed. Language models are used, as well probabilistic retrieval, Boolean indexing, latent semantic indexing, inference networks, neural networks, fuzzy set retrieval and genetic algorithms. These approaches are based on a multitude of different mathematical constructs. For a human to develop the perfect blend of all of these things to provide the most relevant results would be huge task. But perhaps not so in the realm of what’s known as machine learning.
Google took the classic information retrieval models, and under the guidance of search-master Amit Singhal scaled these approaches to match the modern web era. However, underlying this is Google’s major investment in machine learning and steps towards artificial intelligence. A cultural split has occurred at Google between the “retrievers” (those with an information retrieval background) and the “learners” (those with a machine learning background). The “retrievers” hard coded the search ranking technology (based on hundreds of signals) as far as it could go. But in 2014, the “learners” moved into the ranking team. And the first thing they did was focus on end user behavior using an artificial neural network to create a new ranking score. In April 2015, a whole new machine learning component called “RankBrain” was added to the ranking mechanism.
It’s hard to discuss the future of search without understanding how we live in a world of algorithms now. Algorithms run your cell phone, they’re in your computer, in your house, in your appliances, in your car, and your banking data and medical records are a huge tangle of algorithms.
I say it because it’s hard to be in the industry we’re in if you don’t fully understand the power of the algorithm. No, you don’t need to be a scientist or a programmer. But rather like driving a car, it’s kind of useful to know a little bit about how the engine works, not just how to drive it.
Not all queries are intended to end in a transactional result in the sense of a financial transaction. As digital marketers, we occupy our minds way too much with this and focus way too much on trying solely to connect with an end user at the checkout. Over time, computer scientists build on each other’s work. And this has certainly happened at Google. Algorithms combine with other algorithms to use the results of other algorithms, which in turn produce results for more algorithms. Each algorithm has an input and an output. You put something in, the computer does what it does and out comes a result. But with machine learning, something entirely different happens.
With machine learning you enter the data and the desired result. And then out comes an algorithm that turns one into the other. These are learning algorithms – or learners for short – and they’re algorithms that make other algorithms. With machine learning, computers write algorithms so humans don’t have to.
Ranking signals do not remain static. They’re fluid and change with time and context and geography and behavioral feedback. And as the learner builds around all of this, it begins to look more actively beyond basic relevance to the query, maximizing the usefulness of the search results for the user who input the query.
Not all queries are intended to end in a transactional result in the sense of a financial transaction. As digital marketers, we occupy our minds way too much with this and focus way too much on trying solely to connect with an end user at the checkout.
And yet, what if we simply want to stimulate ourselves, change our mood, maybe find a funny video clip to laugh at, maybe we simply want to look at some nice pictures or, who knows, maybe we want a tutorial on how to build a house. In order for Google to help us in so many ways, understanding intent is the most important factor.
We really have entered a brand new era of search. And that means a new era of SEO. If, in fact, that’s what it should still be called. The job has changed so much from those early web-page-tinkering days. If it’s time for Google to move forward with new, faster learning technology, identifying so much more about the end user’s information need than simply words on a page and how many links that page has pointing to it, then SEO must change with it.
Each year Google publishes the founders letter. For 2016 they gave the task to new Google CEO Sundar Pichai. In it, he said “When Larry and Sergey founded Google in 1998, there were about 300 million people online. By and large, they were sitting in a chair, logging on to a desktop machine, typing searches on a big keyboard connected to a big, bulky monitor. Today, that number is around 3 billion people, many of them searching for information on tiny devices they carry with them wherever they go.”
It’s all about context and content. Who you are, where you are, what time of day it is and, of course, previous search behavior. Our job is less about helping Google index the web 1999 style. It’s less about worrying about the penalty of buying links to beat the competition in the SERPs. It’s so much more about creating useful content experiences on the user journey. It’s about being there in the moment.
Maybe we should be thinking about ourselves as content experience analysts (CEA) concerned more with human interaction and engagement. Perhaps now really is the time to focus on optimizing for humans and not for machines.
—
Mike Grehan is CMO and Managing Director of Acronym and Chairman of SEMPO, the worldwide search marketing organization.