A guide to entity recognition.



Read Time


Last Updated

Ben Wheatley, Business Director


4 Minutes

28 Aug 2020

10 May 2024

You know those sites that seem to just do well in search even though they’ve missed the fundamentals? The fact that Google knows that when you search ‘2008 Britney’ you want to see her smashing car windows, not the other 364 days of the year? The way that if you’re searching for information about an event and then look for a hotel, the two seem to be right next to each other?

How does any of that actually work?

It could be that Google has really honed in on its mind-reading skills, or there’s something bigger going on. Well, Derren Brown can rest easy as Google is not after his show. In fact, all of these insights and connections that Google can make in milliseconds is driven by its understanding of Entities.

Before we dive into this and explore how Google’s library of entities and their connections fuels so many of the rich answers we see in search, there are three things we need to define:

  1. What is an “entity”?
  2. What is entity recognition?
  3. What is entity salience?

What is an entity?

According to Google’s patent (Question answering using entity references in unstructured data), an entity is a thing or concept that is singular, unique, well-defined and distinguishable.

That “singular” thing or concept could be a person, a place, or an item among other things and could be identified in relation to wider “categories” like locations, movies, animals etc. The reason entities are different to a ‘thing’ is because they are universally understood and accepted as being synonymous with their sub-entities.

Confusing right?

Let’s talk through an example. To circle back to one of my initial examples, we’re going to spend a minute talking about Britney.

Notice, you instantly knew that I’m referring to Britney Spears here.
That shows that it is widely accepted that the brand ‘Britney’ is synonymous with Britney Spears, as are her biggest hits, her own website, the show in Vegas, her social media and even her Wikipedia page.

All of these smaller elements that make up the Britney brand are the sub-entities that contribute towards creating this much larger, recognised entity.


What is entity recognition?

Entity recognition is how Google determines what an entity truly is, and accepts it into its list of ‘approved entities’. Then each time Google undertakes its information retrieval process, you should imagine it cross-referencing that list of entities with those which appear on the site. This helps Google to understand two things:
1. Is the content talking about relevant, reliable sources (entities) in relation to the topic?
2. Is an entity being spoken about frequently in a context which is unexpected/not relevant to the current definition?

Through Natural Language Processing (NLP) search engines are able to determine core concepts and related themes within an article, match them to entities and therefore determine how relevant/useful to the user the content is. So if you’re thinking about growing your Expertise-Authority-Trust or even just looking to improve your overall organic positioning, then entity recognition is a good place to start.

Why do entities matter?

As Google’s algorithms have developed to focus on the needs of the user, and continue to do so, the importance of understanding the relationships between query and document has grown in importance. Google not only needs to know what to serve in answer to a single question, but also what other information could possibly be needed at the same time to give the user the most complete answer to their query. Therefore, if Google doesn’t understand the relationship between entities, and the subsequent relationship between those entities and your website, it isn’t going to result in a very good experience for the user.

How do I improve my entity recognition?

To help improve your chances of making sure Google correctly understands the content on your website and the relevance of your website to related queries, there are a few things you can do:

  1. Check out Google’s NLP API; this is a great place to start to get a benchmark of how Google is interpreting your content and how it is finding, grouping, and scoring different entities within it
  2. Create relationships; between your business and your people, your people and your content, your content and your purpose, your purpose and your service, your service and your locations – really work to connect all the dots
  3. Use structured data – Entity recognition begins understanding unstructured data, so if you can give pointers as to what your content is about through the use of structured data you can push more of that understanding through to Google
  4. Structure your content properly – make sure it is ordered in a way that flows, makes sense, and always ties back to the main point
  5. Look for clues – People Also Ask, Related Questions, Top Stories – these are all features which can give you insight into the information Google understands is related to a query and thus the areas you might want to explore to form deeper connections
  6. Make content to match the intent – if your content doesn’t match the intent of the query, Google has no reason to show it. If you know what content is coming up for your focus areas, drill into what is being served and why
  7. Constantly revisit – Google’s understanding is growing every day, so what may seem relevant today might not do in 3, 6, 12 months time. Review it, benchmark it, change it, measure it, review it, repeat.

We’ve followed this practice for a number of our clients, so we know how to make the most out of the available tools from both a general user understanding perspective and how you can use it to boost your SEO performance.

Want some advice on how you could make this work for you? Get in touch.