Data StorageInverted Index

An inverted index is a data structure that stores a mapping from keys to values. The keys are typically words or phrases, and the values are the documents in which those keys appear. An inverted index can be thought of as a reverse dictionary where the keys are the words and the values are the definitions.

Inverted indexes are often used in search engines to improve the speed of searches. The best way to explain an inverted index is to show you an example.

How does an Inverted Index Work? 

To create an inverted index, we first need a collection of documents. For each document in the collection, we create a list of all the unique words that appear in that document. We then store each word along with a list of the documents in which it appears. 

For example, consider the following two documents: 

Document 1: "The quick brown fox jumps over the lazy dog."

Document 2: "The quick red fox jumps over the lazy cat."

We would create the following inverted index: 

  • quick -> [Document 1, Document 2]
  • brown -> [Document 1]
  • fox -> [Document 1, Document 2]
  • jumps -> [Document 1, Document 2]
  • over -> [Document 1, Document 2]
  • the -> [Document 1, Document 2]
  • lazy -> [Document 1, Document 2]
  • dog -> [Document 1]  
  • cat -> [Document 2]

Notice that we have omitted stop words such as "the," "a," and "to" from our index. This is a common practice when using an inverted index to support a search engine since they do not add any value for search purposes. We have also stemmed each word into their base words so that variations such as "jumping" and "jumped" are mapped to the same key ("jump"). 

Adding to our inverted index

Now let's say we have a new document: 

Document 3: "The quick brown rabbit jumps over the lazy dog."

 We can easily update our index by adding the new document to the list of values for each word that appears in it. For instance, since "rabbit" appears in Document 3, we would add Document 3 to the list of values for the key "rabbit". Our updated index would look like this: 

  • quick -> [Document 1, Document 2]
  • brown -> [Document 1, Document 3]
  • fox -> [Document 1, Document 2]
  • jumps -> [Document 1, Document 2]
  • over -> [Document 1, Document 2]
  • lazy -> [Document 1, Document 2]
  • cat ->[Document2]
  • dog->[Document1, Document3]
  • rabbit->[Document3]

 When someone performs a search using a keyword (e.g., "lazy"), we can quickly look up all documents containing that keyword by consulting our index. This is much faster than having to search through all of our documents one by one. 

Inverted indexes are powerful data structures that can be used for a variety of purposes including spell-checking applications, and recommendation systems. This data structure is common in system design interviews.

Related Problems