2. Search Engine

Medium

Given a search query, design a search engine that returns 10 urls that contain the keyword

Functional Requirements
  • 10 relevant results should be returned for each query if they exist
  • When a user clicks on a result they should be sent to the page
  • A "good match" is when a url contains a specific keyword
  • Searches can only consists of a single keyword
  • All grammatical tenses should map back to the same base words
  • All "stop words"( a, an, you, or ,etc.) should not be included in the index
  • Urls should be checked daily for updates
  • Urls do not need to be ranked off of any relevancy
Nonfunctional Requirements
  • 99.99% Availability
  • Search latency less than 200ms
Assumptions
  • Pagination of results isn't required
  • You are given a dataset containing all of the web pages that could be considered in the search
  • Urls take up 20 Bytes on average
  • Each webpage contains 20 Megabytes of text content per page
  • Searches are uniformly distributed across all available search terms
  • Each letter of the alphabet has an equal number of keywords associated with it
Estimated Usage
  • 500,000 queries per day
  • Website list is to store are .5 megabyte in size

Seen this question in a real interview before?

Not all editor features are supported on mobile