# 9.5. Cache > A web server for a simplified search engine has 100 machines to respond to search queries, which may then call to processSearch(string query) to another cluster of machines. The machine responding to a query is chosen at random. The method processSearch is very expensive. Design a caching mechanism for the most recent queries. Explain how to update the cache when data changes I would implement a stack (chapter 3, stacks and queues) which uses a LIFO system. last in first out. We can keep the last 100 queries, as long as it's not empty we keep filling it, when it's full, a new query comes in, the oldest query is removed, and the new query is added to the stack. ## Solution ### Assumptions * Calling between machines is fast * We are caching a lot of queries * The most popular queries are very popular, so they are always in the cache ### System requirements * Efficient lookups given a key * Expiration of old data so it can be replaced with new data * Quick updating and handling of cache ### 1. Design a cache for a single system A linked list allows easy purging of old data, by moving fresh items to the front. A hash table allows efficient lookup f data, but doesn't ordinarily allow easy data purging. We have a linked list where a node is moved is moved to the front each time it's accessed, and the end of the linked list contains the stalest information. Also, we have a hash table mapping from a query to the corresponding node in the linked list. ### 2. Expand to many machines * Each machine has its own cache * it's quick, no machine calls, but not effective. many repeat queries are treated as fresh * Each machine has a copy of the cache * the entire data structure of hash table and linked list is duplicated. Updating the cache means sending data to N machines. because the cache would take up more space, we could store less data * Each machine stores a part of the cache * when machine i needs to look up the results for a query, it finds which machine has it, and then asks machine j for it. How to know this? The cache can be divided based on some formula hash(query) % N. Machine i finds out machine j has the data, asks machine j for it, and machine j either checks cache or calls the function. then updates its cache. ### 3. Updating results when content changes Some pages are so popular they are always cached, we need to refresh the cache either periodically or on demand, when content changes. * The content at a url changes, or the page is removed * The ordering of results change in response to the rank of a page changing * New pages appear related to a particular query See book page 384 for the solutions for this.