25 lines
2.2 KiB
Markdown
25 lines
2.2 KiB
Markdown
# 9.2. Social network
|
|
|
|
> Design the data structures for a very large social network. Design the algorithm to show the shortest path between two people (me -> Bob -> Susan -> Jason -> You)
|
|
|
|
This is a graph problem. There are many nodes and many edges. There is probably a python library to use. Or make a hash table, with each node as key and its connections as values. Then we can make a breadth first search to find the destination person. Once we reach it, we stop. Also, once a person is visited, a flag should be activated to discard recurrent loops. And those people should be discarded.
|
|
|
|
**Bidirectional breadth first search**: you can also start looking from the destination to the origin, but then my hash table data structure doesn't work. It needs another list in the values, a list of preceding nodes. Which nodes lead to this node. And then start search from both ends, whoever reaches the other end first, the route wins and we stop. Or when the two searches collide, we know we have found a path.
|
|
|
|
BBFS needs access to both origin and destionation points, which is not always the case. What about scalability?
|
|
|
|
If a connection between two friends is frequented a lot, it can be saved in a sort of cache if it is queried often.
|
|
|
|
With millions of users, the whole database cannot be kept in one machine. Instead of keeping people's friends as an object, we can keep the ID of where they are stored in the db, which machine and which section.
|
|
|
|
Jumping from machine to machine is expensive. We can batch this: if 5 of someone's friends are in a certain machine, we can visit this machine all at once. Also, instead of storing people randomly in different machines, we can store them based on country of residence.
|
|
|
|
Since in a scalable way many queries are being done in each case, we cannot mark nodes as visited. We could create different flags for each query. Or we can create an additional hash table showing whether the node has been visited in this particular query.
|
|
|
|
Possible problems:
|
|
|
|
* What when a server fails?
|
|
* How can you take advantage of caching?
|
|
* What if no friend is found, do you continue forever or when do you give up?
|
|
* Some people have more friends and therefore more chances of making a path to new people. How to use this info to choose where to start traversing?
|