Facebook database design

Question

I have always wondered how Facebook designed the friend  lt -  user relation   I figure the user table is something like this   user email PK user id PK password    I figure the table with user s data  sex  age etc connected via user email I would assume     How does it connect all the friends to this user   Something like this   user id friend id 1 friend id 2 friend id 3 friend id N    Probably not  Because the number of users is unknown and will expand

User · Answer

You re looking for foreign keys  Basically you can t have an array in a database unless it has it s own table   Example schema       Users Table         userID PK         other data     Friends Table         userID   -- FK to users s table representing the user that has a friend          friendID -- FK to Users  table representing the user id of the friend

User · Answer

It s not possible to retrieve data from RDBMS for user friends data for data which cross more than half a billion at a constant time so Facebook implemented this using a hash database  no SQL  and they opensourced the database called Cassandra   So every user has its own key and the friends details in a queue  to know how cassandra works look at this   http   prasath posterous com cassandra-55

User · Answer

This recent June 2013 post goes into some detail into explaining the transition from relationship databases to objects with associations for some data types   https   www facebook com notes facebook-engineering tao-the-power-of-the-graph 10151525983993920  There s a longer paper available at https   www usenix org conference atc13 tao-facebook   s-distributed-data-store-social-graph

User · Answer

Regarding the performance of a many-to-many table  if you have 2 32-bit ints linking user IDs  your basic data storage for 200 000 000 users averaging 200 friends apiece is just under 300GB   Obviously  you would need some partitioning and indexing and you re not going to keep that in memory for all users

User · Answer

Its a type of graph database  http   components neo4j org neo4j-examples 1 2-SNAPSHOT social-network html  Its not related to Relational databases   Google for graph databases

User · Answer

TL DR   They use a stack architecture with cached graphs for everything above the MySQL bottom of their stack   Long Answer   I did some research on this myself because I was curious how they handle their huge amount of data and search it in a quick way  I ve seen people complaining about custom made social network scripts becoming slow when the user base grows  After I did some benchmarking myself with just 10k users and 2 5 million friend connections - not even trying to bother about group permissions and likes and wall posts - it quickly turned out that this approach is flawed  So I ve spent some time searching the web on how to do it better and came across this official Facebook article    TAO  Facebook   s Distributed Data Store for the Social Graph  TAO  The power of the graph    I really recommend you to watch the presentation of the first link above before continue reading  It s probably the best explanation of how FB works behind the scenes you can find   The video and article tells you a few things    They re using MySQL at the very bottom of their stack Above the SQL DB there is the TAO layer which contains at least two levels of caching and is using graphs to describe the connections  I could not find anything on what software   DB they actually use for their cached graphs   Let s take a look at this  friend connections are top left     Well  this is a graph     It doesn t tell you how to build it in SQL  there are several ways to do it but this site has a good amount of different approaches  Attention  Consider that a relational DB is what it is  It s thought to store normalised data  not a graph structure  So it won t perform as good as a specialised graph database   Also consider that you have to do more complex queries than just friends of friends  for example when you want to filter all locations around a given coordinate that you and your friends of friends like  A graph is the perfect solution here   I can t tell you how to build it so that it will perform well but it clearly requires some trial and error and benchmarking   Here is my disappointing test for just findings friends of friends   DB Schema   CREATE TABLE IF NOT EXISTS  friends     id  int 11  NOT NULL     user id  int 11  NOT NULL     friend id  int 11  NOT NULL   ENGINE InnoDB AUTO INCREMENT 2 DEFAULT CHARSET utf8    Friends of Friends Query             select friend id         from friends         where user id   1       union           select distinct ff friend id         from             friends f             join friends ff on ff user id   f friend id         where f user id   1         I really recommend you to create you some sample data with at least 10k user records and each of them having at least 250 friend connections and then run this query  On my machine  i7 4770k  SSD  16gb RAM  the result was  0 18 seconds for that query  Maybe it can be optimized  I m not a DB genius  suggestions are welcome   However  if this scales linear you re already at 1 8 seconds for just 100k users  18 seconds for 1 million users   This might still sound OKish for  100k users but consider that you just fetched friends of friends and didn t do any more complex query like  display me only posts from friends of friends   do the permission check if I m allowed or NOT allowed to see some of them   do a sub query to check if I liked any of them   You want to let the DB do the check on if you liked a post already or not or you ll have to do in code  Also consider that this is not the only query you run and that your have more than active user at the same time on a more or less popular site   I think my answer answers the question how Facebook designed their friends relationship very well but I m sorry that I can t tell you how to implement it in a way it will work fast  Implementing a social network is easy but making sure it performs well is clearly not - IMHO   I ve started experimenting with OrientDB to do the graph-queries and mapping my edges to the underlying SQL DB  If I ever get it done I ll write an article about it

User · Answer

Keep a friend table that holds the UserID and then the UserID of the friend  we will call it FriendID   Both columns would be foreign keys back to the Users table   Somewhat useful example   Table Name  User Columns      UserID PK     EmailAddress     Password     Gender     DOB     Location  TableName  Friends Columns      UserID PK FK     FriendID PK FK      This table features a composite primary key made up of the two foreign       keys  both pointing back to the user table  One ID will point to the      logged in user  the other ID will point to the individual friend      of that user    Example Usage   Table User -------------- UserID EmailAddress Password Gender DOB      Location ------------------------------------------------------ 1      bob bob com  bobbie   M      1 1 2009 New York City 2      jon jon com  jonathan M      2 2 2008 Los Angeles 3      joe joe com  joseph   M      1 2 2007 Pittsburgh  Table Friends --------------- UserID FriendID ---------------- 1      2 1      3 2      3   This will show that Bob is friends with both Jon and Joe and that Jon is also friends with Joe  In this example we will assume that friendship is always two ways  so you would not need a row in the table such as  2 1  or  3 2  because they are already represented in the other direction  For examples where friendship or other relations aren t explicitly two way  you would need to also have those rows to indicate the two-way relationship

User · Answer

Probably there is a table  which stores the friend  lt -  user relation  say  frnd list   having fields  user id   frnd id     Whenever a user adds another user as a friend  two new rows are created    For instance  suppose my id is  deep9c  and I add a user having id  akash3b  as my friend  then two new rows are created in table  frnd list  with values   deep9c   akash3b   and   akash3b   deep9c     Now when showing the friends-list to a particular user  a simple sql would do that   select frnd id from frnd list where user id   where  is the id of the logged-in user  stored as a session-attribute

User · Answer

Have a look at the following database schema  reverse engineered by Anatoly Lubarsky

User · Answer

My best bet is that they created a graph structure   The nodes are users and  friendships  are edges   Keep one table of users  keep another table of edges   Then you can keep data about the edges  like  day they became friends  and  approved status   etc

User · Answer

Take a look at these articles describing how LinkedIn and Digg are built    http   hurvitz org blog 2008 06 linkedin-architecture http   highscalability com scaling-digg-and-other-web-applications   There s also  Big Data  Viewpoints from the Facebook Data Team  that might be helpful   http   developer yahoo net blogs theater archives 2008 01 nextyahoonet big data viewpoints from the fac html  Also  there s this article that talks about non-relational databases and how they re used by some companies   http   www readwriteweb com archives is the relational database doomed php  You ll see that these companies are dealing with data warehouses  partitioned databases  data caching and other higher level concepts than most of us never deal with on a daily basis  Or at least  maybe we don t know that we do    There are a lot of links on the first two articles that should give you some more insight   UPDATE 10 20 2014  Murat Demirbas wrote a summary on    TAO  Facebook s distributed data store for the social graph  ATC 13  F4  Facebook s warm BLOB storage system  OSDI 14    http   muratbuffalo blogspot com 2014 10 facebooks-software-architecture html  HTH

User · Answer

It s most likely a many to many relationship   FriendList  table   user id - gt  users user id friend id - gt  users user id friendVisibilityLevel   EDIT  The user table probably doesn t have user email as a PK  possibly as a unique key though   users  table   user id PK user email password

User · Answer

Keep in mind that database tables are designed to grow vertically  more rows   not horizontally  more columns

[sql] Facebook database design?

Example schema:

Examples related to sql

Examples related to facebook

Examples related to database-design

Examples related to database-normalization

Examples related to database-table