[database] What database does Google use?

Is it Oracle or MySQL or something they have built themselves?

The answer is


And it's maybe also handy to know that BigTable is not a relational database (like MySQL) but a huge (distributed) hash table which has very different characteristics. You can play around with (a limited version) of BigTable yourself on the Google AppEngine platform.

Next to Hadoop mentioned above there are many other implementations that try to solve the same problems as BigTable (scalability, availability). I saw a nice blog post yesterday listing most of them here.


Spanner is Google's globally distributed relational database management system (RDBMS), the successor to BigTable. Google claims it is not a pure relational system because each table must have a primary key.

Here is the link of the paper.

Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

Another database invented by Google is Megastore. Here is the abstract:

Megastore is a storage system developed to meet the requirements of today's interactive online services. Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability. We provide fully serializable ACID semantics within fine-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters. This paper describes Megastore's semantics and replication algorithm. It also describes our experience supporting a wide range of Google production services built with Megastore.


It's something they've built themselves - it's called Bigtable.

http://en.wikipedia.org/wiki/BigTable

There is a paper by Google on the database:

http://research.google.com/archive/bigtable.html


As others have mentioned, Google uses a homegrown solution called BigTable and they've released a few papers describing it out into the real world.

The Apache folks have an implementation of the ideas presented in these papers called HBase. HBase is part of the larger Hadoop project which according to their site "is a software platform that lets one easily write and run applications that process vast amounts of data." Some of the benchmarks are quite impressive. Their site is at http://hadoop.apache.org.


It's something they've built themselves - it's called Bigtable.

http://en.wikipedia.org/wiki/BigTable

There is a paper by Google on the database:

http://research.google.com/archive/bigtable.html


And it's maybe also handy to know that BigTable is not a relational database (like MySQL) but a huge (distributed) hash table which has very different characteristics. You can play around with (a limited version) of BigTable yourself on the Google AppEngine platform.

Next to Hadoop mentioned above there are many other implementations that try to solve the same problems as BigTable (scalability, availability). I saw a nice blog post yesterday listing most of them here.


Although Google uses BigTable for all their main applications, they also use MySQL for other (perhaps minor) apps.


Spanner is Google's globally distributed relational database management system (RDBMS), the successor to BigTable. Google claims it is not a pure relational system because each table must have a primary key.

Here is the link of the paper.

Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

Another database invented by Google is Megastore. Here is the abstract:

Megastore is a storage system developed to meet the requirements of today's interactive online services. Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability. We provide fully serializable ACID semantics within fine-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters. This paper describes Megastore's semantics and replication algorithm. It also describes our experience supporting a wide range of Google production services built with Megastore.


Google services have a polyglot persistence architecture. BigTable is leveraged by most of its services like YouTube, Google Search, Google Analytics etc. The search service initially used MapReduce for its indexing infrastructure but later transitioned to BigTable during the Caffeine release.

Google Cloud datastore has over 100 applications in production at Google both facing internal and external users. Applications like Gmail, Picasa, Google Calendar, Android Market & AppEngine use Cloud Datastore & Megastore.

Google Trends use MillWheel for stream processing. Google Ads initially used MySQL later migrated to F1 DB - a custom written distributed relational database. Youtube uses MySQL with Vitess. Google stores exabytes of data across the commodity servers with the help of the Google File System.

Source: Google Databases: How Do Google Services Store Petabyte-Exabyte Scale Data?

YouTube Database – How Does It Store So Many Videos Without Running Out Of Storage Space?

enter image description here


As others have mentioned, Google uses a homegrown solution called BigTable and they've released a few papers describing it out into the real world.

The Apache folks have an implementation of the ideas presented in these papers called HBase. HBase is part of the larger Hadoop project which according to their site "is a software platform that lets one easily write and run applications that process vast amounts of data." Some of the benchmarks are quite impressive. Their site is at http://hadoop.apache.org.


Google primarily uses Bigtable.

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size.

For more information, download the document from here.

Google also uses Oracle and MySQL databases for some of their applications.

Any more information you can add is highly appreciated.


Google services have a polyglot persistence architecture. BigTable is leveraged by most of its services like YouTube, Google Search, Google Analytics etc. The search service initially used MapReduce for its indexing infrastructure but later transitioned to BigTable during the Caffeine release.

Google Cloud datastore has over 100 applications in production at Google both facing internal and external users. Applications like Gmail, Picasa, Google Calendar, Android Market & AppEngine use Cloud Datastore & Megastore.

Google Trends use MillWheel for stream processing. Google Ads initially used MySQL later migrated to F1 DB - a custom written distributed relational database. Youtube uses MySQL with Vitess. Google stores exabytes of data across the commodity servers with the help of the Google File System.

Source: Google Databases: How Do Google Services Store Petabyte-Exabyte Scale Data?

YouTube Database – How Does It Store So Many Videos Without Running Out Of Storage Space?

enter image description here


Google primarily uses Bigtable.

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size.

For more information, download the document from here.

Google also uses Oracle and MySQL databases for some of their applications.

Any more information you can add is highly appreciated.


Although Google uses BigTable for all their main applications, they also use MySQL for other (perhaps minor) apps.


And it's maybe also handy to know that BigTable is not a relational database (like MySQL) but a huge (distributed) hash table which has very different characteristics. You can play around with (a limited version) of BigTable yourself on the Google AppEngine platform.

Next to Hadoop mentioned above there are many other implementations that try to solve the same problems as BigTable (scalability, availability). I saw a nice blog post yesterday listing most of them here.


It's something they've built themselves - it's called Bigtable.

http://en.wikipedia.org/wiki/BigTable

There is a paper by Google on the database:

http://research.google.com/archive/bigtable.html


Although Google uses BigTable for all their main applications, they also use MySQL for other (perhaps minor) apps.


As others have mentioned, Google uses a homegrown solution called BigTable and they've released a few papers describing it out into the real world.

The Apache folks have an implementation of the ideas presented in these papers called HBase. HBase is part of the larger Hadoop project which according to their site "is a software platform that lets one easily write and run applications that process vast amounts of data." Some of the benchmarks are quite impressive. Their site is at http://hadoop.apache.org.


It's something they've built themselves - it's called Bigtable.

http://en.wikipedia.org/wiki/BigTable

There is a paper by Google on the database:

http://research.google.com/archive/bigtable.html


As others have mentioned, Google uses a homegrown solution called BigTable and they've released a few papers describing it out into the real world.

The Apache folks have an implementation of the ideas presented in these papers called HBase. HBase is part of the larger Hadoop project which according to their site "is a software platform that lets one easily write and run applications that process vast amounts of data." Some of the benchmarks are quite impressive. Their site is at http://hadoop.apache.org.


Although Google uses BigTable for all their main applications, they also use MySQL for other (perhaps minor) apps.


And it's maybe also handy to know that BigTable is not a relational database (like MySQL) but a huge (distributed) hash table which has very different characteristics. You can play around with (a limited version) of BigTable yourself on the Google AppEngine platform.

Next to Hadoop mentioned above there are many other implementations that try to solve the same problems as BigTable (scalability, availability). I saw a nice blog post yesterday listing most of them here.