Colossus, Google’s File System
Colossus, Google’s File System
You trust Google Cloud with your critical data, but did you
know that Google also relies on the same underlying
storage infrastructure for its other businesses as well?
That’s right, the same storage system that powers Google
Cloud also underpins Google’s most popular products,
supporting globally available services like YouTube, Drive,
Cloud
and Gmail.Blog
Contact sales Get started for free
Colossus in a nutshell
Now, let’s take a closer look at how Colossus works.
But how does it all work? And how can one file system
underpin such a wide range of workloads? Below is a
diagram of the key components of the Colossus control
plane:
Cloud Blog Contact sales Get started for free
Client library
The client library is how an application or service interacts
with Colossus. The client is probably the most complex
part of the entire file system. There’s a lot of functionality,
such as software RAID, that goes into the client based on
an application’s requirements. Applications built on top of
Colossus use a variety of encodings to fine-tune
performance and cost trade-offs for different workloads.
Metadata database
Curators store file system metadata in Google’s high-
performance NoSQL database, BigTable. The original
motivation for building Colossus was to solve scaling limits
we experienced with Google File System (GFS) when
trying to accommodate metadata related to Search.
Storing file metadata in BigTable allowed Colossus to
scale up by over 100x over the largest GFS clusters.
D File Servers
Colossus also minimizes the number of hops for data on
the network. Data flows directly between clients and “D”
Cloud Blog
file servers
Contact sales
(our network attached disks).
Get started for free
Custodians
Colossus also includes background storage managers
called Custodians. They play a key role in maintaining the
durability and availability of data as well as overall
efficiency, handling tasks like disk space balancing and
RAID reconstruction.
Battle-tested to deliver
massive scale
So, there you have it—Colossus is the secret scaling
superpower behind Google’s storage infrastructure.
Colossus not only handles the storage needs of Google
Cloud services, but also provides the storage capabilities
of Google’s internal storage needs, helping to deliver
content to the billions of people using Search, Maps,
YouTube, and more every single day. When you build your
business on Google Cloud you get access to the same
super-charged infrastructure that keeps Google running.
We’ll keep making our infrastructure better, so you don’t
have to.
Optimizing object
storage costs in Google
Cloud: location and
classes