Chubby

Let's talk about Chubby - Google's distributed lock system

Oct 02, 2024

What is Chubby?
- It is like a distributed file system with advisory locks, with the design focussing on availability and reliability instead of performance
- Chubby is intended for use within a loosely-coupled distributed system, consisting of moderately large number of machines connected by a high-speed network
The purpose of the lock service is to allow its clients to synchronise their activities and agree on basic information about their environment
The primary goals for chubby were
- Availability
- Reliability
- Easy to understand semantics
Throughput and storage space were considered as secondary goals
The intent behind making chubby was to help devs deal with coarse-grained synchronisation and to help elect a leader from a set of equivalent servers

🔑 What does coarse-grained synchronisation mean ?

Imagine you have a big toy box that you share with your sister. Coarse-grained synchronization is like making a rule: "Only one person can play with the toy box at a time."

When you want to play:

You put a sign on the toy box that says "Being used by [your name]"
You can then take out any toys you want
When you're done, you remove the sign

Your sister has to wait until you're completely finished and remove your sign before she can use the toy box. She can't grab toys while you're using it, even if she wants different toys than you're playing with.

This is simple but sometimes slow - like when your sister just wants one small toy while you're playing with completely different ones, she still has to wait for you to finish with the whole box.

GFS uses a Chubby lock to elect master server, Bigtable uses Chubby to elect master, allow master to see the servers it controls and permit clients to find the master
Both GFS and BigTable use Chubby to store small amounts of meta-data (i.e they use Chubby as root of their distributed data-structure)

Before Chubby :

Adhoc methods were used for master election for most distributed systems at Google
- Chubby allowed for a small save in computation for this case
Operator intervention also became necessary (when correctness was important) at times
- Using chubby high availability was obtained in systems that no longer required human intervention

One of the problems that chubby solves, primary election, is an instance of the distributed consensus problem, that requires a solution using asynchronous communication. One way to solve async consensus is Paxos protocol

🔑 TLDR on Paxos Protocol

Imagine a group of friends trying to order pizza over text messages, but:

Phone networks are spotty (messages might not arrive)
Some friends might fall asleep (stop responding)
You need most friends to agree on the order

Here's how Paxos would work:

PROPOSE: Alice texts "Let's get pepperoni? Anyone against this? Reply 1234"
- The number 1234 is important - it's higher than any previous pizza proposals
PROMISE: Friends check if they've already promised to support another pizza choice with a higher number
- If not, they reply "OK, I'll support pepperoni (1234)"
- If they already promised to support something else with a higher number, they say "Can't, already promised 1235 for mushroom"
ACCEPT: If Alice gets "OK" from more than half the friends, she texts "It's decided - pepperoni (1234)"

Why this works:

Even if Bob fell asleep, enough friends agreed
Even if Carol didn't get the initial text, she'll eventually learn the decision
If David simultaneously proposed mushroom pizza with 1235, only one choice would get enough promises

In computers, this ensures all servers agree on data even when some fail or can't communicate briefly.

Design

Rationale

Imagine 2 ways of making computers work together properly
- One is “Ask master” (centralised like chubby) the master clarifies everything and tells each computer what to do
- The other is “democracy approach” (something similar to a library implementation for Paxos)
Both these approaches have their own pros and cons, simply putting it other words it's like the difference between:
- Having to ask a teacher for permission to do anything
- Giving students guidelines to make decisions as a group
A lock service (Chubby) has some advantages over a client library
- Devs do not always plan for high availability, as the system grows availability becomes more and more important
  - A lock service makes it easier to maintain existing programming structure and comparison as opposed to a client library
  🔑 to elect a master which then writes to an existing file server requires adding just two statements and one RPC parameter to an existing system
  One would acquire a lock to become master, pass an additional integer (the lock acquisition count) with the write RPC, and add an if-statement to the file server to reject the write if the acquisition count is lower than the current value (to guard against delayed packets)
- A lot of the services at google that elect a primary OR partition data between their components need a mechanism for advertising result
  - For this we need to allow clients to store and fetch small quantities of data
  - This could be done with a name service, but Google’s experience has been that the lock service itself is well-suited for this task, because this
    - reduces the number of servers on which a client depends,
    - and because the consistency features of the protocol are shared
- A lock based interface is more familiar
- Think of three different ways to make group decisions:
  1. The Committee Way (Distributed-consensus):
    - Like having 5 people on a committee
    - Need at least 3 to agree to make any decision
    - Good because decisions are well-discussed
    - Bad because you need lots of people available
  2. The Hall Monitor Way (Lock Service/Chubby):
    - One person gets the "hall pass" to make decisions
    - Only need one person to be active
    - The hall monitor is backed by a reliable team (like Chubby's 5 servers)
    - Simpler because one person can move things forward
  3. The Voting Service Way (Consensus Service):
    - Like hiring a professional voting company to manage decisions
    - Also lets one person move forward with decisions
    - But if used for anything besides taking turns (locks), it doesn't solve other problems
  Real-world example:
  - Committee Way: Like needing 3 out of 5 family members to agree on dinner
  - Hall Monitor Way: One person gets to pick, backed by parents who make sure it's fair
  - Voting Service: Asking a neighbor to manage your family decisions
  The Hall Monitor Way (lock service) is often the best choice because:
  1. It's simple - one person can make progress
  2. It's still safe - backed by a reliable system
  3. It solves more problems than just decision-making
The above discussed reasons led to 2 key design decisions
- Chose a lock service as opposed to a library or service for consensus
- Chose to serve small-files to allow primary (master) to advertise and make their parameters known instead of building and maintaining a second service

Some other key design decisions taken were

A service advertising its primary via a Chubby file may have thousands of clients. Therefore, we must allow thousands of clients to observe this file, preferably without needing many servers.
Clients and replicas of a replicated service may wish to know when the service’s primary changes.
- This suggests that an event notification mechanism would be useful to avoid polling
Even if clients need not poll files periodically, many will (this is a consequence of supporting many developers).
- Thus, caching of files is desirable. Google devs prefer consistent caching.
To provide security mechanisms, including access control.

A rather surprising choice is that the lock use is not expected to be fine grained on the contrary a coarse grained use is expected

🔑Fine-grained locks are a way to control access to small, specific pieces of data or resources in a system. Instead of restricting access to a large chunk of resources all at once, fine-grained locks allow for more precise control, letting multiple users or processes access different parts of the system simultaneously. This approach increases efficiency by minimizing unnecessary waiting, though it requires more complex management.

Difference in fine-grained and coarse grained locks

Coarse-grained locks (big, less frequent locks):

Don't stress the lock server much
If the lock server is briefly unavailable, it's not a big deal
Need to be reliable and not disappear if the server hiccups
Work well with fewer servers

Fine-grained locks (small, frequent locks):

Put more stress on the lock server
If the server is unavailable even briefly, many users get stuck
Don't need to survive server problems
Need more servers to handle all the requests
It's okay if these locks disappear sometimes since they're short-lived anyway

The key difference: Coarse-grained locks prioritize reliability, while fine-grained locks prioritize speed and handling many requests.

Chubby also lets clients implement their own fine grained locks, in short the scene is :

Chubby's Main Job
- Only handles big, less frequent locks
- Like controlling access to entire systems
Lets you add your own DIY Smaller Locks
- Users can create their own system for smaller, frequent locks
- They use Chubby's big locks to manage their small-lock system
Easy to Set Up
- Small-lock servers are simple
- They just need a counter that goes up
- Don't need to save much information
Benefits
- Users manage their own servers for small locks
- Don't have to figure out the hard stuff about making computers agree
- Can customize locks for their specific needs

System structure

Chubby has 2 main components that communicate via RPC. The components are

A server
A library that client applications link against

One chubby cell has 5 servers called replicas, these replicas reduce the likelihood of correlated failures.
The replicas use a distributed consensus protocol to elect a master
- master must obtain votes from a majority of the replicas, plus promises that those replicas will not elect a different master for an interval of a few seconds known as the master lease
- The master lease is continued if the same replica wins again
Replicas maintain a simple DB but only the master initiates reads and writes to this DB, all other replicas just copy data from the master
Clients find the master by sending master location requests to the replicas listed in the DNS.
- Non-master replicas respond to such requests by returning the identity of the master.
The client then directs all requests to the master either until it ceases to respond, or until it indicates that it is no longer the master
Write requests are propagated via the consensus protocol to all replicas; such requests are acknowledged when the write has reached a majority of the replicas in the cell
Read requests are satisfied by the master alone
- this is safe provided the master lease has not expired, as no other master can possibly exist.
- if a master fails, the other replicas run the election protocol when their master leases expire;
- a new master will typically be elected in a few seconds
If a replica fails and does not recover for a few hours, a new system is selected from the available pool, starts the lock server binary on it and updates the DNS with the new IP address
The current master server periodically checks the DNS and eventually notices changes, such as new replicas.
The master updates the list of members in the cell’s database, which is kept consistent across all members using the replication protocol.
The new replica gets a recent copy of the database from backups stored on file servers and updates from active replicas.
Once the new replica processes a request that the current master is waiting to commit, it becomes eligible to vote in the elections for a new master.

Files, directories and handles

Chubby has a similar but simpler file system interface than UNIX
A typical chubby file name is /ls/foo/bar/abc
- The ls prefix is common to all Chubby names and stands for lock-service
- the second component foo is the name of the chubby cell, it is resolved to one or more chubby servers via DNS lookup
  - A special cell name local means that clients local chubby cell should be used
- The remainder /bar/abc in interpreted within the replica
Because the naming structure resembled a filesystem it was easier to be made available to apps using a simple API
The design is different from UNIX to ease distribution
- To allow the files in different directories to be served from different Chubby masters, Chubby does not expose operations that can move files from one directory to another,
- Chubby does not maintain directory modified times,
- Chubby avoids path-dependent permission semantics
- To make it easier to cache file meta-data, the system does not reveal last-access times.
The namespace has file and directories collectively called nodes
- Every such node has only one name within its cell
- There are no symbolic or hard links
🔑A symbolic link is like a shortcut or pointer to another file or directory. A hard link is a direct reference to the data of a file.
Nodes may be permanent or ephemeral (short-lived)
Any node may be deleted explicitly, but ephemeral nodes are also deleted if no client has them open,
Ephemeral files are used as temporary files, and as indicators to others that a client is alive
Any node can act as an advisory reader/writer lock
Each node has some meta-data the most important one being 3 ACLs (Access control lists) for controlling read, write and updating ACL names for the node
By default, on creation, a node takes the ACL of its parent
ACLs are located in the ACL directory which is a know part in the cell’s local name space
Because Chubby’s ACLs are simply files, they are automatically available to other services that wish to use similar access control mechanisms.
Each node has a 4 monotonically increasing 64-bit numbers in its metadata
- These numbers allow the client to detect changes easily
- The 4 numbers are :
  - an instance number → greater than the instance number of any previous node with the same name.
  - a content generation number (files only) → this increases when the file’s contents are written.
  - a lock generation number → this increases when the node’s lock transitions from free to held.
  - an ACL generation number → this increases when the node’s ACL names are written
Chubby also exposes a 64-bit file-content checksum so clients may tell whether files differ

🔑A checksum is a small, fixed-size value (usually a number) generated from a larger set of data, like a file or message. It acts as a fingerprint of the data, allowing you to detect changes or errors.

When a client wants to check if a file has changed, instead of reading the entire file, it can simply compare the stored checksum with the current checksum. If the checksums differ, the file content has changed; if they are the same, the file content is identical*

When clients open a file or directory (called a node), they get something called a "handle," which works like a file descriptor in UNIX.
- A handle lets the client interact with the file, similar to how file descriptors are used in UNIX to read, write, or manage files.
Each handle has "check digits," which act like a security feature.
- These digits make it impossible for clients to create fake handles or guess valid ones.
- This means full access checks only need to happen when the handle is created, not every time the client reads or writes data (unlike UNIX, where permissions are checked when the file is opened).
The handle also includes a sequence number.
- This helps the master server know if the handle was created by it or a previous master (in case the master has been restarted).
When the handle is created, it stores mode information (like read-only or write permissions).
- If the master restarts and an old handle is presented, the mode information helps the master remember how the handle should behave.

Locks and sequencers

Each Chubby file and directory can act as a reader-writer lock
- One client handle may hold the lock in exclusive (writer) mode
- Multiple clients hold the lock in shared (reader) mode
Locks are advisory, they conflict only with other attempts to acquire the same lock
- Holding a lock is NOT necessary to access the file and it does NOT prevent other clients from doing so
Google chose to reject mandatory locks (client needs to have lock to access file) because :
- Chubby locks often protect resources managed by other services, not just the file linked to the lock. Enforcing strict (mandatory) locking would require big changes to those services.
- The team did not want to require users to shut down applications when they needed to access locked files for debugging or administrative purposes. In complex systems, it is not practical to shut down or reboot to bypass locks, as is commonly done on personal computers.
- The developers perform error checking using standard methods like "assertions" (e.g., "lock X is held"). They see little benefit from mandatory lock checks, as faulty or malicious processes can still corrupt data even without holding locks.
- Since data can be corrupted in other ways, the extra protection provided by mandatory locking is considered to add little value
In Chubby acquiring a lock in either read or write mode needs a write permission so that an unprivileged reader cannot prevent writer from making progress
Chubby provides a way by which sequence numbers can be introduced into only those interactions that make use of locks.
At any time, a lock holder may request a sequencer, an opaque byte-string that describes the state of the lock immediately after acquisition.
- It contains the name of the lock, the mode in which it was acquired (exclusive or shared), and the lock generation number.
Clients gives this sequencer to servers IF it expects the operation to be protected by the lock
Recipient server tests if the sequencer is still valid and has the correct mode, if it doesn’t the request is rejected
- The validity of the sequencer can be checked with the help of the Chubby server’s cache or with the help of the most recent sequencer that the server has observed
while sequencers (a system for keeping requests in order) are effective and simple to use, important protocols (rules that govern communication between systems) tend to evolve slowly, which can lead to delays or issues with request ordering.
Since not all systems or servers support sequencers, Chubby provides an alternative mechanism that, while not perfect, is easier to use and helps reduce the chances of:
- Delayed requests (when requests arrive too late).
- Re-ordered requests (when requests are processed out of order).
If client gives away the lock in a normal way then lock is made available immediately for other clients, but if the client looses the lock due to failure then lock waits for a period called lock-delay
- This limit prevents a faulty client from making a lock unavailable for a long time
- This protects unmodified servers and clients from everyday problems caused by message delays and restarts
🔑In short , the lock-delay prevents new clients from immediately grabbing a lock when the previous client fails, reducing the risk of inconsistency or conflicts from delayed messages or sudden restarts of the original client. This protects servers that aren’t designed to handle such timing issues.

Events

The clients may subscribe to various number of events when creating the handle.
The events are delivered to the client async via an up-call from Chubby library
Events include :
- modified content inside files
- child node changed (added, removed or modified)
- master failed over
- handle failed or has become invalid
- lock acquired
- conflicting lock req from another client
Events are delivered AFTER the action has occurred

API

Handles are created with Open() and destroyed with Close()
Open() opens a named file or directory to produce a handle.
- Only this call takes a node name, all other operations use handles
The client indicates various options :
- how the handle will be used (reading; writing and locking; changing the ACL). The handle is created only if the client has the appropriate permissions.
- events that should be delivered .
- the lock-delay.
- whether a new file or directory should (or must) be created. If a file is created, the caller may supply initial contents and initial ACL names. The return value indicates whether the file was in fact created.
Close() closes an open handle and further use of this handle is not allowed
- This call NEVER fails
- A similar call Poison() causes outstanding and subsequent operations on the handle to fail without closing the handle
  - This allows a client to cancel Chubby calls made by other threads without fear of deallocating the memory being accessed by them.

Some important APIs are covered below :

Reading Files

GetContentsAndStat(): The Swiss Army knife of reads
- Grabs both file content and metadata
- Always reads the ENTIRE file
- No partial reads allowed (Chubby's way of saying "keep your files small!")
GetStat(): The lightweight alternative
- Just fetches metadata
- Perfect when you don't need the actual content
ReadDir(): The directory explorer
- Lists all children in a directory
- Includes metadata for each child

Writing Files

SetContents(): The all-or-nothing writer
- Writes complete files atomically
- Optional bonus: Compare-and-swap with generation numbers

If generation = current:
    Update file
Else:
    Sorry, try again!

File Management

SetACL(): The bouncer
- Updates Access Control Lists
- Decides who gets in and who doesn't
Delete(): The cautious cleaner
- Deletes nodes
- But only if they have no children
- Chubby's version of "Are you sure you want to delete this?"

Lock Operations:

Acquire(): The classic lock grabber
TryAcquire(): The "no-wait" alternative
Release(): The lock liberator

Sequencer Operations:

GetSequencer(): Gets your "proof of lock" token
SetSequencer(): Binds a sequencer to your handle
CheckSequencer(): Validates if a sequencer is still legit

Handle Behavior:

Handles are tied to file instances, not names
If a file is deleted and recreated, old handles won't work
Every call can face access control checks
Open() calls ALWAYS face the security check
All API calls include an operation parameter for:
- Async callbacks
- Waiting for completion
- Getting detailed error info
File handles become invalid if:
- The file is deleted
- Even if a new file is created with the same name

Here's how servers use Chubby to elect a leader:

All potential primaries:
- Open the lock file
- Try to acquire the lock
The winner:
- Becomes the primary
- Writes its identity to the lock file
- Gets a sequencer as proof
Other servers:
- Read the file to find the primary
- Can verify the primary's sequencer
Safety feature:
- Lock-delay mechanism for services that can't check sequencers

Sessions and KeepAlives

Think of a Chubby session as a "relationship status" between a client and a Chubby cell. It's like:

A virtual handshake 🤝
Has an expiration date
Needs regular check-ins to stay alive

Working of a session :

Starting Up
- Client says "Hey, let's connect!"
- Master says "Ok macha, here's your session!"
Staying Alive
- Client sends "KeepAlive" messages
- Like saying "Alive?" every now and then
- Master responds with "Got it, enjoy for another 12 seconds!"
- If the master is busy, it might say "You're good for longer" to reduce check-ins

Failure scenarios

Jeopardy Mode
- Client thinks "Uh oh, haven't heard from the master in a while..."
- Empties its cache (better safe than sorry!)
- Starts a 45-second countdown (called the "grace period")
What Happens Next?
- If they reconnect: Everything goes back to normal
- If time runs out: Session is considered dead ⚰️

🔑Think of it like a video call:

If the connection gets bad, you might say "Can you hear me?" (jeopardy mode)
You wait a bit to see if it improves (grace period)
If it doesn't get better, you hang up and redial (session expires)

Fail-overs

When a master fails or looses its status as master it looses all of its in-memory state
until a new master is elected the session lease timer is stopped
- this is legal because it is equivalent to extending the client’s lease.
If new master election happens before local lease expiry, clients can contact the new master and things continue
Else all the clients flush their caches and wait for grace period to find new master
- the grace period allows sessions to be maintained across fail-overs that exceed the normal lease timeout.

Example Scenario

OLD MASTER has lease M1
OLD MASTER exchanges KeepAlives with CLIENT
OLD MASTER extends to lease M2
OLD MASTER sends lease M2 info in KeepAlive reply #2
CLIENT updates its view to lease C2 (shorter than master's lease)
OLD MASTER suddenly dies
OLD MASTER had already committed to lease M2
CLIENT sends a KeepAlive
No response received by CLIENT
Time passes with no master available
CLIENT's lease C2 expires
CLIENT enters "jeopardy" mode:
- Empties its cache
- Initiates grace period timer
- Continuously attempts to locate new master
NEW MASTER is elected
NEW MASTER initializes with lease M3
- Takes cautious approach about previous leases
CLIENT attempts KeepAlive #4
This attempt is rejected due to incorrect epoch number
CLIENT makes another attempt with KeepAlive #6
This second attempt succeeds
NEW MASTER sends KeepAlive reply #7
CLIENT extends to new lease C3
If grace period bridges gap between C2 and C3:
- CLIENT only experiences a delay
- No session failure is detected
CLIENT exits "jeopardy" mode
Normal operations resume

Conclusions

Grace period is crucial for maintaining sessions during fail-over
System handles master changes without losing client sessions
Multiple attempts may be needed to establish connection with new master
Epoch numbers prevent confusion between old and new masters

The newly elected master proceeds to the following immediately after election

It assigns a fresh "epoch number" to clients. If a client uses an old epoch number, its requests are rejected. This prevents the new master from responding to outdated requests sent to a previous master.
The new master can now respond to requests asking for its location but doesn't yet handle session-related operations.
The master rebuilds in-memory data for sessions and locks using the information stored in the database. It extends session leases to match what the previous master may have used.
Clients can now send "KeepAlives" (signals that they’re still active), but they can't perform other session-related actions yet.
The master sends a fail-over event to each session, prompting clients to clear their caches (since they may have missed updates) and warn applications that some events might have been lost.
The master waits for clients to acknowledge the fail-over event or lets their sessions expire if they don't respond.
Once acknowledgements are received, the master allows all operations to continue as normal.
If a client uses a handle created before the fail-over, the master rebuilds the handle’s information and processes the request. Once a handle is closed, it cannot be recreated in the current epoch to avoid accidental duplication from delayed or repeated messages.
After a short period, the master deletes temporary (ephemeral) files that no longer have active handles. However, if a client loses its session during the fail-over, the file may not disappear right away.

Database implementation

The first version of Chubby used Berkeley DB as its database, which stores data using a system called B-trees.
A special function was used to sort data based on the structure of file paths, keeping related files close together.
Chubby didn’t need path-based permissions, so only one lookup was needed to access a file in the database.
Berkeley DB uses a distributed consensus protocol to share its data logs across multiple servers, which fit well with Chubby’s system once master leases were added.
While the B-tree part of Berkeley DB was reliable, the newer replication system wasn’t as well-tested and had fewer users.
Even though the Berkeley DB team fixed issues, Chubby’s developers thought using the replication feature was risky.
To reduce this risk, the team created a simpler database using write-ahead logging and snapshotting, similar to another design.
They continued to share the database logs across servers using a distributed consensus protocol.
Chubby didn’t use many of Berkeley DB’s features, so creating a simpler system made everything easier to manage.
They needed atomic operations (simple all-or-nothing tasks), but didn’t require more complex transactions.

Mirroring

Chubby allows files to be mirrored (copied) from one cell to another.
Mirroring is fast because the files are small, and Chubby’s event system instantly informs when a file is added, deleted, or changed.
If the network is working properly, changes are mirrored globally in less than a second.
If a mirror can't be reached due to network issues, it stays unchanged until the connection is restored. The updated files are identified by comparing their checksums.
Mirroring is mainly used to copy configuration files to different computing clusters worldwide.
A special cell called global contains a folder that is mirrored to the corresponding folder in every other Chubby cell.
The global cell is special because its replicas are spread across the world, ensuring it's almost always accessible.
Files mirrored from the global cell include:
- Chubby's access control lists (ACLs)
- Files for monitoring services that track the presence of Chubby cells and other systems
- Links to help clients find large datasets like Bigtable cells
- Many configuration files for other systems.

That’s it for most of the technical concepts and understanding for Chubby let us now try and understand how Google tried to scale it

Mechanism for scaling

Chubby clients are individual processes, so the system often handles more clients than expected,
- sometimes up to 90,000 clients 🤯 communicating with a single Chubby master.
Since each Chubby cell only has one master and the master machine is similar to client machines, the master can get overwhelmed if too many clients communicate with it at once.
The best way to scale Chubby is by reducing communication with the master, as small improvements in request processing won’t make much of a difference.
Several strategies are used to manage this load:
- Chubby can have multiple cells, and clients usually connect to a nearby one (found through DNS) to avoid relying on remote machines. A typical deployment uses one cell for a data center with several thousand machines.
- The master can increase lease times (from the default 12 seconds to up to 60 seconds) under heavy load to reduce the number of KeepAlive requests it needs to process.
- Chubby clients cache file data, metadata, file absence, and open handles to reduce the number of requests they make to the server.
- Protocol-conversion servers are used to convert the Chubby protocol into simpler ones like DNS, reducing complexity.
Two additional methods, proxies and partitioning, are designed to further improve scaling, although they are not yet in production.
Chubby doesn't need to scale beyond a factor of five for now because:
- There's a limit to how many machines one would put in a data center or rely on a single service.
- Hardware improvements will increase the capacity of both clients and servers, so they will grow together.

Proxies

Chubby’s protocol can be proxied by trusted processes that forward requests from clients to a Chubby cell.
Proxies can reduce server load by handling KeepAlive and read requests, but they can't reduce write traffic since writes bypass the proxy’s cache.
Even though write traffic isn't reduced, it makes up less than 1% of Chubby's workload, so proxies can still greatly increase the number of clients supported.
A proxy that manages N clients reduces KeepAlive traffic by a factor of N, which can be 10,000 or more.
A proxy cache can reduce read traffic by up to a factor of 10, but since reads are less than 10% of the load, the savings from reduced KeepAlive traffic are more significant.
Proxies add extra RPCs to write operations and first-time reads.
While proxies may cause the system to become unavailable more often (since each proxied client depends on both the proxy and the Chubby master), the overall benefits in scaling outweigh the risks.

Partitioning

The system can divide a Chubby cell into N partitions, where each partition has its own set of replicas and a master.
The partition for a directory or node is determined by a hash function.
- Every node D/C in directory D would be stored on the partition P(D/C) = hash(D) mod N
- Metadata maybe stored in a different partition
Some operations, like ACL checks and ensuring directories are empty before deletion, may require cross-partition communication.
- However, these cases are limited, and the performance impact is expected to be modest.
Partitioning reduces read and write traffic on each partition by a factor of N, but it doesn't necessarily lower KeepAlive traffic.
Proxies and partitioning together are considered the solution for scaling Chubby to handle more clients.

Resources

While I didn’t get the chance to spend a lot of time with this paper, here are some resources that helped me understand some of the concepts discussed in this paper

A lecture on the paper :

The Whitepaper :

https://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf

Paxos algorithm :

Project Whitepaper

Discussion about this post