Leader-Based Data Replication

WHAT?

In distributed databases, each machine storing a copy of the database is referred to as a replica. Ensuring that all replicas contain consistent data is a central challenge. Leader-based replication, also known as active/passive or master-slave replication, offers a common solution to this challenge.

Here's how leader-based replication works:

1. Leader Selection

Among the replicas, one is designated as the leader (or master or primary). All write requests from clients are directed to this leader.

2. Write Handling

When a client wants to write data to the database, it sends the request to the leader, which first stores the new data locally.

3. Replication to Followers

Simultaneously, the leader forwards the data changes to all other replicas, known as followers. This is typically done through a replication log or change stream. Each follower applies the changes in the same order they were processed on the leader, ensuring that all replicas stay synchronized.

4. Read Access

Clients can read data from either the leader or any of the followers. However, write operations are only accepted by the leader; followers are read-only from the client's perspective.

Leader-based replication is widely employed in various database systems, including PostgreSQL, MySQL, Oracle Data Guard, and SQL Server's AlwaysOn Availability Groups. It is also used in non-relational databases like MongoDB, RethinkDB, and Espresso, as well as in distributed message brokers such as Kafka and RabbitMQ, along with network filesystems and replicated block devices like DRBD.

Synchronous vs. Asynchronous Replication

A critical aspect of replication in distributed systems is whether the replication process occurs synchronously or asynchronously.

Synchronous Replication

In synchronous replication, the leader waits for acknowledgment from the followers before confirming a successful write to the client. This guarantees that followers have an up-to-date copy of the data consistent with the leader. However, if a synchronous follower doesn't respond due to issues like crashes or network faults, the leader must block all writes until the follower becomes available again.

Asynchronous Replication

Asynchronous replication involves the leader sending data changes to followers without waiting for acknowledgments. While this speeds up write processing and enables the leader to continue processing writes even if followers fall behind, it introduces a trade-off. If the leader fails and is unrecoverable, any writes not yet replicated to followers may be lost, potentially affecting data durability. Fully asynchronous replication is common when there are many followers or they are geographically distributed. This approach allows the leader to continue processing writes even if all followers lag behind, though it may weaken data durability.

Semi-synchronous Replication

In practice, databases often use a mix of synchronous and asynchronous replication, ensuring that at least one synchronous follower maintains an up-to-date copy of the data, while others operate asynchronously. This configuration, sometimes called semi-synchronous replication, strikes a balance between data consistency and system availability.

Setting Up New Followers

Periodically, you may need to set up new followers in your distributed database system. This could be to increase the number of replicas or to replace failed nodes. However, ensuring that the new follower has an accurate copy of the leader's data is not as simple as copying data files from one node to another. The dynamic nature of databases, with constant client write operations, presents challenges.

Copying data files alone would result in different parts of the database being at different states in time, rendering the replica inconsistent. Locking the database to create a consistent snapshot is an option, but this conflicts with the goal of high availability. Instead, the process typically involves:

Taking a Consistent Snapshot

The leader's database is snapped at a specific point in time, ideally without locking the entire database. Most databases offer this feature, which is also used for backups.

Copying the Snapshot

The snapshot is then copied to the new follower node.

Applying Data Changes

The follower connects to the leader and requests all data changes that occurred after the snapshot was taken. This requires the snapshot to be associated with a precise position in the leader's replication log.

Catch-Up and Continuation

Once the follower has processed the backlog of data changes from the snapshot, it is considered caught up and can continue receiving real-time data changes from the leader.

The specific steps for setting up a follower can vary significantly depending on the database system, with some systems automating the process while others require manual administrator intervention.

Handling Node Outages

Node failures in a distributed database system are inevitable, whether due to faults or planned maintenance. The key objective is to maintain high availability and minimize the impact of node outages on the system.

Follower Failure: Catch-Up Recovery

Each follower maintains a log of data changes received from the leader.
If a follower crashes or experiences network interruptions, it can recover by connecting to the leader and requesting the data changes it missed during the downtime.
Once the follower applies these changes, it's considered caught up and can resume receiving real-time data changes.

Leader Failure: Failover

Handling a leader's failure is more complex.
One of the followers must be promoted as the new leader, and clients need to be reconfigured to send writes to the new leader.
Automatic failover processes often involve detecting leader failures (typically through timeouts), choosing a new leader (through elections or controller nodes), and reconfiguring the system.
Failover is not without challenges, including potential data loss, coordination issues, and the possibility of split-brain scenarios.

These challenges are fundamental in distributed systems and require careful consideration in database design and operations. While some systems support automatic failover, manual failover is preferred by some operations teams due to its predictability and control.