Friday, April 27, 2012

SchoonerSQL Automates Failure Handling & Failover


There are different types of failures in the database environment ranging from the loss of the network, to the loss of an instance, all the way to the loss of a node (the server hardware). A robust database is one that can detect such failures and automate the failover and recovery process without any user intervention.
SchoonerSQL does exactly that: it detects failures and provides an immediate and automated failover process.
Below are the failover scenarios and how SchoonerSQL will handle them.

Instance Failure
Consider three instances in a synchronous replication group where Node 1 has the master instance, and Node 2 and Node 3 have slave instances.
The master has one write virtual IP (10.1.1.2) and one read virtual IP (10.1.1.3); slave instances have read virtual IPs (10.1.1.4, 10.1.1.5) as shown in the diagram below.

When a master instance fails due to a crash or a shutdown, the failure handler will immediately detect it and migrate all the virtual IPs from the failed instance to Node 2 & Node 3. The entire process is generally quick with an average failover time of 3 to 5 seconds.
Clients automatically connect to the surviving node and continue running the queries. The entire operation is transparent to the end-user.
SchoonerSQL also possess self-healing capability with an auto restart when a crash happens. The instance will rejoin the cluster once it catches up and is in sync (virtual IPs will be re-distributed to this instance automatically when it joins the group). 
Node Failure
The failure handler detects when the entire node crashes or shuts down. Virtual IPs will then be migrated from the failed node to the instances on the remaining live nodes in the cluster.
Replication Network Failure
Consider two instances, one acting as master with write, read virtual IPs and the other acting as slave with read virtual IP.
Consider a scenario where the replication network fails causing network partitioning. The failure handler immediately detects this and shuts down the slave. The virtual IP of the slave is then migrated over to the master as shown in the diagram.
To conclude, SchoonerSQL not only detects failures but also automates the failover process thereby providing users a seamless and a transparent experience.

No comments:

Post a Comment