Today’s performance demands on web applications are a far stretch from what they were a decade ago. The more the web grows, the more challenging it becomes to consistently deliver the level of service that users expect and demand. This constant race can often leave us chasing those magical 9’s in our application uptime percentage.
As a technology field, High Availability (HA) is one of those that are easily taken for granted. Like backups, it’s really the kind of subject that we all know we should be taking a closer look at; but… somehow, we never quite seem to find the time for it. Sadly, we soon learn outages are simply inevitable. Disks and power supplies die messy deaths, kernels panic, and software freezes. The question is: “How can we cope with these situations gracefully?”
How We Used to Do Things
While High Availability solutions are not new to MySQL, in the past, the solutions involved have typically not been ideally suited. Though they are solid technologies in their own right, they all had some major drawbacks in regards to their use for HA.
For example, let’s start by looking at one of MySQL’s features that is both loved and hated by many: replication. Giving credit where credit is due, its basic premise and ease of setup turned MySQL into the trusted building block for scale-needy web applications it is today. The basic idea? Expand your read capacity to infinity by adding as many slaves to your master-database as you need! On top of that, all machines should hold a complete copy of your entire dataset so, in case of a failure, any of them should be able to pick up where the master dropped.
Sounds perfect, right? Except when it isn’t; though the basics seem deceptively simple, in practice the system has some flaws that make it less than ideal for a solid HA setup. To ensure the exact same order of statements, writes performed on the master are executed on the slaves in a serialized manner. These different write methods introduce a problem – until all changes finish applying on the slave, the systems are in an inconsistent state. As such, standard MySQL replication is referred to as being asynchronous. These inconsistencies can sometimes unexpectedly grow quite large as the slave “lags behind” in applying the necessary changes. If a failover is necessary at that point, data could be lost. On top of that, there is a variety of ways a slave can fall out of sync with its master, without anybody even realizing, and at these points replication can simply “break” and refuse to pick up again until the changes are manually resolved. Though a variety of great tools have been created to keep a handle on this, asynchronous replication is hard to rely on if consistency is what we’re really seeking.
THIS IS A PREVIEW. DOWNLOAD ISSUE 9 TO READ THE FULL ARTICLE