Achieving High Availability using Polyhedra
How Polyhedra’s High-Availability mechanisms add resilience to your system and allow upgrades with zero downtime. The Polyhedra IMDB and Polyhedra Flash DBMS database systems both come with an inbuilt mechanism for setting up a hot standby configuration, with control over fail-over. This article describes how this operates, and discusses how this can be used both to provide high availability and also a means to perform field upgrades with zero downtime. For ease of printing and offline viewing, a copy of this article is appended below in PDF format. (Note: the HA mechanism is not provided in all editions of Polyhedra: see the product comparison chart to see which features are available in the various editions.) High Availability?High Availability and Fault Tolerance are phrases that are often used rather loosely, especially when it comes to software. In practice, it is not software that is fault tolerant or highly-available, it is complete systems - but there are certain features or functions that the software components need to supply so as to make it possible to architect an HA solution. Embedded systems requiring High Availability - say, ‘five 9s’ (99.999%) or better for continuous running - are typically configured with redundant cards, power supplies, etc: rather than build systems to be fault-free, it is better to design them to be able to survive the failure of individual subcomponents. This can also give flexibility, as it can allow components to be swapped out and upgraded without upsetting the running system; while live upgradeability is not needed in some HA systems (for example, systems on board aircraft, that will be turned of when the flight is over or the aircraft is serviced), it is often crucial in more complex systems that need to operate continuously for years on end. For example, in telecoms infrastructure equipment a field upgrade should not disrupt calls in progress; in industrial process control systems, stopping a production line for an upgrade might cost tens or hundreds of thousands of dollars in terms of lost production and might have safety issues.Looking at the data handling needs of HA embedded systems, they often have the following characteristics:
For transactional data stores, reference is often made to the ‘ACID’ conditions - a mnemonic for Atomicity, Consistency, Isolation and Durability. Atomicity means each transaction either fails (leaving the data in the pre-transactional state) or fully commits - no halfway houses. Consistency says that each transaction leaves the data store in a clean state, with all integrity conditions preserved (and if a transaction were to break this, it would be aborted, leaving the data in its pre-transactional state). Isolation says that transactions are independent, giving the appearance of complete serialisation - where each transaction is completed before the next one is started (though judicious use of locking by the system can avoid the need for the implementation to be quite so restrictive). Durability says that if the data store says the transaction is complete, then the data is already in a ‘safe’ state and will survive a subsequent system failure (within the capabilities of the underlying hardware and environment). In practice, the Durability requirement can significantly slow down a system (flushing files to disk can take some time, for example), so most data stores allow the level of durability to be tuned, balancing it against overall system responsiveness when operating normally. All true relational database systems are transactional, and offer a degree of ACID compliancy; most are SQL-based, and support on-the-fly schema changes, but may be too slow for use in embedded systems or not have suitable HA characteristics. Polyhedra and HAThe Polyhedra family of relational database systems is designed for use in embedded systems, where state and configuration information has to be kept readily accessible, rapidly alterable… and safe. To ensure fast access, the Polyhedra32 IMDB and Polyhedra64 IMDB products keep the data in RAM, but backed up by a variety of configurable mechanisms such as snapshots, transaction journals, and even a hot standby where appropriate. Polyhedra Flash DBMS is somewhat different, as it is designed for systems where ultimate performance is less important than reducing the RAM footprint: it shares much of the same code-base as the other products, but uses a file to store the data (supplemented by a configurable RAM cache), with a technique known as shadow paging rather than journal logging to ensure resilience. Like the Polyhedra IMDB products, though, Polyhedra Flash DBMS also allows snapshots to be generated for offline backup, and supports the use of a hot standby server for high availability. Looking in particular at the fault tolerance mechanisms in both Polyhedra IMDB and Polyhedra Flash DBMS, they include the following features:
Field upgradesIn continuously-running systems, there is often the need to change the software or data structures on the fly, with no downtime. A simple case in point would be the addition of a new type of line card to a telecoms rack in a basestation, say; the new card may be very similar to an existing card, but might need additional configuration information or want to report additional status information. The simplest way of handling this would be for the software on the new line card to check the central database on startup, and if the columns it wants are not present in the tables it uses, it can just create them, using the ‘add’ form of the ‘alter table command’, for examplealter table linecard add ( config2 integer default 49, status2 integer) (Alternatively, it could just add the columns one, without checking first if they exist; no harm will be done if they already exist, and the type can be checked when performing queries.) Once these changes have been made, the new line card can then start it application as normal. Provided other clients have not used the ‘select *’ form of query when inspecting the tables, their active queries and prepared statements will not be invalidated by this change, and they will see no interruption in the database service; all that would need to be done would be to update the software on management computers, to allow them to set the new configuration columns and monitor the new status columns. Note that the heterogeneity built in to Polyhedra, plus a very high level of inter-version interoperability, means that the new line card does not have to be running on the same operating system or even processor type as the cards running the database server, and does not have to be using the same release version of the Polyhedra software. More complex changes come in two categories: changes to the application software on the control cards or line cards, or changes to the underlying software used by the application software. Let us consider these two cases separately, starting with the easier case. Upgrading the Polyhedra database softwareFrom time to time, it may be necessary to upgrade the Polyhedra code on a system to a later version: the two reasons for doing this are because a new version of the application software wants to take advantage of a Polyhedra feature that was not present in the currently-used version, or because the new version incorporates a fix for a bug that was adversely affecting the application. In both cases, such upgrades are simplified by Polyhedra’s adoption of a set of compatibility principles regarding version interoperability. These are covered in more detail in a separate document, but in summary:
If it is necessary to upgrade the client software on the line cards, this can be done either before or after upgrading the server code, depending on which is more convenient. In many cases, though, the client software will only need upgrading in the rare event of the application being affected by a bug in the Polyhedra client libraries; there is no need for all the components of the system to be using the same version of Polyhedra. In fact, the client-server protocol is common to all members of the Polyhedra DBMS family, so they share the library code: thus, there would be no need to upgrade the code on the line cards if you were upgrading the database on the control cards from Polyhedra32 to Polyhedra64, say. Upgrading the application codeIf the new application code needs a new version of Polyhedra, it may be simplest to do this first, as a separate stage, using the procedure outlined above. Once the system is up and running the right version of Polyhedra both on the master and the standby, the database schema can be updated; the changes will automatically be applied on both the master and the standby. Polyhedra allows schema changes to be grouped together into a single transaction, by use of the ‘alter schema’ command; the changes are checked for correctness before execution, and if there any problems (such as an incompatibility of column names, say, or lack of room for temporary structure used when transforming the database) the database will revert to its earlier state SummaryBoth the Polyhedra IMDB and the Polyhedra Flash DBMS servers provide a read-only replication mechanism that can be deployed in a variety of configurations, ranging from the very simple set-up to a complex fault-tolerant fan-out configuration that can survive single-point failures if used on a resilient network. This flexibility does of course mean that some thought might need to be given to find the optimal configuration, but in general where there is a heavy query load (or where some client applications will be launching particularly complex queries) it is straightforward to deploy one or more replicas to improve the overall responsiveness of a Polyhedra-based system. |