Polyhedra on multicore systems

"How applications using Enea’s Polyhedra Database products can benefit from the extra processing power available on multicore systems"

This technical white paper gives some background information about the need for multicore in embedded systems, It then describes the client-server and multithreaded nature of Polyhedra, and discusses the advantages this can give when used on a multicore system. For your convenience, it is also available in PDF form - see the attachment and the bottom of this page.

Some Background Information

The drive to multicore

Moore’s law and its limitations

Throughout the history of computing, things have got simultaneously bigger and smaller. Bigger, in the sense of more powerful, larger capacity and faster; smaller, in terms of physical dimensions. There have also been a number of step-changes in the technology – for example, in the move from valves to transistors and then to integrated circuits. Even as far back as 1965 a pattern was noticed for the growth in power, with Moore’s Law predicting an annual twofold growth in the complexity of integrated circuits. (Once he had a few more points on the graph his figure, measured by the number of transistors on a single chip, he later revised this to a doubling in ‘only’ 2 years.) This growth in complexity has meant increasingly powerful computer processors could be built, running at higher rates as the distance between logic gates reduces, and has enabled the production of really powerful embedded systems. Thus, digital mobile phones that in the early 90s only handled voice and SMS are now capable of video conferencing, web browsing, downloading and playing videos and music, always-on email and messaging, as well as less computationally-heavy tasks such as managing calendars and contacts.

Moore’s law still seems to be holding true, but does not apply to all fields of computing. Looking at hard disks, for example, capacity continue to grow – but aspects such as access time show only slow improvement, as there are practical limits on how much you can continue to increase the rate of rotation. And while processor speed has increased tremendously over the years, this has also led to an increase in power consumption by the CPU. This in turn gives rise to problems of heat dissipation, and the need for active cooling means the system uses yet more power (and often generates more noise). Intel’s Pentium 4, for example, used extreme pipelining to achieve clock speeds as high as 3.8GHz – but needed 115 watts when busy!

Internal redesign of the CPU – or, more drastically, completely turning from complex processor architectures to more efficient architectures such as RISC – can stave things off for a while, but does not solve the underlying problem. Also, there is already a severe mismatch between the processor speed and the speed at which memory can be accessed, only partly obviated by the use of large caches on the same die – or at least in the same chip package – as the processor, which often means that processor simply waiting for the RAM to catch up with it. Clearly, as embedded devices get more complex, they need ever more computing power – but equally clearly, ever-faster processors speeds are not the way to go.
Increase CPU cycles/sec without increasing CPU speed?

Fortunately, one approach is fairly obvious: divide and conquer. Instead of trying to shoehorn all the work onto a single processor, get more processors involved in the process. The main reason why this works is quite simple: power consumption dramatically increases with clock speed. For example, the 3.4GHZ Pentium 4 HT650 (part of the same ‘Prescott 2M’ processor family as the 3.8GHz HT670) had a TDP of 84 watts, so the HT670 gained a 12% improvement in clock speed at a cost of a 37% increase in power consumption. Thus, replacing a single high-speed processor by two or more processors running at slower speed can offer at least the same number of CPU cycles, and use less power. If the processors are actually multiple CPU cores on the same chip package, then not only is the external support circuitry (and thus the job of board designers) simplified, but it is also possible to share the cache – or, if each core has its own cache, cache coherency becomes easier to address.

Chip manufacturers are therefore abandoning the drive for faster processors, and instead are addressing the need for more horsepower by providing multicore solutions. However, while hardware engineers are happy with this, things are more difficult for software developers as they have to work out how to make use of the extra cores! At the operating system level, a variety of architectures (including SMP, BMP and AMP) are available for distributing the load amongst the cores, but all require the overall application to be divided into a number of intercommunicating tasks that provide and consume services, and then for the larger of these tasks to be subdivided into cooperating subtasks or threads that have the potential for being run on separate cores.

Multicore for embedded systems

The drive to multicore is particularly strong in embedded systems, as the pressure for continuous innovation (requiring ever more sophisticated software) comes up against the overriding need to keep down both the bill of materials and the power consumption. The typical architecture for multicore embedded systems has used AMP at the hardware level (for example, a DSP core could handle low-level communications, a GPU the screen and one or more general-purpose cores handle everything else) or just at the software level (so all the cores are the same, but each core has a fixed job, controls its own resources, and could be running a different operating system or operating ‘bare metal’). As the number of cores increase, though, it becomes easier to manage if the same operating system is operating throughout or on a ‘pool’ of cores, in a bounded or fully-symmetric software architecture.

The Polyhedra family of DBMS products

Polyhedra is a family of relational database management systems designed from the start for use in embedded applications. It has two main flavours: Polyhedra FlashLite and Polyhedra IMDB, with the latter available both for 32-bit platforms (Polyhedra32, the ‘original’ version) and 64-bit platforms (Polyhedra64). All the products support fault-tolerant configurations to ensure data availability. The main differences between the products is that Polyhedra IMDB is in-memory database system built for speed and has a journalling mechanism to ensure data persistence, whereas Polyhedra FlashLite trades performance against RAM footprint by using a tuneable in-memory cache in front of a file-based database (assumed to be on a flash-based file system). All Polyhedra products support an extended SQL-92 subset and the ODBC and JDBC client libraries conform to the international standards. It has object-oriented features, such as table inheritance (which simplifies and speeds up many queries and updates) and behaviour (to perform knock-on actions and additional integrity checks).

Polyhedra was designed from the start for client-server use. When all the software is running on the same machine, the client-server architecture has two main benefits:
  • The software naturally allows for multiple clients, which could be running the same application software (on behalf of different users, say) or performing different parts of the overall application.
  • Where supported by the operating system and hardware platform, the client application(s) and the server will run in separate address spaces, protected from each other - so a failure of an application will not damage the integrity of the database, nor stop other parts of the application running.
The client-server architecture naturally extends to allowing client and server to be on separate machines, to allow the application to be distributed. On an SMP multicore or multiprocessor machine, the client-server architecture allows clients to run on separate cores from each other (and from the server), and thus allow the overall application to make more use of the available CPU cycles.

Polyhedra servers can operate stand-alone, with each instance handling a separate database, or can operate in master-standby mode, where one server is acting as a hot standby of another server and has a read-only copy of the database. Polyhedra is fully transactional, and satisfies the Atomic, Consistent and Isolated properties needed for ACID compliance (see sidebar). Polyhedra FlashLite transactions are Durable, but in Polyhedra IMDB durability can be balanced against performance: critical changes to the data are preserved by streaming journal records to a log file, and client applications can choose whether the success of a transaction is to be reported immediately or when the log file has been flushed.

Polyhedra has a special feature called ‘active queries’ that allows clients to keep up to date without polling. Basically, for the lifetime of the query the server remembers enough about the query (and the previously-transmitted result set) to know when it has become out of date – so when the transaction completes it can send a ‘delta’ to the client that launched the query to bring it up to date. There is a ‘delta merging’ mechanism to avoid problems if the client is slow.

Polyhedra runs on a number of platforms, and when a client is running on a different OS or a different architecture to the server the Polyhedra libraries automatically handles issues such as differing endianisms. On most platforms the server is supplied as a single executable and uses Posix threads to avoid bottlenecks and provide a more fair service to the various clients, and this paper assumes the Polyhedra server(s) and each client application is being run on such a platform.

Parallelism in Polyhedra

This section concentrates on providing a factual description of how Polyhedra offers a degree of parallelism; it is left to later sections to summarise the benefits that result.

Polyhedra Transactions

Before discussing the use of threads and processes in Polyhedra to improve performance and responsiveness, it is first necessary to take a brief look at the transactional model supported by Polyhedra. In the interests of speed of operation and code simplicity, the model is very simple: basically, the types of transaction are limited to:
  • an SQL query (a ‘select’ statement);
  • an SQL DDL statement (create table, etc);
  • one or more SQL DML statements, separated by semicolons; or,
  • one or more SQL DML statements, separated by semicolons, and followed by a semicolon and a single SQL SELECT statement.

The first form just reads the database, whereas the others are all ‘write’ transactions, as they might alter the structure or contents of the database. The 4th form is essentially just a way of doing a series of updates (as in the 3rd form), and then immediately querying the database before anything else has a chance to alter the database; there is also a performance advantage compared to doing it as two transactions, as there is only one ‘round trip’ of messages between the client and server.

(There is also a fifth form of transactions, ‘updates via active queries’ that will be described later, but for the moment it can be ignored; it merely adds a convenient way of requesting certain types of updates and a form of optimistic concurrency control, without affecting the basics of the transactional model supported by Polyhedra. In addition, Polyhedra 8.7 introduced both optimistic and pessimistic locks, but the implementation of this new feature carefully avoided breaking the fundamental transactional model of Polyhedra.)

Each of the above transactions consists of a request from a client to the server, which puts it on a queue. When the request reaches the top of the queue it is processed, the response generated and queued for delivery to the client – after which, the server can move on to the next tem in the queue. Thus, as far as clients are concerned, everything is fully serialised, fully atomic and fully isolated: only one transaction is running at a time, and in effect it has a complete lock on the whole database.
Threading for client connections

As mentioned earlier, Polyhedra is fully client-server. This means that, even when all the software is running on the same machine, as long as the operating system imposes the appropriate protection between separate processes, an application that wants to use the database can only use the official interfaces, and cannot (deliberately or accidentally) access or modify the data by other routes. As a result, the data is less vulnerable, and the overall integrity and resilience of the system is enhanced.

Clients connect to the server using a proprietary protocol over TCP/IP or a platform-specific transport mechanism; the service name supplied as a parameter to the connection request specifies the transport to use. The Polyhedra client libraries handle all details of the protocol. By default, each incoming connection has its own thread in the server, so that assembly/disassembly of messages for one client can be done at the same time as another ‘client thread’ in the server is processing a transaction (on behalf of another client).

Threading for Transaction handling

The transaction model has been described as fully serialised, with each transaction effectively having a lock on the whole database. However, this model can be enforced more fairly and (depending on the hardware) more efficiently if internally the database server uses multiple threads so that operations on behalf of separate clients are done in parallel.

The mechanism chosen to achieve this in Polyhedra was to allow multiple ‘read’ transactions (queries) to run at once, but at most one ‘write’ transaction at a time; read transactions will set up shared locks on the tables they need to access, whereas the write transaction will accumulate exclusive locks on the tables it touches. As there is only one thread claiming exclusive locks, there is no risk of deadlocks; also, while the write transaction may have to wait for a number of read transactions to complete if they are accessing a table it needs, this is no worse than it would be if everything were fully serialised. On a multicore machine with an SMP-capable operating system, each thread could be running in a separate core – so there are improvements in overall response time if separate clients are querying the same table, or a client is updating one table while others are querying separate tables.

Polyhedra can be configured to use its own heap management algorithm, which uses a memory pooling scheme and a pre-allocation strategy designed to reduce heap fragmentation; as a side-benefit, it reduces the number of times system functions have to be called to claim more memory, which tends to speed things up. Polyhedra 7.0 introduced a mechanism whereby different threads in the database server can have their own separate memory pools; while this does mean that more space might be used overall, there is a performance benefit as there is no need to use locks to control access by separate threads to a central pool.

(New versions of Polyhedra IMDB can be run in a special mode whereby it is possible for two or more write transactions to perform in parallel, providing they are altering different tables. There are some other, pretty stringent, conditions which make this mode inappropriate for many users; for example, it is not allowed in fault-tolerant configurations of Polyhedra. However, where suitable, on a multicore machine it can give performance improvements comparable to running multiple independent servers on the same machine.)

Threading for durability and fault tolerance

Polyhedra Flash DBMS stores the master copy of the database in a file (assumed to be on flash-based media, hence the name of the product) with a an in-memory cache to speed up access; by the time a write transaction is complete, all changes will have been written back to the file so that they are ‘safe’. Polyhedra IMDB, on the other hand, keeps the master copy of the data in-store, and the primary back-up is the ‘snapshot’; this is read in when the server starts, and written out when requested (via use of our SQL ‘SAVE’ statement) and when the service is shut down.

The Polyhedra IMDB snapshot mechanism is supplemented by a transaction log, which is appended to the snapshot file and allows dynamic changes to the databases to be preserved without the need for a full snapshot. On start-up, once the snapshot at the front of the saved database file has been loaded, the transaction log is read, and then each entry (up to, but not including the first incomplete or corrupt log entry) is applied to the database in turn, to recapture as much as possible of the state prior to the crash/shutdown of the system. Once the server is up and running, after each successful transaction the log entry is written asynchronously to the end of the database file, and the file flushed. The client that initiates a transaction can opt to be told about its success when it has been fully applied to the in-memory database (and thus visible to other clients), or when it has been fully journalled (and thus ‘durable’); it does this by using a ‘safe commit’ option.

Writing the journal record is the responsibility of a dedicated ‘journal thread’ within the server. If a client has used the safe commit option when it set off a write transaction, the client thread handling the transaction will not send any acknowledgement at the end of a successful transaction – instead, once the transaction log has been written to disk the ‘journal thread’ will use a client thread from the pool to tell the client process (and thus, the application code in the client) that its transaction is complete.

Both Polyhedra IMDB and Polyhedra FlashLite support fault-tolerant configurations, using the master-hot standby model. In this, there are two servers operating together to provide continuity of service in case a single point of failure; one holds the live copy of the data, which clients can update, whereas the standby holds a backup copy. Whenever a transaction alters data on the master, a copy of the changes is passed across to the standby, so it can take over at a moment’s notice. The two servers operate under the control of a user-written ‘arbitration service’ whose role is to ensure that the standby is promoted to master as soon as the old master is known to be dead, and to ensure both servers never simultaneously think they are master.

Polyhedra uses a single thread in each server for handling signals to and from the arbitrator as well as messages to the partner server. In Polyhedra IMDB, this also acts as the journal thread, writing transaction logs to the end of the database file. As a result, if a transaction request has the ‘safe commit’ option set and the standby is running , then the successful transaction will not be acknowledged until it has been journalled on both servers and applied on the standby.

The client-side view of fault tolerance

The Polyhedra client libraries can be told (by the application program) that a connection should be fault-tolerant. The library code will then monitor the connection using heartbeats, so that in the event of a failure of the master server the client library will automatically switch to using the newly-promoted master. Once the connection to the database is re-established, the client libraries will re-launch any active queries that had been opened by the user; if the initial result set differs from what had been received from the other server, the client application is informed of the difference. The APIs allow the client application to monitor when switchovers occur, and even to determine which server is currently the target of the connection, but the application code does not have to do so. As far as the server at the other end of the connection is concerned, it does not care – and cannot tell – whether the application had requested a fault-tolerant connection.

It is worth noting that while clients that want to update the database have to be connected to the master database, it is also possible to connect to the standby and to query its copy of the database; this allows a degree of load sharing. The client libraries allow applications to set up multiple simultaneous connections to different servers, so it is possible to have one connection to the master server (for use in updating the database) and a separate connection to the standby (when available). It is even possible to set up the 2nd connection so that it will be to the master server if no standby is running, but will automatically switch to the standby when it becomes available.

It is also possible to set up a ‘fault tolerant’ connection to a single server, even if that server is running stand-alone rather than as part of a fault-tolerant pair; in such a set-up the Polyhedra client libraries will not close the connection (as far as the application is concerned) if the server is killed, but instead will ‘poll’ the server until it becomes available.

Handling active queries, server side

The initial processing of an active query by Polyhedra is the same as a normal query: when the request is taken off the queue it waits until it can get a non-exclusive lock on the table(s) involved, the result set is calculated and then sent to the client. However, before releasing the locks on the tables, the SQL engine puts some triggers on the database so that the active query can detect when a transaction does something that could affect the result set. When such a change is made, the trigger code (running in the thread doing the transaction) notes what has changed. On successful completion of the transaction, the thread for the client that launched the active query will get told to calculate the ‘delta’ and send it off.

Updating through active queries

As mentioned earlier, a client can set off a transaction that consists of a series of updates through active queries. Essentially, the client application updates a local copy of the result set for a query, and asks the database to alter the database to match the new state of the result set. If the client is operating in auto-commit mode, the change is sent through to the server, and queued for execution. If the client is operating in batch mode, however then the client libraries and the server operate together to offer a form of optimistic concurrency control:
  • when the client application makes the first alteration to the local result set, the client library informs the server, which blocks the active query from sending through any deltas; at this stage, though, information about the actual alteration is cached client-side (along with any further requests for alterations).
  • when the client application commits the transaction, the client library sends through al list of changes to the server
  • The server analyses the changes, and either queues them for execution, or rejects the whole transaction if any of the changes clashes with a change made since the active queries were blocked. If the transaction is rejected (or fails), the client is sent a special delta to rollback the requested changes and reflect any changes made by other clients.

The benefits of multicore

Having shown how the Polyhedra server uses multiple threads, let us look at the practical benefits that this can bring on a multicore system with an SMP operating system. On such a set-up, all threads can be run on any of the available cores, so if there were enough cores any or all of the following Polyhedra-related activities could be happening simultaneously:
  • one or more client applications could each be assembling a request for transmission to the server, or decoding and handling the response;
  • some cores could be running server code, each analysing a request from a client, and queuing it for execution;
  • the server’s journal thread could be writing a transaction log to file;
  • a write transaction could be in progress, using exclusive locks on the tables to stop them being queries,
  • a number of threads in the server could be querying the database (using shared table-level locks to ensure write transactions cannot be updating the at the same time), and preparing the messages to send the results to the relevant clients
… and of course, other cores could be operating parts of the overall application that are not (currently) using Polyhedra.

In practice, the fact that the Polyhedra server is running on multiple cores will have a variable effect on the overall performance, depending on – among other things – the way the client applications are using Polyhedra. For example, suppose the system designers decide that individual application should not make use of the Polyhedra APIs, and that instead there should be an application-specific ‘data API’. There are two obvious ways this might be implemented. Firstly, the library code for the data API could make use of the Polyhedra APIs to connect to and query the Polyhedra database servers; alternatively, the library API could send messages to an application-specific data server, which (invisibly to the rest of the system) uses a Polyhedra server as its durable, resilient data store. In the latter case, there is likely to be fewer client connections into Polyhedra, and little use made of the multi-threaded nature of Polyhedra other than offloading the journal thread.

Of course, in the embedded world not all platforms are SMP. Other options include Asymmetric (AMP), Bounded (BMP) and ‘bare metal’ architectures. It is also possible to have a mixed approach: certain cores running as an SMP pool, say, with others operating independently and running specialist applications. Polyhedra’s highly portable nature and platform interoperability gives flexibility here – for example, the server could be running on Linux (on a single core or on a number of cores set up as an SMP pool), with client applications running on OSE-based cores. In such an arrangement the client-server nature of Polyhedra allows the system designer great flexibility in the way the overall system is set up to make efficient use of the available processor power.
Ċ
Nigel Day,
10 Oct 2013, 07:48