Active Queries

How Polyhedra’s active query mechanism improves the performance of real-time applications

In a big real-time system (big enough that the software is split into a number of applications, running semi-independently) it can be quite a problem to ensure that one application knows when another application has altered the shared data. For example, in a telecoms system a line card has to know when somebody has changed its configuration - but it is far too expensive to keep polling a database just to see if something has changed. The reasons polling is bad is threefold:
  • either you have to put up with high latency, or a heavy load on the database. If the polling interval is long, the client doing the polling won't discover the change very fast; if the interval is short, both the database and the client(s) doing the polling are spending a lot of time on this activity
  • The client has to realise something has changed. It has to compare the new set of results with the old set; this takes room (as the previous result set has to be cached) and time.
  • The system does not scale well. Add another client, and the workload has to do more work - and you come to the point where the database is too busy to do anything else.
There are other solutions, such as an explicit notification mechanism where something that makes a change sends a message to all interested parties - however, this increases the amount of work that has to be done by the component making the change, and puts a lot of responsibility on the author of the component. The Polyhedra solution is to allow "active queries", where the database server itself takes responsibility for notifying interested parties about relevant changes. This simplifies the work both of the component updating the database and of the component interested in the change, and improves the scalability of the system.

The following article goes into more detail about the background and the capabilities of our active queries. (A copy of this article in PDF format is appended below, for ease of printing and offline viewing; it includes a few illustrations, and also has sample C code illustrating how a client application can perform active queries. There is also a 3-part webcast that covers the same topic.)

 

Background – the need for a new approach

Why use a DBMS in a real-time application

All but the simplest real-time applications are split into a number of software components to make things manageable and to allow different people or teams to be working on the application at the same time. Each component will be responsible for a separate job and communicating with other components via defined interfaces. The data-handling needs of each component will differ, but in general some data will have to be shared between components, and some data will need to be carefully preserved to guard against partial or complete system failure; there may or may not be an overlap between these categories. (Thus, some shared data might be transitory by nature, whereas configuration information will need to be shared and preserved.)

Traditionally, one component might act as the ‘data manager’ for the application, mediating access to the data and ensuring the critical stuff is preserved; where speed was important, this data manager component would be developed specifically for the application – but this can be complex, and it can be simpler to use a database management system designed for real-time work. In fact, the main reason for adopting commercial database technology is the same as that for adopting any COTS (common off-the-shelf) technology: cost. This shows up in many aspects, including reduced risk, reduced development time, reduced maintenance costs, and fewer specialist skills needed in the development team. As with all COTS technology, the emphasis must be that the chosen product is appropriate to one’s needs, otherwise costs will actually increase rather than decrease. In order to keep the footprint down and performance up, a real-time COTS DBMS will be more light-weight than a full, general-purpose DBMS, but will still offer benefits such as transactions, data isolation, data persistence, and industry-standards APIs (such as SQL, ODBC and JDBC) as well as hot-standby capabilities to guard against partial system failure, and the ability to adapt the database structure on the fly to cope with changing needs.

The need for live information

Consider first a SCADA[1] system, controlling (say) a factory or electrical distribution network. Sensors are feeding back readings to the central system, and operators can see what is happening via ‘mimic diagrams’. The question arises, how are the mimic diagrams kept up to date? Does the responsibility lie with the components that are obtaining the readings from the sensors, or the data management component?
 
Next, consider a base station in a telecoms system. The line cards will need to obtain configuration from somewhere; rather than each card having storage capabilities, it is easier to keep the information in cards in the control plane – preferably in a fault-tolerant fashion. External workstations would be able to change the information, but will have no network connection to the line cards and thus would have no direct way of informing them about updates. In a similar fashion, line cards would like to make sure administrative workstations could determine their status.

A final example: in a financial system, the database could hold information about the most recent share prices of stocks, shares or other ‘financial instruments’, allowing traders to monitor their favourite shares in real time.

In both the above scenarios, there is a general need for a low-cost mechanism that allows components to be able to determine when the data in which they are interested is changed ‘under their feet’ by other components of the system.

Three traditional approaches to database monitoring

Polling

Notionally the simplest way for an application to detect changes is for it to periodically query the data store, and then compare the results with the previously-retrieved values. This requires no special mechanisms in the data store, or in application(s) which are feeding changes into the data store. However, it can impose a severe load on the data store module, as if an application needs to know about a change within a short interval – say, 1/10th of a second if merely updating a screen, or 1/1000th or less if the application needs to take mitigating action – then it has to launch queries with that frequency. It also adds to the workload of the client application, as it has to analyse the response to see what – if anything – has changed. In addition, if the data store and the application needing the data are on separate machines, then the network traffic can be significant. In summary, if low latency is required the overheads of use polling are high – and there can be catastrophic degradation in performance at times of peak activity or when the system is expanded sufficiently to cause an overload condition.

External notification mechanisms

Polling can be thought of as a ‘passive pull’ technology. An alternative is a ‘push’ notification scheme, where the component that first detects the change is responsible for notifying any other component that needs to be told. The component publishing the information can either know who needs to be informed at design time, or there can be a more dynamic (and probably more generic) ‘publish and subscribe’ system, with API calls for registering interest in an event and for asking for information to be sent to the parties who had registered to receive information.

The problem with tailored push mechanisms – where the originator/detector of an event tells concerned parties about changes that are relevant to the recipient (without telling them about irrelevant changes) – is that it can be significant effort to adapt to changing requirements during the development phase and beyond. A generic publish & subscribe mechanism avoids this problem, but may not offer sufficiently fine granularity or may not allow precisely the right information to be supplied. Thus, the listener might just get a notification that a value associated with a datum point has changed, and may have to find out for itself whether it is a point of interest, and if so what is the current value. Even if the publish and subscribe mechanism avoids these problems, it still complicates the job of the component publishing the information as it is responsible both for putting the information into the database (for long term storage, say, or to be available to components that are not running or that do not need immediate notification) and for initiating the notification process to subscribers.

Database triggers

Some database systems allow code to be attached to a database, which is triggered on an event. Normally, the trigger code will be written by the database administrator using PL/SQL or in a DBMS-specific trigger language such as Polyhedra’s CL language, and the role of the triggers is either to perform some integrity checks (and aborting the transaction if the conditions are not met), or to do some ‘knock-on’ actions. The integrity checks can be as simple as ensuring particular attributes of the changed records are within a range directly or indirectly defined by other fields in the record, or can be much more complex. Thus, in a telecoms application, an attempt to create a record for a new connection might be rejected if either of the endpoints are flagged as being in a call – if, however, the new connection is permissible, the trigger on the connection table can set the flags indicating that the two endpoints are in a call. (A separate trigger would clear these flags when the connection record is deleted.)
 
The uses described above require the trigger code to be run within the transaction that activated it (either as the triggering event is performed, or with other triggered code at the end of the transaction), and obey the ACID properties (see panel to right). In particular, the triggered code should have no side-effects that cannot be rolled back, should the transactions fail. For this reason, one should treat with great caution data management systems that allow code written in C or C++ to be called within transactions, as atomicity cannot be enforced by the system. However, it is OK to run such code post-transactionally as the server can cancel the triggers prior to execution if the transaction is rolled back.
 
Where a data management system allows post-transactional triggers, such triggers can be used for knock-on actions that take too long to be done in the original transaction – but they can also be used for event reporting: either the database administrator can add C or C++ code to hook in to a generic publish and subscribe mechanism (as described in the previous subsection), or the data management system can have its own inbuilt notification mechanism.
 
Reporting events in this way has the advantage that the client applications generating the updates only have to tell the database; their job is simplified, and, as a central component is taking responsibility for the job of notification, overall system correctness is improved. However, as with the use of a generic notification system described in the previous subsection, it is difficult to ensure the granularity of the notifications is sufficiently fine; in addition, those told about the events will probably need to query the database and work out for themselves what has changed. There is also a maintenance problem, as those who are designing client applications will have to ensure (probably by liaison with the database design team, if they are the only ones that can add triggers) that suitable events are generated for their needs – and these events are updated or deleted if and when their needs change.
 
In some systems, there will also be a problem with security. Most database systems provide some sort of user-based controls that can be used to determine (down to the level of individual columns of a table) who can view and update the contents of the database. In embedded systems, such controls are often used in a role-based manner, restricting what can be done by individual components, and thus limit the damage should a component go wrong, but they may also be used to stop people discovering operational information that they are not entitled to know about. If user-written code can register to be notified about particular types of events, then people may be able to make inferences about data that they could not directly query the database about.

An alternative approach: Active Queries

In Polyhedra, SQL queries from a client can be static or ‘active’. Active Queries provide automatic client notification of changes in the database in real-time. Active queries are launched in a similar manner to normal (‘static’) queries, except that the database is told the queries are ongoing. The client receives the initial set of results, but instead of forgetting about the query, the server remembers it (until it is cancelled by the client, or the client connection is closed or lost). Whenever the database changes in a way that affects the query, the server notices, and sends a ‘delta’ message to the client.

A delta tells the client enough information to bring it up to date. Thus, it will report added rows, deleted rows and changes to individual rows: for changed rows, it does not send the whole row contents, just the cell values that changed. Consequently, efficient use is made of the network bandwidth, when client and server are on separate machines. In addition, the job of the client application is greatly simplified:
  • It is informed when updates occur, with low latency: no need to poll
  • If it was just told a change had occurred, it would have to reissue the query to find the new values – and then, depending on the application, it might have to analyse the results to find out what had changed. By contrast, Polyhedra’s active query mechanisms avoid the need to reissue the query, and the client libraries let the application zoom in on the changes.
Once launched, active queries continue until closed by the client, or the client connection is closed, or the query becomes invalid – for example, if the table is dropped, though adding a column to table only invalidates a query if the client had used ‘select *’ rather than specifying the columns of interest.

Efficient server-side handling of Active Queries

Detecting changes cheaply

Conceptually, at the end of every query the server checks to see what active queries are affected by the changes that have been done, so that it can work out which ones need to be sent deltas. Of course, to do it like that would be grossly inefficient, so instead the active query code running within the server makes use of a fine-grain event triggering mechanism provided by the core database engine. This allows the active query mechanism to ignore irrelevant changes:
the presence of an active query does not affect the performance of other queries, nor the performance of transactions that do not alter the table(s) monitored by the active query;
active queries on single objects (identified by their primary key) have no affect on the performance of transactions that do not modify the monitored objects; and,
the affect on the performance of a transaction that does alter data being monitored by an active query is very low, as the work of updating the client is done post-transactionally in a separate thread.

Coping when things get busy

Suppose values are changing rapidly, and the clients cannot keep up with the changes. One example is a process control application, where sensor values are being read (through Analog-to-Digital Converters, or ‘ADCs’) as frequently as possible and the results fed into the database: a characteristic of high-sensitivity ADCs is that two consecutive readings are rarely the same, even if the input value seems stable. You may then have graphic devices or control room ‘mimic diagrams’ being fed in turn from the database, and the network may be busy or the client machine may not be able to handle the workload.

To cope with such situations, if the client is running slowly and cannot keep up with the stream of deltas, the server will start to ‘merge deltas’. Thus, clients will not be told every change, but will be kept as up to date as possible without slowing down the system. This means that the system is responsive even in peak load conditions, and copes well with crashed client applications.

In fact, clients can take advantage of the delta merging mechanism to further reduce the load on the system. At the time the query is launched, the client can specify a ‘minimum delta interval’, in thousandths of a second. Whenever a server sends a delta over an active query with an interval defined, it will ensure the next delta will not be sent until the interval has expired, with all intervening changes merged. A graphic client can choose, say, an interval of a fifth of a second, which is long enough to allow people to read the values without worrying about flicker – yet short enough to appear real-time to those watching the displays. By choosing an appropriate minimal delta interval, and ensuring only the data required is retrieved (e.g., avoiding ‘select *’ and only asking for the columns and rows of relevance to the client), efficient, scalable and responsive systems can be produced.

Optimistic Concurrency through Active Queries

As well as keeping clients up to date with changes in a server, Polyhedra’s active query mechanism also provides an effective way for the client to change the database: it simply has to change the result set of the active query. When it does so, the client library will automatically ask the server to make the corresponding changes. Of course, for this to work, the query has to be ‘updateable’ in that it must be unambiguous as to which record has to be inserted, modified or deleted to make the database match the modified result set: in practice, this means the client cannot modify the database though active queries involving joins or SQL functions. Updates done through the active query mechanism avoid the use of the SQL engine, which improves efficiency.

Whenever a change request is received by the server, the first thing it does is check whether it satisfies the optimistic concurrency constraints: basically, the transaction is only accepted for processing if no other client has made a conflicting change since the last delta was sent. Clearly, the new transaction cannot change a record if something else has just deleted it, but the transaction is also rejected if something else has changed a record attribute that the client wants to alter. Optimistic concurrency is efficient when transactions are expected to succeed, which is usually the case in embedded applications.

Updates through active queries can be grouped into larger transactions. When the client starts such a transaction, the server is automatically told (by the client library code) to block sending deltas on the active queries involved in the transaction. When the client later commits or cancels the transaction, it will – if necessary – receive a merged delta to bring it up to date, but in the case of a successful transaction it does not have to be told about the changes it made!

Monitoring the monitors

Recent editions of Polyhedra have been enhanced to allow the database designer to detect when active queries are set up on particular tables. They can do this by adding records to a configuration table to say which tables are of interest; once set up, records are created in a special table when active queries are established on any of the designated tables, and removed when the queries are cancelled. Each of these special records allows one to find out details of an individual query, and of the connection used to launch the query. Trigger code (written in Polyhedra’s trigger language, CL) can be attached to the special table to respond to the creation and deletion of records, and can take appropriate action. For example, it would be straightforward to kill any connection that tried to launch active queries on a particular table unless the connection was on the same machine:

script DataQuery
  on create
     -- (nb: 2130706433 corresponds to net address 127.0.0.1)
     if sourcetable="currency" and \
        machine of session = 2130706433 then delete session
   end create
end script


While this particular example may be of limited use(!), the mechanism allows more sophisticated and meaningful actions. For example, suppose that some of the data values are being fed from external devices according to some polling schedule, and the polling schedule for a particular point is specified in its database record. The query monitoring mechanism is sufficiently powerful that it is possible to detect when an active query is looking at a particular point, and trigger code can automatically increase the poll frequency for that point.

Conclusion

As discussed in this document, Active Queries have many benefits over traditional solutions:
  • No polling, so low latency achieved with less load on the server
    • less network traffic
    • Scalability of system improved
  • No server load when clients aren’t listening
  • better handling of overload conditions, more gradual degradation
  • No need for separate notification mechanisms
  • Clients doing updates are simpler: they don’t need to tell others about changes
  • Clients doing queries are simpler: they are told what has changed
  • No need to set up and manage database trigger code to propagate notification messages
In addition, the bidirectional nature of Active Queries allows an efficient way for clients to update the database on an ongoing basis without involving SQL, and the new mechanism for monitoring the placement of active queries allows, for example, polling frequencies for external data sources to be tuned dynamically.

In summary, Active Queries bring an additional and powerful capability to database developers allowing for faster program development and greater system responsiveness. By avoiding polling they make the system more stable and scalable, and the bidirectional nature provides an easy way for a client to update the database. The uses for Active Queries are varied and many!

[1] SCADA stands for supervisory control and data acquisition. It generally refers to industrial control systems: computer systems that monitor and control industrial, infrastructure, or facility-based processes. See the Wikipedia article for more details: http://en.wikipedia.org/wiki/SCADA

Ċ
Nigel Day,
11 May 2012, 08:08