How certain features of the Polyhedra IMDB DBMS can be used to provide a selective audit trail mechanism for a Polyhedra database
The Polyhedras DBMS do not automatically produce an audit trail that would allow one to trace responsibility for each individual change to a database. However, the software does provide sufficient mechanisms to allow a database designer to produce such an audit trail for selected activities. This document contains a worked example that illustrates one way of configuring Polyhedra to do such auditing, and describes the facilities it uses to achieve this effect.
(a copy of this article in PDF format is appended below, for ease of printing and offline viewing.)
In application areas such as SCADA (Supervisory Control And Data Acquisition: industrial control systems for monitoring complex systems such as power plants, water/oil/gas/electricity distribution networks, train networks, refineries as well as more simple factory-floor applications), operators monitor the system state and can interact with the system to control its behaviour. In a water distribution network, for example, operators will receive alarm messages to warn them when critical conditions are being approached such as high water levels, and can cause sluice gates to be raised or lowered to alleviate the position. They can also configure the system - for example, to control when alarms are raised.
In such systems, it is often necessary to know which operator was responsible for actions such as acknowledging an alarm, adjusting an alarm point, or manipulating the position of a sluice gate or the control parameters for a pump. Indeed, in many industries this may be a regulatory requirement. The responsibility for producing the audit trail lies with the overall application, not with the individual components, though of course the component must not offer security loopholes.
Where there is a database at the heart of the SCADA system the operators would not have direct access to the DBMS; instead they will be clicking on on-screen buttons, for example, which will be sending preconfigured SQL commands to the database system. Consequently, the DBMS does not itself have to produce audit trails - but it simplifies the job of the application developers if they can take advantage of any security features the DBMS offers. In the case of Polyhedra, audit trails are not automatically produced - but there are three features that assist the database designer to perform selective generation of audit trails:
· User-based security and connection tracking
· Trigger code
· Time-series data capture
The remainder of this document illustrates how a database designer can use these features, both to implement a role-based security policy and also to ensure a long-term record is made of the underlying users responsible for particular changes to the database.
Polyhedra provides a security mechanism based on the concept of tables being owned by users, and users being assigned access rights to tables and views - and also to individual columns in tables and views. Thus, if you have a table called, say, currency, defined by the following schema:
… then you could allow everyone to read the table, but restrict updates to the usdollar column to a particular user, by the following commands:
By default, user-based security is disabled. To enable it, create a table called users, with a definition equivalent to the following:
The next time the database server is started, the DBMS will automatically create extra tables to record the table and column privileges, and will create two special records in the users table for PUBLIC and SYSTEM, but with null passwords. All other users are automatically able to do anything that the PUBLIC user can do; the SYSTEM account has full access to the database, so your first action should probably be to set a password on that account!
You can now create records for various users, and grant them appropriate privileges. Note that client applications have no privileges until they have logged on - and by default, users have no privileges apart from those they inherit from the PUBLIC account.
As well as the users table, you can also add a table called dataconnection:
Once this table has been added, a record is automatically created in it every time a connection is established; when a user authenticates him/herself, the username field is set appropriately, and cleared again when the user logs off. The client_type and env field are filled in with values supplied by the client application when it connects; it TCP/IP is used for the connection, then the machine field will hold the 32-bit IP number of the client. In fact, for flexibility we can derive a table from this, and tell the database to create records in the derived table when a user connects; this provides a clean way of adding extra columns to the table, and also enables us to add ‘trigger code’ as described in the next section. (You cannot attach triggers to special tables such as dataconnection.)
(At this stage, we have no added any columns in the derived table, so it just has those it inherits from the base table; we shall add a column later, when we extend the example.)
The CL module in the DBMS is a powerful scripting engine that allows you to associate code with tables. You can specify the actions to be done when a record is inserted and when one is deleted, and also associate different ‘methods’ with individual columns. The CL programming language is rich in capabilities, with conditional flow of control and various forms of iteration statement. CL can read and write attributes of records, locate records, and create and delete records, and operates inline within the transaction which triggered it; it can even abort a transaction, and specify the error code and error message that should be sent to the client.
To illustrate what can be done with CL, we shall use it to replace user-based security by role-based security. First we shall add the following line to the definition of the users table, to hold information about the role of a user:
With this definition in place, we can then add some roles, users and privileges:
Basically, roles are represented by entries in the users table with a null value in the role field; normal users will have an associated role, but no privileges. In this case, anyone can query currency and update the usdollar column; only the operator and manager roles are allowed to insert or delete records in this table (plus the system user, of course).
Now, we shall add a trigger to the ‘username’ column of the auditedDataConnection table, so that when a normal user - Tim, say - logs on he is automatically assigned the right role - in this case, the operator role. In essence, omitting various checks, all that is needed is the following:
(‘Omitting checks’ is not very sensible when we are dealing with security! We will add some sensible checks in a moment - but for the moment we shall concentrate on the bare minimum needed.)
Let us look at this code snippet in detail, for the benefit of those who have not previously seen our trigger language. The script dataConnection … end script grouping associates a number of functions and procedures (“methods”) with the named table; the on set username … end set username section defines the method to be applied whenever the designated attribute is changed. When a user logs on or logs off, the security system will set the username field of the appropriate dataconnection record, and this will trigger the invocation of the corresponding method defined above. The method will run in the context of the modified object, so when it refers to a column name, the right cell of this record is read or written. In this example, the code first checks to see if someone is logging on (by seeing if the username is not null) and if so it inspects the appropriate record in the ‘users’ table to see if a role is defined (by seeing if ‘role of username’ is not null: the CL compiler recognises when a field is a foreign key reference, and treats it as a pointer). If username has been set to null, or no role is defined for the user who has logged on, nothing needs to be done; however, the remaining case (where username points at a ‘users’ record with a defined role) needs a bit more attention.
Assume Tim has logged on. What happens in this case is that the CL code sets the username attribute of the auditedDataConnection record to point at the ‘operator’ record in the users table. What happens now is that our method is kicked off again, recursively! This time through, the “exists role of username” test will return false, so the method will just drop through leaving the database unchanged. Control will now return to the outer invocation of the method, immediately after the ‘set’ statement. Once the method is complete, the transaction will terminate, and all subsequent operations by the client will be done with ‘operator’ privileges.
Of course, the above code is a bit too simple. To see why, consider what happens if a mistake has been made in populating the users table: suppose a record had been inserted as follows:
Will we get into an infinite loop if orouberos logs on? Well in this case, no, as the CL statement ‘set username to role of username’ will set the username attribute to the same value as it had before… and as there is no change, the ‘on set’ method will not be triggered for a second time! However, now consider the following:
Logging in as either alpha or omega will trigger infinite recursion, terminated only by memory exhaustion or hitting the CL stack limit. As it happens, this is trivially prevented by the addition the following code snippet before the ‘set’ statement in the previous CL code:
An alternative would simply to say that one does not swap to a role which itself has a role: this would leave the user logged on, but with no privileges.
A further enhancement to this example would be to record in the database the name of the user who actually logged on. First, we would add an extra field to the auditedDataConnection table…
Now, we can adapt our ‘on set’ method to set and clear this attribute at the appropriate times; at the same time we can ensure that no-one can log in as a password-protected role, even if they knew the password:
Looking at the black, emboldened stuff in the above, the line labelled (1) sets actual_user to record the real user prior to the role change (which is suppressed if the role has a role). The ‘else if’ clause labelled (2) stops any role change other than to the role appropriate for the real user. The next ‘else if’ clause (3) detects if someone is logging on as a password-protected role whilst no real user is logged on: setting username to null allows the login to succeed, but the session will have no access rights. Finally, the code labelled (4) cuts in when username has been set to null, and it ensures that we forget who originally logged on.
As well as attaching methods to tables, CL also allows the database designer to define global functions and global constants. Global constants are not just evaluated at compile time: they are also re-evaluated at each subsequent database start-up, and therefore can depend on (and affect) the contents of the database at that time. Consider, for example, a database containing a table that had been set up as follows:
Suppose next that the CL attached to the database included the following:
At start up of the database service, the CL system will evaluate Find_latest_event ("currency") and Find_latest_event ("user") in order to assign a value to the global constants currency_event and user_event. The function uses the CL locate statement to find a particular record in the latest_event table; if it does not exist, the function creates the record. Other CL code can make use of currency_event and user_event, which will point at specific records in the database for the duration of the session.
In an embedded application where, say, the current values of sensors or shares are kept in-hand in a database, it is occasionally useful or important to know how the values are changing over time. In the UK, for example, water utility companies are required to keep certain historic information online for at least 90 days, and offline records for 10 years. In a system monitoring 10’s of thousands of analogue sensors whose values are changing by the second, this can rapidly add up to a huge volume of data that can swamp a normal database, let alone an in-memory one!
Fortunately, there are characteristics to this kind of data - and in particular, to the way that it is used - that means we have been able to develop a special module in Polyhedra to handle this requirement. The historian submodule is part of the database server, and can be configured (by creating and updating records in certain tables) to set up a number of ‘logging streams’, each monitoring a given table with a chosen sample interval (measured in seconds), or ‘on change’. For each logging stream, you can then specify, among other things:
· which columns are to be logged;
· the amount of disk space to be used as a circular buffer containing raw sample data, and likewise for time-based compressed data; and,
· the names of ‘pseudo-tables’ that can be queried to retrieve sample data.
It is thus possible, just by using standard SQL, to set up a series of configuration records to say:
· “Monitor the table called analog, logging away every 5 seconds the values of the (floating point) attribute called ‘value’ and the (integer) attribute called ‘status’, but only for those records where the ‘log_me’ attribute is true.
· Use a 500 megabyte file (organised as 100 blocks of 5 megabytes each) as a circular FIFO buffer of raw samples.
· Use another 200 megabytes for time-compressed records at 1-minute granularity, and another 200 megabytes at one hour granularity; when producing time-compressed samples, record the min, max and average for the value column, and the bitwise AND and bitwise OR of the status field.
· Raw samples can be retrieved through a table called fsample, time-compressed samples via a table called fsample_c.”
· “Monitor the table called digital, logging away the (Boolean) value and (integer) status attributes on change, using a 200 megabyte file as a circular buffer of raw samples; allow samples to be retrieved through the dsample table.”
With such a set-up, one would retrieve information about earlier values by querying one of the special tables that had been set up:
The former retrieves the last hours-worth of samples for a given analog, the latter retrieves 2 weeks-worth of 1-hour time-compressed samples for two analog points. The historian is optimised for queries like these, and maintains its own indexing on the files to reduce the number of file accesses required.
The historian is not limited to logging numeric values; it can log fields of any type supported by Polyhedra. Thus, to record events, one notes that they are normally associated with objects already being represented in the database; one simply adds an extra string attribute to store the most recent event associated with that object (and perhaps a datetime attribute to record a timestamp), and configure the historian accordingly:
· “monitor the latestEvent and timeOfLatestEvent columns of the component table, using a 100 Mbyte file to hold the samples that are recorded on change, and allowing the samples to be retrieved through queries an a table called event.”
While the historian undoubtedly can extend the amount of data stored by a Polyhedra database, there are still limits on the amount of data storage - though this time it is because there are limits on the size of a file. On most 32-bit systems, there is a limit of 2- or 4-gigabyte on the size of a file that can be opened for random access; where data is changing rapidly and fine-grained samples are needed, 4GB is clearly not enough to hold many years of samples. To cope with this, the historian introduces the idea of archive and extract files.
At any time, you can instruct the historian to produce an archive file containing the oldest <n> blocks of un-archived data from the buffer file. A file of the indicated name is produced, and the blocks in the buffer file are marked as archived. An archive file can be brought back online by creating a configuration record in the database. If a query is made of a special historian table to retrieve samples, such as the example queries given earlier, then the historian does not limit itself to using the information in the main buffer file, but it also uses any applicable archive files that contain data for the relevant period. Archive files are in a machine-independent format, and are not tied to any particular database instance; consequently, archive files can be brought back online into any database with a similar historian configuration. This gives great flexibility in the way the historian is used: it is possible, for example, to use the historian on an embedded system as a data logger in a vehicle; when back at base, the archive files for recent journeys could be transferred (via a network, perhaps, or removable media such as flash cards or CD-R) to a workstation and then brought online for further inspection. For even further flexibility, the Polyhedra toolset provides a utility to convert archive files into XML, enabling easy export of the time-series data into a disk-based data system for detailed analysis.
We now have introduced all the tools we need to implement an audit trail within Polyhedra. Suppose, for example, there is a currency table in the database (as defined earlier), and we want to be able to determine the person who creates and deletes records in this table. Assume also that we have set up role-based security, and have defined a table called latest_event as described in the section about CL global constants and global functions. We could now set up the historian to generate a sample whenever the ‘timestamp’ attribute of a latest_event record is altered:
… and attach a script to latest_event to define a procedure:
With the above configuration, the historian will generate a sample whenever the timestamp attribute of a latest_event record is changed, and the log_event procedure will set the event_type, event_text, username and timestamp attributes of a latest_event record whenever it is called. The historian configuration also allows use to query the historical data by inspection the special table called ‘event’ - the contents of this table are not present in the in-store data, but are automatically generated from all the data that is available to the historian: this includes its in-memory buffers, the file it uses to hold samples in a round-robin fashion, and any archive files that have been brought online.
Now let us look at the following CL code:
If this CL code is attached to the database, then whenever a record is created in the currency table, the log_event procedure is invoked on the latest_event record whose primary key is “currency”. This will set various fields on that latest_event record, including the timestamp - and altering the timestamp will in turn will trigger the historian to generate a sample. The record can be retrieved by querying the event table. We can similarly attach code to the data_connection table to log whenever connections are established and when people log on, and to the users table to record whenever users are created or deleted.
While Polyhedra does not by default perform any special security policy or automatically generate audit trails of all activities, it provides features that be used in conjunction to provide audit trails tracking selected activity:
The fact that CL can be used to tie together disparate features of Polyhedra to enforce a security policy such as traceability indicates the power and flexibility of the overall Polyhedra solution.
This appendix brings together and extends the code and data snippets given above into a complete example, with additional checks and with more types of events being recorded. In particular, the audit trail is extended to record when clients connect and disconnect, and when users log on and off. Clearly, in a production version, further actions would need to be checked and tracked (such as alterations to the users table) but it should be clear how the code could be adapted to cover such requirements.
The example is also available in machine-readable form to registered users of Polyhedra via the Polyhedra customer support desk, along with instructions on how to run it.
The CL compiler allows the following sections to be defined in separate files, or all together in a single file. In a small example such as this, there is little point in using more than one file. For large projects, where the CL code can run to thousand of lines, it is often sensible to break the file up, with a separate file for the global constants and global functions that are used in many places, and then having a separate file for each group of related tables (holding the scripts for those tables plus the global functions and constants that are only used by those scripts).
When the main Polyhedra executables (such as rtrdb, the main executable that runs the server for a database, or sqlc, which provides a client interface allowing the user to enter SQL commands and see the results) are set off, they need to be configured to tell them how to operate. The server, for example, needs to know what ports to claim so that clients can connect to it, and it needs to be told where a saved database file can be found. Such information is provided by ‘resource settings’ that can be specified via the command line when setting off the executable, but it is much easier to gather them together in a configuration file – thus when the database server is fired of by a command line such as ‘rtrdb db’ it will look for a file called poly.cfg, and will then read the resource settings defined for the ‘entry point’ labelled db. The configuration file parser allows an inheritance mechanism to make it easy for different entry points to share resources, and as with CL there is a comment convention that allows the author to add explanations to the configuration file.
 In the schema definition, persistent is a table attribute saying that the table contents should still be available after the database is stopped and restarted; the alternative is transient, used for information that does not need to be durable. These keywords can also be used as column attributes - so if for example the usdollar attribute is being updated automatically by a feed from an exchange, it would probably be sensible to mark it as transient even though the rest of the table is persistent. As a further enhancement, it is also possible to mark transient attributes and tables as local, to stop the contents being replicated to a standby server or read-only replica.
 Or, more accurately, if this table is present when the database server is started up again.