How-to Guides‎ > ‎

Combining subscription (partial replication) and the Polyhedra historian

When Polyhedra 9.0 was released, we put together a demo illustrating the database subscription feature introduced in this release, and produced a video of it. The demo is also intended to show off the suitability of Polyhedra for use in SCADA and IoT applications, and makes use of the historian module that records time-stamped samples. This technical note describes the background to this demo, and shows how the databases are configured to achieve the effects shown in the video.

Setting the scene

Polyhedra 9.0 introduced a new feature to Polyhedra IMDB: the ability for one database to subscribe to one or more tables in other databases in order to establish a live cache. Basically, by creating records in control tables, a database can be instructed to set up a connection to other database servers, and then for each one set up a series of subscriptions that specify the tables that are to be replicated, and the columns within those tables.

Thus, suppose you have a series of  separate SCADA systems with a Polyhedra database at its core, where each of the systems monitors and controls part of an industrial plant: a factory, for example, or a water distribution network. The systems might be geographically distributed, with fast connections back to head office - but these connections might of course be broken, and the latency might be high.  The new subscription mechanism makes it easy to have a single central database that holds the latest-available readings from all the sensors in each of the remote systems: assuming the remote databases were set up to allow partial replication (that is, they were running Polyhedra 9.0 or later, and had defined a journal port to which the central database could connect) and you want to retrieve every column of the table called sensor in each remote database, then it is just a case of creating suitable records in the journalsession and journalsubscription tables in the central database:

-- set up the connections ('sessions'):
insert into journalsession (id, service, connect_count)
                    values (102, '10.1.1.102:8051', 0);
insert into journalsession (id, service, connect_count)
                    values (104, '10.1.1.104:8051', 0);
-- repeat as needed; if any of the remote databases is fault-tolerant, give a list
-- of access points in the service attribute:
insert into journalsession (id, service, connect_count)
                    values (108, '10.1.1.108:8051,10.1.1.208:8051', 0);
-- now set up the subscriptions:
insert into journalsubscription (sSession, source_table_name, id_column_name)
                         values (102, 'sensor', 'remote');

insert into journalsubscription (session, source_table_name, id_column_name)
                         values (104, 'sensor', 'remote');

-- ...
insert into JournalSubscription (session, source_table_name, id_column_name)
                         values (108, 'sensor', 'remote');

commit;
-- finally, enable the sessions and associated subscriptions:
update journalsession set enable=true; commit;

(an aside: for backwards compatibility, the journalsession and journalsubscription tables are not present in a newly-created database; you have to define them yourself. See the utility classes reference manual for more details on these tables, and some sample SQL that can be used to create them.)

The only tricky thing to note in the above SQL is that the sensor table in the central database has to have a slightly-different definition to that in the remote databases: to guarantee that there are no primary key clashes between records retrieved from different databases, the primary key of the table in the central database has to have an extra column - in this case, called remote - and the value will be automatically set to the value of the session field of the journalsubscription record that indicates the source of the retrieved record.

Suppose now we want to do something a bit more complicated, and make available to the central database some information about the historical values of the remote sensors, for use in 'trend diagrams'. A trend diagram or trend chart is a graph showing how the value of one or more data points have changed over time, to help operators visualise the behaviour. Here is an example taken from the web site of one of our customers:

a trend diagram

The obvious solution - to configure the Polyhedra historian in the central database to track the values in the local sensor table - has limitations, in that the records are only being updated when the connection to the remote database is up. An alternative solution is to have the historian in the remote databases tracking the values, and when you want to bring up a trend diagram tracking a sensor then you work out which remote database is responsible for that sensor and then query that database to get the information to populate the trend diagram. The limitation with querying the remote site for the historical data is that the connection might be down at the time you need the information! Obviously you can't expect to get at the live data or the recent history whilst the connection is down, but it could be an issue that you cannot get at earlier data. What would be nice is if the central database could somehow have a copy of the historical information, including data that had been gathered remotely during previous times that communications were down.

Requirements for the demo

The challenge we set ourselves was to put together together a demonstration of Polyhedra for SCADA and IoT applications, that also illustrated the use (and benefits) of the recently-introduced partial replication mechanism. As a SCADA demo, it needs to show the historian and fault tolerance (both critical features for SCADA applications), and also how the new partial replication mechanism can contribute to more scaleable, distributed solutions. For IoT, it needs to show remote systems gathering sensor readings, making them available centrally, and coping with loss of connectivity.

In more detail,
  • it should run on a portable, stand-alone multi-board kit that had been put together for demonstrating fault-tolerance.

    Polyhedra demo kit based on 4 raspberry pi computers with graphics cards

  • two boardsshould be used for the fault-tolerant 'central' database in a fault-tolerant configuration, and a third 'remote' board should run its own database, gathering sensor readings and showing a trend diagram. The fourth board should also display a trend diagram, but this time using information from the central database.

  • the central database should be capable of handling multiple remote databases, each with their own sensors (including external sensors).

  • the demo should use the historian running on the remote system - but should use the partial replication features to disseminate data to the central system rather than rely on, say, the historian's live data export facility. The role of the historian on the remote database is solely to populate the local trend diagram, and thereby show the information that is not yet available in the central database when the connection is broken.

The implementation

In overview, in both the central and remote databases we have a table called sensor, with the historian configured to log the value column of this table. To allow the central database to discover what happened when the communications were down, the remote database has an extra table called reading, that will contain, say, 1000 recent time-stamped readings for each sensor. Every second, code attached to the remote database will trigger its historian to log the current values in the sensor table, and will also create a new entry in the reading table for each sensor (deleting the oldest such reading if necessary).

The central database subscribes to the reading tables in the remote database(s), and when entries come in their values (and timestamps) are propagated into the relevant entries in the sensor table (creating them if needed). The historian in the central database is set up to create a new log entry whenever a record is created in the sensor table, or whenever the timestamp field is changed; this way, we can be sure that the log records created for a sensor by the central historian are precisely in step with those created in the remote database that is responsible for that sensor.

Configuring the remote database(s)

To allow the central database to discover what happened when the communications were down, the remote database has to keep a set of missing readings. The following SQL is used to set up and populate the tables:

create table reading
   ( name      large varchar
   , timestamp datetime
   , primary key (name, timestamp)
   , value     real
   );
create table sensor
   ( name      large varchar primary key
   , timestamp datetime transient
   , value     real     transient
  
   , readings array of reading
   );
create table readingtimer
   ( derived from timer
   , stream  integer references logcontrol
   , sensors array of sensor
   );
-- configure the historian. first, it is to sample the sensor table only when told:
insert into logcontrol (id,  source,   namecolumn, enable, rate, usersampling)
                values (10,  'sensor', 'name',     false,  null, true        );
-- say which columns to monitor:
insert into logcolumn (type, control, name,    sourcecolumn)
               values (0,    10,      'value', 'value'     );
-- tell the historian the buffer size to use,
-- and how many buffers the log file should hold.

insert into logdata (rate, fedfromrate, control, buffercount, buffersize)
             values ('0s', NULL,        10,      10,          4096,     );
commit;
-- set up a pseudo-table called 'sample' that can be used to query the log data
update logcontrol set raw='sample', enable=true where id=10;
commit;
-- create the thing that will tell the historian to sample the sensor table
insert into readingtimer (id, interval, active, continuous, stream)
                  values (10,  1000000,  true,   true,       10   );

The readingtimer table is used to trigger logging, and the generation (and deletion) of reading records. Some CL code is attached to the sensor and readingtimer tables, to perform the 'business logic' described earlier:

constant integer ReadingCount = GetIntDefault ("ReadingCount", 1000)
script readingtimer

    on  set counter
        -- trigger a sample, and also create new readings (so that the
        -- central database can be kept up to date)
        local reference sensor s
       
        sample () of stream -- 'sample' is a special method on logcontrol
        repeat foreach s in sensors
            create_reading () of s
        end repeat
    end set counter
   
end script
script sensor

    on  create
        local reference readingtimer rt
        create reading (name=name, timestamp=now(), value=value) into readings
        locate readingtimer (id=1) into rt
        if exists rt then insert me into sensors of rt
    end create
   
    on  create_reading
        local integer n = getarraysize (readings)
        local integer i
       
        -- make room
        repeat with i=ReadingCount to n
            delete object 1 of readings
        end repeat
       
        -- create a reading with the current
        -- (actually, the most recently reported) value

        create reading (name=name, timestamp=now(), value=value) into readings
    end create_reading

end script

The client application that is monitoring the sensors the Polyhedra SQL command INSERT OR UPDATE sensor values (<name>, now(), <value>) to record readings, so that it does not have to worry whether the relevant sensor record already exists.

Configuring the (fault-tolerant) central database

As mentioned earlier, we have both a sensor and reading table in the central database, but these have a slightly different definition to that used in the remote database(s):

create table reading
   ( name      large varchar
   , timestamp datetime
   , id        integer
   , primary key (name, timestamp, id)
   , value     real
   );
create table sensor
   ( name      large varchar primary key
   , timestamp datetime
   , value     real  transient
   );

The main differences are that the sensor table does not have an array of pointers to records in the reading table, and, more importantly, the primary key of the reading table has an extra attribute (called id), that will be used in the subscription definition:

insert into journalsession (id, service, connect_count)
                    values (102, '10.1.1.102:8051', 0);
insert into journalsubscription (session, source_table_name, id_column_name)
                    values (102, 'reading', 'id');

insert into journalsession (id, service, connect_count)
                    values (104, '10.1.1.104:8051', 0);
insert into journalsubscription (session, source_table_name, id_column_name)
                    values (104, 'reading', 'id');
commit;
-- enable the sessions and subscriptions
update journalsession set enable=true;
commit;

As in our previous example, we have more than one subscription populating a table, so we have instructed the subscription mechanism to use one of the primary key attributes (in this case, id) to specify the source of each record it creates. All that remains is to set up the historian:

-- sample the sensor table on change of the timestamp attribute,
-- using supplied timestamp
insert into logcontrol (id,  source,  namecolumn, enable,
                         timestampcolumn, triggercolumn, rate)
                values (1,  'sensor', 'name',     true,  
                        'timestamp',     'timestamp',    NULL);
-- say which columns to monitor:

insert into logcolumn (type, control, name,    sourcecolumn)
               values (0,    10,      'value', 'value' );
-- tell the historian the buffer size to use,
-- and how many buffers the log file should hold.
insert into logdata (rate,fedfromrate,control,buffercount,buffersize)
             values ('0s', NULL,      10,     10,         4096,     );
commit;
-- set up a pseudo-table called 'sample' that can be used to query the log data
update logcontrol set raw='sample', enable=true where id=10;
commit;

Only the first insert statement differs from the historian configuration in the remote databases! The remote set-up had:
insert into logcontrol (id,  source,   namecolumn, enable, rate, usersampling)
                values (10,  'sensor', 'name',     false,  null, true        );

.. which meant samples are only generated when the sample() method is triggered on the logcontrol record. By contrast, the historian in the central database will generate a sample whenever the timestamp attribute of a sensor record is set - which, as a result of the CL code given above, will be whenever a reading record is created. In addition, the time-stamp from the reading record is used, as supplied from the remote database: the historian allows this, as long as the supplied value is not less than the value in the last sample it logged for that sensor record. Consequently, in normal running the samples captured by the two historians are in step - and if the connection is broken and then re-established, the central historian will back-populate the logged data when its copy of the reading table is brought back in sync.

Possible extensions, and conclusions

There are a number of ways where this demo can be extended, for example by detecting when no subscriber is connected and arranging to store more readings (or merging older readings). Also, it is possible to instruct the historian in the central database to perform time-compression on the logged reading, storing (say) the minimum, average and maximum values within, say, 5-minute and 1-hour intervals; these would be queried via SQL as before. (It is also possible to instruct the historian in the remote databases to do the same.) Some extensions will need business logic to be embedded in the central and remote databases via business logic, depending on the complexity of what is trying to be achieved.

In conclusion, the subscription facility significantly extends the capabilities of the Polyhedra in-memory database, allowing independent databases to have a live cache of data from other Polyhedra databases, with a catch-up mechanism in place to harmonise the cache when a broken connection is mended. If the central database wants to capture information about intermediate values while a connection is down (to ensure the data logged by the central historian in the central database is as accurate as possible, say), then the remote database can keep a table of recent readings to which the central database subscribes, and trigger code inside the central database can transfer these values (complete with time-stamp) to the table the historian is monitoring. Thus, while straightforward use of the subscription facility is very easy to set up, more complex cases can also be handled by means of careful design and a small amount of business logic.

Appendix: viewing the database.

The trend diagrams are produced using Python. The application connects to the Polyhedra using pyodbc, so on Linux you have to make sure that the Polyhedra ODBC driver has been installed, for use with unixodbc, and that appropriate DSNs have been set up. On the raspberry pi boards, the ~/.odbc.ini has been set up to define the DSNs 8001 (for connection to the local database on port 8001) and remote, for talking to the central fault-tolerant database.  The main limitation with this approach is that the application cannot take advantage of Polyhedra's active queries, and so has to reissue the queries each time it wants to refresh the trend diagram. However, the cost of this is low: the main time-consuming element is the recalculation of the graph from the retrieved data. (If, however, the application was rewritten in C or C++ to use a more efficient graphics library, it would be sensible to complete the optimisation by adjusting the application to make use  of active queries - which are permitted on historic data.)

(It may not be completely obvious, but the above graph contains two trend lines, one coloured blue and the other green. The on-board temparature sensors on the Raspberry Pi are somewhat granular, and noisy.)

The following code is used to open the connection:

parser = argparse.ArgumentParser(description='Display sensor readings')
parser.add_argument('DSN', nargs='?', default="8001", help='DSN of database to query')
args = parser.parse_args()
cnxn = pyodbc.connect('DSN=' + args.DSN)

The graph is drawn using the matplotlib library. First the database is queried to find out the names of the sensors in the database to which we are attached, and then, after initialising the graph, the query to retrieve the recent historic data for each sensor is set up, and set off to get the initial data:
 
cursor  = cnxn.cursor()
names   = []
for row in cursor.execute("select name from sensor order by name"):
    names.append (row.name)

fig = pylab.figure(figsize=(4,3),dpi=80)
pylab.grid(True)
ax = fig.add_subplot(111)
ax.grid(True)
ax.set_xticklabels([])

lines   = []
queries = []

for i, name in enumerate(names):
    x = []
    y = []
    query = ("select " +
             "60*to_minutes(timestamp-today())+the_second(timestamp) timestamp," +

             "value" +
             "from sample
" +
             "where timestamp>=now()-seconds(60) and name='
" +
name + "' " +
             "order by timestamp
"

            )
    queries.append(query)
    cursor = cnxn.cursor()
    for row in cursor.execute(queries[i]):
        x.append(row.timestamp)
        y.append(row.value)
    line, = pylab.plot(x, y, label=name, drawstyle='steps')
    lines.append(line)
   
pylab.ion()
pylab.show()

The main loop simply repeats each query, and tells matplotlib to replace the data for the corresponding trend line with the newly-retrieved information:

    while True:
        pylab.pause(0.2)
       
        for i, name in enumerate(names):
            line = lines[i]
            x   = []
            y   = []
            cursor = cnxn.cursor()
            for row in cursor.execute(queries[i]):
                x.append (row.timestamp)
                y.append (row.value)
            line.set_data(x,y)

(The full client application extended the the initial code to determine the maximum and minimum y values, and to scale the graph with a bit of leeway. There is similar code in the main loop, that rescaled the graph if the leeways were exceeded. Likewise, there was similar code to ensure the x axis covered the last minute, so that there would be a gap on the right hand side if the database being queried was no longer getting updates.)