TMR, Reliability, and System Debugging in ICL

ICL bases process control applications on a single application language system implemented in what could be a single computer or a larger distributed system. A significant question is how this can be supported by such classic reliability strategies as Triple Modular Redundancy (TMR). The question is answered in a number of background papers, but not in any of the published papers or on this site elsewhere. The distinction between the Large and Small System versions of ICL do support the classic process control attitude toward Single Loop Integrity. This note brings other intended solutions onto the site. The issue came to mind in reading the February 2008 issue of the ISA "InTech" Magazine, and particularly the cover "Petrochem Sees Triple" article.

TMR Implemented for large Functional Elements (Big Box TMR)

TMR

Traditional TMR addresses networks of voted elements, each element generating a simple computed value. In the network, each computing element must be duplicated and voted against its peers to arrive at each local voted value. As in the figure the result, except the last, involves duplicated voters as well. More elaborate strategies address failures in any final voter. But the modern digital control systems, in particularly ICL computes many complex values in a single computer.

Some corresponding strategy is required to summarize the internal results of each control computers computation in a form that allows the comparison of each of the computers computations as a whole. While each output control value could be compared machine for machine, this would be insufficient given the complexity of the sequenced actions going on in a modern control program. Perfectly correct computations could occur in which a slight difference in timing in the different computers suggested incorrectly that one or the other voted computer had failed.

To begin with, my proposals assumed that the individual I/O points might be voted (for logical values) on the assumption that collective values would catch up in any case. For analog or real valued control values voting would consist of median selections between each of the computed results. In some cases these votes or median selections could take place digitally in a digital processor, part of the I/O equipment. But the key issue is the summarization and voting of the more complex ICL (or even I/O) computers.

Big Box TMR

The computations of corresponding duplicated machines would have to receive identical input data and be synchronized so that corresponding computation summarizing computations occurred over the same relative interval in time. Thus there would be a shared computational sample time, at the end of which each computer would compute its summary. The summaries could then be compared and any differing computer could be taken out of the shared voted result.

Since the summary needs no particular operational connection to the control function, it could consist of any combined result from the control computations. For example, a check sum of all of the computed control results would serve to summarize the computation in a way which allowed the summaries of all computers to be compared. These summaries would require only a fraction of the control computation time and could be as detailed as desired for the needed redundancy. This general strategy can be implemented in many ways involving many quite different control computer and I/O designs, serving many possible tradeoffs and desired flexibilities, as well.

ICL Debugging Aids and Operating States

ICL is intended to exist in an integrated operating environment and human interface. The paper "Deriving the Human Interface from the Automatic Controls" describes some of the online ICL operator interface design issues. But the operating environment supporting application debugging is a separate engineering function, with dimensions relating to both pre-operational design and debugging and to operational diagnosis and operation.

In the Specification Documents, an ICL program is defined as having a set of independent system operating states. A particular Operation (application program or an independent part of such a program):

In each of these live program execution states the Operation can be in subStates:

In addition an Operation may be: ACTIVE (normally running) or INACTIVE (not running, usually as a result of programmed command). It can also be BOOKED (under the control only of some external Operation) or UNBOOKED. Finally it can be in an INITIALIZE (commanded to initialize) or _ (not initializing).

The above States allow operators and programming engineers to control the operation of the process application. In addition, ICL is intended to include features that allow the engineer to better trace cause and effect running of the application. In particular, every Operation, Task, Block, or variable includes a back pointer that allows the engineer to locate the prior causing Statement which set that element in the current State or value.