Difference between revisions of "OCEOS/oceos fdir"

From wiki
Jump to navigation Jump to search
Line 5: Line 5:
To detect such faults and anomalies, as scheduling proceeds OCEOS automatically checks that its key data areas have not been corrupted and that uses of resources are within their prescribed bounds. Parameters passed to directives are checked for consistency with the values originally declared in the application configuration.
To detect such faults and anomalies, as scheduling proceeds OCEOS automatically checks that its key data areas have not been corrupted and that uses of resources are within their prescribed bounds. Parameters passed to directives are checked for consistency with the values originally declared in the application configuration.


Once detected, OCEOS provides five levels of responses to faults and anomalies:
Once detected, OCEOS provides five levels of responses to faults and anomalies:<br>
 
Level 1: The status code returned by a directive indicates that there was a problem
# The status code returned by a directive indicates that there was a problem
<blockquote>
<blockquote>
The status code returned by a directive should always be checked.
The status code returned by a directive should always be checked.
Line 13: Line 12:
This is the only response made when the anomaly is an invalid parameter to a directive. It is combined with responses from other levels if a directive fails for some other reason.
This is the only response made when the anomaly is an invalid parameter to a directive. It is combined with responses from other levels if a directive fails for some other reason.
</blockquote>
</blockquote>
 
Level 2: An appropriate log entry is added to the system log.
# An appropriate log entry is added to the system log.
<blockquote>
<blockquote>
The log entry contains a code identifying the problem and a 32-bit timestamp.
The log entry contains a code identifying the problem and a 32-bit timestamp.
Line 21: Line 19:
This response is often accompanied by a Level 3 response.
This response is often accompanied by a Level 3 response.
</blockquote>
</blockquote>
# The system state variable is updated and a user defined problem handling function called.
Level 3: The system state variable is updated and a user defined problem handling function called.
<blockquote>
<blockquote>
The system state variable consists of 32 flags each indicating a particular problem type. It is reset by oceos_init. A copy of this variable is also maintained. Both variables may be reset by the ASW.
The system state variable consists of 32 flags each indicating a particular problem type. It is reset by oceos_init. A copy of this variable is also maintained. Both variables may be reset by the ASW.
This response occurs also in most circumstances where the Level 2 response occurs.
This response occurs also in most circumstances where the Level 2 response occurs.
</blockquote>
</blockquote>
# An ASW problem handling function is called by OCEOS.
Level 4: An ASW problem handling function is called by OCEOS.
<blockquote>
<blockquote>
A problem handling function can be identified in the application configuration structure passed to oceos_init and if present may be called by OCEOS.
A problem handling function can be identified in the application configuration structure passed to oceos_init and if present may be called by OCEOS.
Line 32: Line 30:
This response occurs also in most circumstances where the Level 3 response occurs
This response occurs also in most circumstances where the Level 3 response occurs
</blockquote>
</blockquote>
# OCEOS exits and returns to the ASW with an appropriate status code.
Level 5: OCEOS exits and returns to the ASW with an appropriate status code.
<blockquote>
<blockquote>



Revision as of 10:34, 28 March 2022

OCEOS Fault Detection, Isolation and Recovery (FDIR)

FDIR Introduction

Hardware and software faults can occur that affect the behaviour of the system.

To detect such faults and anomalies, as scheduling proceeds OCEOS automatically checks that its key data areas have not been corrupted and that uses of resources are within their prescribed bounds. Parameters passed to directives are checked for consistency with the values originally declared in the application configuration.

Once detected, OCEOS provides five levels of responses to faults and anomalies:
Level 1: The status code returned by a directive indicates that there was a problem

The status code returned by a directive should always be checked. This code identifies whether the directive succeeded and if not indicates the reason for the failure. This is the only response made when the anomaly is an invalid parameter to a directive. It is combined with responses from other levels if a directive fails for some other reason.

Level 2: An appropriate log entry is added to the system log.

The log entry contains a code identifying the problem and a 32-bit timestamp. This response occurs in addition to the Level 1 response if a directive fails due to an internal factor such as the maximum number of jobs for a task being already created or a data queue being full. This response also occurs when a task misses its deadline or any unexpected behaviour such as the system stack reaching its specified lower bound is detected. This response is often accompanied by a Level 3 response.

Level 3: The system state variable is updated and a user defined problem handling function called.

The system state variable consists of 32 flags each indicating a particular problem type. It is reset by oceos_init. A copy of this variable is also maintained. Both variables may be reset by the ASW. This response occurs also in most circumstances where the Level 2 response occurs.

Level 4: An ASW problem handling function is called by OCEOS.

A problem handling function can be identified in the application configuration structure passed to oceos_init and if present may be called by OCEOS. This function can read the system log and system state variable, read the task timing and other information, enable and disable tasks, reset counting semaphores and data queues, and if necessary exit OCEOS. It can reset the system state variable, it is recommended that the copy of the system state variable not be reset so as to provide a longer term record of problems. This response occurs also in most circumstances where the Level 3 response occurs

Level 5: OCEOS exits and returns to the ASW with an appropriate status code.

FDIR Configuration

API Functions

API Functions
Directive Description main task IRQ handler
oceos_log_add_entry() Add a log entry * * *
oceos_log_remove_entry() Read and remove the oldest unread log entry * * *
oceos_log_get_indexed_entry() Read the log entry at the given index * * *
oceos_log_reset() Clear all log entries and reset to empty * * *
oceos_log_get_size() Get the number of log entries * * *