To use the fault tolerance equipment without corresponding fault tolerance operating system is all the same what to fly
by the plane without wings. Differently, neither the hardware, nor operating system are not self-sufficient concerning
maintenance of the computing system fault tolerance.
So has developed historically, that the equipment and operating systems were always developed in close interaction.
Sometimes architecture of operating system influenced design of the hardware, but usually operational system was
developed so that to approach to already existing equipment.
Modern operating systems are under construction by a principle of layers or levels. At the lowermost level there are
hardware-dependent modules, as, for example, device drivers. The uppermost layers lean on abstract model of a computer,
and do not give any attention of the hardware. Failure-safe operating systems too are under construction by a
hierarchical principle. An example of fault tolerance free distributed operating system UNIX is FreeBSD. The structure
of this operating system version developed by us is shown below.
Unceasing functioning of system is reached due to the developed mechanism of automatic restoration after failure. If any
process is configured as failure-safe in system always there are its two copies which are placed on physically various
processors. One copy is active process, and another - reserve. If the basic process interrupts because of unstable
failures in work hardware or the software reserve process incurs functions of the core and continues calculations from a
point of interruption.
As contents of both processes memory completely coincide, the second processor has the same access to all system
resources, as well as the basic, therefore any additional measures for restoration of the interrupted process is not
required. The operating system at once creates on other CPU new reserve process so reserve duplication of processes is
kept.
Functions of the layers most close to the hardware:
- Check of a working condition of the processor (whether works, voting);
- Switching-off of the given up processors;
- Redistribution of the given up processors functions, maintenance of integrity a cache-memory and memories;
- Support of failure-safe peripheral devices (for example disk systems RAID).
Higher layers carry out such problems as processes restart, re-configuration network interfaces at a level of the
protocol (a feather assignment of TCP/IP address to the interface of a secondary network which becomes more active after
failure of a primary network), start of again configured network, etc.
For applications the system gives typical interface UNIX. Functional expansion of operating system can be carried out
by means of precisely certain interface of applied programs (API) which in UNIX environment is usually realized through
system calls or with use of library functions. The application can use all these opportunities or work without them in not
safe (from the refusals point of view) a mode.
|