SMP server architecture
SMP - symmetric multiprocessing architecture. The main feature of systems with architecture SMP is presence of the
general physical memory divided by all processors.
Schematic of SMP-architecture.
Memory is way of message transfer between processors. All computers at the reference to it have the equal rights and the
same addressing for all cells of memory. Therefore SMP architecture by name symmetric. The SMP-system is under
construction on the basis of the high-speed system trunk (SGI PowerPath, Sun Gigaplane, DEC TurboLaser). To slots the
bus functional blocks of three types are connected: the central processor, operational system and input-output subsystem.
For connection to modules input-output slower trunks (PCI, VME64) are used already. The most known SMP-systems are a
SMP-servers and workstations on the basis of processors Intel (IBM, HP, Compaq, Dell, ALR, Unisys, DG, Fujitsu). All system
works under control of one OS (for example - UNIX SMP server or FreeBSD SMP server).
Advantages of SMP-architecture:
- Simplicity and universality of programming. Architecture SMP does not impose restrictions on the model of programming
used at creation of the appendix. The model of parallel branches is usually used. When all processors work absolutely
independently from each other - however, it is possible to realize and the models using an interprocessor exchange. Use of
the general memory increases speed of such exchange, the user also has access at once to all memory size. For SMP-systems
there are rather effective means automatic deparalleling.
- Ease in operation. As a rule, SMP-systems use the system of air-conditioning cooling based that facilitates
their maintenance service.
- Rather low price.
Disadvantages:
Systems with the general memory, constructed on the system trunk, are badly scaled. This important lack of SMP-system
does not allow to consider their rather perspective. Besides the system trunk has limited (though also high) throughput
and limited number slots. All this with evidence interferes with increase in productivity at increase in number of
processors and numbers of connected users. In real systems it is possible to use no more than 32 processors. For
construction of scaled systems on the basis of SMP are used clustered or NUMA-architecture. At work with SMP systems use
so-called shared memory paradigm.
MPP server architecture
MPP - massive parallel processing architecture. The main feature of such architecture consists that memory is
physically divided. In this case the system is under construction of the separate modules containing the processor, local
bank of operating memory (RAM), two communication processors (routers) or the network adapter, sometimes - hard disks
and-or other devices of input/output. The router One is used for transfer of commands, another - for data transmission.
As a matter of fact, such modules represent full-function computers. Access to bank RAM from the given module have only
the central processor from the same module. Modules incorporate special communication channels. The user can define logic
number of the processor to which it is connected, and to organize an exchange of messages with other processors. Two
variants of work of operational system by servers of MPP-architecture are used. In one high-grade operational system
works only by the operating server (the forward end), on each separate module strongly cut down variant of OS which were
ensuring the functioning only a branch located in it of the parallel appendix works. In the second variant on each module
the high-grade UNIX-like OS (Linux, FreeBSD) established separately on each module works.
Schematic of architecture with separate memory
Main advantage:
The main advantage of systems with separate memory is good scalability: unlike SMP-systems in machines with separate
memory each processor has access only to the local memory in this connection there is no necessity in потактовой
synchronization of processors. Practically all records on productivity for today are established by the machines of such
architecture consisting of several thousand of processors (ASCI Red, ASCI Blue Pacific).
Disadvantages:
- Absence of the general memory noticeably reduces speed of an interprocessor exchange as there is no general environment
for a data storage, intended for an exchange between processors. The special technics of programming for realization of
an exchange by messages between processors Is required.
- Each processor can use only the limited volume of local memory bank.
- Owing to the specified architectural lacks significant efforts as much as possible to use system resources are required.
- It defines the high price of the software for massive parallel systems with separate memory.
Systems with separate memory are supercomputers: MBC-1000, IBM RS/6000 SP, SGI/CRAY T3E, ASCI systems.
Servers of last series CRAY T3E from SGI, based on the basis of processors the Dec Alpha 21164 with peak
productivity 1200 Mflps/sec (CRAY T3E-1200), are capable to be scaled up to 2048 processors.
At work with MPP systems use so-called Massive Passing Programming Paradigm (MPI, PVM, BSPlib).
NUMA - hybrid server architecture
NUMA - nonuniform memory access architecture. The main feature of such architecture - non-uniform access to memory.
The hybrid architecture personifies convenience of systems with the general memory and relative cheapness of systems with
separate memory. An essence of this architecture - in the special organization of memory: memory is system
physically distributed by various parts, but logically divided so the user sees uniform address space. The system consists
of the homogeneous base modules (payments) consisting of a small number of processors and the block of memory. Modules
are incorporated by means of the high-speed switches. The uniform address space is supported, is hardware access to the
removed memory, i.e. to memory of other modules is supported. Thus access to local memory is carried out in some times
more quickly, than to removed. In essence architecture НАМА is MPP (massive parallel architecture) architecture, where
as separate computing elements undertake SMP units.
The block diagram of a computer with a hybrid network, units are connected by a network of type the Butterfly:
For the first time the idea of hybrid architecture was offered by Steve Voloh. It has embodied it in systems of a series
the sample. The Voloh's variant - the system consisting from 8 SMP of units. Firm HP has bought idea and realized on
supercomputers of series SPP. Idea has picked up Seymour R.Cray and has added a new element - a coherent cache, having
created so-called architecture cc-NUMA (the Hiding place Consecutive Non-uniform Access of Memory) which is deciphered
as "non-uniform access to memory with maintenance coherent caches". It it realized the Origin on systems of type.
The organization of multilevel hierarchical memory.
The concept coherent cashes describes that fact, that all the central processors receive identical values of the
same variables at any moment. Really, as the cache-memory belongs to a separate computer, instead of all multiprocessing
system as a whole, the data getting in a cache of one computer, can be inaccessible to another. To avoid it, it is
necessary to lead synchronization of the information stored in a cache-memory of processors.
For maintenance similar coherent cashes there are some opportunities:
To use the mechanism of tracking bus inquiries (unduly curious bus report) in which caches trace the variables
transferred to any the central processors and, if necessary, modify own copies of such variables.
To allocate the special part of memory which are responsible for tracking of reliability of all used spears of variables.
The most known systems of architecture cc-NUMA are: HP 9000 V-classes in SCA-configurations, SGI Origin3000, Sun HPC
15000, IBM/Sequent NUMA-Q 2000. For the present moment the maximal number of processors in cc-NUMA-systems can exceed
1000 (series Origin3000). Usually all system works under control of uniform operating system, as in SMP - UNIX. Variants
of system dynamic "division" when separate "sections" of system work under control of different OS are possible also. At
work as NUMA-systems, also as with SMP, use a so-called paradigm of programming with the general memory (a paradigm of
shared memory).
PVP server architecture
PVP - are parallel to architecture of Process of the Vector. The basic attribute of PVP-systems is presence of special
vector-conveyor processors in which commands of the same processing of vectors of the independent data are stipulated,
effectively carried out on conveyor functional devices. As a rule, some such processors (1-16) work simultaneously with
the general memory (similarly SMP) within the limits of multiprocessing configurations. Some such units can be incorporated
by means of the switchboard (similarly MPP). As data transmission in a vector format is carried out much more quickly,
than in scalar (the maximal speed can make 64 ui/with, that on 2 orders is faster, than in scalar machines) the problem
of interaction between dataflows at deparalleling becomes insignificant. And that is bad deparalleling by scalar
machines, is good deparalleling on vector. Thus, systems PVP of architecture can be servers of a general purpose
(the general systems of the purpose). However, as vector processors are rather dear, these servers will not be popular.
Servers of PVP architecture are most popular:
1. CRAY SV-2, SMP server architecture. Peak productivity of system in a standard
configuration can make tens teraflops.
2. NEC SX-6, NUMA server architecture. Peak productivity of system can reach 8 Tflops,
productivity of 1 processor makes 8 Gflops. The system is scaled up to 128 units.
3. Fujitsu-VPP5000 (vector processing of a parallel), MPP server architecture.
Productivity of 1 processor makes 9.6 Gflops, peak productivity of system can reach 1249 Gflops, the maximal capacity of
memory - 8 Tb. The system is scaled up to 512 units.
The paradigm of programming on PVP systems provides a vectoring of cycles (for achievement of reasonable productivity of
one processor) and them deparalleling (for simultaneous loading several processors by one appendix).
Due to the big physical memory (a share of a terabyte), even it is bad vectoring problems on PVP systems are solved
more quickly, on systems with scalar processors.
|