7ab582bd0956e80875d1a73a6d70089d.ppt
- Количество слайдов: 23
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group
Overview n The ATLAS TDAQ System n Dataflow & HLT n Control Subsystem of the Online Software n Architecture n TDAQ Wide Run Control Group n Technology Choice n CLIPS n Design & Implementation n Expert System Framework n Run Control, Supervision & Verification n Testing & Verification n Test beam n Scalability Tests CHEP 04 - Interlaken Control of the ATLAS TDAQ system 2
The ATLAS TDAQ System n Dataflow ROD n ROS LVL 1 HLT n LVL 2 n Event Filter Online System n Operation DCS n Detector control n n n Test beam: see [331] n Event Building Performance: see [217] CHEP 04 - Interlaken Control of the ATLAS TDAQ system 3
Control Aspects n Dataflow n Fixed configuration n Synchronization, classical Run Control n Error handling n High level Triggers n Flexible configuration n Synchronization n Error Handling CHEP 04 - Interlaken Control of the ATLAS TDAQ system 4
ATLAS Online Software n Component Architecture n Object Oriented, C++ and Java n Distributed system (CORBA) n XML for Configuration n Specialized services for a TDAQ system n Information sharing, Message Reporting, Configuration n Iterative Development Model n Prototype already in use n Laboratories, Test beam, Scalability tests n Evolvement into the systems for initial ATLAS system CHEP 04 - Interlaken Control of the ATLAS TDAQ system 5
Online Software Architecture n In the context of the iterative development cycle and the Technical Design Review n Reevaluation of requirements and architecture n Several high level packages & corresponding subsystems n Control n Supervision, Verification n Databases: see [130] n Configuration, Conditions n Information Sharing: see [166] n Information Service, Message Service, Monitoring CHEP 04 - Interlaken Control of the ATLAS TDAQ system 6
Control Subsystem In the following only the Supervision subsystem is discussed CHEP 04 - Interlaken Control of the ATLAS TDAQ system 7
Supervision n The Initialization and Shutdown is responsible for: n initialization of TDAQ hardware and software components; n re-initialization of a part of the TDAQ partition when necessary; n shutting the TDAQ partition down gracefully; n TDAQ process supervision. n The Run Control is responsible for n controlling the Run by accepting commands from the user and sending commands to TDAQ sub-systems; n analyzing the status of controlled sub-systems and presenting the status of the whole TDAQ to the Operator n The Error Handling is concerned with n analyzing run-time error messages coming from TDAQ sub-systems; n diagnosing problems, proposing recovery actions to the operator, or performing automatic recovery if requested. CHEP 04 - Interlaken Control of the ATLAS TDAQ system 8
TDAQ Wide Run Control group n Examines the requirements from the subsystem side n Dataflow, HLT n Hierarchical concept n Follows the overall organization of the TDAQ system n Controller central element n All control functionality in combined controller n State machine concept for synchronization n Flexibility in error handling n User customization CHEP 04 - Interlaken Control of the ATLAS TDAQ system 9
Initial Design & Technology Choice n A Run Control implementation is based on a State Machine model and uses the State Machine compiler, CHSM, as underlying technology. n P. J. Lucas, An Object-Oriented language system for implementing concurrent hierarchical, finite state machines, MS Thesis, University of Illinois, (1993) n A Supervisor is mainly concerned with process management. It has been built using the Open Source expert system CLIPS n CLIPS, A tool for building expert systems, http: //www. ghg. net/clips/CLIPS. html n A Verification system (DVS) performs tests and provides diagnosis. It is also based on CLIPS. CHEP 04 - Interlaken Control of the ATLAS TDAQ system 10
Experiences n PLUS n Scalability test in 2002 demonstrated that a system of the size of ATLAS TDAQ system can be controlled n MINUS n Lack of flexibility (CHSM) CHEP 04 - Interlaken Control of the ATLAS TDAQ system 11
Technologies n CLIPS n n n Alternatives n n n Jess: Java based, very similar to CLIPS Eclipse: Commercial evolution of CLIPS SMI++ n n Production system, standard open source expert system So-called Rete algorithm drives the evaluation rules on a set of facts In house experience General purpose scripting language, OO features C language bindings State Machine No general purpose scripting language Difficult to integrate in our environment Python n n Excellent scripting language No expert system CHEP 04 - Interlaken Control of the ATLAS TDAQ system 12
Design & Implementation n General Framework embedding CLIPS in a CORBA server Periodic evaluation of knowledge base n Extension mechanism n n Online Software Components embedded as plug ins n Control functionality fully described by CLIPS rules CHEP 04 - Interlaken Control of the ATLAS TDAQ system 13
Proxy Objects n Represent external entities n Other controllers, processes etc n Member attributes exposed to expert system as facts n Member functions implement functionality in terms of Online software components n Example n Proxy objects represents child controllers n State of the object corresponds to state of the child (idle, configured, running) n Commands are forwarded to child controllers CHEP 04 - Interlaken Control of the ATLAS TDAQ system 14
Controller Rules drive interactions between objects Proxy Objects Other Controllers External processes CHEP 04 - Interlaken Control of the ATLAS TDAQ system 15
Status n Supervisor n Uses Framework n Run Control n Uses Framework n Verification system n CLIPS based n Choice of a common technology drives the path to an unified control system based on Controllers CHEP 04 - Interlaken Control of the ATLAS TDAQ system 16
Scalability Test 2004 n Test bed n Up to 330 PCs of the CERN IT LXSHARE n 600 to 800 MHz to 2. 4 GHZ Dual Pentium III n 256 to 512 MB n Linux Red. Hat 7. 3 n Only control aspect verified n No Dataflow network n Various configurations n Servers on standard machines n Servers on dedicated high end machines CHEP 04 - Interlaken Control of the ATLAS TDAQ system 17
Supervisor – Process Management Supervisor P n One Supervisor P n PMG Agents n Startup limited by initialization of processes n Enhanced recovery P procedures CHEP 04 - Interlaken Control of the ATLAS TDAQ system 18
Startup with 1000 Controllers & 3000 processes in 40 to 100 seconds Several configurations: mon_standard has two additional processes for a controller CHEP 04 - Interlaken Control of the ATLAS TDAQ system 19
Run Control n Usual RC tree n Actually 10 controllers on the lowest level n Variation of the number of intermediate nodes n Some central infrastructure n n Name Service (IPC) Information Sharing CHEP 04 - Interlaken Control of the ATLAS TDAQ system 20
Transitions n 7 internal phases n With 1000 Controllers 2 to 6 seconds n No “real life” actions Again: More flexible error handling CHEP 04 - Interlaken Control of the ATLAS TDAQ system 21
Combined Testbeam 2004 Stable operation from the start – Advantage of the component model CHEP 04 - Interlaken Control of the ATLAS TDAQ system 22
Conclusions n New assessment of requirements n Overall Architecture n Controller studied in detail n CLIPS confirmed as technology choice n Design and implementation of a new framework n First test of new systems n Test beam n Scalability test n We can control a system of the size of the ATLAS TDAQ system n Much more flexible system n Common technology in various control components n Unified controllers in the future CHEP 04 - Interlaken Control of the ATLAS TDAQ system 23
7ab582bd0956e80875d1a73a6d70089d.ppt