Скачать презентацию RTES CD Status Report most of the Скачать презентацию RTES CD Status Report most of the

d8cf562cdcf28ef4e35737c7f7e1d14a.ppt

  • Количество слайдов: 21

RTES – CD Status Report (most of the material from the BTe. V Temple RTES – CD Status Report (most of the material from the BTe. V Temple 2003 Review) Jim Kowalkowski

Deliverables • A Toolkit containing – Very Lightweight Agents (VLAs) – ARMORs – Modeling Deliverables • A Toolkit containing – Very Lightweight Agents (VLAs) – ARMORs – Modeling tools and a domain specific environment under which they operate • Some BTe. V trigger and DAQ specific “plug-ins” using the above toolkit, applied to both hardware and software

Participation in SC 2003 • All the university groups & Fermilab worked together to Participation in SC 2003 • All the university groups & Fermilab worked together to create a system (hardware and software) demonstrating their technology in a BTe. V Level 1 trigger-like setting • This was a concrete project with a deadline • Created a system that is being reviewed • Helped the RTES groups develop an understanding of the processing that goes on in the trigger and how events are generated • They developed code and rules that handle some of the problems we expect to encounter • Contains initial prototypes of the deliverables: GME models, ARMORs, and VLAs

SC 2003 – External view of demo SC 2003 – External view of demo

SC 2003 – Internal view of demo Linux PC Display data Actions, Commands ARMOR SC 2003 – Internal view of demo Linux PC Display data Actions, Commands ARMOR Control System GME ARMOR Monitoring, State Gateway Commands Event Generator Windows PC Start/Stop Fault injection Parameters settings Operator Switch Buffer Manager VLA Local Manager Fake Physics App 1 2 3 Farmlet-1 2 3

Milestones with respect to BTe. V • See draft BTe. V document 2079 • Milestones with respect to BTe. V • See draft BTe. V document 2079 • Year 3 – Define APIs, make distributed application decisions, evaluate modeling tools, create a more complete prototype (demo), generate a simulator • Year 4 – Synchronize with or conform to the BTe. V trigger development environment and TSM system for configuration, control, and monitoring • Year 5 – Used to address integration issues with the BTe. V trigger

Achieving milestones • (+) Many of the BTe. V needs (issues addressed in the Achieving milestones • (+) Many of the BTe. V needs (issues addressed in the milestones) are also necessary for RTES collaborators to carry out their research – – – Simulation, to validate or verify their ideas Scalability issues for the modeling tools and ARMORs APIs and ways to move data from place to place • (+) Concrete projects have been successful • (-) They need to feel that the BTe. V goals are matching their own research goals • (-) The students involved must have adequate skills

Schedule • RTES completes early in relationship with the milestones and completion of the Schedule • RTES completes early in relationship with the milestones and completion of the trigger • We have the last year (2006) to address integration issues • The TSM (controls/monitoring for the trigger) will be aided by RTES, but still function (most likely in a reduced capacity) without it

Acceptance testing • We are generating use cases (a work in progress - BTe. Acceptance testing • We are generating use cases (a work in progress - BTe. V document 2189) to capture behavioral requirements (when you poke the system like this, it reacts this way) – Use cases translate nicely into test cases and acceptance criteria (e. g. how many of the use cases does RTES software satisfy? ) – We are following a semi-formal methodology (Cockburn) • A detailed simulation will be used to verify RTES solutions • We are working toward automated component and integration test procedures that fit into the BTe. V development environment

Technology choices • We know the importance of keeping close contact with the BTe. Technology choices • We know the importance of keeping close contact with the BTe. V development and engineering staff – The configuration, distributed computing models, coding standards, and APIs will be highly influenced by BTe. V developers – All this will help make consistent RTES/TSM systems that require less maintenance effort and less manpower to create – BTe. V developers can make use of product evaluations and experiences of the RTES group • We are designing a message format and evaluating exchange protocols • We are investigating the use of a RTOS

Recent Activities • Use cases – Working to understand how to do this correctly Recent Activities • Use cases – Working to understand how to do this correctly with Margaret V. and Luciano P. – can we bring an expert in for a few days? • Evaluation of OSE real-time kernel – Fermilab will be using Power. PC 8540 as the platform/architecture – Target at this time is only the embedded systems in the trigger – Jim K. wants to be involved in this • Review of the prototype/demo system – Marc P. and Mark F. are the reviewers – Extremely valuable results already – Emphasized the need for use cases • Strong desire to evaluate parallel-C compilers that generate VHDL code (Jim K. )

Issues • Tools for research use versus tools for production use in an experiment Issues • Tools for research use versus tools for production use in an experiment • Where the tools actually fit in and the relationship between the traditional controls/monitoring systems • Time for research and contemplating solutions • Diverse interests within the group

RTES hardware for SC 2003 RTES hardware for SC 2003

Backup Slide - errors for SC 2003 • • • Increased/decreased data rate broken Backup Slide - errors for SC 2003 • • • Increased/decreased data rate broken communication link to a DSP trigger filter application hung death of manager process on the host PCs Increased/decreased processing time per event input queue high water mark reached unable to keep up after DSP failure impede processing on one DSP timeout during event processing bad and lost events or lost events

Slides from CHEP 2003 talk… Slides from CHEP 2003 talk…

Goals Summary • Implement a large, aggressive trigger, that – – – Applies computation Goals Summary • Implement a large, aggressive trigger, that – – – Applies computation to every interaction Has high sustained computational performance Maintains functional integrity for long periods of time Is highly available Is dynamically reconfigureable, maintainable, and evolvable – – Accurately identifying problems (where, what, and why) Compensating for problems (shift the load, changing thresholds) Automated recovery procedures (restart / reconfiguration) Accurate accounting Being extended (capturing new detection/recovery procedures) Policy driven monitoring and control • Create fault handling infrastructure capable of – – • Simplify operations

What is RTES? • A collaboration of five institutions, funded by NSF ITR grant What is RTES? • A collaboration of five institutions, funded by NSF ITR grant ACI-0121658 – University of Illinois (M. Haney, R. K. Iyer, Z. Kalbarczyk, Q. Liu, A. Mahajan, M. Selen, Z. Yang) – University of Pittsburgh (D. Mosse, O. Shigiltchoff) – University of Syracuse (R. Chopade, J. Oh, L. Hovey, S. Stone, D. Messie) – Vanderbilt University (T. Bapty, S. Neema, S. Norsdstrom, P. Sheldon, S. Shetty, E. Vaandering, D. Vashishtha) – Fermilab (J. Appel, J. Butler, E. Gottschalk, J. Kowalkowski, L. Piccoli, M. Votava) • Physicists and Computer Scientists/Electrical Engineers at BTe. V institutions with expertise in – High performance, real-time, embedded system software and hardware, – Reliability and fault tolerance, – System specification, generation, and modeling tools. • A group working on fault management in large computing clusters

Very Lightweight Agents (VLA) • Message scheduling and priority assignments • Fast, simple reactive Very Lightweight Agents (VLA) • Message scheduling and priority assignments • Fast, simple reactive decisions • Reads, summarizes, and reports sensors data • Are “pluggable” components • Lives alongside application • Some predictive capabilities

ARMOR View Node 1 Node 2 ARMOR Microkernel Detection Policy Process Mgmt. Named Pipe ARMOR View Node 1 Node 2 ARMOR Microkernel Detection Policy Process Mgmt. Named Pipe Mgmt. Daemon TCP Connection Mgmt. Daemon Network Node 3 Daemon Remote daemons ARMOR Microkernel Recovery Policy Execution Controller Local Manager ARMOR Trigger Application

Modeling Environment §Fault handling §Process dataflow §Hardware Configuration Modeling Environment §Fault handling §Process dataflow §Hardware Configuration

Why is all of this interesting? • It is an integrated approach – from Why is all of this interesting? • It is an integrated approach – from hardware to physics algorithms – Standardization of resource monitoring, management, error reporting, and integration of recovery procedures can make operating the system more efficient and make it possible to comprehend and extend. • There are real-time constraints – Scheduling and deadlines – Numerous detection and recovery actions • The product of this research will – Automatically handle simple problems that occur frequently – Be as smart as the detection/recovery modules plugged into it • The product can lead to better or increased – Trigger uptime by compensating for problems or predicting them instead of pausing or stopping a run – Resource utilization - the trigger will use resources that it needs – Understanding of the operating characteristics of the software – Ability to debug and diagnose difficult problems