Скачать презентацию PHOBOS Computing in the pre-Grid era Burt Holzman Скачать презентацию PHOBOS Computing in the pre-Grid era Burt Holzman

e4cf76eca15131628cbffdc450f9089a.ppt

  • Количество слайдов: 29

PHOBOS Computing in the pre-Grid era Burt Holzman Head of Computing PHOBOS Experiment PHOBOS Computing in the pre-Grid era Burt Holzman Head of Computing PHOBOS Experiment

Outline • “Collision Physics” • Computing with PHOBOS: now • Computing with PHOBOS: soon Outline • “Collision Physics” • Computing with PHOBOS: now • Computing with PHOBOS: soon

Acknowledgments PHOBOS Computing represents years of hard work by many people including but not Acknowledgments PHOBOS Computing represents years of hard work by many people including but not limited to Marten Ballintijn, Mark Baker, Patrick Decowski, Nigel George, George Heintzelman, Judith Katzy, Andrzej Olszewski, Gunther Roland, Peter Steinberg, Krzysztof Wozniak, Jinlong Zhang and the gang at the RCF

Collision Physics We (HEP/Nuclear/RHIC) collide billions (and billions. . . ) of • • Collision Physics We (HEP/Nuclear/RHIC) collide billions (and billions. . . ) of • • electrons vs. positrons antiprotons vs. protons nuclei vs. nuclei. . . and a whole lot more

Collision Physics Collision physics is ideal for parallel computing: each collision (“event”) is independent! Collision Physics Collision physics is ideal for parallel computing: each collision (“event”) is independent! Animation courtesy of Ur. QMD group

Collision Physics. . . and we collect millions of events per day Collision Physics. . . and we collect millions of events per day

RHIC Needs Power Event from Delphi (e+ e-) Event from STAR (Au Au) RHIC Needs Power Event from Delphi (e+ e-) Event from STAR (Au Au)

PHOBOS Detector Cerenkov Trigger Counters Time of Flight Counters Octagon Multiplicity Detector + Vertex PHOBOS Detector Cerenkov Trigger Counters Time of Flight Counters Octagon Multiplicity Detector + Vertex Detector (Silicon) Spectrometer Detectors (Silicon) Ring Multiplicity Detectors (Silicon) Beryllium Beam Pipe Magnet (top part removed) Paddle Trigger Counters

PHOBOS Computing Needs • Operations (Counting House) • Simulations (“Monte Carlo”) – Must understand PHOBOS Computing Needs • Operations (Counting House) • Simulations (“Monte Carlo”) – Must understand detector response and backgrounds • Production – Reconstruct event vertex – Reconstruct tracks • Analysis

Operations/Counting House • Data Acquisition • PR 01 -PR 04: quad Sparc + SCSI Operations/Counting House • Data Acquisition • PR 01 -PR 04: quad Sparc + SCSI Raid • PR 05: Big River TK 200 data recorder • Slow Controls • Windows NT running Labview • Online Monitoring • Datamover

Online Monitoring DAQ Raw Event Dist. Clients do distributed processing of events (Hit. Processing, Online Monitoring DAQ Raw Event Dist. Clients do distributed processing of events (Hit. Processing, …) TPh. On. Distributor Event TPh. On. Dist. Client Event TPh. Socket. Data. File 01

“Datamover” design • Run reconstruction ASAP on data • This implies: – Run calibrations “Datamover” design • Run reconstruction ASAP on data • This implies: – Run calibrations (“pedestals”) in Counting House – Run reconstruction from HPSS disk cache, not HPSS tape

PR 00 datamover • Users ran calibrations by hand • Users sunk data by PR 00 datamover • Users ran calibrations by hand • Users sunk data by hand • OK for a really slow DAQ

PR 01 -PR 04 datamover online DAQ pedestals SCSI deletor_daemon DAQ DISK ped_daemon NFS PR 01 -PR 04 datamover online DAQ pedestals SCSI deletor_daemon DAQ DISK ped_daemon NFS online tells pedestals when to run pedestals tells vmesparc when to sink SCSI FTP vmesparc daq_daemon HPSS

PR 05 datamover online DAQ pedestals daq_daemon ped_daemon DAQ DISK NFS online tells pedestals PR 05 datamover online DAQ pedestals daq_daemon ped_daemon DAQ DISK NFS online tells pedestals when to run pedestals tells vmesparc when to sink FTP HPSS Ramdisk

Vertex Reconstruction counts Si Si +z cm Vertex Resolution: sx ~ 450 mm sy Vertex Reconstruction counts Si Si +z cm Vertex Resolution: sx ~ 450 mm sy ~ sz ~ 200 mm For this event: vertex @ Z = -0. 054 cm cm Combinatorics can be very expensive

Beam Particle Tracking 1. Road-following algorithm finds straight tracks in field-free region By z Beam Particle Tracking 1. Road-following algorithm finds straight tracks in field-free region By z 2 1 x 10 cm 2. Curved tracks in B-field found by clusters in (1/p, ) space 3. Match pieces by , consistency in d. E/dx and fit in yz-plane 4. Covariance Matrix Track Fit for momentum reconstruction and ghost rejection

Size of the PHOBOS Farm RHIC Computing Facility pharm • 6 dual 733 MHz Size of the PHOBOS Farm RHIC Computing Facility pharm • 6 dual 733 MHz • 6 dual 933 MHz • 11 dual 1733 MHz • • • 30 dual 450 MHz 44 dual 800 MHz 63 dual 1000 MHz 26 dual 1400 MHz 98 dual 2400 MHz 80 dual 3060 MHz All nodes have 2 -4 big disks attached

LSF Batch (for analysis & simulations) QUEUE_NAME phslow_hi phcasfast_hi phcrs_hi phslow_med phcasfast_med phcrs_med phslow_lo LSF Batch (for analysis & simulations) QUEUE_NAME phslow_hi phcasfast_hi phcrs_hi phslow_med phcasfast_med phcrs_med phslow_lo phcasfast_lo phcrs_lo phslow_mc phcasfast_mc phcrs_mc PRIO STATUS 40 Open: Active 30 Open: Active 10 Open: Active 5 Open: Active MAX JL/U JL/P JL/H NJOBS 10 0 45 0 70 0 0 0 0 90 0 248 105 386 PEND 0 0 0 0 30 51 189 RUN 0 0 0 90 0 213 54 108 SUSP 0 0 0 0 5 0 89

CRS Batch (for Production) CRS Batch (for Production)

Monitoring: Ganglia http: //ganglia. sourceforge. net Monitoring: Ganglia http: //ganglia. sourceforge. net

ROOT • PHOBOS Software (“Ph. AT”) built on ROOT (http: //root. cern. ch) framework ROOT • PHOBOS Software (“Ph. AT”) built on ROOT (http: //root. cern. ch) framework – Automated I/O of objects – Very efficient data formats (trees) – C++ interpreter (!)

rootd • ROOT files accessible via rootd server-side daemon over 100 Mb/s network • rootd • ROOT files accessible via rootd server-side daemon over 100 Mb/s network • Can be inefficient: data is remote to analysis job (but OK if CPU is bottleneck)

Distributed Disk rootd rcrs 4001 ORACLE DB rcrs 4002 rcrs 4003 rcrs 4004 Distributed Disk rootd rcrs 4001 ORACLE DB rcrs 4002 rcrs 4003 rcrs 4004

Distributed Disk: Cat. Web • Web-based front-end • Oracle database back-end • Allows users Distributed Disk: Cat. Web • Web-based front-end • Oracle database back-end • Allows users to stage files to distributed disk via dedicated LSF queue

Distributed Disk: Cat. Web 88 Tb Distributed Disk: Cat. Web 88 Tb

PHOBOS: Soon • Condor to replace LSF batch • Pros: – Free (LSF costs PHOBOS: Soon • Condor to replace LSF batch • Pros: – Free (LSF costs big bucks) – Grid-ready – PROOF-ready • Cons: – No “queue” concept (hackish solutions may exist)

PHOBOS: Soon • PROOF (Parallel ROOT Facility) – Brings the process to the data! PHOBOS: Soon • PROOF (Parallel ROOT Facility) – Brings the process to the data! – In use now on pharm, soon on RCF – Hooks in with Condor (specifically: Condor On Demand, preempting all other activity)

#proof. conf slave node 1 slave node 2 slave node 3 slave PROOF Remotenode #proof. conf slave node 1 slave node 2 slave node 3 slave PROOF Remotenode 4 PROOF in action Local PC root stdout/obj ana. C proof node 1 Cluster TFile *. root ana. C proof $ root [0] tree. Process(“ana. C”) root [1] g. ROOT->Proof(“remote”) root [2] chain. Process(“ana. C”) proof = master server proof = slave server node 2 proof *. root TNet. File TFile *. root node 3 proof node 4