d6a444d98a5b6c8cf7c09b10b86bda72.ppt
- Количество слайдов: 21
CMS Monitoring tools Farida Fassi November 28 th, 2008
Goal n n Review of some CMS monitoring tools using ARDA Dashboard Useful features of dashboard for remote monitoring – Services status for your site – SAM tests for basic diagnostics - Job activities status for your site n Ph. EDEx monitoring tool for transfer activities – http: //cmsweb. cern. ch/phedex/
Starting point http: //arda-dashboard. cern. ch/cms/ Jobs SAM
SAM visualization n 4 clickable buttons – Latest Results – Historical View – Feedback Savannah – Help Twiki n Every page you’ll find has an URL
Latest results: CE view n The one that ‘comes easy’ Click to reset to menus Click to see log Click to see 48 h history From GOCDB
Last 48 h n This view is not clickable ! – But shows when tests ran
select service Types menu n Great instructions from Facility Operation team – https: //twiki. cern. ch/twiki/bin/view/CMS/SAMChecklist n Your favorite site will look like this SRMv 2, CE tests
SAM availability browsing Can browse and click down to single test and then will get log every time the color matrix has a blue border Means it is clickable Click
SAM visualization (1)
SAM visualization (2) Click to see log of this test
Job processing on the Grid To follow the job processing and analysis on the Grid You can use the main CMS Dashboard page: http: //dashboard. cern. ch/cms n Click on the “Interactive view”
Job Dashboard Direct link is : http: //lxarda 09. cern. ch/dashboard/request. py/jobsummary n You have a choice: 1). Select to see all jobs submitted in the selected time window (default), By default you get last 24 hours time Window 2). Select all jobs which had been terminated in last 24 hours or are pending or running at the current moment. Then select ‘all jobs regardless submission time’ option
Running time (wall clock, from job wrapper) One random day click here n One random day http: //tinyurl. com/2 l 6 s 4 s
Waiting time (from submission to start of job) http: //tinyurl. com/22 vknn n One click here random day
Interactive view What info it can provide me? All my jobs at a given site had failed, does the site have a problem? Supposing you are having Problems in FZK. Let’s check whether you are the only one who. Sort by site. The sites having a lot of light green or red, are those which might have a trouble. FKZ looks suspicious in this respect, Let’s investigate further.
Expand using bars Left click on color bars to get menu for expanding by… Keep doing it Note: more items then on left menu, in particular by task, by submission type (crab server/direct), etc
Interactive view What info it can provide me? (1) Each column can be used for sorting Each blue number is clickableGet list of jobs, Grid/Crab id’s, times, exit codes, Worker. Node name (or IP)
Interactive view What info it can provide me? (2) The full list of job failure codes you can get it by clicking at Exit. Code Jobs are failing with the code 50115 cms. Run did not produce a valid/readable job report at runtime The full list of job failure codes You can get by clicking at Exit. Code Jobs indicating site problem are all marked there
Feedback ! Link to Savannah
Useful links n Commissioning Twiki: https: //twiki. cern. ch/twiki/bin/view/CMS/Computing. Commissioning – Dashboard: http: //arda-dashboard. cern. ch/cms – SAM: http: //lxarda 16. cern. ch/dashboard/request. py/samvisualization n Squid Monitoring – http: //belforte. home. cern. ch/belforte/misc/Squid-Hit-Summary. html – Details for your site – http: //frontier. cern. ch/squidstats/indexcms. html n Ph. EDEx: http: //cmsweb. cern. ch/phedex/
d6a444d98a5b6c8cf7c09b10b86bda72.ppt