
595f778b16f5d1aa694a7c7eaa01fd99.ppt
- Количество слайдов: 17
Cluster Classification Studies with MVA techniques Motivation: Current EMFrac. Classification tool uses 75 TProfile 2 D plots as “lookup tables”. Why not apply simple cuts? Maybe more sophisticated MVA discrimination techniques (Likelihood, ANN, . . . ) Why use the two cluster moments <ϱ> and _clus? Improvement of the efficiency and purity of the classification. Used as test analysis for developing a toolkit for multi variate analyses TMVA. (see http: //tmva. sf. net). TMVA integration in ROOT is about to finish this week Kai Voss, University of Victoria 05/05//2006 / Page 1 kai. voss@cern. ch
Data set The basis of the cluster classification studies are the postrome single pions with calibration hits: http: //menke. home. cern. ch/menke/cgi-bin/hec/postrome. sh Created same data sets with electrons/positrons with same software and scripts: (would be on castor already, but my grid certificate expired) # events per generated single particle energy : clusters: Kai Voss, University of Victoria energy distribution for all 05/05//2006 / Page 2 kai. voss@cern. ch
Which clusters are from the electron or pion? In an empty calorimeter one expects up to 12 clusters from noise in addition to the clusters from the generated single particle Take only clusters wich contain energy from calibration hits (true G 4) clusters in pion sample: Kai Voss, University of Victoria clusters in electron 05/05//2006 / Page 3 kai. voss@cern. ch
Definition of the classification samples Strategy: “Try to find the EM clusters first, and apply weights to the rest” EM clusters are the “signal” Defition of “EM clusters” EM_frac (from calibration hits) > 0. 9 (not tuned yet) Kai Voss, University of Victoria 05/05//2006 / Page 4 kai. voss@cern. ch
Cluster moments (2 < eta <2. 2; 4 < E < 16 Ge. V) Kai Voss, University of Victoria 05/05//2006 / Page 5 kai. voss@cern. ch
Cluster moments 2/3 Kai Voss, University of Victoria 05/05//2006 / Page 6 kai. voss@cern. ch
Cluster moments 3/3 There are many cluster moments already calculated by default Some look pretty promising! Kai Voss, University of Victoria Try to find out “best variable” or “best variable set” using automatic cut / Page 7 05/05//2006 kai. voss@cern. ch optimisation technique
Method of Cut Optimisation “Optimal cuts” maximise the signal efficiency at given background efficiency. The result is (in this case) a set of 100 cuts corresponding the signal efficiency from 0 to 1. Each cut set has a corresponding background rejection efficiency. For the application afterwards one has to choose one working point. Technically, optimisation is achieved in TMVA by Monte Carlo generation using uniform priors for the lower cut value, and the cut width, thrown within the variable ranges. Kai Voss, University of Victoria 05/05//2006 / Page 8 kai. voss@cern. ch
Example for Cut Optimisation Take the two variables from Emfract. Tool: <ϱ> and _clus Run cut optimisation: EMFrac. Tool would be just one point in this plot Kai Voss, University of Victoria 05/05//2006 / Page 9 kai. voss@cern. ch
Finding the best set of variables Strategy: Run cut optimisation for all combinations of 2 (3, 4, 5) moments out of the 16. Compare the resulting efficiencies at background rejection of 99% (high purity) This is done for more than 1000 combinations in to bins 0. 2 < |eta| <0. 4 4 < E_clus < 16 Ge. V 2. 0 < |eta| <2. 2 4 < E_clus < 16 Ge. V Kai Voss, University of Victoria 05/05//2006 / Page 10 kai. voss@cern. ch
“optimal” Set of Variables (i) 0. 2 < |eta| <0. 4 --- MVA 4 < E_clus < 16 Ge. V: Signal efficiency: --- Methods: @B=0. 01 @B=0. 10 @B=0. 30 The name “cut_xyz” is a short cut for cutting on three variables (x, y, z) --- Cuts_278 : 0. 681 0. 940 0. 986 --- Cuts_279 : 0. 671 0. 939 0. 987 --- Cuts_27 : 0. 671 0. 939 0. 986 --- Cuts_27 c : 0. 671 0. 938 0. 986 2 = "cl_center_lambda_topo" --- Cuts_8 c : 0. 668 0. 915 0. 985 3 = "cl_lateral_topo" --- Cuts_289 : 0. 667 0. 936 0. 987 4 = "cl_center_x_topo" --- Cuts_28 a : 0. 666 0. 936 0. 986 5 = "cl_longitudinal_topo" --- Cuts_27 a : 0. 663 0. 938 0. 986 6 = "cl_lateral_topo" --- Cuts_27 b : 0. 661 0. 939 0. 986 7 = "cl_m 1_dens_topo" --- Cuts_270 : 0. 654 0. 941 0. 987 8 = "cl_m 2_dens_topo" --- Cuts_8 a : 0. 651 0. 936 0. 985 9 = "cl_center_Y_topo" --- Cuts_28 c : 0. 644 0. 935 0. 986 a = "cl_delta_theta_topo" 0. 644 0. 929 Kai Voss, University of Victoria 0. 986 b = "cl_center_z_topo" --- Cuts_280 : 0 = "cl_m 2_r_topo" 1 = "cl_m 2_lambda_topo" c = "cl_eng_frac_max_topo" 05/05//2006 / Page 11 kai. voss@cern. ch
“optimal” Set of Variables (ii) The name “cut_xyz” is a short cut for cutting on three variables (x, y, z) 0. 2 < |eta| <0. 4 --- MVA --- Methods: 4 < E_clus < 16 Ge. V: Signal efficiency: @B=0. 01 @B=0. 10 @B=0. 30 --- Cuts_25 c : 0. 568 0. 891 0. 980 --- Cuts_258 : 0. 566 0. 892 0. 979 --- Cuts_25 b : 0. 556 0. 890 0. 980 2 = "cl_center_lambda_topo" --- Cuts_256 : 0. 554 0. 891 0. 980 3 = "cl_lateral_topo" --- Cuts_5 bc : 0. 553 0. 893 0. 980 4 = "cl_center_x_topo" --- Cuts_25 : 0. 539 0. 896 0. 979 5 = "cl_longitudinal_topo" --- Cuts_257 : 0. 539 0. 894 0. 980 6 = "cl_lateral_topo" --- Cuts_25 a : 0. 539 0. 894 0. 980 7 = "cl_m 1_dens_topo" --- Cuts_25 c : 0. 538 0. 892 0. 980 8 = "cl_m 2_dens_topo" --- Cuts_5 b : 0. 534 0. 896 0. 980 9 = "cl_center_Y_topo" --- Cuts_7 b : 0. 533 0. 930 0. 985 a = "cl_delta_theta_topo" --- Cuts_278 : 0. 533 0. 930 0. 985 b = "cl_center_z_topo" Kai Voss, --- Cuts_27 : 0. 533 0. 929 0. 985 05/05//2006 / Page 12 University of Victoria c = "cl_eng_frac_max_topo" --- Cuts_279 : 0. 533 0. 928 0. 984 kai. voss@cern. ch 0 = "cl_m 2_r_topo" 1 = "cl_m 2_lambda_topo"
“optimal” Set of Variables (iii) The optimal set of varibles seems to be eta (energy? ) dependend. The most prominent variables are: center_lambda, m 2_dens, longitudinal, frac_em --- TMVA_Factory: Evaluation results ranked by best 'signal eff @B=0. 01' --------------------------------------- MVA Signal efficiency: Signifi- Sepa- mu-Trans--- Methods: @B=0. 01 @B=0. 10 @B=0. 30 cance: ration: form: --------------------------------------- TMlp. ANN : 0. 604 0. 934 0. 988 2. 331 0. 770 0. 841 --- Cuts : 0. 554 0. 924 0. 983 0. 000 --- Likelihood : 0. 472 0. 893 0. 990 1. 670 0. 693 0. 938 --- BDTGini : 0. 393 0. 914 0. 981 2. 115 0. 719 0. 898 --- PDERS : 0. 345 0. 858 0. 976 1. 998 0. 685 0. 780 --- Fisher : 0. 194 0. 790 0. 981 1. 355 0. 538 0. 798 -------------------------------------- Needs further investigation. Let TMVA use these four variables and let's try some other discrimination techniques: Kai Voss, University of Victoria --- TMVA_Method. Fisher: ranked output (top variable is best ranked) --------------------------------- Variable : Coefficient: Discr. power: --------------------------------- cl_m 1_dens_topo: +2. 877 0. 4517 --- cl_center_lambda_topo: -2. 796 0. 3710 --- cl_eng_frac_em_topo: - 0. 039 0. 3436 --- cl_longitudinal_topo: -1. 206 0. 2722 05/05//2006 / Page 13 kai. voss@cern. ch
Summary & Outlook The rectengular cut method is really competitive method for cluster classification Optimal cuts are calculated for each efficiency/background -> need to choose working point other method not yet fully tuned. . . Since at least one variable is perfectly discriminating one has to remove this variable and do a training on the remaining variables on top of it Kai Voss, University of Victoria Optimal sets of cuts for all bins of E and eta are currently being calculated. ->Then decide which variables to use finally Use TMVA_Reader (ROOT class) 05/05//2006 / Page 14 kai. voss@cern. ch
Code Example: Do the training in 72 bins! // load data sets TString dat. File. S = "data/e. dat"; TString dat. File. B = "data/pi. dat"; Tmva_factory->Set. Input. Trees( dat. File. S, dat. File. B ); // which variables are used for discrimination input. Vars->push_back("cl_m 2_r_topo"); input. Vars->push_back("cl_m 2_lambda_topo"); input. Vars->push_back("cl_delta_phi_topo"); tmva_factory->Set. Input. Variables( input. Vars ); Kai Voss, University of Victoria 05/05//2006 / Page 15 kai. voss@cern. ch
// split data set and do training for EACH bin! Double_t eta. Bins[25]={ 0. 0, 0. 2, 0. 4, 0. 6, 0. 8, 1. 0, 1. 2, 1. 4, 1. 6, 1. 8, 2. 0, 2. 2, 2. 4, 2. 6, 2. 8, 3. 0, 3. 2, 3. 4, 3. 6, 3. 8, 4. 0, 4. 2, 4. 4, 4. 6, 4. 8}; Double_t energy. Bins[6] = { 0. 0000, 4000, 16000, 64000, 200000, 40000000 }; tmva_factory->Book. Multiple. MVAs("cl_e_topo", 5, &energy. Bins[0] ); Tmva_factory->Book. Multiple. MVAs("cl_m 1_eta_topo", 24, &eta. Bins[0] ); // choose method input. Vars->push_back("cl_m 2_r_topo"); tmva_factory->Book. Method( "Method. Cuts", "V: MC: 500000: All. FSmart" ); tmva_ factory->Train. All. Methods(); tmva_factory->Test. All. Methods(); Kai Voss, University of Victoria 05/05//2006 / Page 16 kai. voss@cern. ch
Code Example: Apply Classification in Athena // create TMVA_Reader object TMVA_Reader *tmva = new TMVA_Reader( input. Vars ); tmva->Book. Multiple. MVAs("cl_e_topo", 5, &energy. Bins[0] ); tmva->Book. Multiple. MVAs("cl_m 1_eta_topo", 24, &eta. Bins[0] ); TMVA_Reader *tmva = new TMVA_Reader( input. Vars ); tmva->Book. MVA( TMVA_Reader: : Like. Li. Hood, “myweightfile" ); double mva. LKD = tmva->Evaluate. MVA( var. Values, multicut. Values, TMVA_Reader: : Likelihood. D ); Kai Voss, University of Victoria 05/05//2006 / Page 17 kai. voss@cern. ch