bd66ed5cbea5b914138c5711436e00a1.ppt
- Количество слайдов: 30
Advanced CPAS Adam Rauch Lab. Key Software adam@labkey. com
Agenda • Demo of recent & advanced features • Pipeline architecture & configuration • Production installations
What Is CPAS? A proteomics analysis system that handles all data processing & management for high-throughput labs and core facilities
Demo
“Mini-Pipeline” • Included & configured in standard install • CPAS invokes executables (tandem, tpp) directly on web server • Simple approach works fine for lowthroughput evaluation installs
FHCRC Installation CPAS Pipeline Web Server 2 Proc, 2 GB Tomcat Pipeline Mgr Database Server 4 Proc, 4 GB MS SQL Server Net. App Mass Spec PC File Server (Sun Hierarchical Storage) mz. XML Conversion Server Cluster 20+ TB Tape Robot
Production Pipeline • Multi-server, clustered, high-throughput pipeline demands a more sophisticated approach • CPAS interface for configuring, submitting jobs is identical, but pipeline control & communication is handled differently • Each project typically configured with separate “pipeline root” • User initiates search by selecting raw file and specifying search parameters (protocol) • CPAS writes settings file to raw-file directory • Background process (chron job) running on pipeline server sees new job and kicks off pipeline processing
CPAS Pipeline Automated pipeline moves MS 2 data from instrument, through MS/MS search and post-processing, and into CPAS Sample Input LTQ FT MALDI LCQ Raw File MS/MS Search Cluster X! Tandem, SEQUEST, MASCOT XPRESS, Peptide/Protein Prophet Raw File Convert Server mz. XML File PC #40 mz. XML, pep. XML, prot. XML Files CPAS
Production Pipeline Workflow • Chron job state machine manages workflow – Initiates RAW mz. XML conversion • Conversion server (Conversion. Queue) • Vendor-specific DLLs require Windows server – Submits MS/MS search to cluster scheduler – Submits post-processing jobs to cluster scheduler – Handles fractionation scenarios (individual, multi) – When processing is complete, instructs CPAS to load run • Job status is reported via log files, which CPAS reads to update web UI
Search Engine Configuration • SEQUEST cluster uses “Sequest. Queue” – Custom Tomcat/Java web application – Installed on head node of cluster – Pipeline communicates with Sequest. Queue over HTTP • Pipeline drives Mascot cluster directly via HTTP • Pipeline drives X! Tandem via cluster scheduler
Configuring A Production Pipeline • Install, customize Perl scripts that manage the workflow – Scripts used at Fred Hutchinson are available as an example • Configure conversion server – Converters & vendor-specific DLLs • Install TPP, MS/MS search engine(s) on cluster • Enable your search engine(s) within CPAS • Install CPAS FTP server (optional) – Useful to allow external collaborators to submit jobs to pipeline • Configure pipeline email notifications (optional) – Email notifications for completion and/or failures
Demo
Production Installation
Web & Database Servers • Server operating system(s) – CPAS runs on all popular operating system platforms – Solaris, Linux, Windows, OS X installations – Windows has somewhat easier install & upgrade process • Graphical installer • Pre-compiled binaries – Select OS that you & your IT staff are most comfortable with • Database server – Postgre. SQL: runs on all popular hardware/OS platforms, free – Microsoft SQL Server: Windows only, commercial, well tested • Server hardware – Invest in database server: powerful server, ample storage, reliability – Web server much less demanding
IT Infrastructure • Shared file system (NFS) – CPAS and pipeline need to access to a common NFS – Archive RAW, mz. XML, pep. XML, etc. files • Need plan for backing up NFS and database
Select Administrators • • Database administrator Server administrators CPAS site administrators CPAS project administrators
Production Installation Customization & Settings • Many settings for customizing CPAS to your needs – Fully documented on www. labkey. org – Review all settings carefully on a regular basis • CPAS settings are handled in several places – Most configuration is done via the “Admin Console” – <tomcat>/conf/server. xml – <tomcat>/conf/Catalina/localhost/labkey. xml
Database • JDBC parameters specified in labkey. xml – Driver class (Postgre. SQL vs. SQL Server) – URL includes server name, port, database name – User name & password • Protected your data – CPAS database user needs read/write/delete/update perms – Use a strong password! – Provide no access to database server outside firewall • PGTest and jtdstest tools can help test config
Networking • Basic Networking – Specify port in server. xml – Open firewall port(s) – Procure server name and update DNS • SMTP settings – Server, port, credentials specified in labkey. xml – System email address specified in site settings
Security • Designed to keep sensitive, unpublished scientific data secure • Authentication: dual scheme approach – Can delegate to institution’s LDAP system – External users: invitation only • Users choose their own passwords • Hash of password is stored in database and used for authentication • Authorization: Users must be granted explicit permissions – – All data stored in folder hierarchy managed by the database Users are added to groups Groups are granted permission to folder or hierarchy Authorized only if user belongs to group with required permissions • Folders can be made “public” (no authentication required)
Security Settings • SSL – We strongly recommend requiring SSL connections – Enable SSL port in server. xml – Use “Require SSL connections” option & port setting • LDAP & SASL – Configure CPAS to authenticate users to your organization’s LDAP server(s) – Specify server name, domain, principal template, SASL • Email templates – Customize new user registration, password change, etc. emails
Other Settings • Network drive – Allows CPAS running as Windows service to attach NFS as a drive • Site-wide option to enable ca. BIGTM • Mascot & SEQUEST connection settings • Site description, color theme, font size, logo
Future Directions • • • Web services-based pipeline Faster, easier loading of protein annotations Multi-engine comparisons Improved generalized query support Phase 2 of ca. BIG support
Lab. Key Software, Inc. • Private consulting company created by FHCRC and team of software professionals – Formed to support, document, and extend the CPAS project to other functions and labs – Independent company to directly address other institutions’ needs and secure outside funding • Partnership: – Clients provide scientific leadership – Lab. Key focuses on software development • Lab. Key is available to customize, install, and support your pipeline, CPAS, and other Lab. Key applications – Business model ensures you get help & support when you need it
Next Steps • Visit our booth • Join our informal receptions here – 6: 30 – 9: 30 PM Tonight & Tomorrow • Talk to Lab. Key about your plans
Resources • http: //www. labkey. org – CPAS Distribution & Support Site – Ask questions, contribute feedback – Peruse all the CPAS documentation & tutorials – Download the latest version (Lab. Key 2. 1) • Graphical installer for Windows installation • Well documented “manual” installation for Linux/Mac • http: //www. labkey. com – Lab. Key Software Inc. company web site • CPAS Paper – Rauch A, Bellew M, Eng J, et al. Computational Proteomics Analysis System (CPAS): An Extensible, Open-source Analytic System for Evaluating and Publishing Proteomic Data and High throughput Biological Experiments. J Proteome Res 2006; 5(1): 112 -121.
Acknowledgements • • Fred Hutchinson Cancer Research Center National Cancer Institute Canary Foundation Gates Foundation Institute for Systems Biology Ron Beavis & The GPM Numerous developer contributors
Questions?
Advanced Analysis Features • Filter groups of runs and compare peptides, proteins, Protein. Prophet, quantitation, etc • Analyze groups of runs based on sample properties • Search all experiments for a specific protein or gene name • Link results to protein annotations – Load protein knowledgebases: Tr. EMBL, Swiss-Prot – Gene Ontology: produce GO charts analyzing molecular function, cellular location, metabolic process – Custom protein annotation lists • Flexible, custom query capability – Join results to protein, experiment, sample tables – Display exactly the data you care about
bd66ed5cbea5b914138c5711436e00a1.ppt