27b0319227aafd36de8bdcf2b3dc3be0.ppt
- Количество слайдов: 33
Representation of Markush structures — from molecules towards patents Szabolcs Csepregi August 2010, ACS National meeting, Boston Solutions for Cheminformatics
Contents • Chem. Axon • What are Markush structures? • How to get them? • What can be done with them? – Enumeration – Storage, search • Challenges in chemical representation • Under development August 2010, ACS National meeting, Boston
Chem. Axon • Cheminformatics toolkits and applications • HQ: Budapest, Hungary • Founded: 1998 • Main customers: pharma, biotech, publishing • 3 rd party applications and web sites. (e. g. Integrity, Reaxis, PDB ligand search, ELN-s, registration systems, etc) August 2010, ACS National meeting, Boston
Chem. Axon Main products: – – Structure drawing & visualization (Marvin family) Chemical DB tools (JChem family) Property predictions (Calculator plugins) Drug discovery tools (Reactor, JKlustor, etc. ) Development strategy: customer-driven August 2010, ACS National meeting, Boston
What are Markush structures and how to get them? August 2010, ACS National meeting, Boston
Markush structures Generic notation for describing many molecules (= Markush library) in a compact form. Main usage: – Combinatorial chemistry – Chemistry-related patents August 2010, ACS National meeting, Boston
Markush structures • Current features handled: – R-groups – Atom lists, bond lists – Position variation bond – Link nodes – Repeating units – Homology groups (aryl, alkyl, etc. ) August 2010, ACS National meeting, Boston
Chem. Axon Markush project Goals: – Extend structural search capabilities to combinatorial Markush structures – Markush enumeration Complications: – Practical examples may be very complex, methods using explicit enumeration may be impossible – Extension of current molecular formats (generic features) Timeline – – Pilot study started in 2005 Q 4, First prototype shown at UGM, 2006 June Released in JChem 5. 0, 2008 Markush DARC format support 5. 3. 0 2010 August 2010, ACS National meeting, Boston
How to get Markush structures? • Drawing – Marvin Sketch August 2010, ACS National meeting, Boston
How to get Markush structures? • Patent literature – Markush DARC format (*. vmn) • Compatible with Thomson Reuters MMS patent Markush database (Test set available. ) August 2010, ACS National meeting, Boston
How to get Markush structures? Combinatorial chemistry – Reagent clipping 1. Replace reacting group with attachment point (Reactor tool) 2. Turn fragments to R-group definitions (Molconvert tool) 3. Add a scaffold (Molconvert tool) August 2010, ACS National meeting, Boston
How to get Markush structures? Combinatorial chemistry – R-group decomposition 1. Filter and identify ligands in chemical library 2. Create Markush structure from R-table (R-group decomposition tool) August 2010, ACS National meeting, Boston
What to do with them? August 2010, ACS National meeting, Boston
Markush Enumeration • Markush enumeration plugin – – – Full enumeration Selected parts only Random enumeration Calculate library size Scaffold alignment and coloring – Markush code – Optional example homology group enumeration August 2010, ACS National meeting, Boston
Markush storage & search • JChem Base and Instant JChem • No enumeration involved • Can handle complex Markush structures (1040 or more) • Substructure and Full structure search • Broad translation of homology groups is supported. (Homology in DB, specific in query. ) August 2010, ACS National meeting, Boston
Markush storage & search Substructure hit visualization Query Result in original Markush August 2010, ACS National meeting, Boston
Markush storage & search Substructure hit visualization: „Markush structure reduction” Query Result in original Markush Reduced result August 2010, ACS National meeting, Boston
Main use cases • Patent search hits refining / visualization, • White space analysis, • Patent busting, • Markush structure curation, • In-house storage of small Markush DB, • etc. . . August 2010, ACS National meeting, Boston
MMS evaluation Instant JChem project August 2010, ACS National meeting, Boston
Challenges in chemical representation (solved) August 2010, ACS National meeting, Boston
Representation - What we already had Generic notation in queries: Single or double • Atom lists, bond lists • R-group queries (Problem: RGFile R-logic and patent R-logic are different! - Solution: Just ignore R-logic. ) • Link nodes • Some generic atoms (X) – represented as pseudo atoms. August 2010, ACS National meeting, Boston
Challenge 1: Attachment point • Multiple – ligand order and attachment order Heavily used in Markush DARC (up to 8 attachments!) • Represented as atom property Attachment points for definitions Order of ligands for G 15 (R 15) R-group definitions Parent group (root) August 2010, ACS National meeting, Boston
Challenge 1: Attachment point • Embedded R-groups: Grandparent relations may be needed between attachment points: G 3’s attachment point „ 1” is mapped to G 4’s attachment point „ 1” August 2010, ACS National meeting, Boston
Challenge 1: Attachment point • Temporary representation: attached data – – ligand order attachment point in R-group definition still an atom property ligand order sometimes in parent group (grandparent relation) Order of ligands for R 2 Attachment points for definitions August 2010, ACS National meeting, Boston
Challenge 1: Attachment point • Real attachment object with bond (under development) Attachment points for definitions Order of ligands for R 2 – eliminates need for grandparent relations table: Attachment point for R 3 Order of ligands for R 4 August 2010, ACS National meeting, Boston
Challenge 2: Abbreviations • Superatom S-groups were originally in Marvin (~700 built-in shortcuts) – Expand / Contract – Search code already handled them in specific structures. • M. DARC had 21 shortcuts + 31 peptides. • Attachment point next to abbreviations – Needed to be visible „outside” and handled correctly „inside”. – New attachment point solves this also: August 2010, ACS National meeting, Boston
Challenge 3: Homology groups (generics) • Pseudoatom representation • Naming „Long name” CHK alkyl CYC carbo. Alicyclyl ARY carbo. Aryl HEA (Still looking for the most descriptive „long” names. ) Markush DARC name hetero. Mono. Aryl • Extra conditions: general atom property framework (under development) August 2010, ACS National meeting, Boston
Challenge 4: Frequency variation • Link nodes • Repeating units: modified SRU • Multipliers: – special SRU, 1 outer bonds. – (Currently visualization only. ) • Moieties: – special SRU, 0 outer bonds – to describe (variable) stoichiometry – (Currently visualization only. ) August 2010, ACS National meeting, Boston
Challenge 5: Position variation bond • New special S-group type • Relocatable multicenter atom represents group for bonds • Also useful to represent multicenter charge and coordination compounds: August 2010, ACS National meeting, Boston
What (else) keep us busy August 2010, ACS National meeting, Boston
Under development • Further improvements in Markush DARC support: – Ring segment groups (XX form a ring) – New, more robust representation for attachment points – Homology properties (low alkyl, fused aryl, C 1 -3, N 2 -5, etc) • Ranking of results • New ways to navigate/zoom Markush structures • Maximum common substructure search • Biased enumeration and covering Markush – based on examples in patent. • Improve search speed to handle larger Markush sets. • Other Markush formats – Markush In. Ch. I standard committee • Overlap analysis of Markush structures • Conditions for Markush variables August 2010, ACS National meeting, Boston
Summary • Markush structure storage, search and enumeration at Chem. Axon now patent coverage • Compatible patent data is available from Thomson Reuters • Well thought out chemical representation • Continuous development, improvements in the pipeline August 2010, ACS National meeting, Boston
Acknowledgements • Development team: Nóra Máté, Róbert Wágner, Szilárd Dóránt, Tamás Csizmazia, Tim Dudgeon, Erika Bíró, Ali Baharev, Ferenc Csizmadia, et al. • Tim Miller, Steve Hajkowski, Gez Cross and Linda Clark at Thomson Reuters for useful discussions, help and example Markush DARC files • Many early adopters and colleagues within the field for suggestions and feedback August 2010, ACS National meeting, Boston