7a7df74b88324b38e37cc40fe9d149b7.ppt
- Количество слайдов: 16
Evaluating Health-Care Disparity Employing Linked Data and Data -driven Discovery Amrapali Zaveri AKSW, Institut für Informatik 1 1
Outline • Motivation • Methodology o Datasets o CSV to RDF Conversion o Interlinking using SILK o Validation by Linked Data Querying • Conclusions • Limitations • Future Work 2
Motivation • According to the World Health Organization (WHO), more than one billion people (i. e. one sixth of the world’s population) suffer from one or more neglected tropical diseases. • This shows a significant imbalance between the research intensity invested for the investigation of certain diseases and their prevalence. • Reason • current absence of accurate, interlinked data and information 3
Methodology 4
Datasets DATASET LINKED DATA VERSION NUMBER OF TRIPLES Clinical. Trials. gov Linked. CT 9. 8 million Pub. Med Bio 2 RDF’s Pub. Med 797 million WHO’s Global Health Observatory (GHO) Not yet available - 5
CSV to RDF Conversion • WHO’s GHO dataset • Published as Excel sheets • Advantage • Readable by humans • Disadvantages • Cannot be queried efficiently • Difficult to integrate with other data (in different formats) • Our approach • Converting data into a single data model - RDF • Using SCOVO (Statistical Core Vocabulary)* • designed particularly to represent multidimensional statistical data using RDF. *Michael Hausenblas, et. al. Scovo: Using statistics on the web of data. In ESWC, 2009. 6 6
What is SCOVO? 7
Semi-automated approach • Transforming CSV to RDF in a fully automated way is not feasible. • Dimensions may often be encoded in heading or label of a sheet • Our semi-automatic approach: • As a plug-in in Onto. Wiki# • a semantic collaboration platform developed by the AKSW research group. • A CSV file is converted into RDF using SCOVO # Sören Auer et. al. : Onto. Wiki: A Tool for Social Semantic Collaboration In: Proceedings of the Workshop on Social and Collaborative Construction of Structured Knowledge CKC 2007 at the 8 16 th International WWW 2007 Banff, Canada, 2007
SCOVOfied GHO Data prefix ex:
Interlinking Datasets using SILK 1 0
Interlinking Results Interlinks for: • Publications - already present • Disease - used SILK$ • Country - used SILK$ Number of interlinks obtained between datasets $ Julius Volz, Christian Bizer, Martin Gaedke, Georgi Kobilarov: Discovering and Maintaining Links on the Web of Data. International Semantic Web Conference (ISWC 2009), Westfields, USA, October 2009. 1 http: //www 4. wiwiss. fu-berlin. de/bizer/silk/spec/ 1
Validation by Linked Data Querying PREFIX who:
Conclusions • Which disease has the highest percentage of health-care • • • disparity with respect to the burden of disease and the clinical trials conducted in a particular country? As a research policy maker, which research area would it be most beneficial to allocate funds? Who are the key people doing most research for a particular disease? What has been the trend, overtime, for the health-care disparity for a particular region? 1 3
Limitations • Information Quality • Coverage • Interlinking Quality • Propagation of Errors 1 4
Future Work • Improve Interlinking • Interlinking with other relevant datasets • Updating knowledge-base as new data is published • Creating a user interface 1 5
Acknowledgements • Research group Agile Knowledge Engineering & Semantic Web (AKSW): http: //aksw. org • Research on Research Group: http: //researchonresearch. duhs. duke. edu/site 1 6