f0b7872df988d52a99b9e3be5687a110.ppt
- Количество слайдов: 20
Dutch Ships and Sailors Linked Data Cloud Victor de Boer, Matthias van Rossum, Jur Leinenga, Rik Hoekstra With input from Andrea Bravo Balado and Robin Ponstein Netherlands Institute for Sound and Vision / VU University Amsterdam v. de. boer@vu. nl ISWC 2014
The Problem: ((Maritime) historical) data is not integrated 25+ Maritime datasets; Heterogeneous
The solution Well, Linked Data obviously!
But why Linked Data • Heterogeneous models, one dataformat – Link what can be linked – Keep specificity of original data – Allow integration at project level (and beyond) • Links to other sources: re-use knowledge • Extensible • Allow multiple levels of semantic enrichment/ normalization – through Named Graphs – Provenance
Dutch Ships and Sailors KB Delpher “VOC Opvarenden” Mustering and payroll information (DANS Easy) Dutch-Asiatic Shipping (DAS) – Voyages (Huygens ING)
Modeling in collaboration with historians (1) Jur Leinenga (Huygens ING) Muster-rolls Northern Provinces 1803 -1937 mdb: has_person dss: Record mdb: Persoons. Contract mdb: persoonscontract-del_gem -1879 -101 -16858 Pieter_Hoekstra dss: has_aanmonstering foaf: firstname mdb: voornaa m foaf: lastname mdb: achternaam mdb: maandgage dcterms: identifier mdb: inventarisnummer dss: Record mdb: Aanmonstering mdb: aanmonstering-del_gem-1879101 dss: ship mdb: ship foaf: Person dss: Person mdb: persoon-del_gem-1879 -101 -16858 “ 1870 -1894" mdb: has_KB_article dss: Schip mdb: schip-del_gem-1879 -101 -Isadora mdb: schip-del_gem-1879 -137 Isadora owl: same. As “ 32” dss: rank mdb: ra nk “Pieter" “Hoekstr a" dss: Rank mdb: Rang mdb: matroos
Modeling in collaboration with historians (2) Matthias van Rossum (VU-hist) Payroll information for European vs Asiatic Sailors (17 th / 18 th C) gzmvoc: telling gzmvoc: heeft DAS heenreis dss: Record gzmvoc: Telling gzmvoc: telling-1046 De_Berkel __bnode_ gzmvoc: aziatische. Bemanning 1 dss: has_ship gzmvoc: schip "1046" dss: Ship gzmvoc: Schip gzmvoc: schip-1046 De_Berkel dss: Record das: Voyage das: voyage 1918_61 “Moorse dss: az. Registratie. Kop mattroosen ” “ 21” gzmvoc: az. Aantal. Matrozen rdfs: label dss: scheepsnaam “De Berkel” gzmvoc: scheepsnaam “Schip” gzmvoc: scheepstype dss: Ship. Type gzmvoc: Scheepst ype dss: has_shiptype gzmvoc: typegzmvoc: has_shiptype Ship
Modelling principles • Model each dataset as directly as possible – Only “syntactical” transformation to RDF – No normalization • Reusability • Transparency, trust • Normalize and link in second stage – store in separate RDF Named Graphs
Link properties and classes to interoperability layer rdfs: sub. Property. Of dss: has_ship. Type rdfs: sub. Property. Of mdb: scheeps. Type mdb: Schip 1 mdb: Kof das: type. Of. Ship das: Ship. X das: Kofship
Vocabulary Links http: //semanticweb. cs. vu. nl/amalgame/ mdb: scheeps. Type mdb: Schip 1 Aat: Platbodems skos: exact. Match mdb: Kof Links to DBPedia (Ship types, places, ranks) Links to Getty AAT (Ship types, ranks) Links to Geo. Names (Places) Aat: Kof skos: exact. Match das: type. Of. Ship das: Ship. X skos: exact. Match das: Kofship
Identifying ships • Identify ships within a dataset using Machine Learning techniques – Based on: name, size, type, destinations etc. – Background knowledge • 33, 435 owl: same. As links – Robin Ponstein
Linking to Historical newspapers • Use ML to detect links between ships and historical newspaper articles (delpher. nl) – Features: ship name, time intervals, captain’s names, ship type, named entities, keywords, background knowledge • 179, 120 links - Andrea Bravo Balado
Example [HARLINGEN, 24 October. ]. «et gestrande Zweedsche schip , waarvan wij ons vorig no. melding maakten , is door de 'eepboot van hier afgebragt en hier binnengede u Bi. J die gelegenheid werd ons medegeeeid, dat nog vier vaartuigen op Terschelling aren gestrand. Tevens is het berigt ontvan°e > dat het hier behoorende schoonerschip Transit, kapitein Schaap, in de Noordzee is gezonken, nadat het achterschip was weggeslagen ; een ligtmatroos verloor daarbij het leven. Mede zijn hier drie vreemde schepen met meer en minder zware averij binnengeloopen. Spoiler alert! It sank in the North Sea.
Provenance (PROV-O) • Individual named graphs have provenance information – Who made it (people/software? ) – Based on what source – Content confidence • Matches historical science requirements
Clio. Patria Triplestore • Data live at Huygens Institute for Dutch History – http: //dutchshipsandsailors. nl/data – ~30 Million triples • Dev. Server – http: //semanticweb. cs. vu. nl/dss • Purl. org URIs redirect to live server w/ content negotiation • SPARQL endpoint • Web interface
foaf PROV rdfs: sub. Class. Of, rdfs: sub. Property. Of AAT skos : exact. Match DAS dss: has. KBLink MDB dss: DAS link GZMVOC VOCOPV Begunstig Soldijboek Opvaren den enden owl: same. As
Data analysis and visualisation
Current work: linking original scans
Take home • Linked Data principles are a great fit to digital history requirements – Heterogeneous models/datasets, light-weight reusable integration – Multiple levels of normalisation, through separate named graphs – SW Provenance matches Historical Provenance • Watch out when you sail your Schooner into the North Sea
Data. Lab http: //dutchshipsandsailors. nl/data v. de. boer@vu. nl