Скачать презентацию Data Mining Query Languages Donato Malerba Dipartimento di Скачать презентацию Data Mining Query Languages Donato Malerba Dipartimento di

fb8a787b522c49fe074e382df1adc252.ppt

  • Количество слайдов: 51

Data Mining Query Languages Donato Malerba Dipartimento di Informatica Università degli studi di Bari Data Mining Query Languages Donato Malerba Dipartimento di Informatica Università degli studi di Bari [email protected]. uniba. it http: //www. di. uniba. it/~malerba/ Dipartimento di Informatica Università di Bari

A database perspective on KDD z. Most current KDD systems offer isolated discovery features A database perspective on KDD z. Most current KDD systems offer isolated discovery features using tree inducers, neural nets, and rule discovery algorithms z. They cannot be embedded into a large application and typically offer just one knowledge discovery feature z. True also for OLAP tools This is the first generation of KDD tools DMQL – Prof. D. Malerba 2

Short term research program z Efficient DM algorithms on top of large databases and Short term research program z Efficient DM algorithms on top of large databases and utilizing the existing DBMS support Example: 1. Realization of C 4. 5 on top of a large database requires tighter coupling with the DBMS and intelligent use of indexing techniques. 2. Exploitation of caching techniques for association rule mining 3. Exploitation of special indexing techniques for clustering See IBM’s Intelligent Miner DMQL – Prof. D. Malerba 3

Long term research program z KDD should follow one of the key DBMS paradigms: Long term research program z KDD should follow one of the key DBMS paradigms: building interpreters for query languages and compilers for ad hoc queries and embedding queries in application programming interfaces (API) z Focus: increasing programmer productivity for KDD application development Knowledge and Data Discovery Management Systems (KDDMS) are the second generation KDD systems. DMQL – Prof. D. Malerba 4

Imielinski & Mannila’s view z KDD object y Rule: probabilistic formula or multidimensional correlation Imielinski & Mannila’s view z KDD object y Rule: probabilistic formula or multidimensional correlation X. Diagnosis=“heart disease” and X. Age <50 X. BMI > 29 [300, 0. 80] y Classifier: decision trees, neural network, multidimensional regression y Clustering: collection of objects z KDD query: a predicate which returns a set of objects that can either be KDD objects or database objects (records or tuples) DMQL – Prof. D. Malerba 5

Imielinski & Mannila’s view z z z The KDD objects typically will not exist Imielinski & Mannila’s view z z z The KDD objects typically will not exist a priori, thus querying the KDD objects requires their generation at run time. KDD objects may also be pre-generated and stored in a “inductive” database, such as metadata. In such cases querying can be reduced to retrieval. KDDMS should be able to persistently store and manage the KDD objects as well as provide the ability to query them Querying involves y The generation of new KDD objects y Retrieval of the ones which were generated before DMQL – Prof. D. Malerba 6

Imielinski & Mannila’s view z Closure principle: the result of a query is a Imielinski & Mannila’s view z Closure principle: the result of a query is a relation that can be queried further. z A result of a KDD query may be an argument of another compatible type of KDD query. z In principle a KDD query can be nested within a regular relational query. z KDD queries can be embedded in a host programming environment just as SQL queries can be embedded in host languages. DMQL – Prof. D. Malerba 7

Imielinski & Mannila’s view z z z Generate a decision tree on a user-defined Imielinski & Mannila’s view z z z Generate a decision tree on a user-defined training set (specified through a database query) with userdefined attributes and user-specified classification categories. Then find all records in a database wrongly classified using that classifier as a training data for another classifier. Generate all rules with consequent values computed by an SQL query (KDD queries may not be completely known at a compile time!). Find tuples that belong to the largest cluster in a clustering constructed according to a user-specified distance metrics. DMQL – Prof. D. Malerba 8

Imielinski & Mannila’s view Research program: 1. A KDD query language has to be Imielinski & Mannila’s view Research program: 1. A KDD query language has to be formally defined 2. Query optimization tools would be developed to compile queries into reasonably efficient execution plans. Very challenging! KDD queries are much more powerful than SQL queries DMQL – Prof. D. Malerba 9

Imielinski & Mannila’s view Example: Patient(Age, Sex, City, Diagnosis, Height, Weight, Claim. Amount, …) Imielinski & Mannila’s view Example: Patient(Age, Sex, City, Diagnosis, Height, Weight, Claim. Amount, …) City(State, Population, …) X. Diagnosis=“heart disesase” and Sex=“male” X. Age>50 [1200, 0. 70] The user wants to see all the rules about a patient with heart disease such that the consequent of this rule says something about the age of the patient, there at least 1, 000 cases which the rule body applies, and the confidence of the rule is at least 65%. DMQL – Prof. D. Malerba 10

Imielinski & Mannila’s view In M-SQL (Imielinski et al. , Proc. KDD’ 96) SELECT Imielinski & Mannila’s view In M-SQL (Imielinski et al. , Proc. KDD’ 96) SELECT FROM MINE(T): R WHERE R. Body={(Diagnosis=“heart disesase”)} AND R. Consequent = {(Age=*)} R. Support > 1000 R. Confidence > 0. 65 R renames MINE(T) is an operator that takes a class T and generates all propositional rules about T Rule discovery: Another type of querying! DMQL – Prof. D. Malerba 11

Imielinski & Mannila’s view Rules are not necessarily the final product of KDD applications. Imielinski & Mannila’s view Rules are not necessarily the final product of KDD applications. A proper API, which embeds a rule query language in a more expressive, general purpose, host programming environment is necessary. y Iterate over a collection of rules DMQL – Prof. D. Malerba 12

KDD query languages Imielinski, Virmani, Abdulghani. Discovery board application programming interface and query language KDD query languages Imielinski, Virmani, Abdulghani. Discovery board application programming interface and query language for database mining. Proc. KDD 96 Imielinski and Virmani. MSQL: A query language for database mining. Journal of Data Mining and Knowledge Discovery, 3(4), 1999. Meo, Psaila, and Ceri. A new SQL-like operator for mining association rules. Proc. VLDB, 1996. Han, Fu, Koperski, Wang, and Zaiane. DMQL: A Data Mining Query Language for Relational Databases‘, Proc. SIGMOD'96 Workshop. on Research Issues on Data Mining and Knowledge Discovery (DMKD'96), 1996. Shen, Ong, Mitbander, and Zaniolo. Metaqueries for Data Mining. In: Fayyad et al. Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996. DMQL – Prof. D. Malerba 13

KDD query languages Giannotti, Manco. Querying Inductive Databases via Logic-Based User. Defined Aggregates. PKDD KDD query languages Giannotti, Manco. Querying Inductive Databases via Logic-Based User. Defined Aggregates. PKDD 1999 De Raedt. An Inductive Logic Programming Query Language for Database Mining. AISC 1998 De Raedt. A Logical Database Mining Query Language. ILP 2000 De Raedt. Query execution and optimization for inductive databases. Proc. EDBT Workshop on Database Technologies for Data Mining, 2002 Boulicaut, Klemettinen, Mannila. Querying inductive databases: a case study on the MINE RULE operator. In: Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery PKDD'98, LNAI 1510, 1998 Elfeky, Saad, Fouad. ODMQL: Object Data Mining Query Language. In Dittrich et al. (eds), Objects and Databases 2000, LNCS 1944, 2001 Johnson, Lakshmanan, Ng. The 3 w model and algebra for unified data mining. –Proc. VLDB, 1998 DMQL Prof. D. Malerba 14

KDD query languages Han, Koperski, Stefanovic. Geo. Miner: A System Prototype for Spatial Data KDD query languages Han, Koperski, Stefanovic. Geo. Miner: A System Prototype for Spatial Data Mining. SIGMOD Conference 1997 Malerba, Appice, Ceci, Vacca. SDMOQL: An OQL-based Data Mining Query Language for Map Interpretation. Proc. EDBT Workshop on Database Technologies for Data Mining, 2002 DMQL – Prof. D. Malerba 15

DMQL: just some syntactic sugar on top of DM algorithms? z A user can DMQL: just some syntactic sugar on top of DM algorithms? z A user can formulate a DM task without paying attention to y Logical and physical representation problems y The correct procedural order in which some DM steps should be performed z The development of decision support applications is easier, just as SQL make implementation of operational information systems easy z A casual user can find patterns by means of a DMQL in the same way he can find data by means of a SQL query: no development of ad hoc applications z A DMQL provides a foundation on which a GUI can be built DMQL – Prof. D. Malerba 16

Spatial Data Mining z z Spatial Data Mining: the extraction of spatial patterns from Spatial Data Mining z z Spatial Data Mining: the extraction of spatial patterns from both spatial and aspatial data, possibly stored in a spatial database Spatial Pattern: a pattern showing the interaction of two or more spatial objects or space-depending attributes according to a particular spacing or set of arrangements IF a large town intersects the motorway A 14 THEN it is also close to the Adriatic sea (13%, 90%) DMQL – Prof. D. Malerba 17

Spatial Data Mining & GIS z. Geographical Information Systems (GIS) offer an important application Spatial Data Mining & GIS z. Geographical Information Systems (GIS) offer an important application area where spatial data mining techniques can be effectively used z. Example: topographic map interpretation DMQL – Prof. D. Malerba 18

Interpreting Topographic Maps z Topographic map: large scale (1: 10000 to 1: 100000) composite Interpreting Topographic Maps z Topographic map: large scale (1: 10000 to 1: 100000) composite map showing relief, vegetation and man-made features of a portion of a land surface. z Interpreting the colored lines, areas, and other symbols is the first step in using topographic maps. z Easy! Symbols correspond univocally to concepts explicitly modelled by the map creator. z Difficult! locating in a map some geographical objects not explicitly modelled (e. g. , industrial area) DMQL – Prof. D. Malerba 19

Interpreting Topographic Maps z Solution: embedding intelligent capabilities in geo-based tools z Knowledge-based GIS Interpreting Topographic Maps z Solution: embedding intelligent capabilities in geo-based tools z Knowledge-based GIS use yspatial reasoning capabilities yavailable domain knowledge to support map interpretation z But operational definitions of some complex concepts yare difficult to elicit yare not portable on different data models ydepend on the scale of the map DMQL – Prof. D. Malerba 20

Data Mining to Support Map Interpretation Tasks z. Data Mining tools and techniques to Data Mining to Support Map Interpretation Tasks z. Data Mining tools and techniques to find spatial patterns of interest. z. INGENS (INductive GEographic i. Nformation System) = GIS + Data Mining Server + … z. Training functionality z. The user can train the system by providing instances of geographical objects to be recognized in a map DMQL – Prof. D. Malerba 21

INGENS Architecture Interface Layer GUI (Web Browser) Map Converter Application Enabler Resource Manager Map INGENS Architecture Interface Layer GUI (Web Browser) Map Converter Application Enabler Resource Manager Map Editor Map Descriptor Data mining Server Map Storage Subsystem Object. Store DBMS Map Repository DMQL – Prof. D. Malerba Query Interpreter Deductive DBMS Knowledge Repository The interface Permits Suite tools layer of the for Allows any integration user import/export of implements a Responsible fora and/or maps of data GUI, which is Ato formulate suite the automated queries in modification of Java applet. mining systems generation of Is the only access SDMOQL information that can be run first-order bydata path to the by language. Manages logic acquired concurrently descriptions of contained in the discovered means users multipleof the to some Repository Map INGENS patterns Map train Converter geographical Involved objects. in storing, updating and retrieving items 22

The data model for the map repository z Hybrid tessellation-topological model z Tessellation model: The data model for the map repository z Hybrid tessellation-topological model z Tessellation model: a map is decomposed according to a regular grid of cells z Topological model has two structural hierarchies: yphysical (describes the geographical objects by means of the most appropriate geometric entity); ylogical (expresses the semantics of geographical objects). DMQL – Prof. D. Malerba 23

The object-oriented data model in UML DMQL – Prof. D. Malerba 24 The object-oriented data model in UML DMQL – Prof. D. Malerba 24

Different technologies: what support for the user? z Problem: The user should not suffer Different technologies: what support for the user? z Problem: The user should not suffer from problems related to the integration of different technologies, such as y Data mining y OODBMS y Deductive databases y GIS z Solution: A data mining query language (DMQL) interfaces users with the whole system and hides the different technologies. DMQL – Prof. D. Malerba 25

SDMOQL z DMQL is the data mining query language define by Han et al. SDMOQL z DMQL is the data mining query language define by Han et al. (1996) for relational databases z GMQL (Geo Mining Query Language) is a language for spatial data mining, based on DMQL (Koperski 1999) z Both inspired to SQL and the relational model not appropriate for an OO information system like INGENS z SDMOQL (Spatial Data Mining Object Query Language) is a spatial mining query language for INGENS users based on OQL DMQL – Prof. D. Malerba 26

Data Mining primitives z A DMQL must incorporate a set of DM primitives designed Data Mining primitives z A DMQL must incorporate a set of DM primitives designed to facilitate efficient, fruitful knowledge discovery. z Primitives include: y. The specification of portions of the database in which the user is interested; y. The kinds of knowledge to be mined y. Background knowledge useful in guiding the discovery process; y. Interestingness measures of pattern evaluation y. How the discovered knowledge should be visualized DMQL – Prof. D. Malerba 27

Task-relevant data specification In traditional DM applications, it is sufficient to specify y Database Task-relevant data specification In traditional DM applications, it is sufficient to specify y Database attributes or y Datawarehouse dimensions since: 1. No interaction transformation of isstored datasois 2. complex between objects assumed, required that each object can be effectively described by a Not in tuple in data relation where working at the level of spatial the mining, Notstored data, that is mining, where attributes (points, in spatial data geometric representations of the lines and of some spatial object of interest may neighborsregions) of geographic objects is undesirable. influence the object itself. The user is interested in working at higher conceptual levels, set to human-interpretable straightforwardly Data where mine cannot be properties and relations between geographical objects table, where represented by means of a relationalare expressed distinct tuples refer to distinct, independent objects. z DMQL – Prof. D. Malerba 28

Example z Two roads can cross each other, or run parallel, or can be Example z Two roads can cross each other, or run parallel, or can be confluent, independently of the fact that they are represented by one or more tuples of a relational table of “lines” or “regions” DMQL – Prof. D. Malerba 29

A solution z SDMOQL interpreter allows user to select the geographical objects that are A solution z SDMOQL interpreter allows user to select the geographical objects that are relevant to the data mining task, and then it invokes the Map Descriptor to produce their high level conceptual descriptions. z Conceptual descriptions are based on first-order logic language, where both properties and relations of selected geographical objects can be easily represented. DMQL – Prof. D. Malerba 30

Example SELECT x FROM x IN Cell WHERE x->num_cell = 11 contain(x 1, x Example SELECT x FROM x IN Cell WHERE x->num_cell = 11 contain(x 1, x 2)=true, …, contain(x 1, x 70)=true, type_of(x 1)=cell, …, type_of(x 4)=vegetation, …, subtype_of(x 2)=cultivation, …, subtype_of(x 7)=cart_track_road, …, color(x 2)=black, …, color(x 70)=black, extension(x 7)=111. 018, …, extension(x 33)=1104. 74, geographic_direction(x 7)=north, …, geographic_direction(x 68)=north, line_shape(x 7)=straight, …, line_shape(x 33)=cuspidal, …, altitude(x 19)=106. 00, …, altitude(x 43)=102. 00, area(x 2)=187525. 00, …, area(x 62)=30250. 00, density(x 2)=high, …, density(x 62)=low, line_to_line(x 7, x 68)=almost_parallel, …, region_to_region(x 2, x 21)=meet, …, distance(x 7, x 68)=5. 00, line_to_region(x 8, x 27)=adjacent, …, point_to_region(x 4, x 18)=outside, … DMQL – Prof. D. Malerba 31

Describing topographic maps z 33 geographical objects: contour_slope, river, canal, primary_road, farm_road, interfarm_road, main_road, Describing topographic maps z 33 geographical objects: contour_slope, river, canal, primary_road, farm_road, interfarm_road, main_road, … z 16 descriptors: contain(x, y), type_of(y), subtype_of(y), color(y), area(y), density(y), extension(y), geographic_direction(y), line_shape(y), altitude(y), line_to_line(y), distance(y, z), region_to_region(y, z), line_to_region(y, z), point_to_region(y, z) z Defined together with town planners, the set of descriptors is quite general and can capture geometric, topological and directional features of geographical objects in a topographic map. DMQL – Prof. D. Malerba 32

Task-relevant data specification z In SDMOQL the selection of geographical objects is performed by Task-relevant data specification z In SDMOQL the selection of geographical objects is performed by means of simplified OQL queries with a SELECT-FROM-WHERE structure. z Example 1: cell-level query The user selects cell 26 from the topographic map of Canosa (Apulia, Italy) SELECT x FROM x IN Cell WHERE x->num_cell = 26 AND x->part_map->map_name = “Canosa” The Map Descriptor generates the description of all the objects in this cell. DMQL – Prof. D. Malerba 33

Task-relevant data specification z Example 2: layer-level query The user selects the layer Horography Task-relevant data specification z Example 2: layer-level query The user selects the layer Horography from the topographic map of Canosa and the layer Construction from any map. SELECT x, y FROM x IN Horograhy, y IN Construction WHERE x->part_map->map_name = “Canosa” The Map Descriptor generates the description of the objects in these layers. DMQL – Prof. D. Malerba 34

Task-relevant data specification z Example 3: object-level query The user selects the objects of Task-relevant data specification z Example 3: object-level query The user selects the objects of the logic class River and the objects of type motorway (instances of the class Road), from cell 26 of the topographic map of Canosa. SELECT x, y FROM x IN River, y IN Road WHERE x->part_map->map_name = “Canosa” AND y->part_map->map_name = “Canosa” AND x->log_incell->num_cell = 26 AND y->type_road = “motorway” The Map Descriptor generates the description of these objects. DMQL – Prof. D. Malerba 35

Task-relevant data specification z Example 4: Semantically ambiguous query SELECT x, y FROM x Task-relevant data specification z Example 4: Semantically ambiguous query SELECT x, y FROM x IN Cell, y IN River WHERE x->num_cell = 26 AND y->log_incell->num_cell = 26 This query selects the object cell 26 and all rivers in it. However, it is unclear whether the Map Descriptor should describe 1. the entire cell 26 or Formulate a cell-level query 2. only the rivers in it, or Formulate an object-level query (unusual) case, anyway the problem can be 3. both. solved by the UNION operator, applied to the cell-level query and the object-level query. DMQL – Prof. D. Malerba 36

Task-relevant data specification The following constraint is imposed on SDMOQL: the selected data must Task-relevant data specification The following constraint is imposed on SDMOQL: the selected data must belong to the same level (cell, layer or logic object). More formally the FROM clause can contain either a group of Cells or a set of Layers, or a set of Logic Objects, but never a mixture of them. DMQL – Prof. D. Malerba 37

The kind of knowledge to be mined <Spatial_Data_Mining_Statement> : : = <Limited_OQL_Query> mine <Kind_of_Pattern> The kind of knowledge to be mined : : = mine : : = | : : = classification as for {, } [analyze {, }] The analyze clause indicates that the descriptions of selected data is based on spatial/aspatial descriptors in the list DMQL – Prof. D. Malerba 38

Example SELECT x FROM x in Cell WHERE x->num_cell >= 5 AND x->num_cell <= Example SELECT x FROM x in Cell WHERE x->num_cell >= 5 AND x->num_cell <= 12 mine classification as Morphological. Elements for class(_)=system_of_farms, class(_)=fluvial_landscape analyze contain/2, type_of/1, subtype_of/1, area/1, density/1, extension/1, line_shape/1, geographic_direction/1, line_to_line/2, distance/2, line_to_region/2, region_to_region/2, point_to_region/2 DMQL – Prof. D. Malerba 39

Defining background knowledge z In SDMOQL the BK is defined as a set of Defining background knowledge z In SDMOQL the BK is defined as a set of definite clauses. z Example: define knowledge close_to(X, Y)=true : - region_to_region(X, Y)=meet. close_to(X, Y)=true : - close_to(Y, X)=true. DMQL – Prof. D. Malerba 40

Defining schema hierarchies z Define a total or partial order among attributes in the Defining schema hierarchies z Define a total or partial order among attributes in the database schema. Activity z Example: business_activity low_business_activity other_activity high_business_activity define hierarchy Activity as level 1: {business_activity, other_activity} < level 0: Activity; level 2: {low_business_activity, high_business_activity} < level 1: business_activity; DMQL – Prof. D. Malerba 41

Defining set-grouping hierarchies z Organize values for given attributes or dimensions into groups of Defining set-grouping hierarchies z Organize values for given attributes or dimensions into groups of constants or range of values Distance z Example: far 2 Km. . + Km near 0 m … 1, 999 m define hierarchy Distance for distance/2 as level 1: {far, near} < level 0: Distance; level 2: {0, 1999} < level 1: near; level 2: {2000, +inf} < level 1: far; DMQL – Prof. D. Malerba 42

Interestingness measure specification z threshold values: e. g. the user can set thresholds such Interestingness measure specification z threshold values: e. g. the user can set thresholds such as confidence and support as follows: Threshold. Parameter threshold Value z search biases in the hypotheses space: The user can specify a number of preference criteria, such as maximization of the number of covered examples or minimization of the number of variables in the body of a learned clauses, according to the following syntax: preference criteria (minimize | maximize ) Criterion with tolerance Value. z generic input parameter of a data mining algorithm: Parameter. Name = Value DMQL – Prof. D. Malerba 43

An example z Problem: Localize a “sistema poderale” (system of farms) in Apulian maps. An example z Problem: Localize a “sistema poderale” (system of farms) in Apulian maps. z The user browses the maps with INGENS and finds some examples of system of farms … DMQL – Prof. D. Malerba 44

An example: the data … and some counterexample DMQL – Prof. D. Malerba 45 An example: the data … and some counterexample DMQL – Prof. D. Malerba 45

An example: the DM query z Formulate a data mining task through SDMOQL: SELECT An example: the DM query z Formulate a data mining task through SDMOQL: SELECT x FROM x in Cell WHERE(x->num_cell>=1 AND x->num_cell<=6) OR x->num_cell=11 OR x->num_cell=34 OR (x->num_cell>=15 and x->num_cell <= 17) mine classification as Morphological. Elements for class(X)=system_of_farms analyze contain/2, type_of/1, subtype_of/1, color/1, altitude/1, area/1, density/1, extension/1, line_shape/1, geographic_direction/1, line_to_line/2, distance/2, line_to_region/2, region_to_region/2, point_to_region/2 with preference criteria minimize negative_example_covered with tolerance 0. 6, maximize positive_example_covered with tolerance 0. 4, minimize cost with tolerance 0. 4 number_of_rules threshold 15, consistent threshold 500 DMQL – Prof. D. Malerba 46

An example: the process VISUALIZATION QUERY OF SPATIAL DATA MINING ALGORITHMS MAP DESCRIPTOR OBJECT An example: the process VISUALIZATION QUERY OF SPATIAL DATA MINING ALGORITHMS MAP DESCRIPTOR OBJECT ORIENTED DBMS DISCOVERED KNOWLEDGE SYMBOLIC DESCRIPTIONS DEDUCTIVE DATABASE OBJECT ORIENTED DATABASE DMQL – Prof. D. Malerba 47

An example: results class(S 1)=system_of_farms contain(S 1, S 2)=true, region_to_region(S 2, S 3)=meet, area(S An example: results class(S 1)=system_of_farms contain(S 1, S 2)=true, region_to_region(S 2, S 3)=meet, area(S 2) [68437. 5. . 187525], region_to_region(S 2, S 4)=disjoint, region_to_region(S 4, S 3)=meet, type_of(S 1)=cell, type_of(S 2)=parcel, type_of(S 4)=parcel, type_of(S 3)=parcel there are two pairs of adjacent parcels (S 2, S 3) and (S 4, S 3), one of which is relatively large (the area is between 68437. 5 and 187525 m 2) DMQL – Prof. D. Malerba 48

An example: results class(S 1)=system_of_farms contain(S 1, S 2)=true, region_to_region(S 2, S 3)=disjoint, density(S An example: results class(S 1)=system_of_farms contain(S 1, S 2)=true, region_to_region(S 2, S 3)=disjoint, density(S 3)=high, region_to_region(S 2, S 4)=meet, region_to_region(S 4, S 5)=meet, region_to_region(S 2, S 5)=meet, type_of(S 1)=cell, area(S 2) [12381. 2. . 25981. 2], type_of(S 2)=parcel there are three adjacent regions (S 2, S 4, S 5), one of which is certainly a medium-sized parcel (the area is between 12381. 2 and 25981. 2 m 2), and there is a fourth region (S 3) with a high density (presumably vegetation), disjoint from the parcel S 2 DMQL – Prof. D. Malerba 49

An example: use of results z The user asks INGENS to find all cells An example: use of results z The user asks INGENS to find all cells in the Canosa map that are classified as system of farms and contain a main road. SELECT C FROM M in Map, C in Cell, R in Road WHERE M->name = “Canosa” AND C->map = M AND R->log_incell = C AND R->type_road=“main_road” AND class(C) = system_of_farms z To check the condition defined by the predicate class(C)=system_of_farms, the Query Interpreter generates the symbolic description of each cell in the map and asks the Query Engine of the Deductive Database to prove the goal class(C)=system_of_farms given the logic program previously learned. DMQL – Prof. D. Malerba 50

Conclusions and future work z A query language for spatial data mining based on Conclusions and future work z A query language for spatial data mining based on OQL z A solution to the problem of integrating different technologies (OODBMS, Deductive database, DM, …) z Differences with respect to traditional DMQL z Implementation of the interpreter in INGENS. Future Work z Extension of the set of descriptors automatically extracted from a vectorized map z Extension to other spatial data mining tasks supporting quantitative interpretation of maps DMQL – Prof. D. Malerba 51