Скачать презентацию Exploiting Syntactico-Semantic Structures for Relation Extraction Yee Seng Скачать презентацию Exploiting Syntactico-Semantic Structures for Relation Extraction Yee Seng

652f1e767209a56ad7c01b1fb8142226.ppt

  • Количество слайдов: 37

Exploiting Syntactico-Semantic Structures for Relation Extraction Yee Seng Chan and Dan Roth University of Exploiting Syntactico-Semantic Structures for Relation Extraction Yee Seng Chan and Dan Roth University of Illinois at Urbana-Champaign ACL-2011 1

Relation Types Located-at county jail Employ Part-of … ABC anchorman … … California ‘s Relation Types Located-at county jail Employ Part-of … ABC anchorman … … California ‘s Governor … … president at AOL … … Texas ranch Chicago 's O'Hare airport city 's stock exchange building workers in central Seoul hotels in the city Aaron Brown , ABC news , New York … Chicago , Illinois … Peggy Wehmeyer , Louisiana 2

Syntactico-Semantic Structures Located-at Premodifier county jail Employ Part-of … ABC anchorman … … California Syntactico-Semantic Structures Located-at Premodifier county jail Employ Part-of … ABC anchorman … … California ‘s Governor … … president at AOL … … Texas ranch Possessive Chicago 's O'Hare airport city 's stock exchange building Preposition workers in central Seoul hotels in the city Formulaic Aaron Brown , ABC news , New York … Chicago , Illinois … Peggy Wehmeyer , Louisiana Verbal 3

Premodifier Structure Located-at Premodifier [[county] jail] Employ Part-of … [[ABC] anchorman] … … California Premodifier Structure Located-at Premodifier [[county] jail] Employ Part-of … [[ABC] anchorman] … … California ‘s Governor … … president at AOL … … [[Texas] ranch] Possessive Chicago 's O'Hare airport city 's stock exchange building Preposition workers in central Seoul hotels in the city Formulaic Aaron Brown , ABC news , New York … Chicago , Illinois … Peggy Wehmeyer , Louisiana Verbal 4

Possessive Structure Located-at Premodifier [[county] jail] Employ Part-of … [[ABC] anchorman] … … [[California] Possessive Structure Located-at Premodifier [[county] jail] Employ Part-of … [[ABC] anchorman] … … [[California] ‘s Governor] … … president at AOL … … [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] 's stock exchange building] Preposition workers in central Seoul hotels in the city Formulaic Aaron Brown , ABC news , New York … Chicago , Illinois … Peggy Wehmeyer , Louisiana Verbal 5

Preposition Structure Located-at Employ Part-of … [[ABC] anchorman] … … [[California] ‘s Governor] … Preposition Structure Located-at Employ Part-of … [[ABC] anchorman] … … [[California] ‘s Governor] … … [workers] in central [president] at … … [Seoul] [AOL] [Chicago] , [Illinois] … Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] 's stock exchange building] Preposition [hotels] in the [city] Formulaic [Aaron Brown] , ABC news , [New York] … [Peggy Wehmeyer] , [Louisiana] Verbal 6

Formulaic Structure Located-at Premodifier [[county] jail] Employ Part-of … [[ABC] anchorman] … … [[California] Formulaic Structure Located-at Premodifier [[county] jail] Employ Part-of … [[ABC] anchorman] … … [[California] ‘s Governor] … … [president] at [AOL] … … … [Chicago] , [Illinois] … [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] 's stock exchange building] Preposition [workers] in central [Seoul] [hotels] in the [city] Formulaic [Aaron Brown] , ABC news , [New York] [Peggy Wehmeyer] , [Louisiana] Verbal 7

Relations and Structures Located-at Premodifier [[county] jail] Employ Part-of … [[ABC] anchorman] … … Relations and Structures Located-at Premodifier [[county] jail] Employ Part-of … [[ABC] anchorman] … … [[Texas] ranch] Possessive [[Chicago] 's usually Prior work O'Hare airport] type dimension: [[California] ‘s … proceed along the relation … Governor] [[city] 's • Train a single multi-class stock exchange building] classifier to disambiguate between the different relation types when given [president] at [AOL] … … Preposition [workers] mentions in context two in central [Seoul] [hotels] in the [city] Formulaic We highlight that there exists another “Structure” [Aaron Brown] , dimension: ABC news , [New York] … [Chicago] , [Illinois] … • Useful for improving relation extraction [Peggy Wehmeyer] , performance [Louisiana] Verbal 8

Overview n Relation types vs Syntactico-semantic structures Premodifier, Possessive, Preposition, Formulaic ¨ The structures Overview n Relation types vs Syntactico-semantic structures Premodifier, Possessive, Preposition, Formulaic ¨ The structures cover ~80% of relation instances ¨ Structures can be robustly detected using regular expressions, and allows us to detect and discard unrelated mention pairs ¨ n n n Framework: how to exploit structures during training and testing Mention and baseline relation extraction systems Experiments: Settings ¨ Improvement in relation extraction F 1 -score ¨ n Conclusion 9

Structures Located-at Premodifier [[county] jail] Employ Part-of … [[ABC] anchorman] … … [[Texas] ranch] Structures Located-at Premodifier [[county] jail] Employ Part-of … [[ABC] anchorman] … … [[Texas] ranch] Possessive [[California] ‘s … Which structures should we focus on? … Governor] [[Chicago] 's O'Hare airport] [[city] 's stock exchange building] Preposition Why do we care about structures? [workers] in central [president] at [AOL] … … Why [Seoul] do they help relation extraction? [hotels] in the [city] Formulaic [Aaron Brown] , ABC news , [New York] … [Chicago] , [Illinois] … [Peggy Wehmeyer] , [Louisiana] Verbal 10

Coverage of Syntactico-Semantic Structures Occurrence distribution Employ in relations of ACE-2004: … Located-at Part-of Coverage of Syntactico-Semantic Structures Occurrence distribution Employ in relations of ACE-2004: … Located-at Part-of Premodifier county jail 31% ABC anchorman … … California ‘s Governor … … president at AOL … … Texas ranch Possessive Chicago 's 19% O'Hare airport city 's stock exchange building Preposition workers in central Seoul 24% hotels in the city Formulaic Aaron 7% Brown , ABC news , New York … Chicago , Illinois … Peggy Wehmeyer , Louisiana Verbal 19% 11

Coverage of Syntactico-Semantic Structures Occurrence distribution Employ in relations of ACE-2004: … Located-at Part-of Coverage of Syntactico-Semantic Structures Occurrence distribution Employ in relations of ACE-2004: … Located-at Part-of Premodifier county jail 31% ABC anchorman … … California ‘s Governor … … president at AOL … … Texas ranch Possessive Chicago 's 19% O'Hare airport city 's stock exchange building Preposition workers in central Seoul 24% hotels in the city Formulaic Aaron 7% Brown , ABC news , New York … Chicago , Illinois … 81% Peggy Wehmeyer , Louisiana Verbal 19% 12

Utility of Syntactico-Semantic Structures Why would structures help RE? Part-of Located-at Employ n Relations Utility of Syntactico-Semantic Structures Why would structures help RE? Part-of Located-at Employ n Relations are defined over mention … pairs ABC anchorman Premodifier county jail Texas ranch mention pairs are not related n Most n … … California ‘s … ACE corpus, only 7% admit interesting … Governor relations (sparse positive examples) city 's ¨ Will building stock exchangebe nice! if we can reliably detect and discard the 93% null relations president at AOL … … Preposition workers in central Seoul Possessive Chicago 's the ¨ In O'Hare airport hotels in the city Formulaic Aaron Brown , ABC news , New York … Chicago , Illinois … Peggy Wehmeyer , Louisiana Verbal 13

Utility of Syntactico-Semantic Structures Why would structures help RE? Part-of Located-at Employ n Relations Utility of Syntactico-Semantic Structures Why would structures help RE? Part-of Located-at Employ n Relations are defined over mention … pairs ABC anchorman Premodifier county jail Texas ranch mention pairs are not related n Most n … … California ‘s … ACE corpus, only 7% admit interesting … Governor relations (sparse positive examples) city 's ¨ Will building stock exchangebe nice! if we can reliably detect and discard the 93% null relations president at AOL … … Preposition workers in central Seoul Possessive Chicago 's the ¨ In O'Hare airport n We implement hotels in the city Formulaic simple regular expressions to detect the four structures … Aaron Brown , the existence of … Chicago , ABC news , New York Peggy Wehmeyer , B Louisiana A Illinois Some interesting relation between mentions A and B Verbal 14

Detecting Structures Located-at Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] Detecting Structures Located-at Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] 's Employ Part-of … … …]…] [[[ABC][ anchorman] nouns, adjectives [[…] … [[California] ‘s […[… Governor] ] … ] [[…] … : = ‘s : = PRP$ or WP$ stock exchange building] Preposition [workers] in central [Seoul] [hotels] in the [city] Formulaic [Aaron Brown] , ABC news , [New York] [Peggy Wehmeyer] , [Louisiana] [president] at… ] IN [ … … [Chicago] , … [AOL] ] IN/TO [ … ] [… […] … [… ] … [ GPE ] , [ GPE ] [ PER ] [ ORG ] [Illinois] [ PER ] , [ ORG ] [ PER ] [ GPE ] [ PER ] / [ ORG ] 15

Detecting Structures Located-at Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] Detecting Structures Located-at Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] 's Employ Part-of … … …]…] [[[ABC][ anchorman] nouns, adjectives [[…] … [[California] ‘s […[… Governor] ] … ] [[…] … : = ‘s : = PRP$ or WP$ stock exchange building] Preposition [workers] in central [Seoul] [hotels] in the [city] Formulaic [Aaron Brown] , ABC news , [New York] [Peggy Wehmeyer] , [Louisiana] [president] at… ] IN [ … … [Chicago] , … [AOL] ] IN/TO [ … ] [… […] … [… ] … [ GPE ] , [ GPE ] [ PER ] [ ORG ] [Illinois] [ PER ] , [ ORG ] [ PER ] [ GPE ] [ PER ] / [ ORG ] 16

Detecting Structures Located-at Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] Detecting Structures Located-at Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] 's Employ Part-of … … …]…] [[[ABC][ anchorman] nouns, adjectives [[…] … [[California] ‘s […[… Governor] ] … ] [[…] … : = ‘s : = PRP$ or WP$ stock exchange building] Preposition [workers] in central [Seoul] [hotels] in the [city] Formulaic [Aaron Brown] , ABC news , [New York] [Peggy Wehmeyer] , [Louisiana] [president] at… ] IN [ … … [Chicago] , … [AOL] ] IN/TO [ … ] [… […] … [… ] … [ GPE ] , [ GPE ] [ PER ] [ ORG ] [Illinois] [ PER ] , [ ORG ] [ PER ] [ GPE ] [ PER ] / [ ORG ] 17

Detecting Structures Located-at Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] Detecting Structures Located-at Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] 's Employ Part-of … … …]…] [[[ABC][ anchorman] nouns, adjectives [[…] … [[California] ‘s […[… Governor] ] … ] [[…] … : = ‘s : = PRP$ or WP$ stock exchange building] Preposition [workers] in central [Seoul] [hotels] in the [city] Formulaic [Aaron Brown] , ABC news , [New York] [Peggy Wehmeyer] , [Louisiana] [president] at… ] IN [ … … [Chicago] , … [AOL] ] IN/TO [ … ] [… […] … [… ] … [ GPE ] , [ GPE ] [ PER ] [ ORG ] [Illinois] [ PER ] , [ ORG ] [ PER ] [ GPE ] [ PER ] / [ ORG ] 18

Detecting Structures Located-at Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] Detecting Structures Located-at Premodifier [[county] jail] [[Texas] ranch] Possessive [[Chicago] 's O'Hare airport] [[city] 's Employ Part-of … … …]…] [[[ABC][ anchorman] nouns, adjectives [[…] … [[California] ‘s […[… Governor] ] … ] [[…] … : = ‘s : = PRP$ or WP$ stock exchange building] Preposition [workers] in central [Seoul] [hotels] in the [city] Formulaic [Aaron Brown] , ABC news , [New York] [Peggy Wehmeyer] , [Louisiana] [president] at… ] IN [ … … [Chicago] , … [AOL] ] IN/TO [ … ] [… […] … [… ] … [ GPE ] , [ GPE ] [ PER ] [ ORG ] [Illinois] [ PER ] , [ ORG ] [ PER ] [ GPE ] [ PER ] / [ ORG ] 19

Robustness of Patterns at Detecting Structures Located-at Rec(%) Premodifier county jail 87 Employ Part-of Robustness of Patterns at Detecting Structures Located-at Rec(%) Premodifier county jail 87 Employ Part-of … 80 ABC anchorman … … 88 California ‘s Governor … … president at AOL … … Texas ranch Possessive Chicago 's 90 O'Hare airport city 's stock exchange building Preposition workers in central Seoul 95 20 hotels in the city Formulaic Aaron Brown , 62 86 ABC news , New York … Chicago , Illinois … Peggy Wehmeyer , Louisiana Verbal 20

Contributions of our Work n Syntactico-semantic structures: ¨ Annotated in ACE, but not used Contributions of our Work n Syntactico-semantic structures: ¨ Annotated in ACE, but not used by prior work. We suggest how to use them and highlight that they can improve RE performance 21

Contributions of our Work n Syntactico-semantic structures: ¨ n Annotated in ACE, but not Contributions of our Work n Syntactico-semantic structures: ¨ n Annotated in ACE, but not used by prior work. We suggest how to use them and highlight that they can improve RE performance When one does not have a large amount of training examples, exploiting background knowledge such as these (simple) structures is critical to RE performance 22

Contributions of our Work n Syntactico-semantic structures: ¨ n n Annotated in ACE, but Contributions of our Work n Syntactico-semantic structures: ¨ n n Annotated in ACE, but not used by prior work. We suggest how to use them and highlight that they can improve RE performance When one does not have a large amount of training examples, exploiting background knowledge such as these (simple) structures is critical to RE performance We show that in both settings (gold vs predicted mentions), exploiting structures help RE ¨ We reduce pipeline error by also considering mention pairs where one of the mention is predicted as null, so long as we think the mention pair exhibit a valid structure 23

Overview n Relation types vs Syntactico-semantic structures Premodifier, Possessive, Preposition, Formulaic ¨ The structures Overview n Relation types vs Syntactico-semantic structures Premodifier, Possessive, Preposition, Formulaic ¨ The structures cover ~80% of relation instances ¨ Structures can be robustly detected using regular expressions, and allows us to detect and discard unrelated mention pairs ¨ n n n Framework: how to exploit structures during training and testing Mention and baseline relation extraction systems Experiments: Settings ¨ Improvement in relation extraction F 1 -score ¨ n Conclusion 24

Overview n Relation types vs Syntactico-semantic structures Premodifier, Possessive, Preposition, Formulaic ¨ The structures Overview n Relation types vs Syntactico-semantic structures Premodifier, Possessive, Preposition, Formulaic ¨ The structures cover ~80% of relation instances ¨ Structures can be robustly detected using regular expressions, and allows us to detect and discard unrelated mention pairs ¨ n n n Framework: how to exploit structures during training and testing Mention and baseline relation extraction systems Experiments: Settings ¨ Improvement in relation extraction F 1 -score ¨ n Conclusion 25

Training: the usual approach RE Relation training examples: Pairs of gold mentions standard training Training: the usual approach RE Relation training examples: Pairs of gold mentions standard training A B A B Only about 7% of these exhibit relations 26

Training: with Structures RE Relation training examples: Pairs of gold mentions A B A Training: with Structures RE Relation training examples: Pairs of gold mentions A B A B Only about 7% of these exhibit relations standard training Apply patterns: • • Premod? Poss? Prep? Formulaic? YES: train RE (with. Struct) 27

Testing: with Structures Mention pairs: Formed with predicted mentions A B A B RE Testing: with Structures Mention pairs: Formed with predicted mentions A B A B RE NO Apply patterns: • • Premod? Poss? Prep? Formulaic? If the patterns think that a mention pair is involved in some structure: • likely that both mentions are valid mentions YES RE (with. Struct) 28

Overview n Relation types vs Syntactico-semantic structures Premodifier, Possessive, Preposition, Formulaic ¨ The structures Overview n Relation types vs Syntactico-semantic structures Premodifier, Possessive, Preposition, Formulaic ¨ The structures cover ~80% of relation instances ¨ Structures can be robustly detected using regular expressions, and allows us to detect and discard unrelated mention pairs ¨ n n n Framework: how to exploit structures during training and testing Mention and baseline relation extraction systems Experiments: ¨ ¨ n Settings Improvement in relation extraction F 1 -score Conclusion 29

Extraction Systems n Mention typing system: Given a text span in context, predict the Extraction Systems n Mention typing system: Given a text span in context, predict the mention type ¨ ACE-2004 defines 43 fine-grained mention types ¨ Can predict null to indicate not a mention ¨ n Baseline relation typing system: n n Most of our features are based on (Zhou et al. , ACL-05; Chan and Roth, COLING-10) Obtains state-of-the-art performance 30

Experimental Settings n ACE-2004 n n 6 (coarse) and 22 (fine) grained relations Asymmetric Experimental Settings n ACE-2004 n n 6 (coarse) and 22 (fine) grained relations Asymmetric relations A n B A B Built three classifiers (given a mention pair, predict): binary (2 labels) ¨ coarse-grained (13 relation labels) ¨ fine-grained (45 relation labels) ¨ n Used the nwire and bnews corpora: 55 K relation instances (mention pairs) with 4 K (non-null) ¨ 5 -fold cross validation ¨ 31

Evaluation Settings n Adopted the experimental settings in (Chan and Roth, COLING-10) Prior work Evaluation Settings n Adopted the experimental settings in (Chan and Roth, COLING-10) Prior work Our work Train-test data splits at mention level Train-test data splits at document level Evaluation at mention level Evaluation at entity level more realistic 32

Evaluation Settings n Evaluate our performance at the entity level Prior work calculated RE Evaluation Settings n Evaluate our performance at the entity level Prior work calculated RE performance at the level of mentions ¨ ACE annotators rarely duplicate a relation link for coreferent mentions: r ¨ . . . mi. . . mj. . . mk. . . null? ¨ Both the usual and our scoring method give very similar RE results 33

Improvement in RE F 1(%) with patterns using gold mentions 8 Binary Coarse 7 Improvement in RE F 1(%) with patterns using gold mentions 8 Binary Coarse 7 Fine 6 5 F 1(%) Improvement 4 3 2 1 0 80 70 60 50 40 30 20 10 Proportion (%) of data used for training 5 34

Improvement in RE F 1(%) with patterns using gold mentions Binary 8 Coarse 7 Improvement in RE F 1(%) with patterns using gold mentions Binary 8 Coarse 7 Fine 6 5 F 1(%) Improvement 4 3 80 2 Observations: • Using structures is especially beneficial 1 when there is little training data • Binary performance 0 benefits the most 70 60 50 40 30 20 10 (which aligns with 5 intuition), and this Proportion (%) of data used for training propagates to coarse and fine grained predictions 35

Improvement in RE F 1(%) with patterns using predicted mentions 8 Binary Coarse 7 Improvement in RE F 1(%) with patterns using predicted mentions 8 Binary Coarse 7 Fine 6 5 F 1(%) 4 Improvement 3 2 1 0 80 70 60 50 40 30 20 Proportion (%) of data used for training 10 5 36

Conclusion n n Proposed a novel algorithmic approach to RE by exploiting syntactico-semantic structures Conclusion n n Proposed a novel algorithmic approach to RE by exploiting syntactico-semantic structures These structures can be robustly identified using regular expressions This simple approach provides improvement in RE performance, especially critical when little training data is available Possible directions for future work: ¨ Probably many near misses when applying structure patterns on predicted mentions 37