Скачать презентацию Patstat beyond Europe By Gianluca Tarasconi Madrid 9 12 2010 Скачать презентацию Patstat beyond Europe By Gianluca Tarasconi Madrid 9 12 2010

534aacb39b0dd659194d7d0a66e332ca.ppt

  • Количество слайдов: 14

Patstat beyond Europe By Gianluca Tarasconi Madrid, 9/12/2010 An insight into Patstat data from Patstat beyond Europe By Gianluca Tarasconi Madrid, 9/12/2010 An insight into Patstat data from patent authorities other than EPO 1

What is PATSTAT stands for EPO Worldwide Patent Statistical Database. Contains a snapshot of What is PATSTAT stands for EPO Worldwide Patent Statistical Database. Contains a snapshot of the EPO master documentation database (DOCDB) which contains data of about 90 national and international patent offices with different degree of coverage. Data include bibliographic data, citations and family links. This database is designed to be used for statistical research and requires the data to be loaded in the customer's own database. http: //www. epo. org/patents/patent-information/raw-data/test/product-1424. html http: //forums. epo. org/epo-patstat-faqs/ 2

Non EPO data vs APE-INV Name Game Data from other patent authorities may help Non EPO data vs APE-INV Name Game Data from other patent authorities may help in: o Validate algorithms against other spellings/conventions; o Fill missing/correct data (FI address/city) using data from equivalents; o Use Patent Family(1) data to improve algorithms using other data to give a similarity score; (1) For a list of patent family definitions see : C. Martinez Insight into Different Types of Patent Families, STI Working Paper 2010/2 3

Example (I): inpadoc family # 75, Mr Roberts PUBLN_ AUTH PUBLN_NR BG 98254 INVT_SEQ_N Example (I): inpadoc family # 75, Mr Roberts PUBLN_ AUTH PUBLN_NR BG 98254 INVT_SEQ_N CTRY_CO R DE LAST_NAME 2 GB ROBERTS, TONY G. FIRST_NAME ADDRESS CITY DK 0517145 2 GB ROBERTS, TONY GORDON EP 0517145 1 GB Roberts, Tony Gordon, Glaxo Group Research Limited Park Road, Ware Hertfordshire, SG 12 0 DG IE 921780 2 TONY GORDON ROBERTS RU 2102393 5 TONI GORDON ROBERTS US 5905082 2 GB Roberts Tony Gordon Ware GLAXO GROUP RESEARCH LIMITED; PARK ROAD; WARE HERTFORDSHIRE SG 12 0 DG WO 9221676 2 GB ROBERTS, TONY, GORDON 6 different spellings for name, 3 different addresses In this case name and city are better parsed in US equivalent patent data; 4

Example (II): inpadoc family # 88, Mr Newman PUBLN_AUTH PUBLN_NR EP 060544 2 EP Example (II): inpadoc family # 88, Mr Newman PUBLN_AUTH PUBLN_NR EP 060544 2 EP WO US INVT_SEQ_NR CTRY_C ODE LAST_NAME ADDRESS CITY 1 US NEWMAN, Roland, A. 43111 Robbins Street San Diego, CA 92122 085488 5 2 US NEWMAN, Roland, A. 4311 Robbins Street San Diego, CA 92122 930210 8 1 US NEWMAN, ROLAND, A. 43111 ROBBINS STREET; SAN DIEGO, CA 92122 2 US Newman, Roland Anthony 6136310 San Diego WO patent data confirm that correct address is 43111 Robbins street US patent tells us A. stand for Antony 5

What countries (I) o o o Patstat contains 92 application authorities; 45 are inside What countries (I) o o o Patstat contains 92 application authorities; 45 are inside Europe; 47 are outside Europe; Contains regional/international authorities (WIPO; ARIPO…); Contains also ‘terminated’ authorities (DDR, URSS) 6

What countries (II) 1 Albania (AL) 13 China (CN) 25 Estonia (EE) 37 Hungary What countries (II) 1 Albania (AL) 13 China (CN) 25 Estonia (EE) 37 Hungary (HU) 49 Luxembourg (LU) 61 Nicaragua (NI) 73 Russia (RU) 85 United States of America (US) 2 ARIPO (AP) 14 Costa Rica (CR) 26 Egypt (EG) 38 Indonesia (ID) 50 Latvia (LV) 62 Netherlands (NL) 74 Sweden (SE) 86 Uruguay (UY) 3 Argentina (AR) 15 Czechoslovakia (CS) 27 European Patent Office (EP) 39 Ireland (IE) 51 Morocco (MA) 63 Norway (NO) 75 Singapore (SG) 4 Austria (AT) 16 Cuba (CU) 28 Spain (ES) 40 Israel (IL) 52 Monaco (MC) 54 New Zealand (NZ) 76 Slovenia (SI) 87 Viet Nam (VN) 88 World Intellectual Property Organization (WO) 5 Australia (AU) 17 Cyprus (CY) 29 Finland (FI) 41 India(IN) 53 Moldova (MD) 65 OAPI (OA) 77 Slovakia (SK) 89 Former Serbia and Montenegro (YU) 6 Bosnia and Herzegovina (BA) 18 Czech republic (CZ) 30 France (FR) 42 Iceland (IS) 78 San Marino (SM) 90 South Africa (ZA) 7 Belgium (BE) 19 German Democratic republic (DD) 31 Great Britain (GB) 43 Italy (IT) 54 Republic of Montenegro (ME) 66 Panama (PA) 55 Former Yugoslav Republic of Macedonia (MK) 67 Peru (PE) 79 Soviet Union (SU) 91 Zambia (ZM) 8 Bulgaria (BG) 20 Germany (DE) 32 Gulf Cooperation Council (GC) 44 Japan (JP) 56 Mongolia (MN) 68 The Philippines (PH) 80 El Salvador (SV) 92 Zimbabwe (ZW) 9 Brazil (BR) 21 Denmark (DK) 33 Georgia (GE) 45 Kenya (KE) 57 Malta (MT) 69 Poland (PL) 81 Tajikistan (TJ) 10 Canada (CA) 22 Algeria (DZ) 34 Greece (GR) 46 Korea (South) (KR) 58 Malawi (MW) 70 Portugal (PT) 82 Turkey (TR) 11 Switzerland (CH) 23 Eurasia (EA) 35 Hong Kong S. A. R (HK) 47 Liechtenstein (LI) 59 Mexico (MX) 71 Romania (RO) 83 Taiwan (TW) 12 Chile (CL) 24 Ecuador (EC) 36 Croatia (HR) 48 Lithuania (LT) 60 Malaysia (MY) 72 Republic of Serbia (RS) 84 Ukraine (UA) 7 (last upd. 19. 4. 2010)

What dimensions are relevant A) data coverage (% of coverage by year) Are data What dimensions are relevant A) data coverage (% of coverage by year) Are data from patent authority X 100% included into Patstat from year W to year Z ? B) Data transmission delays How long does it take a non EPO patent to reach in PATSTAT? C) Completeness of geographic data How is quality (and coverage) of address / city / country code ? 8

Data coverage (I) EPO gives partial informations http: //www. epo. org/patents/patent-information/data-quality. html http: //www. Data coverage (I) EPO gives partial informations http: //www. epo. org/patents/patent-information/data-quality. html http: //www. epo. org/patents/patent-information/raw-data/useful-tables. html Total number of applications is given but not the % of total (EPO gives what it gets) 9

Data coverage (II): example on India CC Authority DATE NUMBERS Kind of data DOCDB Data coverage (II): example on India CC Authority DATE NUMBERS Kind of data DOCDB Kind Group KIND CODE Last input week IN India 02/08/1975 11/05/2007 137485 203704 In patstat are reported from EPO 66219 Indian applications Indian Patent office reports 28. 882 applications filed only for 2006 10 Patent A 1, E P 2005/52

Data Transimission delays (I) We study time series 2003 - 2008 for BR, CN, Data Transimission delays (I) We study time series 2003 - 2008 for BR, CN, JP, DE, KR and IN compared to EP; Graph differences suggest publication lags and data transmission lags differ from country to country; Timeseries may also highlight ‘holes’ or changes of population (FI USPTO from 2000 onward) BR 2003 2004 2005 2006 2007 2008 20878 22811 23922 13414 9197 7340 CN DE 205557 134623 235189 111554 287662 105002 341493 95404 382948 83663 404476 73819 EP 137230 145312 154398 160288 160275 139610 11 IN 1047 1115 1687 1966 2195 2493 JP 432789 443034 447845 428966 405234 356748 KR 108922 129515 160590 183037 187712 175785

Data Transimission delays (II) 12 Data Transimission delays (II) 12

Completeness of geographic data APPLN AUTH US EP DE JP CN CA AU SU Completeness of geographic data APPLN AUTH US EP DE JP CN CA AU SU AT KR FR GB RU CH BR SE FI IT ES DD inventors no state no zip no country no address no city 5960856 3705123 2750079 1798271 1537587 1120490 1087573 968915 653048 637296 565254 531087 394691 338739 292047 256248 212722 192460 133471 129845 86% 100% 100% 100% 100% 100% 98% 100% 100% 100% 100% 100% 21% 0% 33% 98% 2% 45% 98% 41% 29% 14% 98% 70% 29% 11% 89% 85% 11% 74% 17% 7% 97% 1% 100% 99% 100% 100% 99% 65% 100% 98% 43% 100% 97% 25% 1% 100% 100% 100% 100% 100% 13 Table for the TOP 20 by inventor count; 13 authorities have more than 80% of records with no country code; 12 authorities have 0% of address/city; Anyway in many cases address data are inside first name field (FI: DE) (data from patstat 09/2009)

Conclusions Non EPO have coverage, quality and ‘spelling’ that may change a lot from Conclusions Non EPO have coverage, quality and ‘spelling’ that may change a lot from patent authority to patent authority; Data can be used as addictional source of information but not as main source (BONUS not MALUS); EPO could probably improve quality of this data, especially add more addresses (FI in april 2011 will release WO address data) is up to users demand more on this topic. 14