Скачать презентацию Automated Digital Libraries William Y Arms Department of Скачать презентацию Automated Digital Libraries William Y Arms Department of

f0e5ac8f50a353c1525642f2d7c54e11.ppt

  • Количество слайдов: 34

Automated Digital Libraries William Y. Arms Department of Computer Science Cornell University 1 Automated Digital Libraries William Y. Arms Department of Computer Science Cornell University 1

Two Questions 2 Two Questions 2

Before Digital Libraries Access to scientific, medical, legal information In the United States: -- Before Digital Libraries Access to scientific, medical, legal information In the United States: -- excellent if you belonged to a rich organization (e. g, a major university) -- very poor otherwise In many countries of the world: -- very poor for everybody 3

Question 1 Must access to scientific and professional information be expensive? 4 Question 1 Must access to scientific and professional information be expensive? 4

Research Libraries are Expensive library materials buildings & facilities staff 5 Research Libraries are Expensive library materials buildings & facilities staff 5

The Potential of Digital Libraries open access ? materials computers & networks staff 6 The Potential of Digital Libraries open access ? materials computers & networks staff 6

Question 2 How effectively can computers be used for the skilled tasks of professional Question 2 How effectively can computers be used for the skilled tasks of professional librarianship? -- Time horizon: 5 to 20 years -- All materials in digital form 7

Automated Library Services 8 Automated Library Services 8

Skilled Librarianship People are skilled at judgment, understanding, discrimination, etc. : -- selection -- Skilled Librarianship People are skilled at judgment, understanding, discrimination, etc. : -- selection -- cataloguing, indexing -- seeking for information -- evaluating information -- reference service Can computers provide equivalent services? 9

Equivalent Services Example: Cataloguing rules -- Application of cataloguing rules to monographs is skilled Equivalent Services Example: Cataloguing rules -- Application of cataloguing rules to monographs is skilled -- It is hard to imagine a computer system with these skills but. . . -- Catalogs and cataloguing rules are the means not the end 10

Equivalent Services Information discovery Why are web search services the most widely used information Equivalent Services Information discovery Why are web search services the most widely used information discovery tools in universities today? 11

Conventional Criteria Web search services have many weaknesses ------ selection is arbitrary index records Conventional Criteria Web search services have many weaknesses ------ selection is arbitrary index records are crude no authority control duplicate detection is weak search precision is deplorable yet they clearly satisfy important requirements. . . 12

Effectiveness of Web Search Inspec v. Google is usually superior for general computing and Effectiveness of Web Search Inspec v. Google is usually superior for general computing and computer science questions > Broader coverage > Adequate indexing records > Better ranking 13

Simple Algorithms + Immense Computing Power 14 Simple Algorithms + Immense Computing Power 14

History: Licklider J. C. R. Licklider Libraries of the Future, 1965 -- envisaged digital History: Licklider J. C. R. Licklider Libraries of the Future, 1965 -- envisaged digital libraries for scientists at their place of work -- listed desiderata for a digital library -- studied construction of fully automated digital libraries -- put emphasis on artificial intelligence and natural language processing 15

History: Licklider's predictions for digital libraries were remarkably good, but. . . -- over History: Licklider's predictions for digital libraries were remarkably good, but. . . -- over optimistic about progress in artificial intelligence -- underestimated what can be done by brute force computing 16

Brute Force Computing Few people can appreciate the power of Moore's Law -- Computing Brute Force Computing Few people can appreciate the power of Moore's Law -- Computing power doubles every 18 months -- Increases 100 times in 10 years -- Increases 10, 000 times in 20 years Simple algorithms + immense computing power may outperform human intelligence 17

Brute Force Computing Example Creators of the world champion chess program (Deep Thought later Brute Force Computing Example Creators of the world champion chess program (Deep Thought later Deep Blue) -- moderate chess players -- simple tree-search algorithm -- very, very fast computer hardware 18

An Anecdote The question (Marvin Minsky) -- How would you design as computer system An Anecdote The question (Marvin Minsky) -- How would you design as computer system that can answer questions such as, "Why was the space station a bad idea? "? The answer (Danny Hillis) -- Design much more powerful computers! 19

Examples of Automated Digital Library Services 20 Examples of Automated Digital Library Services 20

Web Search Brute force indexing and retrieval -- retrieve every page on the web Web Search Brute force indexing and retrieval -- retrieve every page on the web -- index every word -- repeat every month Getting better all the time -- improved algorithms -- faster computers and networks -- analysis of users 21

Web Search Ranking algorithms Closeness of match -- vector space and statistical methods (Salton, Web Search Ranking algorithms Closeness of match -- vector space and statistical methods (Salton, et al. , c. 1970) Importance of digital object -- Google ranks web pages by how many other pages link to them, gives greater weight to links from higher ranking pages. (NSF/DARPA/NASA Digital Libraries Initiative) 22

Archiving and Preservation Internet Archive -- Monthly, web crawler gathers every open access web Archiving and Preservation Internet Archive -- Monthly, web crawler gathers every open access web page with associated images -- Web pages are preserved for future generations -- Files are available for scholarly research not perfect. . . -- HTML pages, images; no Java applets, style sheets -- materials are dumped with no organization or indexing -- access for scholars is rudimentary 23

Reference Linking Web of Science (ISI) -- input: combination of automatic means, skilled people Reference Linking Web of Science (ISI) -- input: combination of automatic means, skilled people -- limited number of journals -- very expensive Research. Index (a. k. a. Cite. Seer, a. k. a. Science. Index) (NEC) -- fully automatic -- all open access material in computer science -- a free service 24

Beyond Text Informedia (Carnegie Mellon) Automatic processing of segments of video, e. g. , Beyond Text Informedia (Carnegie Mellon) Automatic processing of segments of video, e. g. , television news. Algorithms for: ------ dividing raw video into discrete items generating short summaries indexing the sound track using speech recognition recognizing faces searching using natural language processing (NSF/DARPA/NASA Digital Libraries Initiative) 25

Costs and Benefits 26 Costs and Benefits 26

Costs of Catalogs and Indexes Catalog, index and abstracting records are very expensive when Costs of Catalogs and Indexes Catalog, index and abstracting records are very expensive when created by skilled professionals -- only available for certain categories of material (e. g. , monographs, scientific journals) -- contain limited fields of information (e. g. , no contents page) -- restricted to static information High costs reduce effectiveness and access 27

Costs of Automated Digital Libraries The Google company -- 5. 5 million searches daily Costs of Automated Digital Libraries The Google company -- 5. 5 million searches daily -- 85 people (half technical, 14 with Ph. D. in computing) -- 2, 500 PCs running Linux, with 80 terabytes of disk The Internet Archive -- 7 people with support from Alexa (March 2000) 28

Overall If you are rich. . . -- Research libraries, using commercial information services, Overall If you are rich. . . -- Research libraries, using commercial information services, provide excellent service at very high cost to a favored few -- Automated digital libraries are a long way from providing the personal reference service available to a faculty member at a well-endowed university but. . . 29

The Model T Library The Model T Ford, with mass production, brought car travel The Model T Library The Model T Ford, with mass production, brought car travel to the masses. . . -- Automated digital libraries, with open access materials, can already provide good service at low cost -- In the future automated digital libraries can bring scientific, scholarly, medical and legal information to everybody at negligible cost 30

A Footnote 31 A Footnote 31

Library Expertise The future of scientific and professional information is tied to computing, but. Library Expertise The future of scientific and professional information is tied to computing, but. . . -- automated digital libraries need small teams of highly skilled people -- development of automated digital libraries is bypassing libraries (Google, Research. Index, Informedia, Internet Archive) The level of computing expertise in U. S. research libraries is depressingly low 32

Further reading William Y. Arms, Further reading William Y. Arms, "Automated digital libraries. " To be submitted to D-Lib Magazine, July/August 2000. William Y. Arms, "Economic models for open-access publishing. " i. MP, March 2000. http: //www. cisp. org/imp/march_2000/03_00 arms. htm 33

Automated Digital Libraries William Y. Arms Department of Computer Science Cornell University 34 Automated Digital Libraries William Y. Arms Department of Computer Science Cornell University 34