cfb4192f132c98e6ed5530d6dc883032.ppt
- Количество слайдов: 19
Small-world connectors across academic web spaces Lennart Björneborn Royal School of Library and Information Science Copenhagen lb@db. dk Ao. IR-ASIST Workshop on Web Science Research Methods Association of Internet Researchers Conference, Brighton, UK 19 September 2004 M. C. Escher: House of Stairs, 1951 1
Wood et al. (1995) WWW = document network = collaborative weaving 2
web characteristics l www = new type of document system = no central control / coordination = bottom-up construction l www = distributed knowledge organisation = ’ 3 D’ = distributed + diversified + dynamic l www = individual input in collective medium = collaborative weaving l www = self-organized macro-level aggregations (clusters) of micro-level interactions l www = local actions global consequences (e. g. small-world phenomena) 3
small-world networks Watts & Strogatz 1998 l small-world = highly clustered + short paths – short distances through shortcuts between nodes in network – small-world = short local + short global distances – efficient diffusion of signals, contacts, ideas, viruses, etc. in networks l social network analysis in 1960 s: ’six degrees of separation’ – today: ‘small worlds’ in biological, chemical, technical, social networks – brains, ecological food webs, scientific collaboration networks, etc. 4
scale-free link distribution l power law = # in-neighbors / subsite 5
research motivation l distributed knowledge organization small world structures exploratory capabilities (accessibility + navigability) – core issues in LIS (library and information science) – short link paths human web surfers + digital web crawlers can reach and retrieve web pages l what micro-level web activities contribute to small-world link structures? – how do academic link creators actually connect documents, topics, genres, and sites across the Web? 6
main research question l what types of web links, web pages and web sites function as cross-topic connectors in small-world link structures across an academic web space? 7
informetrics bibliometrics scientometrics cybermetrics webometrics l the study of quantitative aspects of the construction and use of info. resources, structures and technologies on the Web, drawing on bibliometric and informetric approaches © Björneborn 2004 8
basic link terminology l B has an inlink from A : ~ citation A l B has an outlink to C : ~ reference B C G l B has a selflink : ~ self-citation E D F H co-links l E and F are reciprocally linked l H is reachable from A by a link path l A has a transversal link to G : shortcut l C and D have co-inlinks from B : ~ co-citation l B and E have co-outlinks to D : ~ bibliographic coupling © Björneborn 2004 9
UK link data 2001 l 109 UK universities l 7669 subsites – www. hum. port. ac. uk – www. atm. ox. ac. uk –. . . l 3. 4 million web pages l 39. 3 mill. page outlinks – 34. 4 million site selflinks – 4. 9 million site outlinks l delimited data set – 105 817 web pages – 207 865 links between 7669 subsites 10
5 -step methodology A A. Graph model of 7669 UK academic subsites; B. 189 random subsites in SCC (Strongest Connected Component); B C E D C. 10 path nets with all shortest paths between five pairs of topically dissimilar SCC subsites; D. Source and target pages along shortest link paths in 10 path nets; E. Links, pages and subsites providing transversal (cross-topic) connections in 10 path nets. 11
corona model bow-tie model 1893 SCC Strongest Connected Component Broder et al. 2000 2660 OUT 626 IN reachable from SCC traversable to SCC 96 IN-Tendrils connected from IN 2332 Disconnected 7 Tube 55 OUT-Tendrils connecting IN to OUT connected to OUT 12 © Björneborn 2004
shortest link path cajun. cs. nott. ac. uk ashmol. ox. ac. uk collections. ucl. ac. uk ukoln. bath. ac. uk ercoftac. mech. surrey. ac. uk vlmp. museophile. sbu. ac. uk cfd. me. umist. ac. uk cs. man. ac. uk 13 © Björneborn 2004
path net = ‘mini’ small world transversal link path net = all shortest link paths between two given nodes (subsites) © Björneborn 2004 14
10 path nets Faculty of Humanities and Social Sciences, hum. port. ac. uk Portsmouth Atmospheric, Oceanic and Planetary Physics, Oxford atm. ox. ac. uk Economics Dept, economics. soton. ac. uk Southampton Chemistry Dept, Glasgow chem. gla. ac. uk Psychology Dept, psy. man. ac. uk Manchester Mathematics Dept, Glasgow Caledonian maths. gcal. ac. uk Palaeontology Research Group, Earth Sciences Dept, Bristol palaeo. gly. bris. ac. uk Ophthalmology Dept, [eye research] Oxford eye. ox. ac. uk Speech Research Group, Linguistics speech. essex. ac. uk Dept, Essex Geography Dept, geog. plym. ac. uk Plymouth 5 pairs of topically dissimilar subsites + both directions = 10 path nets with all shortest paths 15
indicative findings l no generalizable findings – indicative only – national + sectoral + institutional delimitation = UK academic subsites – temporal delimitation = 2001 snapshot : do not cover dynamic changes – small stratified sample of 10 path nets l may however be fruitful for future large-scale investigations – computer-science sites may be important transversal (cross-topic) connectors across academic web spaces – personal link creators may be important connectors across sites and topics in academic web spaces – especially personal link lists – over 80% of transversal links may be academic (research, teaching) – close relation: hubs / authorities and betweenness centrality 16
web of genres & genre drift © Björneborn 2004 17
possible small-world implications/applications l library and information science – also focus on distributed knowledge organization (www) – also focus on exploratory capabilities in distributed info. systems § convergent (goal-directed) and divergent (serendipitous) info. behavior l web sociology / cyberscience – small-world links > cross-social / cross-domain weak ties – counteract balkanization into disconnected / unreachable insularities – small-world ‘gate-keepers’ with betweenness centrality in networks – tracking interdisciplinary boundary crossings – web mining of fertile areas for cross-disciplinary exploration and cross-pollination l search engines – better coverage in web traversal + harvesting – zoomable maps of web clusters + small-world shortcuts 18
Five ’laws’ of web connectivity – Links are for use – the very essence of hypertext; – Every surfer his or her link – the rich diversity of links across topics and genres; – Every link its surfer – ditto; – Save the time of the surfer – by visualizing web clusters and small-world shortcuts; – The Web is a growing organism. © Björneborn 2004 Inspired by Ranganathan (1931). The five laws of library science: “Books are for use. Every reader his or her book. Every book its reader. Save the time of the reader. The Library is a growing organism. ” 19
cfb4192f132c98e6ed5530d6dc883032.ppt