2261fe941a1e0189eb091b5944a80421.ppt
- Количество слайдов: 40
Data Cloud Yury Lifshits Yahoo! Research http: //yury. name
My Beliefs The key challenge in web search is structured search Part 1: What is structured search? The key challenge in structured search is collecting data Part 2: Data distribution & idea of Data Cloud Part 3: Demo: numeric data distribution The key challenge in collecting data is incentive design Part 4: Economics of data distribution
Structured Search
Data = data of entities + data of content Structured data Semi-structured data Entity unit: Content unit: • Identifier • Body: text, video, audio, or image • Metadata: • Metadata: – Explicit key-value pairs – Relational properties – Evaluation
Structured Search Factoid search “what's the value of property X of object Y“ Entity hubs – Domain hubs Structured object search "all concerts this weekend in SF under 20$ sorted by popularity" – Time focus – Ranking focus – Relations focus Structured content search "all videos with Tom Brady" “all comments and blog posts about Bing"
Yury’s Wishlist Business-generated data • Products, services, news, wishlists, contact data Reality stream, sensors • Where what have happened Expert knowledge • Glossary, issues, typical solutions, object databases, related objects graph Events • Sport, concerts, education, corporate, community, private Market graph & signals • Like, interested, use, following, want to buy; votes and ratings
Search as a Platform Query analysis Classic search Web index Post analysis App 1 App 2 App 3 Structured Data App 4
Data Cloud How to collect all structured data in one place?
Data Producers • People: forums, wiki, mail groups, blogs, social networks • Enterprizes: product profiles, corporate news, professional content • Sensors: GPS modules, web cameras, traffic sensors, RFID • Transactional data
Data Distributors Data distributor is any technical solution to accumulate, organize and provide access to structured and semistructured data Data publisher: the original distributor of some data Data retailer: a consumerfacing distributor of some data
Data Consumers • Humans – Email – Aggregators: news, friend feeds, RSS readers – Search – Browsing / random walks • Intelligence projects – Recommendation systems – Trend mining
Data Cloud is a centralized fully-functional data distribution service Success metric for data cloud strategy = the total “value” of data on the cloud
To-Cloud Solutions • Extraction – DBpedia. org, “web tables” • Semantic markup, data APIs – Yahoo! Search. Monkey • Feeds – Yahoo! Shopping – Disqus. com, js-kit. com, Facebook Connect • Direct publishing
On-Cloud Solutions • Ontology maintenance – Freebase • Normalization, de-duplication, antispam • Named entity recognition, metadata inference, ranking • Data recycling (cross-references) – Amazon Public Data Sets – Viral license • Hosted search – Yahoo! BOSS
From-Cloud Solutions • Search, audience – Y! Search. Monkey, Google Base • Data API, dump access, update stream • Custom notifications – Gnip. com • Data cloud as a primary backend • Access control – Ad distribution. (AT&T and Yahoo! Local deal)
Demo: web. Numbr. com Joint work with Paul Tarjan
web. Numbr. com: Import • Crawl numbers from the web URL + XPath + regex • Create “numbr pages” • Update their values every hour • Keep the history Anyone can create a numbr http: //webnumbr. com/create
web. Numbr. com: Export • Embed code • Graphs • Search & browse • RSS
Economics of Data Distribution Joint work with Ravi Kumar and Andrew Tomkins
Network Effect in Two-Sided Markets Two sided market = every product serves consumers of two types A and B Cross-side network effect: the more type-A users product X has, the more attractive it is for type-B consumers and vice versa Examples: operating systems, credit cards, e-commerce marketplaces Two-sided network effects: A theory of information product design G. Parker, M. W. Van Alstyne, N. Bulkley, M. Van Alstyne
Basic model • Distributors D 1, … Dk • Producer/consumer joins only one distributor • Initial shares (p 1, c 1) … (pk, ck) • New consumer selects a distributor with a probability proportional to pi • New producer selects a distributor with probability proportional to ci
Basic model a 1 a 2 a 3 a 4
Market Shares Dynamics Theorem 1 Market shares will stabilize Theorem 2 With super-liner preference rule one of distributors will tip Theorem 3 With sub-liner preference rule market shares will flatten
External Factor Preference rule with external factor: ei+ci/(c 1+…+ck) Theorem 4 Market shares will stabilize on e 1 : e 2 : … : ek
Coalition Data Cloud
Coalitions Theorem 5 If all market shares are below 1/sqrt(k) coalition (sharing data) is profitable for all distributors Corollary Coalitions are not monotone Example: 5 : 4 : 1
Model Variations • • • Same-side network effect Different p-to-c and c-to-p rules Multi-homing (overlapping audiences) n^2 vs. nlog n revenue models Mature market: newcomer rate = departing rate • Diverse market (many types of producers and consumers) • Newcoming and departing distributors • Directed coalitions
Challenges
Marketing • Data demand? • Data offerings? • Requirements for distribution technology?
Incentive design • Incentives for data sharing? • Centralized or distributed? – For profit or non-profit? • Data licensing and ownership? • Monetizing data cloud?
More Challenges Prototyping: • • Data marketplace: open data & data demand Search plugins: related objects, glossaries, object timelines Publishing tools for structured data Data client: structured news, bookmarking, notifications Tech design: • Access management • Namespace design User interface: • Structured search UI • Discovery UI
Thanks! Follow my research: http: //twitter. com/yurylifshits http: //yury. name/blog


