Скачать презентацию Data Cloud Yury Lifshits Yahoo Research http yury Скачать презентацию Data Cloud Yury Lifshits Yahoo Research http yury

2261fe941a1e0189eb091b5944a80421.ppt

  • Количество слайдов: 40

Data Cloud Yury Lifshits Yahoo! Research http: //yury. name Data Cloud Yury Lifshits Yahoo! Research http: //yury. name

My Beliefs The key challenge in web search is structured search Part 1: What My Beliefs The key challenge in web search is structured search Part 1: What is structured search? The key challenge in structured search is collecting data Part 2: Data distribution & idea of Data Cloud Part 3: Demo: numeric data distribution The key challenge in collecting data is incentive design Part 4: Economics of data distribution

Structured Search Structured Search

Data = data of entities + data of content Structured data Semi-structured data Entity Data = data of entities + data of content Structured data Semi-structured data Entity unit: Content unit: • Identifier • Body: text, video, audio, or image • Metadata: • Metadata: – Explicit key-value pairs – Relational properties – Evaluation

Structured Search Factoid search “what's the value of property X of object Y“ Entity Structured Search Factoid search “what's the value of property X of object Y“ Entity hubs – Domain hubs Structured object search "all concerts this weekend in SF under 20$ sorted by popularity" – Time focus – Ranking focus – Relations focus Structured content search "all videos with Tom Brady" “all comments and blog posts about Bing"

Yury’s Wishlist Business-generated data • Products, services, news, wishlists, contact data Reality stream, sensors Yury’s Wishlist Business-generated data • Products, services, news, wishlists, contact data Reality stream, sensors • Where what have happened Expert knowledge • Glossary, issues, typical solutions, object databases, related objects graph Events • Sport, concerts, education, corporate, community, private Market graph & signals • Like, interested, use, following, want to buy; votes and ratings

Search as a Platform Query analysis Classic search Web index Post analysis App 1 Search as a Platform Query analysis Classic search Web index Post analysis App 1 App 2 App 3 Structured Data App 4

Data Cloud How to collect all structured data in one place? Data Cloud How to collect all structured data in one place?

Data Producers • People: forums, wiki, mail groups, blogs, social networks • Enterprizes: product Data Producers • People: forums, wiki, mail groups, blogs, social networks • Enterprizes: product profiles, corporate news, professional content • Sensors: GPS modules, web cameras, traffic sensors, RFID • Transactional data

Data Distributors Data distributor is any technical solution to accumulate, organize and provide access Data Distributors Data distributor is any technical solution to accumulate, organize and provide access to structured and semistructured data Data publisher: the original distributor of some data Data retailer: a consumerfacing distributor of some data

Data Consumers • Humans – Email – Aggregators: news, friend feeds, RSS readers – Data Consumers • Humans – Email – Aggregators: news, friend feeds, RSS readers – Search – Browsing / random walks • Intelligence projects – Recommendation systems – Trend mining

Data Cloud is a centralized fully-functional data distribution service Success metric for data cloud Data Cloud is a centralized fully-functional data distribution service Success metric for data cloud strategy = the total “value” of data on the cloud

To-Cloud Solutions • Extraction – DBpedia. org, “web tables” • Semantic markup, data APIs To-Cloud Solutions • Extraction – DBpedia. org, “web tables” • Semantic markup, data APIs – Yahoo! Search. Monkey • Feeds – Yahoo! Shopping – Disqus. com, js-kit. com, Facebook Connect • Direct publishing

On-Cloud Solutions • Ontology maintenance – Freebase • Normalization, de-duplication, antispam • Named entity On-Cloud Solutions • Ontology maintenance – Freebase • Normalization, de-duplication, antispam • Named entity recognition, metadata inference, ranking • Data recycling (cross-references) – Amazon Public Data Sets – Viral license • Hosted search – Yahoo! BOSS

From-Cloud Solutions • Search, audience – Y! Search. Monkey, Google Base • Data API, From-Cloud Solutions • Search, audience – Y! Search. Monkey, Google Base • Data API, dump access, update stream • Custom notifications – Gnip. com • Data cloud as a primary backend • Access control – Ad distribution. (AT&T and Yahoo! Local deal)

Demo: web. Numbr. com Joint work with Paul Tarjan Demo: web. Numbr. com Joint work with Paul Tarjan

web. Numbr. com: Import • Crawl numbers from the web URL + XPath + web. Numbr. com: Import • Crawl numbers from the web URL + XPath + regex • Create “numbr pages” • Update their values every hour • Keep the history Anyone can create a numbr http: //webnumbr. com/create

web. Numbr. com: Export • Embed code • Graphs • Search & browse • web. Numbr. com: Export • Embed code • Graphs • Search & browse • RSS

Economics of Data Distribution Joint work with Ravi Kumar and Andrew Tomkins Economics of Data Distribution Joint work with Ravi Kumar and Andrew Tomkins

Network Effect in Two-Sided Markets Two sided market = every product serves consumers of Network Effect in Two-Sided Markets Two sided market = every product serves consumers of two types A and B Cross-side network effect: the more type-A users product X has, the more attractive it is for type-B consumers and vice versa Examples: operating systems, credit cards, e-commerce marketplaces Two-sided network effects: A theory of information product design G. Parker, M. W. Van Alstyne, N. Bulkley, M. Van Alstyne

Basic model • Distributors D 1, … Dk • Producer/consumer joins only one distributor Basic model • Distributors D 1, … Dk • Producer/consumer joins only one distributor • Initial shares (p 1, c 1) … (pk, ck) • New consumer selects a distributor with a probability proportional to pi • New producer selects a distributor with probability proportional to ci

Basic model a 1 a 2 a 3 a 4 Basic model a 1 a 2 a 3 a 4

Market Shares Dynamics Theorem 1 Market shares will stabilize Theorem 2 With super-liner preference Market Shares Dynamics Theorem 1 Market shares will stabilize Theorem 2 With super-liner preference rule one of distributors will tip Theorem 3 With sub-liner preference rule market shares will flatten

External Factor Preference rule with external factor: ei+ci/(c 1+…+ck) Theorem 4 Market shares will External Factor Preference rule with external factor: ei+ci/(c 1+…+ck) Theorem 4 Market shares will stabilize on e 1 : e 2 : … : ek

Coalition Data Cloud Coalition Data Cloud

Coalitions Theorem 5 If all market shares are below 1/sqrt(k) coalition (sharing data) is Coalitions Theorem 5 If all market shares are below 1/sqrt(k) coalition (sharing data) is profitable for all distributors Corollary Coalitions are not monotone Example: 5 : 4 : 1

Model Variations • • • Same-side network effect Different p-to-c and c-to-p rules Multi-homing Model Variations • • • Same-side network effect Different p-to-c and c-to-p rules Multi-homing (overlapping audiences) n^2 vs. nlog n revenue models Mature market: newcomer rate = departing rate • Diverse market (many types of producers and consumers) • Newcoming and departing distributors • Directed coalitions

Challenges Challenges

Marketing • Data demand? • Data offerings? • Requirements for distribution technology? Marketing • Data demand? • Data offerings? • Requirements for distribution technology?

Incentive design • Incentives for data sharing? • Centralized or distributed? – For profit Incentive design • Incentives for data sharing? • Centralized or distributed? – For profit or non-profit? • Data licensing and ownership? • Monetizing data cloud?

More Challenges Prototyping: • • Data marketplace: open data & data demand Search plugins: More Challenges Prototyping: • • Data marketplace: open data & data demand Search plugins: related objects, glossaries, object timelines Publishing tools for structured data Data client: structured news, bookmarking, notifications Tech design: • Access management • Namespace design User interface: • Structured search UI • Discovery UI

Thanks! Follow my research: http: //twitter. com/yurylifshits http: //yury. name/blog Thanks! Follow my research: http: //twitter. com/yurylifshits http: //yury. name/blog