Скачать презентацию 2010 Workshop on Massive Data Analytics on the Скачать презентацию 2010 Workshop on Massive Data Analytics on the

c9aae5eb5efea08c961db030e6da212e.ppt

  • Количество слайдов: 8

2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010) April 26, 2010 2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010) April 26, 2010 Raleigh, NC, USA In association with the 19 th Annual World Wide Web Conference (WWW 2010)

Making Sense of Mountains of Data Search Online Transaction Processing System Feedback/Action Str tur Making Sense of Mountains of Data Search Online Transaction Processing System Feedback/Action Str tur e d Dashboards Continuous arrival of high volume information (evolving, highly variant) (struct-/semi--/un-structured uct Se str mi- n-struc t Semi-Un-struct Billions of mobile devices Feeds: þ Census Bureau Data þ Market Data, Weather Data þ Sensors data Embedded Analytics Mash ups Financial Planning Auto/Cross Correlation Analytics, Predictive Analytics Semi-U þ Click. Steam, CRM þ Claim data (text, picture, video) þ Call data records þ Location Tracking (GPS), þ i. Phone, Vehicle Use Data, þ $ Trans tracking (Across borders & IP providers), uc þ Web Data (for search) þ Web Buz data (for reputation analysis) Peta. Bytes -> Exabytes Scorecards Deep & Wide Analytics Fine grained – individual product and customer at a time and place

Massive Data Analytic Platforms • Google: Original Map. Reduce implementation • Microsoft: Dryad • Massive Data Analytic Platforms • Google: Original Map. Reduce implementation • Microsoft: Dryad • Yahoo!, Facebook, and many others: Hadoop • Ecosystems: Hive, Pig, Jaql, Zookeeper, • Alternatives to Map/Reduce, e. g. Pregel M • 1000’s processors Petabytes of data …and growing R M M C C Partition Sort • • C R • • • “Easy” parallelism Scalability Fault-Tolerance Elastic Flexibility Cost / Performance

Chairpeople Perspective • Other parallel systems technology and customers – Parallel Database – enterprise Chairpeople Perspective • Other parallel systems technology and customers – Parallel Database – enterprise data warehousing – Parallel ETL (extraction, transformation, load) – Search and text analytics • Hadoop and related technologies – Finance, Telco, Healthcare, Retail, Government, …

Questions Posed in Call For Papers • What kinds of problems are people trying Questions Posed in Call For Papers • What kinds of problems are people trying to solve? • How are existing massive-scaleout platforms used, and what extensions would be helpful? • Other kinds of platforms for different problems? • How to integrate with existing environments such as data warehouses? • Challenges in managing massive datasets? • Legal/moral challenges associated with mining these data sets?

Agenda (morning) 9: 00 - 10: 30: Session 1 Introduction and Welcome Invited Talk: Agenda (morning) 9: 00 - 10: 30: Session 1 Introduction and Welcome Invited Talk: "Hadoop: An Industry Perspective" Dr. Amr Awadallah, CTO, VP-Engineering, Cloudera 10: 30 - 11: 00: Coffee Break* 11: 00 - 12: 30: Session 2 Distributed Indexing of Web Scale Datasets for the Cloud Ioannis Konstantinou, Evangelos Angelou, Dimitrios Tsoumakos, Nectarios Koziris; National Technical University of Athens Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map-Reduce Joos-Hendrik Böse 1, Artur Andrzejak 2, Mikael Högqvist 2; 1 Intl. Comp. Sci. Institute, 2 Zuse Institute Berlin (ZIB) Efficient Updates for a Shared Nothing Analytics Platform Katerina Doka 3, Dimitrios Tsoumakos 4, Nectarios Koziris 3; 3 National Technical University of Athens, Greece, 4 University of Cyprus 12: 30 - 1: 30: Lunch*

Agenda (afternoon) 1: 30 - 3: 30: Session 3 Invited Talk: Agenda (afternoon) 1: 30 - 3: 30: Session 3 Invited Talk: "Large Scale Applications on Hadoop in Yahoo" Dr. Vijay Narayanan, Yahoo! Labs Silicon Valley, Extracting User Profiles from Large Scale Data Michal Shmueli-Scheuer, Haggai Roitman, David Carmel, Yosi Mass, David Konopnicki; IBM Research, Haifa A Novel Approach to Multiple Sequence Alignment using Hadoop Data Grids Sudha Sadasivam, G. Baktavatchalam; PSG College of Technology 3: 30 - 4: 00: Coffee Break* 4: 00 - 5: 30: Session 4 Towards Scalable RDF Graph Analytics on Map. Reduce Padmashree Ravindra, Vikas Deshpande, Kemafor Anyanwu; North Carolina State University SPARQL Basic Graph Pattern Processing with Iterative Map. Reduce Jaeseok Myung, Jongheum Yeon, Sang-goo Lee; Seoul National University Parallelizing Random Walk with Restart for Large-Scale Query Recommendation Meng-Fen Chiang, Tsung-Wei Wang, Wen-Chih Peng; National Chiao Tung University Hsinchu, Taiwan

Acknowledgements Workshop Chairs Ullas Nambiar, IBM India Research Lab, New Delhi, India John Mc. Acknowledgements Workshop Chairs Ullas Nambiar, IBM India Research Lab, New Delhi, India John Mc. Pherson, IBM Almaden Research Center, USA David Konopnicki, IBM Haifa Research Lab, Israel Steering Committee Rakesh Agrawal, Microsoft Search Labs, Mountain View, CA, USA Alon Halevy, Google Inc. , Mountain View, CA, USA Invited Speakers Amr Awadallah, CTO, VP-Engineering, Cloudera, "Hadoop: An Industry Perspective" Vijay Narayanan, Yahoo! Labs Silicon Valley, "Large Scale User Modeling on Hadoop" Program Committee Amr Awadallah, Cloudera, USA Andrew Mc. Callum, University of Massachusetts Amherst, USA Assaf Schuster, Technion - Israel Institute of Technology Gautam Das, University of Texas, Arlington, USA Jimeng Sun, IBM Watson Research Center, USA John Shafer, Microsoft Search Labs, USA Kevin Chang, University of Illinois at Urbana-Champaign, USA Kun Liu, Yahoo! Labs, USA Louiqa Raschid, University of Maryland, College Park, USA Michal Shmueli-Scheuer, IBM Haifa Research Lab, Israel Michael Sheng, University of Adelaide, Australia Mong Li Lee, National University of Singapore, Singapore Rajeev Gupta, IBM India Research Lab, India Vanja Josifovski, Yahoo Research, USA Yannis Sismanis, IBM Almaden Research Center, USA Yi Chen, Arizona State University, USA Wen-syan Li, SAP, China