Скачать презентацию DINAMIC Data Analytics for Big Data Vandana P Скачать презентацию DINAMIC Data Analytics for Big Data Vandana P

4ff6b03e6739693407365831e2a4befd.ppt

  • Количество слайдов: 17

DINAMIC Data Analytics for Big Data Vandana P. Janeja Information Systems Department, University of DINAMIC Data Analytics for Big Data Vandana P. Janeja Information Systems Department, University of Maryland, Baltimore County, MD, USA

DINAMIC Big Data • What is Big Data? • Recently much good science, whether DINAMIC Big Data • What is Big Data? • Recently much good science, whether physical, biological, or social, has been forced to confront - and has often benefited from - the Big Data phenomenon. • Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology. (p. 115) Diebold, F. X. (2003), Big Data Dynamic Factor Models for Macroeconomic Measurement and Forecasting: A Discussion of the Papers by Reichlin and Watson, " In M. Dewatripont, L. P. Hansen and S. Turnovsky (eds. ), Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress of the Econometric Society, Cambridge University Press, 115 -122

DINAMIC Big data spans four dimensions: Volume, Velocity, Variety, and Veracity DINAMIC Big data spans four dimensions: Volume, Velocity, Variety, and Veracity

DINAMIC • Volume: Enterprises are awash with ever-growing data of all types, – Terabytes-petabytes-exabytes—of DINAMIC • Volume: Enterprises are awash with ever-growing data of all types, – Terabytes-petabytes-exabytes—of information. – Turn 12 terabytes of Tweets created each day into improved product sentiment analysis – Convert 350 billion annual meter readings to better predict power consumption

DINAMIC • Velocity: Sometimes 2 minutes is too late. – For time-sensitive processes such DINAMIC • Velocity: Sometimes 2 minutes is too late. – For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. – Scrutinize 5 million trade events created each day to identify potential fraud – Analyze 500 million daily call detail records in real-time to predict customer churn faster

DINAMIC • Variety: Big data is any type of data - structured and unstructured DINAMIC • Variety: Big data is any type of data - structured and unstructured data – text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. – Monitor 100’s of live video feeds from surveillance cameras to target points of interest – Exploit the 80% data growth in images, video and documents to improve customer satisfaction

DINAMIC • Veracity: 1 in 3 business leaders don’t trust the information they use DINAMIC • Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. – How can you act upon information if you don’t trust it? – Establishing trust in big data presents a huge challenge as the variety and number of sources grows.

DINAMIC Analytics DINAMIC Analytics

DINAMIC Is it all about algorithms DINAMIC Is it all about algorithms

DINAMIC DINAMIC

DINAMIC Will it make a difference if some of this data is from France DINAMIC Will it make a difference if some of this data is from France and some from Maryland ? Will it make a difference if some of this data is from LA and some from Baltimore ? Will it make a difference if some of this data is from Maryland some from D. C ? Will it make a difference if some of this data is from Howard County, MD and some from Montgomery County, MD ?

US HIGHWAYS 42, 000 Americans Are Killed DINAMIC • • On Highways Each Year US HIGHWAYS 42, 000 Americans Are Killed DINAMIC • • On Highways Each Year Nearly one-third of all fatal crashes each year are caused by substandard road conditions and roadside hazards. Motor vehicle crashes cost the United States $231 billion annually, including $21 billion from Federal and State tax revenue. Americans Waste $67 Billion Each Year Due To Congestion According to the 2001 statistics, NJ ranks 12 in intersection fatalities with 32. 1% of all state highway fatalities, and ranks 12 in pedestrian fatalities with 17. 7% of all state highway fatalities (USDOT) Ref: http: //www. house. gov/transportation/press 2005/release 9. html

DINAMIC LA Times 4/27/09 12 pm DINAMIC LA Times 4/27/09 12 pm

DINAMIC CDC Officials Confirm Swine Flu Cases Up to 40; Outbreak May Worsen : DINAMIC CDC Officials Confirm Swine Flu Cases Up to 40; Outbreak May Worsen : ABC News 2/27/09 1 pm Dr. William Schaffner, chairman of Preventive Medicine at Vanderbilt University Medical Center in Nashville, Tenn. , said doctors like him have been advised by the CDC and state health department to set up a system that would test patients with flu-like symptoms and help define how widespread this outbreak is. He said the severity of the virus is hard to gauge because of the wide discrepancy in how it has affected Mexicans and Americans, and because it is occurring in places that are warm, which is very unusual. "The genetic make up of this virus has influenza experts scratching their heads, " he said. "One of the things that has us worried is that could this be a virus that could continue to make mischief during the warmest parts of the year. That would be a big thing. For a respiratory virus to be active during the summer months" would be very unique.

DINAMIC Knowledge Discovery (KDD) Process – Data mining—core of knowledge discovery process Pattern Evaluation DINAMIC Knowledge Discovery (KDD) Process – Data mining—core of knowledge discovery process Pattern Evaluation Data Mining Task-relevant Data Warehouse Selection Data Cleaning Data Integration 19 March Databases 2018 Data Mining: Concepts and Techniques 15

DINAMIC Big Data Framework • Automatic Parallelization • Run-time – – Data partitioning Task DINAMIC Big Data Framework • Automatic Parallelization • Run-time – – Data partitioning Task scheduling Handling machine failures Managing inter-machine communication • Completely transparent to the programmer/analyst/user

DINAMIC Relevant IS Courses • IS 410 Introduction to Database Design • IS 420 DINAMIC Relevant IS Courses • IS 410 Introduction to Database Design • IS 420 Database Application Development • IS 427 Introduction to Artificial Intelligence: Concepts and Applications • IS 428 Data Mining Techniques and Applications • IS 498 Special Topics • Independent studies