Скачать презентацию Big Data And Analytics Challenges and Issues Stephen Скачать презентацию Big Data And Analytics Challenges and Issues Stephen

96cff5659465b79d4266522c7f53dd18.ppt

  • Количество слайдов: 95

Big Data And Analytics Challenges and Issues Stephen H. Kaisler, D. Sc. Frank J. Big Data And Analytics Challenges and Issues Stephen H. Kaisler, D. Sc. Frank J. Armour, Ph. D. J. Alberto Espinosa, Ph. D. William H. Money, Ph. D. Presented at HICSS-49 January 5, 2016 Grand Hyatt, Poipu, Kauai, Hawaii

Who We Are Stephen H. Kaisler, D. Sc. Senior Associate PCI Strategic Management Columbia, Who We Are Stephen H. Kaisler, D. Sc. Senior Associate PCI Strategic Management Columbia, MD Stephen. [email protected] com Frank J. Armour, Ph. D. Kogod School of Business American University Washington, DC [email protected] edu 3/18/2018 BDA-2 J. Alberto Espinosa, Ph. D. Professor and Chair Kogod School of Business American University Washington, DC [email protected] edu William H. Money, Ph. D. Associate Professor School of Business Administration The Citadel Charleston, SC [email protected] edu Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money BDA-2

Outline Topic Schedule Analytics: An Introduction 1300 -1415 Break 1415 -1445 Analytics: 1445 -1600 Outline Topic Schedule Analytics: An Introduction 1300 -1415 Break 1415 -1445 Analytics: 1445 -1600 BDA-3 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Tutorial Purpose • Big Data and Analytics is thought to be about: – Business Tutorial Purpose • Big Data and Analytics is thought to be about: – Business Intelligence and Analytics – Computational Science • But, it is much more than that! – – Demographic Analysis Geointelligence: Spatial Analysis The Grand Challenges in Science Medicine: Processing 3 -D hyperspectral high resolution images for diagnostics, genomic research, Proteonomics, etc. – Media Analysis: Processing text, audio, video, imagery – And much, much more …. . • So, we want to introduce you to the issues and challenges at the frontiers of Advanced Analytics! BDA-4 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

The Business Analytics Landscape Strategies Social, Email, Blogs, Video, Mobile Marketing, Sales - Product The Business Analytics Landscape Strategies Social, Email, Blogs, Video, Mobile Marketing, Sales - Product Listing, Promotions Applications ERP, CRM, Databases, Internal Applications, Customer/Consumer facing applications Context Web, Customers, Products, Business Systems, Processes and Services Support Systems CRM, Recommendation Systems Data warehouses, Business Intelligence Ref: S. Radhakrishnan, Advanced analytics: The Next Wave of Business Intelligence, Business Intelligence Conference, 2011 BDA-5 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

The Emerging Analytics Landscape • Extending diagnostic analytics to different domains • Developing new The Emerging Analytics Landscape • Extending diagnostic analytics to different domains • Developing new predictive and prescriptive analytics based on advanced analytic techniques – Prediction based on scenario development rather than just probabilities – Prescription based on advanced simulation and visualization capabilities • Development of Analytic Scientist curricula and degree programs • Expansion beyond the traditional business intelligence applications and scientific application based on descriptive analytics. BDA-6 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

What is Advanced Analytics? • Advanced analytics: – the application of multiple analytic methods What is Advanced Analytics? • Advanced analytics: – the application of multiple analytic methods that address the diversity of big data – structured or unstructured – – to provide descriptive results, and – to yield actionable predictive and prescriptive results that facilitate decision-making. • Beyond data mining and statistical processing methods to encompass logic-based methods, qualitative analytics, and nonstatistical quantitative methods. • A diverse set of techniques that require new software architectures and application frameworks to solve complex problems. • New metrics that focus on the contributions of the value of the analysis as a holistic result are required to assess and evaluate the outcomes of advanced analytics. BDA-7 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Setting the Stage: A Few Words About Big Data BDA-8 Copyright (except where referenced) Setting the Stage: A Few Words About Big Data BDA-8 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Big Data Definition • Big Data is the amount of data just beyond technology’s Big Data Definition • Big Data is the amount of data just beyond technology’s capability to store, manage and process efficiently. Ah, but a man’s reach should exceed his grasp, Or what’s a heaven for? ” – Robert Browning Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money BDA-9

Big Data - Quick Recap • Organizations have access to a wealth of information: Big Data - Quick Recap • Organizations have access to a wealth of information: – They can’t get value out of it because most of it is sitting in its most raw form or in a semistructured or unstructured form – They don't even know whether it's worth keeping (or even able to keep it for that matter). • Attributes: – Gartner: Volume, Velocity, Variety – Kaisler, Armour, Espinosa, Money: Value, Veracity – Others have also been defined • Value relies on Big Data processing being fast and agile BDA-10 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Hmmm! BDA-11 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Hmmm! BDA-11 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

The Data Scientist Hal Varian, Mckinsey Quarterly, January 2009: “The sexy job in the The Data Scientist Hal Varian, Mckinsey Quarterly, January 2009: “The sexy job in the next ten years will be statisticians… The ability to take data —to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill. ” Ref: http: //www. mckinseyquarterly. com/Hal_Varian_on_how_the_Web_challenges_managers_2286 “The critical job in the next 20 years will be the analytic scientist … the individual with the ability to understand a problem domain, to understand know what data to collect about it, to identify analytics to process that data/information, to discover its meaning, and to extract knowledge from it— that’s going to be a very critical skill. ” - Kaisler, Armour, Espinosa, Money (2014) Amended For both roles: Analytic scientists require advanced training in specific domains, data science tools, multiple analytics, and visualization to perform predictive and prescriptive analytics. They may hold Ph. D. ’s, but pragmatic experience in a domain will be equally important. Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money BDA-12

Some Big Data Issues Affecting Analytics • Volume: – How much data is really Some Big Data Issues Affecting Analytics • Volume: – How much data is really relevant to the problem solution? Cost of processing? – So, can you really afford to store and process all that data? • Velocity: – Much data coming in at high speed – Need for streaming versus block approach to data analysis – So, how to analyze data in-flight and combine with data at-rest • Variety: – – • A small fraction is structured formats, Relational, XML, etc. A fair amount is semi-structured, as web logs, etc. The rest of the data is unstructured text, photographs, etc. So, no single data model can currently handle the diversity Veracity: cover term for … – Accuracy, Precision, Reliability, Integrity – So, what is it that you don’t know about the data? • Value: – How much value is created for each unit of data (whatever it is)? – So, what is the contribution of subsets of the data to the problem solution? BDA-13 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Types of Analytics • • • Descriptive: A set of techniques for reviewing and Types of Analytics • • • Descriptive: A set of techniques for reviewing and examining the data set(s) to understand the data and analyze business performance. Diagnostic: A set of techniques for determine what has happened and why Predictive: A set of techniques that analyze current and historical data to determine what is most likely to (not) happen Prescriptive: A set of techniques for computationally developing and analyzing alternatives that can become courses of action – either tactical or strategic – that may discover the unexpected Decisive: A set of techniques for visualizing information and recommending courses of action to facilitate human decision-making when presented with a set of alternatives. Passive Active Deductive Descriptive Diagnostic Inductive Predictive Prescriptive Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money BDA-14

Descriptive Analytics • Process: http: //v 1 shal. com/content/25 cartoons-give-current-big-datahype-perspective/ • – Identify the Descriptive Analytics • Process: http: //v 1 shal. com/content/25 cartoons-give-current-big-datahype-perspective/ • – Identify the attributes, then assess/evaluate the attributes – Estimate the magnitude to correlate the relative contribution of each attribute to the final solution – Accumulate more instances of data from the data sources – If possible, perform the steps of evaluation, classification and categorization quickly – Yield a measure of adaptability within the OODA loop At some threshold, crossover into diagnostic and predictive analytics Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money BDA-15

Diagnostic Analytics • Process: – Begin with descriptive analytics – Extract patterns from large Diagnostic Analytics • Process: – Begin with descriptive analytics – Extract patterns from large data quantities via data mining – Correlate data types for explanation of near-term behavior – past and present – Estimate linear/non-linear behavior not easily identifiable through other approaches. • Example: by classifying past insurance claims, estimate the number of future claims to flag for investigation with a high probability of being fraudulent. 16 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Predictive Analytics • Process: – Begin with descriptive AND diagnostic analytics – Choose the Predictive Analytics • Process: – Begin with descriptive AND diagnostic analytics – Choose the right data based on domain knowledge and relationships among variables – Choose the right techniques to yield insight into possible outcomes – Determine the likelihood of possible outcomes given initial boundary conditions – Remember! Data driven analytics is non-linear; do NOT treat like an engineering project 17

Prescriptive Analytics • Process: – – – Begin w/ predictive analytics Determine what should Prescriptive Analytics • Process: – – – Begin w/ predictive analytics Determine what should occur and how to make it so Determine the mitigating factors that lead to desirable/undesirable outcomes “What-if” analysis w/ local or global optimization Ex: Find the best set of prices and advertising frequency to maximize revenue Ex: And, the right set of business moves to make to achieve that goal “Make it so” Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money BDA-18

Decisive Analytics • Process: – Given a set of decision alternatives, choose the one Decisive Analytics • Process: – Given a set of decision alternatives, choose the one course of action to do from possibly many – But, it may not be the optimal one. – Visualize alternatives – whole or partial subset – Perform exploratory analysis – what-if and why • How do I get to there from here? • How did I get here from there? BDA-19 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Advanced Analytics: Application of Analytics To Critical Problems 3/18/2018 BDA-20 Copyright (except where referenced) Advanced Analytics: Application of Analytics To Critical Problems 3/18/2018 BDA-20 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

The Role of Analytics • “Tools and techniques that gear the analyst’s mind to The Role of Analytics • “Tools and techniques that gear the analyst’s mind to apply higher levels of critical thinking can substantially improve analysis… structuring information, challenging assumptions, and exploring alternative interpretations. ” Richards Heuer, Jr. , “The Psychology of Intelligence Analysis” • Beware Frege’s Caution: – Converse Problems: • If you magnify on details, you are losing the overview • If you focus on the overview, you don’t see the details – Problem with Data Mining: • Applying statistics to understand the trends causes a loss of grounding in the data 3/18/2018 BDA-21 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

The Analytics Continuum • Analytics problems span a continuum: – Short-term analysis leads to The Analytics Continuum • Analytics problems span a continuum: – Short-term analysis leads to quick fixes and quick results, which may be unsustainable – What are the disruptive innovations in the middle-term that provide near-term domain leadership? – Long-term leads to strategic changes and innovations that provide sustainable domain dominance. push Finding a Needle In a Haystack Top-Down Analysis Deductive 3/18/2018 BDA-22 pull A little bit of this; a little bit of that Middle-Out Analysis Abductive? Spinning Straw into Gold Bottom-Up Analysis Inductive Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Analytics Classes Indications & Warning Dynamical Systems (Hidden) Markov Models Event Data Analysis Econometric Analytics Classes Indications & Warning Dynamical Systems (Hidden) Markov Models Event Data Analysis Econometric Models Probabilistic Models Principal Components Analysis Game Theory Models Logic Systems Generally, indications of warfare and potential conflict and other crises, based on quantitative information found in open source datasets Differential or difference equations of low dimensionality representing competing actors (incl. system dynamics) Time-phased data aggregated at fixed intervals with scaled values. Separate from underlying events input to set of discrete states w/ associated probabilities. Analysis of abstracted and coded streams of short-term interactions among competing or cooperating actors Large-scale aggregate models of social actors, states or organizations in economic and social systems – regional, national, international. Regression and statistical models estimating the probability of how variables will affect a specified outcome. Techniques for the reduction of high-dimensionality models to a few critical dimensions to facilitate prediction and visualization. Application of 2 -person and N-person game theory to competitive and collaborative situations involving strategic interdependence. Use of logical formulae and systems to represent and solve qualitative Problems, including deductive, abductive, and inductive techniques. Ref: Kaisler and Cioffi-Revilla 2007; Kaisler, Armour, Espinosa, and Money 2014 3/18/2018 BDA-23 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Analytics Classes 3/18/2018 BDA-24 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Analytics Classes 3/18/2018 BDA-24 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Analytics Is About Discovery Novelty Discovery – Finding new, rare, one-in-a-[million / billion / Analytics Is About Discovery Novelty Discovery – Finding new, rare, one-in-a-[million / billion / trillion/ etc. ] objects and events Class Discovery – Finding new classes of objects and behaviors – Learning the rules that constrain class boundaries Association Discovery – Finding unusual (improbable) co-occurring associations Correlation Discovery – Finding patterns and dependencies, which reveal new natural laws or new scientific principles associations Ref: Kirk Borne, Dynamic Events in Massive Data Streams, GMU BDA-25 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

The Goal of Analytics http: //timoelliott. com/blog/wpcontent/uploads/2013/04/Whats-the-ROI-of-knowing -what-you-dont-know. jpg From sensors (data collection, measurement, The Goal of Analytics http: //timoelliott. com/blog/wpcontent/uploads/2013/04/Whats-the-ROI-of-knowing -what-you-dont-know. jpg From sensors (data collection, measurement, observation, …) to Monitoring and Alerting to Sensemaking (Data and Analytics Science) to Cents-Making (Getting to ROI!!) Adapted from: Kirk Borne, Dynamic Events in Massive Data Streams, GMU BDA-26 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Sensemaking • In the end, what analytics is really about is sensemaking: – What Sensemaking • In the end, what analytics is really about is sensemaking: – What does that event really mean to me/him/her/my friends/etc. ? – What is a plausible explanation? • Sensemaking: – Fits data into a frame or mental model – Can be physical or social – Requires situational awareness that helps us to adapt and respond to known and unexpected or unknown situations – Interpreting – something is there that is waiting to be discovered or approximated – Comparison to previous experience - retrospectively – Requires a higher level of intellectual engagement, not a passive translation • Klein (2006) theorizes that sensemaking processes are initiated when individuals or organizations recognize a lack of understanding of events 3/18/2018 BDA-27 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Sensemaking Challenges • Lillian Wu (IBM) has noted that everything is becoming: – Instrumented: Sensemaking Challenges • Lillian Wu (IBM) has noted that everything is becoming: – Instrumented: We now have the ability to measure, sense and see the exact condition of practically everything. – Interconnected: People, systems and objects can communicate and interact with each other in entirely new ways – Intelligent: People, systems and objects can respond to changes quickly and accurately, and get better results by predicting and optimizing for future events. • • How to deal with ambiguity? How to deal with too much data? Have we let algorithms and large centralized data centres not only control the remembering but also the meaning and interpretation of the data? (Giulia Forsythe, http: //gforsythe. ca/big-data-sensemaking/) We know how to do massive data collection and have the ability to index, curate, search and share. – But, what seems to be missing is the ability to review, reflect, recall and ponder Ref: Technology, Data. Analytics, PSM workshop -- October 14, 2011 3/18/2018 BDA-28 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money BDA-28

Sensemaking: Properties (Weick) • Grounded in identity construction – Know thyself & Know your Sensemaking: Properties (Weick) • Grounded in identity construction – Know thyself & Know your enemies/friends, … • Retrospective: – Look back to look forward; Look forward to look back • Social – Who socialized you, how, and who will see the results – Beliefs: I believe what you believe what I believe, etc. • Ongoing and dynamic: – Who or what changes over time and space • Cues: what are the initiators? • Plausibility rather than accuracy: – Understanding only needs to be sufficient, not comprehensive 3/18/2018 BDA-29 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Sensemaking: An Application • Mobile Crowdsensing: – Individuals with sensing and computing devices collectively Sensemaking: An Application • Mobile Crowdsensing: – Individuals with sensing and computing devices collectively share information to measure and map phenomena of common interest within communities of people • Participatory sensing - individuals are actively involved in contributing sensor data • Opportunistic sensing - autonomous and user involvement is minimal • Localized analytics at the device and near field • Aggregate analytics at the central repository • Privacy Issues: Potentially collecting sensitive sensor data pertaining to individuals Ref: Ganti, R. K. , F. Le, and H. Lei. ? ? . Mobile Crowdsensing: Current State and Future Challenges, BM T. J. Watson Research Center, Hawthorne, NY 3/18/2018 BDA-30 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Knowledge-Centric Systems • User-centric Systems — Systems That Know • Adaptive Systems — Systems Knowledge-Centric Systems • User-centric Systems — Systems That Know • Adaptive Systems — Systems That Learn – Knowledge-driven solutions that feature modeling, collaboration, and advanced analytics to detect patterns, make sense, simulate, predict, learn, take action, and improve performance with use and scale. • Smart Operations — Systems That Reason – Knowledge-driven solutions that reason like experts, advise as avatars, adapt, are autonomic, perform autonomously. 3/18/2018 BDA-31 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money These are the types of systems using advanced analytics – Knowledge-driven solutions that connect open information, share open decision-making rules, deliver open composite services, access and navigate information in context of use, and provide virtual assistants that manage cases and complete tasks.

The New Analytic Paradigm #1: You will be expected to do something with information The New Analytic Paradigm #1: You will be expected to do something with information #2: There really is more to know #3: You will have to know more about knowing #4: Brain science and decision science are converging #5: The environment is changing our brain #6: Information management is the essence of leadership #7: A more connected world means much more data is available (and accessible) #8: Math matters (but so does logic and rules) #9: There are significant downsides to not knowing #10: Knowing can change the world Source: Thompson May, The New Know: Innovation Powered by Analytics, 2009 3/18/2018 BDA-32 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Analytics Challenges BDA-33 Analytics Challenges BDA-33

Finding A Needle in a Haystack • • With (all) the data available, find Finding A Needle in a Haystack • • With (all) the data available, find the/a key pattern that indicates a situational change – A single event – Perhaps, a sequence of events – (Not the signal in the noise problem!!) Have we seen this pattern before? – Determine its characteristics, not just that it exists • • • Predict what event occurs next because this/these event(s) occurred in the pattern How to identify relevant fragments of data easily from a multitude of data sources? Difficult to determine what the right answer is in advance Problem: The needle hasn’t grown as fast as the haystack!! Problem: We need new analytics methods to deal with larger, more complex data and problems!! 3/18/2018 BDA-34 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Finding A Needle in a Haystack • What if the “needle” happens to be Finding A Needle in a Haystack • What if the “needle” happens to be a complex data structure? – Brute force search and computation are unlikely to succeed due to inefficiency – Complexity increases with streaming data as opposed to a static data set • Absence of evidence (so far) is not evidence of absence! (Borne 2013) • What preprocessing do we need to do before searching? – Quality vs. Quantity: What data are required to satisfy the given value proposition? – At what precision, accuracy, and reliability? • What if the needle must be derived rather than found? – How do we track the provenance of the derived data/information? – Is the process repeatable as we change algorithms and data structures? Challenge: Consider finding the few packets in the millions (er, tens of billions) flowing through a network that carry a virus or malware. 3/18/2018 BDA-35 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Ref: Crawford, L. Access and Analytics to the UK Archive, British Library, 2010 Needle(s) Ref: Crawford, L. Access and Analytics to the UK Archive, British Library, 2010 Needle(s) in a Haystack BDA-36 Blogs Uhmm! What are we supposed to glean from this picture? Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Network Forensics • Networks have become exponentially faster. They carry more traffic and more Network Forensics • Networks have become exponentially faster. They carry more traffic and more types of data than ever before. Yet as they get faster, they become more difficult to monitor and analyze. – 40 G Networks – Richer Data: VOIP as the telephony standard – Malicious security threats are more subtle • Problems: – – • Finding proof of a security attack Troubleshooting intermittent performance issues Identifying the source of data leaks Troubleshooting VOIP and Video over VOIP Network forensics must be: – – – Precise: capture high-speed packets without droppage Scalable: extend to new network technologies and speeds Flexible: adapt to heterogeneous network segments VOIP-Smart: reconstruct & replay Vo. IP calls; present Call Detail Records (CDR) for each call Continuously available: run 24/7 with adequate storage; support real-time analysis

Finding the Knees • The knee of an algorithm or analytic is the scale Finding the Knees • The knee of an algorithm or analytic is the scale value at which the performance begins to degrade as larger data volumes are processed. – Every analytic method and algorithm can have one (or more? ) – Where positive slope increases begin to flatten out – Where positive or flat slopes transition to negative slopes • Factors affecting the knee: – data structure, volume, and variety – algorithm complexity and implementation, and – infrastructure implementation. • What is/are the corollaries for non-algorithmic analytics? 3/18/2018 BDA-38 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Finding the Tipping Point • A tipping point is one in which change in Finding the Tipping Point • A tipping point is one in which change in a system becomes potentially irreversible and maybe even unstoppable. – Maybe associated with negative or positive effects – In social systems, a buildup to a critical mass at which point a seminal change occurs. – Ex: My. Space was a formidable component of Facebook, but once the Facebook membership reached its “tipping point” people started abandoning My. Space and signing up for Facebook. • Small events can create ripple effects – may be linear or non-linear, chaotic or perturbative • Concept of emerging trends in the commercial marketplace • The explosion of a viral infection into an epidemic Ref: Choucri, N. , et al. (2006) Understanding and Modeling State stability: Exploiting System Dynamics. MIT Sloan Research Papers, No. 4574 -06, Jan. 2006. BDA-39 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Spinning Straw Into Gold • With (all) the data available, describe a situation in Spinning Straw Into Gold • With (all) the data available, describe a situation in a generalized form such that predictions for future events and prescriptions for courses of actions can be made. • Objective: Identify one or more patterns that characterize the behavior of the system. • Remember: All data has value to someone, but not all data has value to everyone. • • • Patterns may be unknown or ambiguously defined. Patterns may be morphing over time. The problem is sensemaking: the dual process of trying to fit data to a frame or model and of fitting a frame around the data. Neither data nor frame comes first! Must evolve concurrently! 3/18/2018 BDA-40 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Dealing with Ambiguity • Ambiguity arises from entity resolution: – – Not enough data Dealing with Ambiguity • Ambiguity arises from entity resolution: – – Not enough data to explicitly resolve two or more entities or objects 1 st Degree ER: “who is who”? (within domain) 2 nd Degree ER: “who knows whom” (a graph/network analysis problem) 3 rd Degree ER: cross-domain linkage (ontological resolution) • Example: Text Processing and Understanding: – Resolving ambiguity in human languages (much data is unstructured text) • Ex: The word “strike” has over 30 meanings in English – Entity resolution is a multi-level process (Talburt 2009 -2011) • Ex: There are more than 45, 000 people named “John Smith” in the U. S. • Computational complexity increases with knowledge level. – Tradeoff is end-to-end processing time versus number of entities to be resolved • Scaling may be problematic. BDA-41 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Talburt’s Hierarchy for Entity Resolution Method Description Deterministic Matching Link/merge two entity mentions based Talburt’s Hierarchy for Entity Resolution Method Description Deterministic Matching Link/merge two entity mentions based on the degree of similarity between the values of corresponding entity attributes. Ex: Direct match of names, addresses, other attributes Probabilistic Matching Link/merge two entities based on corresponding attributes – even if some have different values (but within the expected range). Ex: “John, Doe, 1989 -08 -13” and “Jon, Doe, 1989 -08 -13” Transitive Matching Link/merge two entities based on corresponding attributes: A matches B and B matches C, so A matches C – even if not all attributes match. But, may lead to false positives. Associative Matching Link/merge two entities based on semantics and domain knowledge. If (Mary Smith, 123 Oak St), (Mary Smith, 456 Elm St), (John Smith, 123 Oak St), and (John Smith, 456 Elm St) are entity mentions, none of the six possible pairings of four records agree on name or address. But, we may infer that these are the same John and Mary Smith at both addresses. Assertive Matching Link/merge two entities based on prior and derived knowledge to reason about possible relationships between them. Different models may be used to deduce/infer relationships. Ex: “The Mary Smith who lives with her brother and resided at 123 Oak St. now resides at 456 Elm St. ”) Ref: Talburt, J. (2009 -2011) Reference Linking Methods, Identity Resolution Daily, Retrieved November 3, 2013 from http: //identityresolutiondaily. com/ BDA-42 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Advanced Analytics BDA-43 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Advanced Analytics BDA-43 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Crowdsourcing: A New Analytic? • Tasks normally performed by employees are outsourced via an Crowdsourcing: A New Analytic? • Tasks normally performed by employees are outsourced via an open call to a large, self-selected community • Some examples – Netflix prize – Inno. Centive: solve R&D challenges – DARPA Network Challenge • Follows an AI blackboard model • Distributed co-creation has become a mainstream technology • Issue: What metrics do we use to assess the results? • Issue: How robust are the results? • And, more issues to be addressed …. 3/18/2018 BDA-44 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Crowdsourcing: Wikipedia: An Example The Death of the Encyclopedia Business Model Ref: Mary Meeker. Crowdsourcing: Wikipedia: An Example The Death of the Encyclopedia Business Model Ref: Mary Meeker. KPCB, presentation at All Things D. 3/18/2018 BDA-45 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money BDA-45

Moving Analytics To the Edge • Traditional analysis: data is stored and then analyzed Moving Analytics To the Edge • Traditional analysis: data is stored and then analyzed – Usually at some central location or a few distributed locations – The cost and time to move large amounts of data may render it obsolete or of little worth • Moving analytics to the frontier of the domain: – (Near) real-time analysis and decisions are required – Streaming massive amounts of data is expensive, fraught with error: microsecond latency, millions of events – We just can’t store it all – Perishability is a key factor – May only really need the synthesized, aggregate information, not the raw data – True data-driven analysis • Example: Pushing analytics into cameras for images, full motion video analysis, motion correction, 3 D perception, … BDA-46 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Edge Device Analytics • Filtering and compression processes are often tied to the downstream Edge Device Analytics • Filtering and compression processes are often tied to the downstream analytical requirements – Means the filtering and compression algorithms must be dynamic • How close to the edge can we push the filtering and compression algorithms? – What (additional) data do these algorithms need to be effective? – How do we measure the efficiency of these algorithms? – Does the in situ hardware have the computational capacity to support such algorithms? – How much data correction can we do at the edges? • Challenges: – – How can we summarize streaming data? How fast can we determine changes in the incoming data? How fast can we adapt to changes in the data stream? How fast can we affect the environment based on what we see? NOTE: Not the same as IBM Edge Analytics!! 3/18/2018 BDA-47

Streaming Analytics • Streaming data: analytics must (often) occur in real time, as the Streaming Analytics • Streaming data: analytics must (often) occur in real time, as the data passes through the sensing/collecting device: – Allows you to identify and examine patterns of interest as the data is being created. – Can yield instant insight and immediate (re)action. • “Real-time is for robots” – Joe Hellerstein, Professor of Computer Science at UC Berkeley. – “If you have people in the loop, it’s not real time. Most people take a second or two to react, and that’s plenty of time for a traditional transactional system to handle input and output. ” BDA-48

Location Analytics • • What is it? – Augmenting mission-critical, enterprise business systems with Location Analytics • • What is it? – Augmenting mission-critical, enterprise business systems with complementary content, mapping, and geographic capabilities – Mapping & Visualization: use maps as the media to visualize data – Spatial analytics: merging GIS w/ other types of analytics – Find spatio-temporal patterns indicative of physical activities or social behavior – Data/information enrichment: add maps, imagery, demographics, consumer and lifestyle data, environment and weather, social media, etc. Ubiquity of GPS on cellphones, cars, wristwatches, laptops, tablets, etc. Ref: Kerr & Nelson, ESRI International User Conference, July 2012 BDA-49 Ref: http: //www. esri. com/software/location-analytics

Location Analytics: After a Regime Change … • Immediate transition from “normal operations in Location Analytics: After a Regime Change … • Immediate transition from “normal operations in an urban environment” to “government in place” – Need Civil Affairs support for managing city/region relief efforts • No existing analysis, planning, and management tool to assist civil Government and Relief Officials • A lack of coherent view will lead to failure and, possibly, continuing deterioration of government, law and order, civil systems, etc… • Why? – (nearly) complete destruction of governing regime – Disintegration of social/governmental institutions: water, education, health, law enforcement, financial, food – Society reorganized overnight to adapt to the new power structure, which may/may not include former government personnel • Ex: Sadr handed out food; created immediate constituency and had people talking about him • Cannot win “hearts and minds” if you cannot “feed and house them” BDA-50

Web Analytics • What is it? – Now: The study of the behavior of Web Analytics • What is it? – Now: The study of the behavior of web users – Future: The study of one mechanism for how society makes decisions – Example: Behavior of Web Users • How many people clicked on Ebola (or related terms in the past 2 months) • Their location, their dwell time, the number of sites they examined, the difficulty or complexity of the material on the web site • What can this tell us about popular concern about Ebola? • Can it help decision makers to better present information and decisions – Commercially, it is the collection and analysis of data from a web site to determine which aspects of the website achieve the business objectives 3/18/2018 BDA-51

Web Analytics • Challenge: When people start getting more of their information from the Web Analytics • Challenge: When people start getting more of their information from the Web than from radio/tv/papers: – How do we design web pages to influence the opinions of people? • The web is not a neutral medium – How do we measure the influence of a web page’s contents on user opinions? • Does # visitors translate into a viable influence metric? • Does dwell time translate into influence? • Are opinions more or less influenced as the visitor clicks through a sequence of pages? • How many different web sites (each of one or more pages) does a user searching a particular topic click through? – How do we measure the difficulty and/or complexity of the material presented on a web page(s)? • What is the corollary to the Flesch-Kincaid Score for web pages? – How do we design “good” web pages to increase their Google Page Rank score? 3/18/2018 BDA-52

Visual Analytics • Visual analytics: the science of analytical reasoning facilitated by interactive visual Visual Analytics • Visual analytics: the science of analytical reasoning facilitated by interactive visual interfaces Where are the walkers going? Remember: The eye can be easily fooled and, thereby, fool the brain. 3/18/2018 BDA-53

Visual Analytics • Visual analytics: an evolving discipline which is driving new ways of Visual Analytics • Visual analytics: an evolving discipline which is driving new ways of presenting data and information to the user. • Visual analytics: “the science of analytical reasoning facilitated by interactive visual interfaces” (Thomas and Cook 2005) • Visual analytics: – the formation of visual metaphors in combination with a human information discourse (interaction) – that enables detection of the expected and discovery of the unexpected within massive, dynamically changing information spaces. (Wong and Thomas 2004) • Visual analytics provides the “last 12 inches” between the masses of information and the human mind that enables us to make decisions. 3/18/2018 BDA-54

Visual Analytics • Why is it hard? – You can only see 2 D Visual Analytics • Why is it hard? – You can only see 2 D because your screen is 2 D • To visualize k-dimensional data: – Divide the screen into multiple 2 D regions and show pair-wise correlations across selected dimensions – Project k-dimensional data into 2 D • Projection Methods (Dimension Reduction): • PCA, MDS, LDA, LLE, many others … • Many others! Usually, try to preserve distances in 2 D as they exist in k-D 3/18/2018 BDA-55

Visualization Analytics: A Periodic Table Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Visualization Analytics: A Periodic Table Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Visual Analytics: Challenges • Projection methods: – How to choose which is best for Visual Analytics: Challenges • Projection methods: – How to choose which is best for a given problem – Scaling – Harder to understand what is being conveyed • How to visualize non-numeric data, e. g. text, icons, or images? – Interactive multiple displays – Change in one display begets a change in another to allow exploration • As k grows really large and the data types in the problem space are mixed, what do we do? – Remember, Miller 1956: 7 +/- 2 3/18/2018 BDA-57

Visual Analytics: Challenges • What if the data cannot fit on your computer? – Visual Analytics: Challenges • What if the data cannot fit on your computer? – Truncate (sample, filter) • Easy to implement; efficient; scalable • Sampling is often data- or task-dependent – Resolution reduction (“blurring”, image zooming) • Fine details can be lost (get the big picture) • Can zoom in on specific features (but lose forest for trees) – Streaming: • Inspect data in blocks (with or without overlapping windows) 3/18/2018 BDA-58

Visual Analytics: Affecting Factors • • Spatial Ability Cognitive Workload/Mental Demand Personality Experience (novice Visual Analytics: Affecting Factors • • Spatial Ability Cognitive Workload/Mental Demand Personality Experience (novice vs. expert) Emotional State Perceptual Speed … and more Conclusions: - Computer must be more aware of the user - Computer must develop a model of the user’s behavior - Develop a symbiotic environment for data exploration 3/18/2018 BDA-59

10 Exascale VA Challenges - I In-Situ Analysis Beyond PBytes, storing data, then retrieving 10 Exascale VA Challenges - I In-Situ Analysis Beyond PBytes, storing data, then retrieving it later for visualization may not be feasible. Develop new algorithms for in-situ VA to greatly reduce I/O. Perform VA concurrently with data analysis. User-driven Data Reduction While data volumes grow rapidly, human cognitive capabilities remain unchanged. Provide flexible, interactive user-control mechanisms for dynamically filtering data for VA. Multilevel Hierarchy depth and complexity grow with data volume. New algorithms are required for transformation and traversal of multilevel hierarchies. Representing Evidence and Uncertainty Evidence synthesis and uncertainty quantification are usually united through visualization. How best to present evidence and uncertainty without introducing significant bias. Heterogeneous Data Fusion Extreme scale problems are often heterogeneous with complex structures. New algorithms are required for fusion of heterogeneous data objects. Ref: Wong, Shen, Johnson, Chen and Ross (2012) 3/18/2018 BDA-60

10 Exascale VA Challenges - II Data summarization and Triage for Interactive Query Analyzing 10 Exascale VA Challenges - II Data summarization and Triage for Interactive Query Analyzing entire exascale data sets is likely impractical. New tools for interactive filtering of data for analysis and display of selected, relevant data is required. Temporally Evolved Features Develop tractable algorithms for VA of temporal streams. Mitigate the Human Bottleneck Finds ways to compensate for human cognitive limitations. Need to understand how brain processes visual images. New VA Frameworks for HPC Design and develop new frameworks with open APIs for interaction and UI that do not constrain HPC systems Replace Conventional Wisdom New ideas are required for VA Methodologies. Ref: Wong, Shen, Johnson, Chen and Ross (2012) 3/18/2018 BDA-61

Context-Aware Computing • Arose from ubiquitous computing in early 90 s: computing everywhere and Context-Aware Computing • Arose from ubiquitous computing in early 90 s: computing everywhere and “invisible” • Ex: Active Badges – Problem: locating researchers – Solution: badge tied to identity, tracked as researcher moves in building – Xerox PARC pioneered this idea in 90 s under Mark Weiser – Now used in museums and other venues to “customize” the user experience – Can be based on RFIDs embedded in badges

What Is Context? • • • By example – Location, time, identities of nearby What Is Context? • • • By example – Location, time, identities of nearby users … By synonym – Situation, environment, circumstance By dictionary [Word. Net] – the set of facts or circumstances that surround a situation or event • “Context is any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and the application themselves. ” [Dey and Abowd, 2000]

Personal AI: Context-Aware Computing • No consensus on what it is! • “A system Personal AI: Context-Aware Computing • No consensus on what it is! • “A system is context-aware if it uses context to provide relevant information and/or services to the user, where relevancy depends on the user’s task. ” • For our purposes, it is symbiotic computing: – A partnership between human and machine where each is aware of the other – capabilities and limitations – Each is a reasoning autonomous adaptive entity – Each may initiate an activity and propose/dispose of ideas • Contradicts Weiser’s first view of Ubiquitous Computing – Computing recedes into the background • Need to define what Context is: – Location, time, situation, data/information BDA-64 Copyright (except where referenced) 2014 -2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money

Context-Aware Requirements • John Mc. Carthy defined the key ideas (for a robot), but Context-Aware Requirements • John Mc. Carthy defined the key ideas (for a robot), but we adapt them to a general situation: – The system observes its physical environment, recognizes the status of its effectors, notices the relation of itself to the environment and notices the values of important internal variables, e. g. the state of its power supply and of its communication channels. – Observes that it does/does not know the value of a certain term • E. g. , observing whether it knows the telephone number of a certain person. – Observing that it does know the number or that it can get it by some procedure is likely to be straightforward. – Keeping a journal of physical and intellectual events so it can refer to its past beliefs, observations and actions. – Observing its goal structure and forming sentences about it. • Notice that merely having a stack of subgoals doesn't achieve this unless the stack is observable and not merely obeyable. BDA-65

Context-Aware Requirements • John Mc. Carthy continued: – The entity may intend to perform Context-Aware Requirements • John Mc. Carthy continued: – The entity may intend to perform a certain action. • It may later infer that certain possibilities are irrelevant in view of its intentions. • This requires the ability to reflect on its intentions. – Observing how it arrived at its current beliefs. • Most of the important beliefs of the system will have been obtained by nonmonotonic reasoning, and therefore are usually uncertain. • It will need to maintain a critical view of these beliefs, i. e. , believe metasentences about them that will aid in revising them when new information warrants doing so. • It will presumably be useful to maintain a pedigree for each belief of the system so that it can be revised if its logical ancestors are revised. • Reason maintenance systems maintain the pedigrees but not in the form of sentences that can be used in reasoning. • Neither do they have introspective subroutines that can observe the pedigrees and generate sentences about them. BDA-66

Context-Aware Requirements • John Mc. Carthy continued: – A system should be able to Context-Aware Requirements • John Mc. Carthy continued: – A system should be able to answer the questions: ``Why do I believe ? '' or alternatively ``Why don't I believe ? ''. – Contexts need to be modeled as objects that represent mental states of events/things/people in the world around it. – The ability to transcend one's present context and think about it as an object is an important form of introspection. – Knowing what goals it can currently achieve and what its choices are for action. • He claims that the ability to understand one's own choices constitutes free will. BDA-67

Context-Aware Requirements • On the computer side of the partnership: – System must be Context-Aware Requirements • On the computer side of the partnership: – System must be dynamically extensible: • One can incorporate a new module with new functionality, incrementally, without the need for recompilation • Indeed, the system environment should be (nearly) wholly selfcontained. – Self-Modifying: • Ability to generate new or revised modules that change the functionality of the system • Modules may affect any representation or control mechanisms of the system – Self-Adapting: • System must be able to integrate new modules into its computational repertoire without ceasing operations, except for snapshotting and restarting critical processes. • Safety and health of the system must be ensured. • Note: Humans already do this, but not very well at times. BDA-68

Context-Aware Computing • Challenges: – New situations don’t fit examples – How to use Context-Aware Computing • Challenges: – New situations don’t fit examples – How to use in practice? • Presentation to user • Types of analytics – How to model context in a computational environment? • • Spatial Temporal Dynamically changing: velocity, volume, …. Is it a Big Data problem?

Advanced Analytics: Critical Challenges 3/18/2018 BDA-70 Advanced Analytics: Critical Challenges 3/18/2018 BDA-70

What Are Grand Challenges? • Definition: A specific scientific or technological innovation that would What Are Grand Challenges? • Definition: A specific scientific or technological innovation that would remove a critical barrier to solving an important domain problem with a high likelihood of global impact and feasibility. – Provide scope for engineering ambition to build something that has never been seen before. – Generally comprehensible, and capture the imagination of the general public, as well as the esteem of scientists in other disciplines. – Go beyond what is initially possible, and requires development of understanding, techniques and tools unknown at the start of the project. – Since these first appeared in the 80 s, they abound in every science and discipline! • Not just a restatement of the many “big problems” facing the world today – Are they equivalent to Wicked Problems? , or are – Grand Challenges Wicked Problems? • A tool for focusing investigators working towards overcoming one or more bottlenecks in a foreseeable path toward a solution to significant domain problems. BDA-71

Some Grand Challenges • Modeling Our Planet’s Systems: – Assessing global warming and determining Some Grand Challenges • Modeling Our Planet’s Systems: – Assessing global warming and determining mitigating actions • Confronting Existential Risk: – What is the impact of a dangerous genetically modified pathogen • Exploring Transhumanism: – What is the impact of embedded nanotechnology, genetic therapy, and “smart” prosthetics? • The Singularity? – What happens when systems approach the level of human intelligence? Emotional intelligence? • Dealing Effectively with Globalism: – Modeling the interconnected of human societies/organizations Ref: Martin, J. "The Meaning of the 21 st Century: A Vital Blueprint for Ensuring Our Future“, Jan 2007 72

Wicked Problems • • In 1973, Horst Rittel and Melvin Webber formally described the Wicked Problems • • In 1973, Horst Rittel and Melvin Webber formally described the concept of wicked problems. Conventional problem solving methods, rooted in 18 th century physics, economics and engineering, focused on efficiency. Societal problems are fundamentally different from the types of problems that scientists and engineers deal with. Societal problems are wicked problems. BDA-73

Tenets of Wicked Problems • • • No general agreement on what the problem Tenets of Wicked Problems • • • No general agreement on what the problem is. You don’t understand the problem until you develop a solution. Wicked problems have no stopping rule. Solutions to wicked problems are not right or wrong. Every wicked problem is essentially novel and unique. Every solution to a wicked problem is a 'one shot operation‘. Wicked problems have no given alternative solutions. Causes and Effects are Elusive Sensitive to Initial and Boundary Conditions (History) BDA-74 Figure 3. Wicked Problems (Conklin, 2005)

Some Wicked Problems • • Infrastructure Resilience Climate Change “Peaking” Oil or Coal: When Some Wicked Problems • • Infrastructure Resilience Climate Change “Peaking” Oil or Coal: When does it run out? The “Long War”: Is there an end to terrorism? Sustainable Cities and Ecosystems Sustainable Development in the Third World Affordable Health Maintenance for an Aging Society Transitioning to Democracy and Beyond – Predicting the next “Arab Spring” • Biological and Genetic Threats and Opportunities • Reducing the U. S. Debt • Discover drugs that minimize disease-resistant micro-organisms BDA-75

Analytics Challenges • The Google Property ? s: – Can analyses improve with more Analytics Challenges • The Google Property ? s: – Can analyses improve with more data to process? – Can analyses improve with more detailed analytics that we use? • Kaisler, Armour, Espinosa, Money: – Can analyses improve with better system and environment models? – How do we measure value of an analytic? – What is the limit for value as we add more data? – Can good algorithms, models, heuristics overcome data quality problems? – With more data to analyze, can Big Data improve decisionmaking? And, by how much (e. g. , how do we measure it? ) BDA-76

Challenge: Population Imbalance • Events of interest occur relatively infrequently in very large datasets. Challenge: Population Imbalance • Events of interest occur relatively infrequently in very large datasets. • However, non-interesting events occur more frequently: – May require {additional, extensive} computational effort. • Reasons for Imbalance: – Underrepresented data/severe class distribution skew: • Large regions of the problem space may be covered sporadically or not at all by the observer or observation instruments. • Impact precision and performance of data mining/machine learning algorithms (He and Garcia 2009). – Data collection may be (usually is) imperfect. – Data are often beset with noise. – Data may be missing in longitudinal or temporal sequences. • Enough relevant data of good quality may not be available to permit robust analysis. BDA-77

Challenge: Data Analysis • Feature Selection: Information is distributed in a complex way across Challenge: Data Analysis • Feature Selection: Information is distributed in a complex way across many features. • Mitigating False Alarms: Target patterns are ambiguous/unknown; “squelch” settings are brittle; cannot prevent false positives • Domain Drift: Target patterns change/morph over time and across operational modes (processing methods becomes “stale”) BDA-78

Challenge: Ethical Problems • Should we use data without the permission of individual owners, Challenge: Ethical Problems • Should we use data without the permission of individual owners, such as copying publicly available data? – What is tacit permission and approval, anyway? • Should we be required to inform individuals when we use their data? – Do they really own it? Or, how much of it? • Should we (be required to) check the accuracy of what is posted on a publicly available web site before using it? • What rules and regulations should exist about combining data about individuals into a central repository? – Does aggregation exceed permissible need to know about an individual? BDA-79

Challenges: Data Annotation • The semantic web is largely unrealized: – Much metadata is Challenges: Data Annotation • The semantic web is largely unrealized: – Much metadata is lousy – Standardization of models and labels is a major issue • • • – Integrating ontologies and vocabularies is a critical problem No standards for social tagging, science tagging, etc. Tagging only works if many are tagging What is the Quality of the Result if the Quality of the Data/Metadata is poor? Mark Greaves: “If we don’t have semantic convergence, then semantics isn’t a differentiator” The “information provenance” problem 3/18/2018 BDA-80

Challenges: End-to-End Systems • Wicked Problems will require an end-to-end system approach: – Multiple Challenges: End-to-End Systems • Wicked Problems will require an end-to-end system approach: – Multiple analytics: cascade or mesh or other topology for the analytic architecture • Why? Different types of data (may) require different approaches – Need a robust, reliable computing/infrastructure environment • Pipelines are a relatively simple model; may not be adequate for complexity of problem • Must consider end-to-end behavior: – – – Bottlenecks Data conversion, transfer, & loading overheads Storage costs & other parts of the data life-cycle Resource management challenges Total Cost of Ownership (TCO) 3/18/2018 BDA-81

Challenges: Data Sources • Tolerant Analysis – you are typically doing open-world reasoning – Challenges: Data Sources • Tolerant Analysis – you are typically doing open-world reasoning – – – Things go away Contradiction is present Data is incomplete and may be erroneous/noisy Surveying the KB may not/is not possible (it is too large!!) Need substantial domain knowledge to reason effectively • Both deep and shallow knowledge • Multiple linked ontologies & data sources: – Single ontologies are feasible only at the organizational level – How to reconcile multiple authors and overlapping data sources – Contain both private and public knowledge w/ security, privacy & separation issues • Is this equivalent to the multilevel security problem? • Heterogeneity and Incompleteness: – Humans are very tolerant of heterogeneous data; computers are not – Need to perform dynamic cleansing, curation, and transformation – How to build programs that accept heterogeneous data BDA-82

Challenge: Metrics 83 Challenge: Metrics 83

Where is the ROI? • ROI (Return on Investment) is not always immediately obvious Where is the ROI? • ROI (Return on Investment) is not always immediately obvious • Results of analytics may be available only after years of following the prescription • Requires long-term effort(s) to develop a sustainable capability • Examples: – Health: moving from predictive to preventative health care – Health: enabling personalized medicine for shortened time to value – Health: recognizing and predicting the spread of infectious diseases (Ebola) – Crime: aggressively recognizing and combating syndicated, multiparty fraud online – Crime: predicting potential crime locales and time to preventatively deploy police – Environment: predicting weather, floods, earthquakes, volcanic eruptions earlier – Computer Security: surveying systems to predict potential for attacks BDA-84

Advanced Analytics Key Challenges Today for Tomorrow • Analytic Scientists or lack thereof – Advanced Analytics Key Challenges Today for Tomorrow • Analytic Scientists or lack thereof – How are we going to train them? – How many do we need? • Potential to end up like “data mining” (shudder) – The Big Data mantra: “ 80% of the effort is in extracting, moving cleaning, and preparing the data, not actually analyzing it. ” • Don’t disregard Traditional Analytics: – Big Data Analytics and Advanced Analytics will be side by side for years – We need an analytics capability beyond what is offered by traditional business analytics BDA-85

Analytics: Transformative Science • Making Analytics a Transformative Science: – Computationally tractable reasoning algorithms Analytics: Transformative Science • Making Analytics a Transformative Science: – Computationally tractable reasoning algorithms – Explicit models for prediction, prescription and decision versus implicit or embedded models in technology – Enabling technologies and infrastructures for modeling and analysis, including interoperability interfaces and standards: multiple suites of analytic tools – Model validation and verification – Ensuring availability of appropriate, quality data – Uncertainty quantification and predictability of outcomes – Community-developed versus custom-made software – Peer review for analytics and modeling research – open repositories – Analytic Method Capture and Reuse BDA-86

Is this …. ? Oooops! We mean the Exabyte Age! BDA-87 Is this …. ? Oooops! We mean the Exabyte Age! BDA-87

Questions BDA-88 Questions BDA-88

Who We Are Stephen H. Kaisler, D. Sc. Senior Associate/PCI Strategic Management && Principal/SHK Who We Are Stephen H. Kaisler, D. Sc. Senior Associate/PCI Strategic Management && Principal/SHK & Associates Columbia MD/Laurel, MD skaisler [email protected] net Dr. Stephen Kaisler is currently a Senior Associate at PCI Strategic Management and a Principle in SHK & Associates. . He has previously worked for DARPA, the U. S. Senate and a number of small businesses. Dr. Kaisler has worked with big data, Map. Reduce technology, and advanced analytics in support of the ODNI CATALYST program. He has been an Adjunct Professor of Engineering since 2002 in the Department of Computer Science at George Washington University. Recently, he has also taught enterprise architecture and information security in the GWU Business School. He earned a D. Sc. (Computer Science) from George Washington University, an M. S. (Computer Science) and B. S. (Physics) from the University of Maryland at College Park. He has written or co-authored seven books and published over 38 technical papers. William H. Money, Ph. D. School of Business Administration The Citadel [email protected] edu William Money joined the Citadel as Associate Professor of Business Administration in 2014. Previously, he was with the George Washington University School of Business faculty, which he joined in September 1992 after acquiring over 12 years of management experience in the design, development, installation, and support of management information systems (1980 -92). His publications and recent research interests focus on information system development tools and agile software engineering methodologies, collaborative solutions to complex business problems, program management, business process engineering, and individual learning. He developed teaching and facilitation techniques that prepare students to use collaboration tools in complex organizations and dynamic work environments experiencing significant change. Dr. Money has a Ph. D. , Organizational Behavior 1977, Northwestern University, Graduate School of Management; the M. B. A. , Management, 1969, Indiana University; and a B. A. , Political Science, 1968, University of Richmond. BDA-89

Who We Are Frank Armour, Ph. D. Kogod School of Business/American University farmour@american. edu Who We Are Frank Armour, Ph. D. Kogod School of Business/American University [email protected] edu Dr. Armour is an independent senior IT consultant and Research Fellow at the Center for Information Technology in the Global Environment (CITGE), Kogod Business School, American University. Dr. Armour has extensive experience applying advanced information technology. His work and research includes business and requirements analysis, enterprise architectures, System Development Cycle Development (SDLC), and object -oriented development. Dr. Armour has consulted for both government and private organizations on the effective application of enterprise architecture, IT Governance and system requirements approaches. In a previous position, at a major IT consulting firm, he had a joint appointment as the lead Object Methodologist and as the Assistant Director of the Object Technology Lab. In this position he provided guidance and indepth mentoring to object projects on object concepts, architecture, project management, methods and tools. J. Alberto Espinosa, Ph. D. Kogod School of Business/American University [email protected] edu Dr. Espinosa is currently Professor and Chair of Information Technology at the Kogod School of Business, American University. He holds a Ph. D. and Master of Science degrees in Information Systems from Carnegie Mellon University, Graduate School of Industrial Administration; a Masters degree in Business Administration from Texas Tech University; and a Mechanical Engineering degree from Universidad Catolica, Peru. His research focuses on coordination and performance in global technical projects across global boundaries, particularly distance and time separation (e. g. time zones). His work has been published in leading scholarly journals, including: Management Science; Organization Science; Information Systems Research; the Journal of Management Information Systems; Communications of the ACM; Information, Technology and People; and Software Process: Improvement and Practice. He is also a frequent presenter in leading academic conferences. BDA-90

Thank You!! 3/18/2018 BDA-91 Thank You!! 3/18/2018 BDA-91

References • • • Argenta, C. , J. Benson, N. Bos et al. 2014. References • • • Argenta, C. , J. Benson, N. Bos et al. 2014. “Sensemaking in Big Data Environments”, 1 st Workshop on Human-Centered Big Data Research, Raleigh, NC Borne, K. 2013. Statistical truiisms in the Age of Big Data, retrieved December 2013 from http: //www. statisticsviews. com/details/feature/4911381/Statistical-Truisms-in-the-Age-of-Big. Data. html Clarkson, A. (1981) Towards Effective Strategic Analysis, Westview Press, Boulder, CO Davenport, T. H. and J. G. Harris. 2007. Competing on Analytics: The New Science of Winning , Harvard Business School Press Felten, E. (2010) Needle in a Haystack Problems, Retrieved November 1, 2013 from https: //freedom-to-tinker. com/blog/felten/needle-haystack-problems/ Gladwell, M. (2000) The Tipping Point: How Little Things can Make a Big Difference. Boston: Little Brown He, H. and E. A. Garcia. (2009) “Learning from Imbalanced Data”, IEEE Transactions on Data and Knowledge Engineering, 21(9): 1263 -1284 Heuer, R. J. , Jr. 1999. Psychology of Intelligence Analysis, Center for the Study of Intelligence, Central Intelligence Agency, Washington, D. C. Kaisler, S. 1990. Strategic Automated Discovery System (STRADS), with C. Oresky, A. Clarkson, and D. B. Lenat, published in Knowledge Based Simulation: Methodology and Application, ed. by P. Fishwick and D. Modjeski, Springer-Verlag, December, 1990 3/18/2018 BDA-92

References • • Kaisler, S. (2005) Software Paradigms, New York, NY: John Wiley & References • • Kaisler, S. (2005) Software Paradigms, New York, NY: John Wiley & Sons Kaisler, S. and C. Cioffi-Revilla. 2007. Quantitative and Computational Social Sciences Tutorial, 40 th Hawaii International Conference on System Sciences, Waikoloa, HI, 2007 Kaisler, S. 2012. Advanced Analytics, Technical Report prepared under contract AFRL #AFRL FA 8750 -11 -C-0045 Kaisler, S. , F. Armour, A. Espinosa, and W. Money. 2013. “Big Data: Issues and Challenges Moving Forward”, 46 th Hawaii International Conference on System Sciences, Grand Wailea, Maui, HI Kaisler, S. , F. Armour, A. Espinosa, and W. Money. 2014. “Advanced Analytics: Issues and Challenges”, 47 th Hawaii International Conference on System Sciences, Hilton Waikoloa, Big Island, HI Kaisler, S. , F. Armour, W. Money, and A. Espinosa. 2014. “Big Data: Issues and Challenges”, Encyclopedia of Science and Technology, 3 rd Edition, IGI Global Kaisler, S. , F. Armour, A. Espinosa, and W. Money. 2014. “Advanced Analytics: Issues and Challenges”, Encyclopedia of Science and Technology, 3 rd Edition, IGI Global 3/18/2018 BDA-93

References • • • Ritchey, T. . 2005. Wicked Problems: Structuring Social Messes with References • • • Ritchey, T. . 2005. Wicked Problems: Structuring Social Messes with Morphological Analysis, Swedish Morphological Society. Retrieved October 30, 2013 from http: //www. swemorph. com/wp. html Rittel, H. and M. Webber. 1973. “Dilemmas in a General theory of Planning”, in Policy Sciences, Vol. 4 (pp. 155 -169). Amsterdam, the Netherlands: Elsevier Scientific. Schwartz, P. M. 2010. Data Protection Law and The Ethical Use of Analytics, The Centre for Information Policy Leadership, Hunton & Williams, LLP. Singh, L. , E. J. Bienenstock and J. Mann. 2010. What are we missing? Perspectives on social network analysis for observational scientific data, Handbook of Social Networks: Technologies and Applications. Ed. B. Furht. , Springer Stanton, J. 2013. Version 3: An Introduction to Data Science, http: //jsresearch. net/ Suchman, L. 1987. Plans and Situated Actions: The Problem of human-Machine Communication, Cambridge University Press, Cambridge, England 3/18/2018 BDA-94

References • • Talburt, J. (2009 -2011) Reference Linking Methods, Identity Resolution Daily, Retrieved References • • Talburt, J. (2009 -2011) Reference Linking Methods, Identity Resolution Daily, Retrieved November 3, 2013 from http: //identityresolutiondaily. com/ Tufte, E. R. 1997. Visual & Statistical Thinking: Displays of Evidence for Decision Making. Graphics Press. Thomas, J. J. and K. A. Cook, Eds. 2005. Illuminating the Path – the Research and Development Agenda for Visual Analytics, IEEE Computer Society Varian, H. (2009) Mc. Kinsey Quarterly. Retrieved October 30, 2013 http: //www. mckinseyquarterly. com/Hal_Varian_on_how_the_Web_challenges_mana gers_2286 Weick, K. E. (1995). Sensemaking in Organizations. Thousand Oaks, CA: Sage Publications. Weiser, M. 1991. “The Computer for the Twenty-First Century”, Scientific American, 265(3): 94 -104 Wong, P. C. and J. Thomas. 2004. “Visual Analytics”, IEEE Computer Graphics and Applications, 24(5): 20 -21 Wong, P. C. , H-W. Shen, C. R. Johnson, C. Chen, and R. B. Ross. 2012. “The Top 10 Challenges in Extreme-Scale Visual Analytics”, IEEE Computer Graphics and Applications, 32(4): 63 -67 3/18/2018 BDA-95