Optimization-Based Data Mining Approaches in Neuroscience Research Panos

Optimization-Based Data Mining Approaches in Neuroscience Research Panos M. Pardalos University of Florida

Introduction l Data Mining: “the practice of searching through large amounts of computerized data to find useful patterns or trends. ” l Optimization: “An act, process, or methodology of making something (as a design, system, or decision) as fully perfect, functional, or effective as possible; specifically : the mathematical procedures (as finding the maximum of a function) involved in this. ” Merriam Webster Dictionary

Introduction l l l The combination of data mining and optimization: Find the “best” way to extract meaningful “patterns” from data. Not always an easy task.

How difficult Optimization can be? l l l Given integers N 1, N 2, …, Nk and M find a subset of N 1, N 2, …, Nk such that their sum is equal to M. Can you find a better algorithm than of O(2 k). Exponential complexity ?

Hard drive Cost l Approximate ly 1/10 cheaper every 5 years

Hard Drive Capacity l Approximat ely 10 times more every 5 years

Processing power l Number of transistors of a computer processor double every two years

References l Handbook of Massive Data Sets, co-editors: J. Abello, P. M. Pardalos, and M. Resende, Kluwer Academic Publishers, (2002).

Main problems in data mining l Data preprocessing l Dimensionality reduction l Feature selection l Regression l Clustering (Unsupervised learning) l Classification (Supervised Learning) l Semi-Supervised learning (between unsupervised and unsupervised) l Biclustering l Result Validation l Data Visualization/Representation l Biomedical Informatics is a challenging area with lots of these problems.

Agenda l l l l Research Background l Epilepsy l Seizure Prediction Sources of Data l Electroencephalogram (EEG) Time Series Dimensionality Reduction l Chaos Theory Feature Selection for Brain Monitoring Time Series Classification of Neuro-Physiological States Brain Clustering Brain Network Models Concluding Remarks

Facts About Epilepsy l At least 2 million Americans and other 40 -50 million people worldwide (about 1% of population) suffer from Epilepsy. l Epilepsy is the second most common brain disorder (after stroke) that causes recurrent seizures. l Epileptic seizures occur when a massive group of neurons in the cerebral cortex suddenly begin to discharge in a highly organized rhythmic pattern.

Epileptic Seizures l Seizures usually occur spontaneously, in the absence of external triggers. l Seizures cause temporary disturbances of brain functions such as motor control, responsiveness and recall which typically last from seconds to a few minutes. l Seizures may be followed by a post-ictal period of confusion or impaired sensorial that can persist for several hours.

10 -second EEGs: Seizure Evolution Normal Seizure Onset Pre-Seizure Post-Seizure

Why do we care? l Based on 1995 estimates, epilepsy imposes an annual economic burden of $12. 5 billion* in the U. S. in associated health care costs and losses in employment, wages, and productivity. l Cost per patient ranged from $4, 272 for persons** with remission after initial diagnosis and treatment to $138, 602 for persons** with intractable and frequent seizures.

Current Epilepsy Treatment l Pharmacological Therapy l Anti-Epileptic Drugs (AEDs) l Mainstay of epilepsy treatment l Approximately 25 to 30% remain unresponsive l Epilepsy Resective Surgery l Require long-term invasive EEG monitoring to locate a specific, localized part of the brain where the seizures are thought to originate l 50% of pre-surgical candidates do not undergo respective surgery l Multiple epileptogenic zones l Epileptogenic zone located in functional brain tissue l Only 50 -60% of surgery cases result in seizure free

Current Epilepsy Treatment l l l Electrical Stimulation (Vagus nerve stimulator) l Parameters (amplitude and duration of stimulation) arbitrarily adjusted l As effective as one additional AED dose l Side Effects Seizure Prediction? Monitoring Unit? Forecasting Impending Seizures? Seizure Control? Deep Brain Stimulator?

Electroencephalogram (EEG) l l …is a traditional tool for evaluating the physiological state of the brain. …offers excellent spatial and temporal resolution to characterize rapidly changing electrical activity of brain activation …captures voltage potentials produced by brain cells while communicating. In an EEG, electrodes are implanted in deep brain or placed on the scalp over multiple areas of the brain to detect and record patterns of electrical activity and check for abnormalities.

From Microscopic to Macroscopic Level (Electroencephalogram - EEG)

Electrode Montage and EEGs

Scalp EEG Data Acquisition

Open Problems l l l Is the seizure occurrence random? If not, can seizures be predicted? If yes, are there seizure pre-cursors (in EEGs) preceding seizures? If yes, what data mining techniques can be used to indicate these pre-cursors? Does normal brain activity during differ from abnormal brain activity?

Goals of Research l Test the hypothesis that seizures are not a random process. l Demonstrate that seizures could be predicted l Feature Selection to identify seizure pre-cursors (Statistical Process Control) Demonstrate that normal and abnormal EEGs can be differentiated l Time Series Classification Better understand the epileptogenic process – how seizures are initiated and propagated. l Brain Clustering l l l Develop a closed-loop seizure control device (Brain Pacemaker)

Dimensionality Reduction Chaos Theory

EEGs with the Curse of Dimensionality The brain is a non-stationary system. l EEG time series is non-stationary. l With 200 Hz sampling, 1 hour of EEGs is comprised of 200*60*60*30 = 21, 600, 000 data points = 43. 2 MB (assume 16 -bit ASCI format) l 1 day = 1. 04 GB l 1 week = 7. 28 GB l 20 patients ≈ 0. 15 TB l Kilobytes → Megabytes → Gigabytes → Terabytes

Data Transformation Using Chaos Theory l Measure the brain dynamics from time series: l l Stock Market Currency Exchanges (e. g. , Swedish Kroner) l Apply dynamical measures (based on chaos theory) to non-overlapping EEG epochs of 10. 24 seconds = 2048 points. l Maximum Short-Term Lyapunov Exponent l measure the average uncertainty along the local eigenvectors and phase differences of an attractor in the phase space l measure the stability/chaoticity of EEG signals

Measure of Chaos

STLmax Profiles Pre-Seizure Onset Post-Seizure

Hidden Synchronization Patterns

How similar are they? Statistics to quantify the convergence of STLmax By paired-T statistic: Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 60 points, 10 minutes) are Then, we calculate the average value, , and the sample standard deviation, , of . The T-index between EEG signal epochs i and j is defined as

Statistically Quantifying the Convergence

Convergence of STLmax

Why Feature Selection? l l Not every electrode site shows the convergence. Feature Selection: Select the electrodes that are most likely to show the convergence preceding the next seizure.

Feature Selection Quadratic Integer Programming with Quadratic Constraints

Optimization Problem l l Optimization: l We apply optimization techniques to find a group of electrode sites such that … l They are the most converged (in STLmax) electrode sites during 10 -min window before the seizure l They show the dynamical resetting (diverged in STLmax) during 10 -min window after the seizure. l Such electrode sites are defined as “critical electrode sites”. Hypothesis: l The critical electrode sites should be most likely to show the convergence in STLmax again before the next seizure.

Notation and Modeling l l l x is an n-dimensional column vector (decision variables), where each xi represents the electrode site i. l xi = 1 if electrode i is selected to be one of the critical electrode sites. l xi = 0 otherwise. Q is an (n n) matrix, whose each element qij represents the Tindex between electrode i and j during 10 -minute window before a seizure. b is an integer constant. (the number of critical electrode sites) D is an (n n) matrix, whose each element dij represents the Tindex between electrode i and j during 10 -minute window after a seizure. α = 2. 662*b*(b-1), an integer constant. 2. 662 is the critical value of T-index, as previously defined, to reject H 0: “`two brain sites acquire identical STLmax values within 10 -minute window”

Multi-Quadratic Integer Programming l To select critical electrode sites, we formulated this problem as a multiquadratic integer (0 -1) programming (MQIP) problem with … l objective function to minimize the average T-index among electrode sites l a linear constraint to identify the number of critical electrode sites l a quadratic constraint to ensure that the selected electrode sites show the dynamical resetting

Conventional Linearization Approach for Multi-Quadratic 0 -1 Problem

Theoretical Results: MILP formulation for MQIP problem l l Consider the MQIP problem We proved that the MQIP program is EQUIVALENT to a MILP problem with the SAME number of integer variables. Equivalent

Empirical Results: Performance on Larger Problems

Hypothesis Testing Simulation l l Hypothesis: l The critical electrode sites should be most likely to show the convergence in STLmax (drop in T-index below the critical value) again before the next seizure. l The critical electrode sites are electrode sites that l are the most converged (in STLmax ) electrode sites during 10 -min window before the seizure l show the dynamical resetting (diverged in STLmax ) during 10 -min window after the seizure Simulation: l Based on 3 patients with 20 seizures, we compare the probability of showing the convergence in STLmax (drop in T -index below the critical value) before the next seizure between the electrode sites, which are l Critical electrode sites l Randomly selected (5, 000 times)

Optimal VS Non-Optimal

Simulation - Results

Statistical Process Control: How to automate the system?

Automated Seizure Warning System EEG Signals Continuously calculate STLmax from multichannel EEG. ASWA Select critical electrode sites after every subsequent seizure Give a warning when T-index value drops below a critical value Monitor the average T-index of the critical electrodes

Data Characteristics

Performance Evaluation for ASWS l To test this algorithm, a warning was considered to be true if a seizure occurred within 3 hours after the warning. l Sensitivity = l False Prediction Rate = average number of false warnings per hour

Training Results Performance characteristics of automated seizure warning algorithm with the best parameter-settings of training data set.

RECEIVER OPERATING CHARACTERISTICS (ROC) l l l ROC curve (receiver operating characteristic) is used to indicate an appropriate trade-off that one can achieve between: the false positive rate (1 -Specificity, plotted on Xaxis) that needs to be minimized the detection rate (Sensitivity, plotted on Y-axis) that needs to be maximized.

Test Results Performance characteristics of automated seizure warning algorithm with the best parameter settings on testing data set.

Validation of the ASWS algorithm l Temporal Properties l l l Surrogate Seizure Time Data Set 100 Surrogate Data Sets Spatial Properties l l Non-Optimized ASWS – Selecting non-optimal electrode sites 100 Randomly Selected Electrodes

Prediction Scores: Surrogate Data and Non-Optimized ASWS

Remarks l l l Optimization as feature selection for brain monitoring Developed an online real-time seizure prediction system Tested on the dataset of l 10 patients suffering from temporal lope seizures l ~90 days (2100 hours) of EEG data l 58 seizures Seizure Prediction l Predicting ~70% of temporal lobe seizures on average l Giving a false alarm rate of ~0. 16 per hour on average What’s next? -fundamental questions on brain physiology

Time Series Classification I Support Vector Machines with Dynamic Time Warping

Other Dynamical Measures: Phase Profiles

Other Dynamical Measures: Entropy H of Attractor

Classification of Physiological States

Support Vector Machines From 1 electrode

Input l Standard SVM Input l l 30 electrodes, 30 data points, 3 dynamical features = 2, 700 features Time Series SVM Input l 30*29 data pairs, 3 dynamical features = 2, 700 – 90 features

Dynamic Time Warping

Preliminary Data Set l l 132 5 -minute epochs of pre-seizure EEGs 300 5 -minute epochs of normal EEGs l Pre-seizure = 0 -30 minutes before seizure l Normal = 10 hours away from seizure

Metrics for Performance Evaluation PREDICTED CLASS Class=Yes Class=No ACTUAL CLASS Class=Yes a b Class=No c d a: TP (true positive); b: FN (false negative); c: FP (false positive); d: TN (true negative)

$Sensitivity and Specificity l l Sensitivity measures the fraction of positive cases that are$

Sensitivity and Specificity l l Sensitivity measures the fraction of positive cases that are classified as positive. Specificity measures the fraction of negative cases classified as negative. Sensitivity = TP/(TP+FN) Specificity = TN/(TN+FP) l l l Sensitivity can be considered as a detection (prediction or classification) rate that one wants to maximize. Maximize the probability of correctly classifying patient states. False positive rate can be considered as 1 -Specificity which one wants to minimize.

Leave-one-out Cross Validation l l Cross-validation can be seen as a way of applying partial information about the applicability of alternative classification strategies. K-fold cross validation: l l Divide all the data into k subsets of equal size. Train a classifier using k-1 groups of training data. Test a classifier on the omitted subset. Iterate k times.

Empirical Results

Automated Seizure Prediction Paradigm Multichannel Com Feature Extraction/ Cluster Analysis Data Acquisition Interface Technology Pattern Recognition VNS User Drug Initiate a variety of therapies (e. g. , electrical stimulation, drug injection)

Related Patents Multi-dimensional multi-parameter time series processing for seizure warning and prediction Patent 7, 263, 467 (Issued on August 28, 2007). Optimization of Multi-dimensional Time Series Processing for seizure warning and prediction Patent 7, 373, 199 (Issued on May 13, 2008). Optimization of spatio-temporal pattern processing for seizure warning and prediction Patent 7, 461, 045 (Issued on December 2, 2008). Multi-dimensional dynamical analysis U. S. Utility Patent application filed on December 21, 2006, Serial No. : 11/339, 606. Closed-Loop State-Dependent Seizure Prevention Systems U. S. Utility Patent application filed on December 19, 2006, Serial No. : 11/641, 292.

Brain Network Models Brain Connectivity Networks Based on f. MRI Data

The Problem l l l Certain neurological diseases are very difficult to diagnose at early stages Functional Magnetic Resonance Imaging (f. MRI) technique provides vast amount of information about structure and function of human brain, but there is lack of methods to analyse these data Computational methods and algorithms based on mathematical models should be applied in order to find and recognize key patterns in this “ocean” of data

Network Models l l l Network models of human brain Partition of the brain into regions of interest Functional interconnections between regions in brain

Connectivity Networks

MRI Data l l Blood flow level as an indicator of neuronal activity Representation of values of signal in spatial voxels as 2 D and 3 D images

f. MRI Data l l The measurements are being performed every 2 seconds over 6 minutes for each voxel of brain of size 2 mm x 2 mm The f. MRI data is therefore a set of time series, corresponding to particular elementary volumes of the brain. In our data set each series contains 180 elements.

f. MRI Data, Vector Representation Z (x, y, z) X 0 Y

Small World Networks l Small world phenomenon first described by Stanley Milgram in 1960. l “Six degrees of separation” l “Erdos number”

From Random Graph to Regular Lattice l l l Random graphs generally have property of low mean shortest path length and low clustering coefficient Regular lattice has high mean shortest path and high cluster coefficient Small world networks have low mean shortest path length while still high clustering coefficient

Random Graph vs Regular Lattice

Small World Network

Quantitative Measures of “Small World” Property l Characteristic path length l Clustering coefficient l Global efficiency l Nodal efficiency

Brain Connectivity Networks l l Brain connectivity networks possess small world properties We predicted, that network characteristics, such as global and local efficiency values, would be decreased for people with Parkinson’s disease.

Nodes in Connectivity Network l l l How to define brain regions – nodes in the network? Clustering problem Standard MNI template

Signal Time Series Form Clusters time

Clustering Problem l l Each data set contains roughly 100 000 of time series, each of them consist of 180 elements Efficient algorithms should be developed in order to solve this problem

Standard MNI Brain Atlas l Partition of the brain into 116 brain regions

Edges in Connectivity Network l l Weighted graph with nodes corresponding to MNI brain regions Weights of edges defined based on correlation between averaged neural activity over the regions

Signal Processing l l Neural activity Head movements during the MR session Respiratory and heart rhythms Noise

Maximal Overlap Discrete Wavelet Transform l l Wavelet is a “small wave” Wavelet transform is a decomposition of initial signal into linear combination of wavelets

Time Series Decomposition by Wavelets

Wavelet Coefficients Correlation l l Inter-regional correlations in resting state f. MRI data are particularly salient at frequencies below 0. 1 Hz Second scale wavelet coefficients correspond to 0. 06 – 0. 12 Hz frequency range

Connectivity Strength l Averaged over the regions signal vectors l Define level 2 wavelet coefficients of averaged signals , . The connectivity between regions A and B is l

Definition of Distance Between Nodes l l l For each time series S = {s 1, s 2, …, sn } of size n there is a corresponding point in ndimensional space For normal vectors x and y the distance between end points is equal to Therefore, (1 – corr(x, y)) may serve as a measure of distance between time series

Geometrical Representation x x - y S = (s 1, s 2, …, sn) y 0

Data Set l 15 healthy controls, 14 Parkinson patients l Each network for each patient consist of 116 nodes

Averaged Connectivity Networks Control Parkinson

Global Network Efficiency Values Control (1. 85 +/- 0. 57), Parkinson (1. 12 +/- 0. 55), independent t-test p-value = 0. 0017

Top 30 Nodal Efficiency Values

Nodal Efficiency Plot Red line – Control set, blue line - PD set

Discussion l l Parkinson’s brain network properties possess measurable alteration in comparison with healthy ones Further research, in particular, different network model, may reveal the pattern in brain networks, which could be used as a diagnosis criteria

Concluding Remarks l l l l Overview of Epilepsy Research Applications of Data Mining and Optimization Techniques Interplay between theory and application Feature Selection Time Series Classification Brain Clustering Brain Network Models

Related Patents Sensor registration by global optimization procedures Patent 7, 653, 513 (Issued January 26, 2010). Atomic Magnetometer Sensor Array Magnetoencephalogram Systems and Method United States Patent Application 20100219820 (Filed April 14, 2008)

References l Handbook of Massive Data Sets, co-editors: J. Abello, P. M. Pardalos, and M. Resende, Kluwer Academic Publishers, (2002).

References “Clustering Challenges on Biological Networks” S. Butenko, W. A. Chaovalitwongse and P. M. Pardalos, World Scientific (2009). l “Feature Selection for Consistent Biclustering via Fractional 0 -1 Programming” (with Stanislav Busygin and Oleg A. Prokopyev), Journal of Combinatorial Optimization, Volume 10, Number 1 (2005), pp. 7 -21. l “Biclustering in Data Mining” (with S. Busygin, and O. Prokopyev), Computers & Operations Research, Volume 35, Issue 9 (2008), pp. 2964 -2987. l “On Biclustering with Features Selection for Microarray Data Set” (with S. Busygin and O. Prokopyev), In (BIOMAT 2005) Proceedings of the International Symposium on Mathematical and Computational Biology (Edited by R. Mondaini & R. Dilao), World Scientific (2006), pp. 367377. l “Biclustering: algorithms and applications in data mining and forecasting” (with P. Xanthopoulos, N. Boyko and N. Fan) In Encyclopedia of Operations Research and Management Science (accepted to appear) Wiley(2010).

References l Quantitative Neuroscience, co-editors: P. M. Pardalos, C. Sackellares, P. Carney, and L. Iasemidis, Kluwer Academic Publishers, (2004). l Biocomputing, co-editors: P. M. Pardalos and J. Principe, Kluwer Academic Publishers, (2002).

References New in 2010: Computational Neuroscience, co-editors: W. A. Chaovalitwongse, P. M. Pardalos, P. Xanthopoulos (Eds. ) Series: Springer Optimization and Its Applications , Vol. 38.

References l Optimization in Medicine, Carlos Alves, , Panos M. Pardalos, Luis Vicente (Eds. ), 2008

References l Handbook of Optimization in Medicine, Panos M. Pardalos, Edwin H. Romeijn (Eds. ), 2009

Reference l l l l W. Chaovalitwongse, L. D. Iasemidis, P. M. Pardalos, P. R. Carney, D. -S. Shiau, and J. C. Sackellares. A Robust Method for Studying the Dynamics of the Intracranial EEG: Application to Epilepsy Research, 64, 93 -133, 2005. W. Chaovalitwongse, P. M. Pardalos, and O. A. Prokopyev. Electroencephalogram (EEG) time series classification: Applications in epilepsy , Annals of Operations Research, 148, 1 (2006), p 227 -250. Jicong Zhang, Petros Xanthopoulos , Chang-Chia Liu, Panos M. Pardalos. Real-time differentiation of nonconvulsive status epilepticus from other encephalopathies using quantitative EEG analysis: A pilot study“, Epilepsia, 51, 2 (2010), pp. 243 -250 W. Chaovalitwongse , P. M. Pardalos, L. D. Iasemidis, D. -S. Shiau, and J. C. Sackellares. Dynamical Approaches and Multi-Quadratic Integer Programming for Seizure Prediction. Optimization Methods and Software, 20 (2 -3): 383 -394, 2005. L. D. Iasemidis, P. M. Pardalos, D. -S. Shiau, W. Chaovalitwongse, K. Narayanan, A. Prasad, K. Tsakalis, P. R. Carney, and J. C. Sackellares. Long Term Prospective On-Line Real-Time Seizure Prediction. Journal of Clinical Neurophysiology, 116 (3): 532 -544, 2005. P. M. Pardalos, W. Chaovalitwongse, L. D. Iasemidis, J. C. Sackellares, D. -S. Shiau, P. R. Carney, O. A. Prokopyev, and V. A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365 -385, 2004. (INFORMS Pierskalla Best Paper Award 2004) W. Chaovalitwongse , P. M. Pardalos, and O. A. Prokopyev. A New Linearization Technique for Multi -Quadratic 0 -1 Programming Problems. Operations Research Letters, 32(6): 517 -522, 2004. (Rank 5 th in Top 25 Articles in Operations Research Letters)

Thank you for your attention! Questions?

Conference in 2011