http 4 c ucc ie rmarines talks tutorial-IJCAI-09 -syllabus pdf Combinatorial

http: //4 c. ucc. ie/~rmarines/talks/tutorial-IJCAI-09 -syllabus. pdf Combinatorial Optimization for Graphical Models Rina Dechter Donald Bren School of Computer Science University of California, Irvine, USA Radu Marinescu Cork Constraint Computation Centre University College Cork, Ireland Simon de Givry & Thomas Schiex Dept. de Mathématique et Informatique Appliquées INRA, Toulouse, France with contributed slides by Javier Larrosa (UPC, Spain)

Outline n Introduction n Graphical models Optimization tasks for graphical models Inference Variable Elimination, Bucket Elimination Search (OR) n Branch and Bound and Best First Search n n n Lower-bounds and relaxations n n Exploiting problem structure in search n n Bounded variable elimination and local consistency AND/OR search spaces (trees, graphs) Software 2

Outline n Introduction n n n n Graphical models Optimization tasks for graphical models Solving optimization problems by inference and search Inference Search (OR) Lower-bounds and relaxations Exploiting problem structure in search Software 3

Combinatorial Optimization Find an schedule for the satellite that maximizes the number of photographs taken, subject to the on board recording capacity Earn 8 cents per invested dollar such that the investment risk is minimized

Combinatorial Optimization Assign frequencies to a set of radio links Find a joint haplotype configuration for such that interferences are minimized all members of the pedigree which maximizes the probability of data

Constrained Optimization Example: power plant scheduling 6

Constraint Optimization Problems for Graphical Models f(A, B, D) has scope {A, B, D} A 1 1 2 2 3 3 B 2 3 1 2 Primal graph = Variables --> nodes Functions, Constraints - arcs D Cost 3 3 2 2 3 1 0 2 5 1 0 A B C D F F(a, b, c, d, f, g)= f 1(a, b, d)+f 2(d, f, g)+f 3(b, c, f) G 7

Constraint Networks Map coloring Variables: countries (A B C etc. ) Values: colors (red green blue) Constraints: Constraint graph A B red green yellow red yellow green red E A A D E D B F C B G G F C 8

Constraint Networks Map coloring Variables: countries (A B C etc. ) Values: colors (red green blue) Constraints: Constraint graph A B red green yellow Others green yellow red yellow green red 0 0 0 E A A D E D B F C B G G F C 9

Probabilistic Networks BN = (X, D, G, P) P(S) Smoking P(C|S) P(D|C, B) P(B|S) D=0 D=1 0 0 0. 1 0. 9 1 0. 7 0. 3 0 0. 8 0. 2 1 Dyspnoea B 1 P(X|C, S) C 0 Cancer Bronchitis 1 0. 9 0. 1 P(D|C, B) X-Ray P(S, C, B, X, D) = P(S)· P(C|S)· P(B|S)· P(X|C, S)· P(D|C, B) MPE= Find a maximum probability assignment, given evidence MPE= find argmax P(S)· P(C|S)· P(B|S)· P(X|C, S)· P(D|C, B) 10

Monitoring Intensive Care Patients The “alarm” network 37 variables, 509 parameters (instead of 2 37) MINVOLSET PULMEMBOLUS PAP KINKEDTUBE INTUBATION SHUNT VENTMACH VENTLUNG DISCONNECT VENITUBE PRESS MINOVL SAO 2 TPR HYPOVOLEMIA LVEDVOLUME CVP PCWP LVFAILURE STROEVOLUME VENTALV PVSAT ANAPHYLAXIS FIO 2 ARTCO 2 EXPCO 2 INSUFFANESTH CATECHOL HISTORY ERRBLOWOUTPUT CO HR HREKG ERRCAUTER HRSAT HRBP BP 11

Linkage Analysis ? ? A a 2 B b 1 A A B b 3 ? ? 4 5 6 A|? B|? A|a B|b • 6 individuals • Haplotype: {2, 3} • Genotype: {6} • Unknown

Pedigree: 6 people, 3 markers L 11 m L 12 m L 11 f L 12 f X 12 X 11 S 15 m S 13 m L 13 f L 14 m X 14 X 13 S 15 m L 15 f L 16 m L 22 m L 21 f L 23 m X 16 S 25 m L 23 f L 24 m X 23 S 25 m L 25 f L 26 m L 32 m L 33 m X 26 S 35 m L 33 f L 34 m X 33 S 35 m S 25 m L 26 f L 32 f X 32 X 31 S 33 m S 26 m S 25 m X 25 L 31 f L 24 f X 24 L 25 m L 31 m S 15 m L 16 f L 22 f X 22 X 21 S 23 m S 16 m S 15 m X 15 L 21 m L 14 f L 34 f X 34 L 35 m L 35 f X 35 L 36 m S 35 m S 36 m L 36 f X 36 S 35 m 13

Influence Diagrams Task: find optimal policy: Influence diagram ID = (X, D, P, R). Test cost Test result Drill cost Drill Seismic structure Chance variables: Oil sales Oil sale policy Oil produced Oil underground Sales cost Market information over domains. Decision variables: CPT’s for chance variables: Reward components: Utility function: 14

Graphical Models n A graphical model (X, D, F): n n n Relation X = {X 1, …Xn} variables D = {D 1, … Dn} domains F = {f 1, …, fm} functions A 0 0 1 1 (constraints, CPTS, CNFs …) n F 0 1 0 1 n A combination elimination (projection) n n n Belief updating: X y j Pi MPE: max. X j Pj CSP: X j Cj Max-CSP: min. X j fj C E n C green red blue red F blue red green blue F B Tasks: n A red blue green P(F|A, C) 0. 14 0. 96 0. 40 0. 60 0. 35 0. 65 0. 72 0. 68 Operators: n n C 0 0 1 1 Primal graph (interaction graph) D All these tasks are NP-hard n exploit problem structure n identify special cases n approximate 15

Sample Domains for Graphical M n n n n n Web Pages and Link Analysis Communication Networks (Cell phone Fraud Detection) Natural Language Processing (e. g. Information Extraction and Semantic Parsing) Battle space Awareness Epidemiological Studies Citation Networks Intelligence Analysis (Terrorist Networks) Financial Transactions (Money Laundering) Computational Biology Object Recognition and Scene Analysis … Type of constrained optimization: • Weighted CSPs, Max SAT • Most Probable Explanation (MPE) • Linear Integer Programs 16

Outline n Introduction n n n n Graphical models Optimization tasks for graphical models Solving optimization problems by inference and search Inference Search (OR) Lower-bounds and relaxations Exploiting problem structure in search Software 17

Solution Techniques AND/OR search Search: Conditioning Time: exp(treewidth*log n) Space: linear Space: exp(treewidth) Time: exp(n) Space: linear Incomplete Complete DFS search Branch-and-Bound A* Simulated Annealing Gradient Descent Stochastic Local Search Hybrids Time: exp(pathwidth) Space: exp(pathwidth) Incomplete Local Consistency Complete Time: exp(treewidth) Space: exp(treewidth) Unit Resolution Adaptive Consistency Tree Clustering Variable Elimination Resolution Inference: Elimination Mini-bucket(i)

Combination of cost functions A B b b 6 b g 0 g b 0 g g 6 B B C f(A, B, C) b b b 12 b b g g 6 g b b 6 g b g 0 g g b 6 g g g 12 g 0 b 0 g g 6 0 b 6 6 b b b A f(B, C) b + C g f(A, B) = 0 + 6

Elimination in a cost function A B f(A, B) b b 4 b g 6 b r 1 g b g Min Elim(f, B) A g(A) 2 b 1 g 6 g 2 g r 3 r 1 r b 1 r g 1 r r 6 Elim(g, A) h 1

Conditioning a cost function A B f(A, B) b b 6 b g 0 b r 3 g b 0 g g 6 0 3 g r 0 r b 0 r g 0 r r 6 Assign(f. AB, A, b) g(B) Assign(g, B, r) h

Conditioning vs. Elimination Conditioning (search) Elimination (inference) A A D B C E E C E … D B C F G A=1 F G A=k D B F G D B C E k “sparser” problems F G D B C E F G 1 “denser” problem 22

Outline n Introduction n Inference n n n Variable Elimination, Bucket Elimination Search (OR) Lower-bounds and relaxations Exploiting problem structure in search Software 23

Computing the Optimal Cost Solution A Constraint graph B B D D OPT = C C E E f(a, b)+f(a, c)+f(a, d)+f(b, c)+f(b, d)+f(b, e)+f(c, e) Combination f(a, d) + f(a, c)+f(c, e) + f(a, b)+f(b, c)+f(b, d)+f(b, e) Variable Elimination 24

Finding Algorithm elim-opt (Dechter, 1996) Non serial Dynamic Programming (Bertele & Briochi, 1973) Elimination operator bucket B: f(a, b) f(b, c) f(b, d) f(b, e) bucket C: f(c, a) f(c, e) bucket D: f(a, d) bucket E: B C D e=0 E bucket A: A OPT 25

Generating the Optimal Assignment B: f(a, b) f(b, c) f(b, d) f(b, e) C: f(c, a) f(c, e) D: f(a, d) E: e=0 A: 26

Complexity Algorithm elim-opt (Dechter, 1996) Non serial Dynamic Programming (Bertele & Briochi, 1973) Elimination operator bucket B: f(a, b) f(b, c) f(b, d) f(b, e) bucket C: f(c, a) f(c, e) C bucket D: f(a, d) bucket E: B D e=0 bucket A: OPT exp(w*=4) ”induced width” (max clique size) E A 27

Complexity of Bucket Elimination is time and space r = number of functions A The effect of the ordering: B C D E D C B A C D E B E A constraint graph Finding smallest induced-width is hard! 29

Outline n Introduction Inference n Search (OR) n n n Branch and Bound and Best First search Lower-bounds and relaxations Exploiting problem structure in search Software 30

The Search Space C A D A 0 0 1 1 F B E B 0 1 f 1 2 0 1 4 A 0 0 1 1 C 0 1 f 2 3 0 0 1 A 0 0 1 1 E 0 1 f 3 0 3 2 0 A 0 0 1 1 F 0 1 f 4 2 0 0 2 B 0 0 1 1 C 0 1 f 5 0 1 2 4 B 0 0 1 1 D 0 1 f 6 4 2 1 0 B 0 0 1 1 E 0 1 f 7 3 2 1 0 C 0 0 1 1 D 0 1 f 8 1 4 0 0 E 0 0 1 1 F 0 1 f 9 1 0 0 2 Objective function: A 0 B 0 C F 1 0 D E 1 1 0 0 0 1 1 0 0 1 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 31

The Search Space C A D A 0 0 1 1 F B E B 0 1 f 1 2 0 1 4 A 0 0 1 1 C 0 1 f 2 3 0 0 1 A 0 0 1 1 E 0 1 f 3 0 3 2 0 A 0 0 1 1 0 A F 0 1 f 4 2 0 0 2 B 0 0 1 1 C 0 1 f 5 0 1 2 4 B 0 0 1 1 C 0 1 1 1 5 D 6 0 1 4 0 4 2 0 E 0 1 f 7 3 2 1 0 C 0 0 1 1 D 0 1 f 8 1 4 0 0 E 0 0 1 1 F 0 1 f 9 1 0 0 2 2 1 4 2 0 1 1 1 0 4 0 5 0 B 0 0 1 1 1 0 0 3 f 6 4 2 1 0 0 0 2 B D 0 1 0 0 1 5 1 2 6 0 0 4 1 5 2 0 2 1 1 4 0 1 1 0 0 1 3 E 5 3 5 3 5 1 3 1 3 5 2 5 2 3 0 3 0 0 1 0 1 0 1 0 1 3 02 23 0 2 23 02 2 1 20 41 2 0 41 20 4 F 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Arc-cost is calculated based from cost functions with empty scope (conditioning) 32

The Value Function C A D A 0 0 1 1 F B E B 0 1 f 1 2 0 1 4 A 0 0 1 1 C 0 1 f 2 3 0 0 1 A 0 0 1 1 E 0 1 f 3 0 3 2 0 A 0 0 1 1 F 0 1 f 4 2 0 0 2 B 0 0 1 1 C 0 1 f 5 0 1 2 4 B 0 0 1 1 D 0 1 f 6 4 2 1 0 B 0 0 1 1 E 0 1 f 7 3 2 1 0 C 0 0 1 1 D 0 1 f 8 1 4 0 0 E 0 0 1 1 F 0 1 f 9 1 0 0 2 5 0 A 6 B 8 5 E F 3 3 0 5 6 4 1 5 0 5 5 3 3 0 1 0 3 3 0 5 7 1 5 3 1 2 3 3 2 1 5 7 0 0 3 C D 5 2 0 1 1 0 3 4 1 0 4 1 1 3 1 1 0 3 7 0 1 1 5 1 3 2 5 0 2 4 0 6 2 5 4 0 0 1 1 4 1 2 2 5 0 2 1 2 5 20 1 2 2 5 2 1 2 0 3 0 0 01 4 0 3 1 1 0 0 3 1 0 00 21 00 2 1 00 21 0 0 2 1 1 0 0 1 1 0 0 1 3 02 23 0 2 23 02 2 1 20 41 2 0 41 20 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it 33

An Optimal Solution C A D A 0 0 1 1 F B E B 0 1 f 1 2 0 1 4 A 0 0 1 1 C 0 1 f 2 3 0 0 1 A 0 0 1 1 E 0 1 f 3 0 3 2 0 A 0 0 1 1 F 0 1 f 4 2 0 0 2 B 0 0 1 1 C 0 1 f 5 0 1 2 4 B 0 0 1 1 D 0 1 f 6 4 2 1 0 B 0 0 1 1 E 0 1 f 7 3 2 1 0 C 0 0 1 1 D 0 1 f 8 1 4 0 0 E 0 0 1 1 F 0 1 f 9 1 0 0 2 5 0 A 6 B 8 5 E F 3 3 0 5 6 4 1 5 0 5 5 3 3 0 1 0 3 3 0 5 7 1 5 3 1 2 3 3 2 1 5 7 0 0 3 C D 5 2 0 1 1 0 3 4 1 0 4 1 1 3 1 1 0 3 7 0 1 1 5 1 3 2 5 0 2 4 0 6 2 5 4 0 0 1 1 4 1 2 2 5 0 2 1 2 5 20 1 2 2 5 2 1 2 0 3 0 0 01 4 0 3 1 1 0 0 3 1 0 00 21 00 2 1 00 21 0 0 2 1 1 0 0 1 1 0 0 1 3 02 23 0 2 23 02 2 1 20 41 2 0 41 20 4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Value of node = minimal cost solution below it 34

Basic Heuristic Search Schemes Heuristic function f(xp) computes a lower bound on the best extension of xp and can be used to guide a heuristic search algorithm. We focus on: 1. Branch-and-Bound Use heuristic function f(xp) to prune the depth first search tree Linear space 2. Best-First Search Always expand the node with the highest heuristic value f(xp) Needs lots of memory f L L 35

Classic Branch and Bound Each node is a COP subproblem (defined by current conditioning) g(n) n f(n) = g(n) + h(n) f(n) = lower bound Prune if f(n) ≥ UB h(n) - under-estimates Optimal cost below n (UB) Upper Bound = best solution so far 36

Best First vs. Depth first Branch and Bound n Best-First (A*): (optimal) n n n Expand least number of nodes given h Requires to store all search tree Depth-first Branch-and-Bound: n n n Can use only linear space If find an optimal solution early will expand the same space as Best First (if search space is a tree) B&B can improve heuristic function dynamically 37

How to Generate Heuristics n The principle of relaxed models n n n Mini Bucket Elimination Bounded directional consistency ideas Linear relaxation for integer programs 38

Outline n Introduction Inference Search (OR) n Lower-bounds and relaxations n n n Bounded variable elimination n n Mini Bucket Elimination Generating heuristics using mini bucket elimination Local consistency Exploiting problem structure in search Software 39

Mini Bucket Approximation Split a bucket into mini buckets => bound complexity bucket (X) = { h 1, …, hr+1, …, hn } { h 1, …, hr } { hr+1, …, hn } 40

Mini Bucket Elimination Mini-buckets min. BΣ bucket B: A bucket C: B f(b, e) f(a, b) f(c, e) f(a, c) C f(a, d) bucket D: D f(b, d) E bucket E: h. B(e) h. C(e, a) bucket A: h. E(a) h. B(a, d) h. D(a) Lb = lower bound 41

Semantics of Mini Bucket: Splitting a Node Variables in different buckets are renamed and duplicated (Kask et. al. , 2001), (Geffner et. al. , 2007), (Choi, Chavira, Darwiche , 2007) Before Splitting: Network N U After Splitting: Network N' U Û 42

Mini Bucket Elimination semantic Mini-buckets min. BΣ bucket B: A min. BΣ F(a, b’) F(b’, c) F(b, d) F(b, e) A bucket C: h. B(a, c) F(c, e) F(a, c) B C F(a, d) bucket D: D E bucket E: bucket A: h. B(d, e) h. C(e, a) h. D(e, a) B D E C B’ h. E(a) L = lower bound 43

MBE MPE(i) Algorithm Approx-MPE (Dechter & Rish, 1997) n n Input: i – max number of variables allowed in a mini bucket Output: [lower bound (P of a sub optimal solution), upper bound] Example: approx-mpe(3) versus elim-mpe 44

Properties of MBE(i) n Complexity: O(r exp(i)) time and O(exp(i)) space Yields an upper bound a lower bound n Accuracy: determined by upper/lower (U/L) bound n As i increases, both accuracy and complexity increase n n n Possible use of mini bucket approximations: n As anytime algorithms n As heuristics in search Other tasks: similar mini bucket approximations for: n Belief updating, MAP and MEU (Dechter & Rish, 1997) 45

Anytime Approximation 46

Empirical Evaluation (Rish & Dechter, 1999) n Benchmarks n n Randomly generated networks CPCS networks Probabilistic decoding Task n Comparing approx-mpe and anytime-mpe versus bucket elimination (elim-mpe) 47

CPCS networks – medical diagnosis (noisy OR model) Test case: no evidence Time (sec) Algorithm elim-mpe cpcs 360 cpcs 422 115. 8 1697. 6 anytime-mpe( ), 70. 3 505. 2 anytime-mpe( ), 70. 3 110. 5 48

Outline n Introduction Inference Search (OR) n Lower-bounds and relaxations n n n Bounded variable elimination n n Mini Bucket Elimination Generating heuristics using mini bucket elimination Local consistency Exploiting problem structure in search Software 49

Generating Heuristic for Graphical Models (Kask & Dechter, AIJ’ 01) Given a cost function F(a, b, c, d, e) = f(a) + f(b, a) + f(c, a) + f(e, b, c) + f(d, b, a) Define an evaluation function over a partial assignment as the best cost of it’s best extension 0 0 E 1 A 1 E D 0 B 1 B D D D f*(a, e, D) = minb, c F(a, B, C, D, e) = = f(a) + minb, c f(B, a) + f(C, a) + f(e, B, C) + f(D, a, B) = g(a, e, D) • H*(a, e, D) 50

Generating Heuristics (cont. ) H*(a, e, d) = minb, c f(b, a) + f(c, a) + f(e, b, c) + f(d, a, b) = minc [f(c, a) + minb [f(e, b, c) + f(b, a) + f(d, a, b)]] >= minc [f(c, a) + minb f(e, b, c) + minb [f(b, a) + f(d, a, b)]] = minb [f(b, a) + f(d, a, b)] + minc [f(c, a) + minb f(e, b, c)] = h. B(d, a) + h. C(e, a) = H(a, e, d) f(a, e, d) = g(a, e, d) + H(a, e, d) <= f*(a, e, d) The heuristic function H is what is compiled during the preprocessing stage of the Mini-Bucket algorithm. 51

Static MBE Heuristics n A n B C Given a partial assignment xp, estimate the cost of the best extension to a full solution The evaluation function f(xp) can be computed using function recorded by the Mini Bucket scheme E D Cost Network f(a, e, D))=g(a, e) + H(a, e, D ) B: f(E, B, C) C: f(C, A) f(D, A, B) h. B(E, C) D: E: A: f(B, A) 0 h. B(D, A) f(A) h. E(A) h. D(A) 0 E 1 1 E B 1 0 A h. C(E, A) D B D D f(a, e, D) = f(a) + h. B(D, a) + h. C(e, a) g h – is admissible 52

Heuristics Properties n MB Heuristic is monotone, admissible Computed in linear time n IMPORTANT: n n Heuristic strength can vary by MB(i) Higher i bound more processing stronger heuristic less search Allows controlled trade off between preprocessing and search 53

Experimental Methodology n Algorithms n n Benchmarks n n BBMB(i) Branch and Bound with MB(i) BBFB(i) Best First with MB(i) MBE(i) – Mini Bucket Elimination Random Coding (Bayesian) CPCS (Bayesian) Random (CSP) Measures of performance n Compare accuracy given a fixed amount of time n n i. e. , how close is the cost found to the optimal solution Compare trade off performance as a function of time 54

Empirical Evaluation of Mini Bucket heuristics: Random coding networks (Kask & Dechter, UAI’ 99, Aij 2000) Random Coding, K=100, noise=0. 28 Random Coding, K=100, noise=0. 32 Each data point represents an average over 100 random instances 55

Dynamic MB and MBTE Heuristics (Kask, Marinescu and Dechter, UAI’ 03) n Rather than pre compile compute the heuristics during search n Dynamic MB: use the Mini Bucket algorithm to produce a bound for any node during search n n Dynamic MBTE: We can compute heuristics simultaneously for all un instantiated variables using mini bucket tree elimination MBTE is an approximation scheme defined over cluster trees. It outputs multiple bounds for each variable and value extension at once 56

Branch and Bound w/ Mini Buckets n BB with static Mini Bucket Heuristics (s BBMB) n n n BB with dynamic Mini Bucket Heuristics (d BBMB) n n n Heuristic information is pre compiled before search Static variable ordering, prunes current variable Heuristic information is assembled during search Static variable ordering, prunes current variable BB with dynamic Mini Bucket Tree Heuristics (BBBT) n n Heuristic information is assembled during search. Dynamic variable ordering, prunes all future variables 57

Empirical Evaluation n Algorithms: n Complete n n n BBBT BBMB n n Incomplete n n n DLM GLS SLS IJGP IBP (coding) Measures: n n Time Accuracy (% exact) #Backtracks Bit Error Rate (coding) Benchmarks: n n n Coding networks Bayesian Network Repository Grid networks (N by N) Random noisy OR networks Random networks 58

Real World Benchmarks (Marinescu, Kask & Dechter, UAI’ 03) Average Accuracy and Time. 30 samples, 10 observations, 30 seconds 59

Hybrid of Variable elimination and Search n Tradeoff space and time 60

Search Basic Step: Conditioning X 1 X 2 X 3 X 4 X 5 61

Search Basic Step: Conditioning • Select a variable X 1 X 2 X 3 X 4 X 5 62

Search Basic Step: Conditioning X 1 X 2 X 3 X 4 X 1 a X 1 b X 2 X 3 X 5 X 1 c …. . . X 2 X 4 X 5 X 4 X 2 X 3 …. . . X 5 X 4 X 5 63

Search Basic Step: Variable Branching by Conditioning X 1 General principle: Condition until tractable Then solve sub-problems efficiently X 2 X 3 X 4 X 1 a X 1 b X 2 X 3 X 5 X 1 c …. . . X 2 X 4 X 5 X 4 X 2 X 3 …. . . X 5 X 4 X 5 64

Search Basic Step: Variable Branching by Conditioning X 1 X 2 Example: solve subproblem by inference, BE(i=2) X 3 X 4 X 1 a X 1 b X 2 X 3 X 5 X 1 c …. . . X 2 X 4 X 5 X 4 X 2 X 3 …. . . X 5 X 4 X 5 65

The Cycle Cutset Scheme: Condition Until Treeness • Cycle-cutset • i-cutset • C(i)-size of i-cutset Space: exp(i), Time: O(exp(i+c(i)) 66

Eliminate First 67

Eliminate First 68

Eliminate First Solve the rest of the problem by any means 69

Hybrids Variants n n n Condition, condition … and then only eliminate (w cutset, cycle cutset) Eliminate, eliminate … and then only search Interleave conditioning and elimination (elim cond(i), VE+C) 70

Interleaving Conditioning and Elimination (Larrosa & Dechter, CP’ 02) 71

Interleaving Conditioning and Elimination 72

Interleaving Conditioning and Elimination 73

Interleaving Conditioning and Elimination 74

Interleaving Conditioning and Elimination 75

Interleaving Conditioning and Elimination 76

Interleaving Conditioning and Elimination . . . 77

Boosting Search with Variable Elimination (Larrosa & Dechter, Constraints 2003) n At each search node n n Eliminate all unassigned variables with degree ≤ p Select an unassigned variable A Branch on the values of A Properties n n BB VE( 1) is Depth First Branch and Bound BB VE(w) is Variable Elimination BB VE(1) is similar to Cycle Cutset BB VE(2) is well suited with soft local consistencies (add binary constraints only, independent of the elimination order) 78

(Sanchez et al, Constraints 2008) Mendelian error detection n n Given a pedigree and partial observations (genotypings) Find the erroneous genotypings, such that their removal restores consistency Checking consistency is NP complete (Aceto et al. , Comp. Sci. Tech. 2004) Minimize the number of genotypings to be removed Maximize the joint probability of the true genotypes (MPE) Pedigree problem size: n≤ 20, 000 ; d=3— 66 ; e(3)≤ 30, 000 79

toulbar 2 with EDAC and binary Pedigree • Minimize v 0. 5 number of genotypings branching • the to be removed • CPU time in seconds to find and prove optimality on a 3 GHz computer with 16 GB BB with dom/deg BB with last conflict BB-VE(2) with dom/deg BB-VE(2) with last conflict 80

Outline n Introduction Inference Search (OR) n Lower-bounds and relaxations n n Bounded variable elimination Local consistency n n n Equivalence Preserving Transformations Chaotic iteration of EPTs Optimal set of EPTs Improving sequence of EPTs Exploiting problem structure in search Software 81

Variables (dynamic ordering) Depth irst Branch and Bound (DFBB) F Each node is a COP subproblem (defined by current conditioning) (LB) Lower Bound = f under estimation of the best Obtained by enforcing local consistency solution in the sub tree f k If UB then prune LB (UB) Upper Bound = best solution so far = k 82

Local Consistency in Constraint Networks n Massive local inference n n n Time efficient (local inference, as mini buckets) Infer only small constraints, added to the network No variable is eliminated Produces an equivalent more explicit problem May detect inconsistency (prune tree search) Arc consistency inference in the scope of 1 constraint 83

Arc Consistency (binary CSP) n for a constraint c. AB and variable A c. AB + c. B v 0 w 0 A 0 v 0 w B A B c. AB v v 0 Elim B A c. A v w 0 v 0 w v w w w n Applied iteratively on all constraint/variables n Confluent, incremental, complexity in O(md 2) n Empty domain => inconsistency 84

Arc Consistency and Cost Functions n for a cost function f. AB and a variable A f. AB + f. B w 1 1 0 A 2 0 0 1 B v w A B f. AB v v 0 v w w w v Elim B A g. A 0 v 2 w 1 EQUIVALENCE LOST 85

Shifting Costs (cost compensation) v w 1 0 A 1 0 v 0 1 2 w B A B f. AB v v 0 v w 0 w v 1 2 w 1 0 Elim B A g. A v 0 w 1 Subtract from source in order to preserve the problem Equivalence Preserving Transformation 86

Complete Inference vs Local Inference n Complete inference Combine, eliminate, add & forget Systematic inference Exponential time/space Preserves optimum n n n Provides the optimum f n Local consistency n Combine, eliminate, add & subtract Massive local inference Space/time efficient Preserves equivalence n Provides a lb n n n f 87

Equivalence Preserving Transformation n Shifting costs from f. AB to A Shift(f. AB, (A, w), 1) v 1 w A 1 0 v 0 0 1 w B Arc EPT: shift cost in the scope of 1 cost function Problem structure preserved n Can be reversed (e. g. Shift(f. AB, (A, w), 1)) 88

Equivalence Preserving Transformations Shift(f. AB, (A, b), 1) A B Shift(f. A, (), 1) A Shift(f. AB, (B, a), 1) B A B Shift(f. AB, (B, a), 1) • EPTs may cycle • EPTs may lead to different f 0 • Which EPTs should we apply? 89

Local Consistency n Equivalence Preserving Transformation n Chaotic iteration of EPTs n Optimal set of EPTs n Improving sequence of EPTs 90

Local Consistency n Equivalence Preserving Transformation n Chaotic iteration of EPTs n Enforce a local property by one or two EPT(s) n Optimal set of EPTs n Improving sequence of EPTs 91

(Larrosa, AAAI 2002) Node Consistency (NC*) n For any variable A n n a, f + f. A(a)<k a, f. A (a)= 0 A v Complexity: O(nd) 3 w n k = 4 1 f = 0 0 2 C 0 1 1 v w 0 1 1 v 0 1 2 w 1 B Shift(f. C, , 1) Shift(f. A, , -1); Shift(f. A, , 1) 92

(Schiex, CP 2000) Arc Consistency (AC*) n n NC* For any f. AB n (Larrosa, AAAI 2002) k=4 2 f = 1 A a b f. AB(a, b)= 0 n n w 0 1 v w 0 1 0 0 1 1 v 0 2 b is a support complexity: O(n 2 d 3) C 0 w 1 B Shift(f. AC, (C, v), 2) Shift(f. BC, (B, v), 1) Shift(f. B, , 1) Shift(f. C, , -2) Shift(f. C, , 2) Shift(f. BC, (B, w), 1) 93

Directional AC (DAC*) n n NC* For all f. AB (A<B) n a b f. AB(a, b) + f. B(b) = 0 (Cooper, Fuzzy Sets and Systems 2003) A < B< C n complexity: v 2 0 O(ed C 0 1 1 v 2) 2 2 b is a full-support n A w k=4 1 f = 2 w 0 1 1 1 v 0 w 1 B Shift(f. BC, (C, v), -1) Shift(f. A, , -2) Shift(f. BC, (B, v), 1) Shift(f. A, , 2) Shift(f. BC, (B, w), 1) Shift(f. B, , 1) 94

DAC lb = Mini Bucket(2) lb Mini-buckets A<E<D<C<B bucket B: f(b, e) A bucket C: B f(b, d) f(c, e) f(a, c) C f(a, d) bucket D: D f(a, b) E h. B(d) bucket E: h. B(e) h. C(e) bucket A: h. E(Ø) h. C(a) h. D(a) h. B(a) lb = lower bound DAC provides an equivalent problem: incrementality DAC+NC (value pruning) can improve lb 95

Other « Chaotic » Local Consistencies n FDAC* = DAC+AC+NC n n Stronger lower bound O(end 3) Better compromise (Cooper, Fuzzy Sets and Systems 2003) (Larrosa & Schiex, IJCAI 2003) (Larrosa & Schiex, AI 2004) (Cooper & Schiex, AI 2004) EDAC* = FDAC+ EAC (existential AC) n n n (Heras et al. , IJCAI 2005) Even stronger (Sanchez et al, Constraints 2008) 2 max{nd, k}) O(ed Currently among the best practical choice 96

Local Consistency n Equivalence Preserving Transformation n Chaotic iteration of EPTs n Optimal set of simultaneously applied EPTs n n Solve a linear problem in rational costs Improving sequence of EPTs 97

Finding an EPT Sequence Maximizing the LB Bad news Finding a sequence of integer arc EPTs that maximizes the lower bound defines an NP hard problem (Cooper & Schiex, AI 2004) 98

Good news: A continuous linear formulation n n u. A : p. Aa. AB: cost shifted from A to f 0 cost shifted from f. AB to (A, a) max ∑ui Subject to non negativity of costs n + m. r. d variables n. d + m. dr linear constraints 99

Optimal Soft AC (Cooper et al. , IJCAI 2007) (Schlesinger, Kibernetika 1976) (Boros & Hammer, Discrete Appl. Math. 2002) solved by Linear Programming n n n Polynomial time, rational costs (bounded arity r) Computes an optimal set of EPT (u. A , p. Aa. AB) to apply simultaneously Stronger than AC, DAC, FDAC, EDAC. . . (or any local consistency that preserves scopes) 100

Example -1 AC, DAC, FDAC, EDAC c a X 2 X 3 X 1 1 c a 1 b 1 a c 1 1 X 4 f 0 = 1 101

Local Consistency n Equivalence Preserving Transformation n Chaotic iteration of EPTs n Optimal set of EPTs n Improving sequence of EPTs n Find an improving sequence using classical arcconsistency in classical CSPs 102

Virtual Arc Consistency n (Cooper et al. , AAAI 2008) Bool(P): a classical CSP derived from P n Forbids non zero cost assignments. F n If non empty: Solutions of Bool(P) = Optimal solutions of P (cost f 0) P is Virtual AC iff AC(Bool(P)) is not empty 103

Properties n Solves the polynomial class of submodular cost functions n In Bool(P) this means n Bool(P) max closed: AC implies consistency 104

Binary Submodular Cost Functions n Decomposable in a sum of ”Generalized Intervals” x 2 x 1 n Subsumes “Simple Temporal CSP with strictly monotone preferences” (Khatib et al, IJCAI 2001) 105

Enforcing VAC AC, DAC, FDAC, EDAC 106

Enforcing VAC on a binary COP n Iterative process n One iteration O(ed 2) time, O(ed) space n Number of iterations possibly unbounded: premature stop (e threshold) 107

Hierarchy Special case: CSP (k=1) NC NC* DAC AC* Solve tree-like primal graphs FDAC* AC EDAC* VAC OSAC [0, 1] Solve submodular cost functions 108

VAC MEDAC MFDAC MAC/MDAC MNC BT 109

Radio Link Frequency Assignment Problem (Cabon et al. , Constraints 1999) (Koster et al. , 4 OR 2003) n n n Best can be: n n n Given a telecommunication network …find the best frequency for each communication link, avoiding interferences Minimize the maximum frequency, no interference (max) Minimize the global interference (sum) Generalizes graph coloring problems: |f 1 – f 2| a CELAR problem size: n=100— 458 ; d=44 ; m=1, 000— 5, 000 110

CELAR toulbar 2 v 0. 6 running on a 3 GHz computer with 16 GB n Time Nodes VAC 18 sec 25 103 EDAC 7 sec 38 103 FDAC 10 sec 72 103 AC 23 sec 410 103 DAC 150 sec 2. 4 106 NC 897 sec 26 106 SCEN-06 -sub 1 n = 14, d = 44, m = 75, k = 2669 Solver: toulbar 2 BB VE(2), last conflict, dichotomic branching 111

CELAR/Graph problems (Cooper et al. , AAAI 2008) toulbar 2 v 0. 6 running on a 3 GHz computer with 16 GB n Maintaining VAC last conflict heuristics, BB VE(2) during search, binary branching by domain splitting, premature VAC stop during search n Closed 2 open problems by maintaining VAC 112

CELAR toulbar 2 v 0. 6 running on a 3 GHz computer with 16 GB n SCEN 06 sub 1 n = 14, d = 44, m = 75, k = 2669 Variable ordering toulbar 2 results with EDAC Domain / Degree Last Conflict (Lecoutre. . , 2006) N ary branching Type of branching 234 sec. 6. 1 106 nodes N/A Binary branching 197 sec. 6. 7 106 nodes 40 sec. 247, 000 nodes 75 sec. 7 sec. 38, 000 nodes Dichotomic branching 1. 8 106 nodes N-ary branching with dom/degree and no initial upper-bound: 265 sec. , 7. 2 M nd. 113

CELAR: Local vs. Partial search with LC toulbar 2 v 0. 6 running on a 3 GHz computer with 16 GB Local consistency enforcement informs variable and value ordering heuristics Incomplete solvers toulbar 2 with Limited Discrepancy Search (Harvey & Ginsberg, IJCAI 1995) exploiting unary cost functions produced by EDAC n INCOP (Neveu, Trombettoni and Glover, CP 2004) Intensification/Diversification Walk meta heuristic n CELAR SCEN-06 Time Solution cost LDS(8), BB VE(2) 35 sec 3394 INCOP 36 sec/run 3476 (best) 3750 (mean over 10 runs) 114

Perspectives n Improve modeling capabilities n n Study stronger soft local consistencies n n Global cost functions (Lee and Leung, IJCAI 2009) & Virtual GAC, … Singleton arc consistency, … Extension to other tasks n Probabilistic inference, … 115

Outline n Introduction Inference Search (OR) Lower-bounds and relaxations n Exploiting problem structure in search n n n n AND/OR search trees (linear space) AND/OR Branch and Bound search AND/OR search graphs (caching) AND/OR search for 0 1 integer programming Software 116

Solution Techniques AND/OR search Time: exp(treewidth*log n) Space: linear Space: exp(treewidth) Time: exp(treewidth) Search: Conditioning Time: exp(n) Space: linear Complete DFS search Branch and Bound Time: exp(pathwidth) Space: exp(pathwidth) Hybrids Complete Time: exp(treewidth) Space: exp(treewidth) Inference: Elimination Adaptive Consistency Tree Clustering Variable Elimination Resolution 117

Classic OR Search Space A 0 0 1 1 B 0 1 f 1 2 0 1 4 A 0 0 1 1 C 0 1 f 2 3 0 0 1 A 0 0 1 1 E 0 1 f 3 0 3 2 0 A 0 0 1 1 F 0 1 f 4 2 0 0 2 B 0 0 1 1 C 0 1 f 5 0 1 2 4 B 0 0 1 1 D 0 1 f 6 4 2 1 0 B 0 0 1 1 E 0 1 f 7 3 2 1 0 C 0 0 1 1 D 0 1 f 8 1 4 0 0 E 0 0 1 1 F 0 1 f 9 1 0 0 2 Objective function: A 0 B 0 C F 1 0 D E 1 1 0 0 0 1 1 0 0 1 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 1 118

The AND/OR Search Tree A C A F B D B E C E D F Pseudo tree (Freuder & Quinn 85) A OR AND 0 1 OR B B 0 AND 1 C OR E 0 C E 1 C E AND 0 1 0 1 OR D D F F AND 0 1 0 1 0 1 0 1 119

The AND/OR Search Tree A B C E D F Pseudo tree A OR AND 0 1 OR B B 0 AND 1 C OR E 0 C E 1 C E AND 0 1 0 1 OR D D F F AND 0 1 0 1 0 1 0 1 A solution subtree is (A=0, B=1, C=0, D=0, E=1, F=1) 0 1 120

Weighted AND/OR Search Tree A 0 0 1 1 B 0 1 f 1 2 0 1 4 A 0 0 1 1 C 0 1 f 2 3 0 0 1 A 0 0 1 1 E 0 1 f 3 0 3 2 0 A 0 0 1 1 F 0 1 f 4 2 0 0 2 B 0 0 1 1 C 0 1 f 5 0 1 2 4 B 0 0 1 1 D 0 1 f 6 4 2 1 0 B 0 0 1 1 E 0 1 f 7 3 2 1 0 C 0 0 1 1 D 0 1 f 8 1 4 0 0 E 0 0 1 1 F 0 1 f 9 1 0 0 2 Objective function: 5 w(A, 0) = 0 OR A 5 0 AND OR 1 5 B 2 0 6 0 AND Node Value (bottom-up evaluation) 51 3 C 3 1 OR 3 3 4 E 5 C 5 1 E 1 3 4 AND 5 0 2 1 0 0 2 1 2 0 0 1 0 0 OR 5 D 2 0 F 2 D 0 0 F OR – minimization AND – summation 2 1 2 F D D 5 AND w(A, 1) = 0 6 4 2 3 0 2 2 2 4 1 0 3 0 2 2 0 1 0 1 121

AND/OR vs. OR Spaces A OR AND 0 OR B 1 54 nodes 0 AND 1 C OR B E 0 C E 1 C E AND 0 1 0 1 OR D D F F 0 1 AND 0 1 0 1 A 0 B F 1 0 0 1 1 0 1 0 1 1 1 0 0 1 0 1 0 0 0 1 0 1 1 0 D 0 1 126 nodes 0 C E 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 0 1 0 1 1 0 122 1 0 1 0 1 0 1 0 1

AND/OR vs. OR Spaces width depth 5 OR space AND/OR space Time (sec. ) Nodes Time (sec. ) AND nodes OR nodes 10 3. 15 2, 097, 150 0. 03 10, 494 5, 247 4 9 3. 13 2, 097, 150 0. 01 5, 102 2, 551 5 10 3. 12 2, 097, 150 0. 03 8, 926 4, 463 4 10 3. 12 2, 097, 150 0. 02 7, 806 3, 903 5 13 3. 11 2, 097, 150 0. 10 36, 510 18, 255 Random graphs with 20 nodes, 20 edges and 2 values per node 123

Complexity of AND/OR Tree Search AND/OR tree Space O(n) Time O(n dt) O(n dw* log n) O(dn) (Freuder & Quinn 85), (Collin, Dechter & Katz 91), (Bayardo & Miranker 95), (Darwiche 01) d = domain size t = depth of pseudo tree n = number of variables w*= treewidth 124

Constructing Pseudo Trees n n n AND/OR search algorithms are influenced by the quality of the pseudo tree Finding the minimal induced width / depth pseudo tree is NP hard Heuristics n n Min Fill (min induced width) Hypergraph partitioning (min depth) 125

Constructing Pseudo Trees n Min-Fill n n n (Kjaerulff 90) Depth first traversal of the induced graph obtained along the min fill elimination order Variables ordered according to the smallest “fill set” Hypergraph Partitioning n n n (Karypis &Kumar 00) Functions are vertices in the hypergraph and variables are hyperedges Recursive decomposition of the hypergraph while minimizing the separator size at each step Using state of the art software package h. Me. Ti. S 126

Quality of the Pseudo Trees Network hypergraph min-fill Network hypergraph width depth barley 7 13 7 23 diabetes 7 16 4 link 21 40 mildew 5 munin 1 width depth spot 5 47 152 39 204 77 spot 28 108 138 79 199 15 53 spot 29 16 23 14 42 9 4 13 spot 42 36 48 33 87 12 17 12 29 spot 54 12 16 11 33 munin 2 9 16 9 32 spot 404 19 26 19 42 munin 3 9 15 9 30 spot 408 47 52 35 97 munin 4 9 18 9 30 spot 503 11 20 9 39 water 11 16 10 15 spot 505 29 42 23 74 pigs 11 20 11 26 spot 507 70 122 59 160 Bayesian Networks Repository min-fill SPOT 5 Benchmarks 127

Outline n Introduction Inference Search (OR) Lower-bounds and relaxations n Exploiting problem structure in search n n n AND/OR search trees AND/OR Branch and Bound search n n n Lower bounding heuristics Dynamic variable orderings AND/OR search graphs (caching) AND/OR search for 0 1 integer programming Software & Applications 128

Classic Branch and Bound Search Upper Bound UB Lower Bound LB g(n) n LB(n) = g(n) + h(n) Prune if LB(n) ≥ UB h(n) OR Search Tree 129

Partial Solution Tree A A 0 0 B B C 0 0 0 C 0 Pseudo tree D 0 (A=0, B=0, C=0, D=0) D 1 (A=0, B=0, C=0, D=1) (A=0, B=1, C=0, D=0) (A=0, B=1, C=0, D=1) Extension(T’) – solution trees that extend T’ 130

Exact Evaluation Function A 0 0 1 1 B 0 0 1 1 C f 1(ABC) 0 2 1 5 0 3 1 5 0 9 1 3 0 7 1 2 A 0 0 1 1 B 0 0 1 1 OR 9 AND E f 3(BDE) 0 6 1 4 0 8 1 5 0 9 1 3 0 7 1 4 A 0 1 B 0 0 9 0 AND OR 9 D 0 0 1 1 0 OR AND B 0 0 1 1 0 AND OR F f 2(ABF) 0 3 1 5 0 1 1 4 0 6 1 5 2 C 2 5 0 1 1 4 D 0 4 0 3 F 5 0 3 5 1 4 E 5 E 6 4 8 5 0 1 0 1 3 C 3 5 0 1 D 0 F v(F) 0 v(D, 0) tip nodes f*(T’) = w(A, 0) + w(B, 1) + w(C, 0) + w(D, 0) + v(F) 131

Heuristic Evaluation Function A 0 0 1 1 B 0 0 1 1 C f 1(ABC) 0 2 1 5 0 3 1 5 0 9 1 3 0 7 1 2 A 0 0 1 1 B 0 0 1 1 OR 9 AND E f 3(BDE) 0 6 1 4 0 8 1 5 0 9 1 3 0 7 1 4 A 0 1 B 0 2 C 2 5 0 1 h(n) ≤ v(n) 0 9 0 AND OR 9 D 0 0 1 1 0 OR AND B 0 0 1 1 0 AND OR F f 2(ABF) 0 3 1 5 0 1 1 4 0 6 1 5 1 4 D 0 4 0 3 F 5 0 3 5 1 4 E 5 E 6 4 8 5 0 1 0 1 3 C 3 5 0 1 D F 0 h(F) = 5 0 h(D, 0) = 4 tip nodes f(T’) = w(A, 0) + w(B, 1) + w(C, 0) + w(D, 0) + h(F) = 12 ≤ f*(T’) 132

AND/OR Branch and Bound Search UB 5 OR 0 AND 5 0 11 1 OR 5 B 11 B 0 AND OR OR AND 0 ∞ C ∞ AND 0 0 5 1 4 ∞ 1 2 D 1 ∞ 3 0 4 3 E ∞ 3 4 E ∞ 4 0 1 1 0 0 1 3 C ∞ 11 2 3 0 4 3 E ∞ 3 4 E ∞ 4 1 1 D 0 0 1 0 f(T’) ≥ UB 1 133

AND/OR Branch and Bound Search (AOBB) (Marinescu & Dechter, IJCAI’ 05) n n Associate each node n with a heuristic lower bound h(n) on v(n) EXPAND (top down) n n n Evaluate f(T’) and prune search if f(T’) ≥ UB Expand the tip node n PROPAGATE (bottom up) n Update value of the parent p of n n n OR nodes: minimization AND nodes: summation 134

Heuristics for AND/OR Branch and Bound n In the AND/OR search space h(n) can be computed using any heuristic. We used: n Static Mini Bucket heuristics (Kask & Dechter, AIJ’ 01), (Marinescu & Dechter, IJCAI’ 05) n Dynamic Mini Bucket heuristics (Marinescu & Dechter, IJCAI’ 05) n Maintaining local consistency (Larrosa & Schiex, AAAI’ 03), (de Givry et al. , IJCAI’ 05) n LP relaxations (Nemhauser & Wosley, 1998) 135

Mini Bucket Heuristics n Static Mini Buckets n n Pre compiled Reduced overhead Less accurate Static variable ordering n Dynamic Mini Buckets n n Computed dynamically Higher overhead High accuracy Dynamic variable ordering 136

Bucket Elimination A h. B (A) B f(A, B) h. C (A, B) h. F (A, B) C f(B, C) h. D (A, B, C) D f(A, D) f(B, D) f(C, D) F h. E (B, C) E f(B, E) f(C, E) f(B, F) h. G (A, F) G f(A, G) f(F, G) h*(a, b, c) = h. D(a, b, c) + h. E(b, c) Ordering: (A, B, C, D, E, F, G) 137

Static Mini Bucket Heuristics MBE(3) A h. B (A) B f(A, B) h. C (B) h. D (A) C f(B, C) h. D (B, C) D f(A, D) D h. F (A, B) f(B, D) f(C, D) F h. E (B, C) E f(B, E) f(C, E) f(B, F) h. G (A, F) G f(A, G) f(F, G) mini buckets Ordering: (A, B, C, D, E, F, G) h(a, b, c) = h. D(a) + h. D(b, c) + h. E(b, c) ≤ h*(a, b, c) 138

Dynamic Mini Bucket Heuristics MBE(3) A h. B () B f(a, b) h. C () h. F () C f(b, C) h. D (C) D F h. E (C) f(a, D) f(b, D) f(C, D) E f(b, E) f(C, E) f(b, F) h. G (F) G f(a, G) f(F, G) h(a, b, c) = h. D(c) + h. E(c) = h*(a, b, c) Ordering: (A, B, C, D, E, F, G) 139

Outline n Introduction Inference Search (OR) Lower-bounds and relaxations n Exploiting problem structure in search n n n AND/OR search trees AND/OR Branch and Bound search n n n Lower bounding heuristics Dynamic variable orderings AND/OR search graphs (caching) AND/OR search for 0 1 integer programming Software & Applications 140

Dynamic Variable Orderings (Marinescu & Dechter, ECAI’ 06) n Variable ordering heuristics: n Semantic-based n Aim at shrinking the size of the search space based on context and current value assignments n n e. g. min domain, min dom/deg, min reduced cost Graph-based n Aim at maximizing the problem decomposition n e. g. pseudo tree arrangement Orthogonal forces, use one as primary and break ties based on the other 141

Partial Variable Ordering A B C C D A B F E D F Variable Groups/Chains: • {A, B} • {C, D} • {E, F} Instantiate {A, B} before {C, D} and {E, F} *{A, B} is a separator/chain E B Primal graph A C E D F Variables on chains in the pseudo tree can be instantiated dynamically, based on some semantic ordering heuristic [similar idea is exploited by BTD (Jegou & Terrioux 04)] 142

Full Dynamic Variable Ordering G E A H B F A C 0 1 B E D domains DA={0, 1} DB={0, 1, 2} DE={0, 1, 2, 3} DC=DD=DF=DG=DH=DE cost functions A 0 0 1 1 E 0 1 2 3 f(AE) 0 5 1 4 8 8 B f(AB) 0 3 1 2 0 4 1 0 2 6 8 8 A 0 0 0 1 1 1 0 5 1 0 B C E H G P 1 1 F D P 2 C H G P 1 [similar idea exploited in #SAT (Bayardo & Pehoushek 00)] F D P 2 143

Dynamic Separator Ordering A separator G E H A B C F CP B C I D E Primal graph I G H P 1 F D P 2 Constraint Propagation may create singleton variables in P 1 and P 2 (changing the problem’s structure), which in turn may yield smaller separators [similar idea exploited in SAT (Li & van Beek 04)] 144

Experiments n Benchmarks n n n Algorithms n n n Belief Networks (BN) Weighted CSPs (WCSP) AOBB Sam. Iam (BN) Superlink (Genetic linkage analysis) Toolbar (ie, DFBB+EDAC) Heuristics n n Mini Bucket heuristics (BN, WCSP) EDAC heuristics (WCSP) 145

Genetic Linkage Analysis (Fishelson&Geiger 02) pedigree (n, d) (w*, h) ped 18 (1184, 5) (21, 119) ped 25 (994, 5) (29, 53) ped 30 (1016, 5) (25, 51) ped 33 (581, 5) (26, 48) ped 39 (1272, 5) (23, 94) Superlink v. 1. 6 time 139. 06 13095. 83 322. 14 Sam. Iam v. 2. 3. 2 time 157. 05 out out MBE(i) BB+SMB(i) AOBB+SMB(i) i=12 time nodes 0. 51 0. 34 0. 31 5563. 22 63, 068, 960 0. 41 2335. 28 32, 444, 818 0. 52 MBE(i) BB+SMB(i) AOBB+SMB(i) i=16 time nodes 4. 59 270. 96 2, 555, 078 3. 20 2. 66 1811. 34 20, 275, 620 5. 28 62. 91 807, 071 8. 41 4041. 56 52, 804, 044 MBE(i) BB+SMB(i) AOBB+SMB(i) i=20 time nodes 19. 30 20. 27 7, 689 33. 42 1894. 17 11, 709, 153 24. 88 82. 25 588, 558 51. 24 76. 47 320, 279 81. 27 141. 23 407, 280 Min fill pseudo tree. Time limit 3 hours. 146

Impact of the Pseudo Tree Runtime distribution for hypergraph pseudo trees over 20 independent runs. ped 30 and ped 33 linkage networks. 147

Dynamic Variable Orderings (Bensana et al. 99) spot 5 29 42 b 54 408 b 503 n c 16 57 14 75 16 89 18 106 22 131 w* toolbar BBEDAC AOEDAC+PVO DVO+AOEDAC+DSO h 7 time 4. 56 109. 66 613. 79 545. 43 0. 83 11. 36 8 nodes 218, 846 710, 122 8, 997, 894 7, 837, 447 8, 698 92, 970 9 time 6825. 4 9 nodes 27, 698, 614 9 time 0. 31 0. 97 31. 34 9. 11 0. 06 0. 75 9 nodes 21, 939 8, 270 823, 326 90, 495 688 6, 614 10 time 151. 11 2232. 89 255. 83 152. 81 12. 09 1. 74 12 nodes 6, 215, 135 7, 598, 995 3, 260, 610 1, 984, 747 88, 079 14, 844 10 time 747. 71 13 nodes 2, 134, 472 11 time 53. 72 15 nodes 231, 480 SPOT 5 benchmark. Time limit 2 hours. 148

Summary n n New generation of depth first AND/OR Branch and Bound search Heuristics based on n n Mini Bucket approximation (static, dynamic) Local consistency (EDAC) Dynamic variable orderings Superior to state of the art solvers traversing the classic OR search space 149

Outline n Introduction Inference Search (OR) Lower-bounds and relaxations n Exploiting problem structure in search n n n AND/OR search trees AND/OR Branch and Bound search AND/OR search graphs (caching) n n AND/OR Branch and Bound with caching Best First AND/OR search for 0 1 integer programming Software 150

From Search Trees to Search Graphs n Any two nodes that root identical sub trees or sub graphs can be merged 151

From Search Trees to Search Graphs n Any two nodes that root identical sub trees or sub graphs can be merged 152

Merging Based on Context n One way of recognizing nodes that can be merged (based on graph structure) context(X) = ancestors of X in the pseudo tree that are connected to X, or to descendants of X [ ] A C D A B B F E [A] [AB] C E [BC] D F [AB] [AE] pseudo tree 153

AND/OR Search Graph [] A 0 0 1 1 [A] [AB] [BC] B 0 1 f 1 2 0 1 4 A 0 0 1 1 C 0 1 f 2 3 0 0 1 A 0 0 1 1 E 0 1 f 3 0 3 2 0 A 0 0 1 1 F 0 1 f 4 2 0 0 2 B 0 0 1 1 C 0 1 f 5 0 1 2 4 B 0 0 1 1 D 0 1 f 6 4 2 1 0 B 0 0 1 1 E 0 1 f 7 3 2 1 0 C 0 0 1 1 D 0 1 f 8 1 4 0 0 E 0 0 1 1 F 0 1 f 9 1 0 0 2 [AE] Objective function: OR A AND 0 1 OR B B AND 1 0 C OR 0 AND C E 1 1 D OR AND B 0 0 1 1 C 0 1 Value Cache table for D 0 1 0 D 1 0 C E 0 D 0 1 1 0 0 1 0 D 1 0 C E F 1 0 0 1 F 1 0 E 1 F 1 0 0 1 F 1 0 1 context minimal graph 154

How Big Is The Context? Theorem: The maximum context size for a pseudo tree is equal to the treewidth of the graph along the pseudo tree. C F [ ] G K B H E C D [C] [CK] A [CH] N [CKL] B [CHA] O J H L A [C] [CKLN] E [CHAB] P [CKO] J [CHAE] D [CEJ] M [CD] L K N M P O max context size = treewidth F [AB] G [AF] (C K H A B E J L N O D P M F G) 155

Complexity of AND/OR Graph Search AND/OR graph Space O(n dw*) O(n dpw*) Time O(n dw*) O(n dpw*) d = domain size n = number of variables w*= treewidth pw*= pathwidth w* ≤ pw* ≤ w* log n 156

All Four Search Spaces A B C D E F 0 0 1 1 0 0 0 1 A 1 0 1 1 0 0 1 0 1 B 1 0 0 1 C 1 0 0 D 1 0 1 0 01 01 0 1 0 1 01 010 1 0 1 01 0101 01 0 0 1 E 0 1 0 1 1 1 0 0 1 1 0 0 F 0 1 1 Full OR search tree Context minimal OR search graph 126 nodes 28 nodes OR A AND 0 1 OR B B AND 1 0 C OR C E 1 0 C E E AND 0 1 0 1 OR AND D D F F 0 1 0 1 0 1 Full AND/OR search tree 54 AND nodes 0 1 0 1 OR AND A 0 1 B B 1 0 C C E 1 0 C E E 0 1 0 1 D D F F 01 01 Context minimal AND/OR search graph 18 AND nodes 157

AND/OR Branch and Bound with Caching (Marinescu & Dechter, AAAI’ 06) n n Associate each node n with a heuristic lower bound h(n) on v(n) EXPAND (top down) n n n Evaluate f(T’) and prune search if f(T’) ≥ UB If not in cache, expand the tip node n PROPAGATE (bottom up) n Update value of the parent p of n n OR nodes: minimization AND nodes: summation Cache value of n, based on context 158

Backtrack with Tree Decomposition C 4 C 2 (Jegou & Terrioux, ECAI 2004) C 2 BC AB C 1 C 3 AE C 3 C 4 tree decomposition (w=2) BTD: [ ] A • AND/OR graph search (caching on separators) • Partial variable ordering (dynamic inside clusters) • Maintaining local consistency B [A] [AB] C E [BC] D F [AB] [AE] pseudo tree (w=2) 159

Backtrack with Tree Decomposition n Before the search n n Merge clusters with a separator size > p Time O(k exp(w’)), Space O(exp(p)) More freedom for variable ordering heuristics Properties n n BTD( 1) is Depth First Branch and Bound BTD(0) solves connected components independently BTD(1) exploits bi connected components BTD(s) is Backtrack with Tree Decomposition s: largest separator size 160

Basic Heuristic Search Schemes Heuristic function f(xp) computes a lower bound on the best extension of xp and can be used to guide a heuristic search algorithm. We focus on: 1. DF Branch-and-Bound Use heuristic function f(xp) to prune the depth first search tree Linear space 2. Best-First Search Always expand the node with the highest heuristic value f(xp) Needs lots of memory f L L 161

Best First Principle n n Best first search expands first the node with the best heuristic evaluation function among all node encountered so far It never expands nodes whose cost is beyond the optimal one, unlike depth first search algorithms (Dechter & Pearl, 1985) n Superior among memory intensive algorithms employing the same heuristic function 162

Best First AND/OR Search (AOBF) (Marinescu & Dechter, CPAIOR’ 07, AAAI’ 07, UAI’ 07) n n Maintains the set of best partial solution trees Top-down Step (EXPAND) n Traces down marked connectors from root n n n Expands a tip node n by generating its successors n’ Associate each successor with heuristic estimate h(n’) n n Initialize v(n’) = h(n’) Bottom-up Step (REVISE) n Updates node values v(n) n n OR nodes: minimization AND nodes: summation Marks the most promising solution tree from the root Label the nodes as SOLVED: n n n i. e. , best partial solution tree OR is SOLVED if marked child is SOLVED AND is SOLVED if all children are SOLVED Terminate when root node is SOLVED (specializes Nilsson’s AO* to solving COP) (Nilsson, 1984) 163

AOBF versus AOBB n n AOBF with the same heuristic as AOBB is likely to expand the smallest search space AOBB improves its heuristic function dynamically, whereas AOBF uses only h(n) AOBB can use far less memory by avoiding for example dead caches, whereas AOBF keeps in memory the explicated search graph AOBB is any time, whereas AOBF is not 164

Lower Bounding Heuristics n AOBF can be guided by: n Static Mini Bucket heuristics (Kask & Dechter, AIJ’ 01), (Marinescu & Dechter, IJCAI’ 05) n Dynamic Mini Bucket heuristics (Marinescu & Dechter, IJCAI’ 05) n LP Relaxations (Nemhauser & Wosley, 1988) 165

Experiments n Benchmarks n n n Algorithms n n n Belief Networks (BN) Weighted CSPs (WCSP) AOBB C – AND/OR Branch and Bound w/ caching AOBF C – Best first AND/OR Search Sam. Iam Superlink Toolbar (DFBB+EDAC), Toolbar BTD (BTD+EDAC) Heuristics n Mini Bucket heuristics 166

Genetic Linkage Analysis (Fishelson & Geiger 02) pedigree (w*, h) (n, d) ped 30 (23, 118) (1016, 5) ped 33 (37, 165) (581, 5) ped 42 (25, 76) (448, 5) Sam. Iam Superlink out 13095. 83 out 561. 31 MBE(i) BB-C+SMB(i) AOBB+SMB(i) AOBB-C+SMB(i) AOBF-C+SMB(i) i=12 i=14 i=16 i=18 time nodes 0. 42 0. 83 1. 78 5. 75 214. 10 1, 379, 131 10212. 70 93, 233, 570 8858. 22 82, 552, 957 34. 19 193, 436 out 30. 39 72, 798 0. 58 2. 31 7. 84 33. 44 2804. 61 34, 229, 495 737. 96 9, 114, 411 3896. 98 50, 072, 988 159. 50 1, 647, 488 1426. 99 11, 349, 475 307. 39 2, 504, 020 1823. 43 14, 925, 943 86. 17 453, 987 out 140. 61 407, 387 out 74. 86 134, 068 4. 20 31. 33 96. 28 out 2364. 67 22, 595, 247 out 133. 19 93, 831 Min fill pseudo tree. Time limit 3 hours. 167

Mastermind Games mastermind (w*, h) (n, r, k) mm-04 -08 -04 (39, 103) (2616, 3, 2) mm-03 -08 -05 (41, 111) (3692, 3, 2) mm-10 -08 -03 (51, 132) (2606, 3, 2) MBE(i) BB-C+SMB(i) AOBB+SMB(i) AOBB-C+SMB(i) AOBF-C+SMB(i) i=12 i=14 i=16 i=18 time nodes 1. 36 2. 08 4. 86 16. 53 494. 50 744, 993 270. 60 447, 464 506. 74 798, 507 80. 86 107, 463 114. 02 82, 070 66. 84 61, 328 93. 50 79, 555 30. 80 13, 924 38. 55 33, 069 29. 19 26, 729 44. 95 38, 989 20. 64 3, 957 2. 34 8. 52 8. 31 24. 94 1084. 48 1, 122, 008 117. 39 55, 033 out 473. 07 199, 725 36. 99 8, 297 1. 64 3. 09 7. 55 21. 08 161. 35 290, 594 99. 09 326, 662 89. 06 151, 128 84. 16 127, 130 19. 86 14, 518 19. 47 14, 739 22. 34 13, 557 29. 80 9, 388 4. 80 3, 705 8. 16 4, 501 11. 17 3, 622 24. 67 3, 619 Min fill pseudo trees. Time limit 1 hour. toolbar, toolbar BTD were not able to solve any instance. 168

n CELAR SCEN 06 n=100, d=44, m=350, optimum=3389 n CELAR SCEN 07 r n=162, d=44, m=764, optimum=343592 169

toulbar 2 v 0. 8 running on a 2. 6 GHz computer with 32 GB CELAR (Sanchez et al. , IJCAI 2009) n Maximum Cardinality Search tree decomposition heuristic Root selection: largest (SCEN 06) / most costly (SCEN 07) cluster Last conflict variable ordering and dichotomic branching n Closed 1 open problem by exploiting tree decompostion and EDAC n n CELAR n d m k p w DFBB BTD RDSBTD SCEN-06 100 44 350 ∞ ∞ 11 2588 sec. 221 sec. 316 sec. SCEN-07 r 162 44 764 354008 3 53 > 50 days 6 days 4. 5 days 170

Summary n n n New memory intensive AND/OR search algorithms for optimization in graphical models Depth first and best first control strategies Superior to state of the art OR and AND/OR Branch and Bound tree search algorithms 171

Outline n Introduction Inference Search (OR) Lower-bounds and relaxations n Exploiting problem structure in search n n n n AND/OR search spaces (tree, graph) Searching the AND/OR space AND/OR search for 0 1 integer programming Software 172

0 1 Integer Linear Programming Applications: § VLSI circuit design § Scheduling § Routing § Combinatorial auctions § Facility location § … C A F D B E primal graph 173

AND/OR Search Tree Primal graph Pseudo tree A OR AND 0 1 OR B B 0 AND 1 C OR AND 0 OR D AND 1 C E 1 0 0 1 0 F F D 010101 C E 1 1 0 F F D 0101 1 C E 1 0 F F D 0 0101 E 1 0 1 F 01 174

Weighted AND/OR Search Tree w(A, 0) = 0 OR AND 0 OR w(A, 1) = 7 A B AND 1 0 1 OR C -1 = (5 -6) 5 = min(5, ) E 0 AND 0 5 OR D 0 -6 = min(0, -6) -6 5 5 AND 1 1 0 1 F 0 0 = min(0, 8) 0 8 0 Node Value (bottom up) OR – minimization AND – summation 1 175

AND/OR Search Graph A OR AND 0 1 OR B B 1 0 AND OR C AND 0 0 1 OR D F F [A] E C 1 0 E C E [BA] [CB] [D] [EA] [F] AND 0 1 16 nodes (graph) vs. 54 nodes (tree) 176

Experiments n Algorithms n n n Benchmarks n n n AOBB, AOBF – tree search AOBB+PVO, AOBF+PVO – tree search AOBB C, AOBF C – graph search lp_solve 5. 5, CPLEX 11. 0, toolbar (DFBB+EDAC) Combinatorial auctions MAX SAT instances Implementation n n LP relaxation solved by lp_solve 5. 5 library BB (lp_solve) baseline solver 177

Combinatorial Auctions Combinatorial auctions from regions-upv distribution with 100 goods and increasing number of bids. Time limit 1 hour. Very large treewidth [68, 184] 178

Combinatorial Auctions Combinatorial auctions from regions-upv distribution with 100 goods and increasing number of bids. Time limit 1 hour. Very large treewidth [68, 184] 179

MAX SAT Instances (pret) Tree search Graph search pret (w*, h) BB CPLEX time AOBB AOBF nodes time AOBB+PVO AOBF+PVO nodes time nodes AOBB-C AOBF-C time nodes pret 60 -40 676. 94 3, 926, 422 7. 88 7. 56 1, 255 1, 202 8. 41 8. 70 1, 216 1, 326 7. 38 3. 58 1, 216 568 pret 60 -60 535. 05 2, 963, 435 8. 56 8. 08 1, 259 1, 184 8. 70 8. 31 1, 247 1, 206 7. 30 3. 56 1, 140 538 pret 60 -75 402. 53 2, 005, 738 6. 97 7. 38 1, 124 1, 145 6. 80 8. 42 1, 089 1, 149 6. 34 3. 08 1, 067 506 pret 150 -40 out 95. 11 101. 78 6, 625 6, 535 108. 84 101. 97 7, 152 6, 246 75. 19 19. 70 5, 625 1, 379 pret 150 -60 out 98. 88 106. 36 6, 851 6, 723 112. 64 102. 28 7, 347 6, 375 78. 25 19. 75 5, 813 1, 393 pret 150 -75 out 108. 14 98. 95 7, 311 6, 282 115. 16 103. 03 7, 452 6, 394 84. 97 20. 95 6, 114 1, 430 (6, 13) (6, 15) pret MAX SAT instances. Time limit 10 hours. BB solver could not solve any instance. 180

Summary n n New AND/OR search algorithms for 0 1 Integer Programming Dynamic variable orderings Superior to baseline OR Branch and Bound from the lp_solve library Outperform CPLEX on selected MAX SAT instances (e. g. , pret, dubois) 181

Algorithms for AND/OR Space n Back-jumping for CSPs (Gaschnig 1977), (Dechter 1990), (Prosser, Bayardo & Mirankar, 1995) n Pseudo-search re-arrangement, for any CSP task (Freuder & Quinn 1985) n Pseudo-tree search for soft constraints (Larrosa, Meseguer & Sanchez, 2002) n Recursive Conditioning (Darwiche, 2001), explores the AND/OR tree or graph for any query n BTD: Searching tree-decompositions for optimization (Jeagou & Terrioux, 2004) n Value Elimination (Bacchus, Dalmao & Pittasi, 2003) 182

Outline n n n Introduction Inference Search (OR) Lower-bounds and relaxations Exploiting problem structure in search Software n aolib and toulbar 2 software packages n Results from UAI 06, CP 06 and UAI 08 solver competitions 183

Software n Reports on competitions n UAI’ 06 Inference Evaluation n n CP’ 06 Competition n 686 2 ary MAX CSP instances 135 n ary MAX CSP instances CP’ 08 Competition n 57 MPE instances 534 2 ary MAX CSP instances 278 n ary MAX CSP instances UAI’ 08 Competition n 480 MPE instances 184

Toulbar 2 and aolib n toulbar 2 http: //mulcyber. toulouse. inra. fr/gf/project/toulbar 2 (Open source WCSP, MPE solver in C++) n aolib http: //graphmod. ics. uci. edu/group/Software (WCSP, MPE, ILP solver in C++, inference and counting) n Large set of benchmarks http: //carlit. toulouse. inra. fr/cgi bin/awki. cgi/Soft. CSP http: //graphmod. ics. uci. edu/group/Repository 185

UAI’ 06 Competitors n Team 1 n UCLA n n Team 2 n IET n n David Allen, Mark Chavira, Arthur Choi, Adnan Darwiche Masami Takikawa, Hans Dettmar, Francis Fung, Rick Kissh Team 5 (ours) n UCI n n Radu Marinescu, Robert Mateescu, Rina Dechter Used AOBB-C+SMB(i) solver for MPE 186

UAI’ 06 Results Rank Proportions (how often was each team a particular rank, rank 1 is best) 187

CP’ 06 Competitors n Solvers n n n n Abscon. Max (ie, DFBB+MRDAC) aolibdvo (ie, AOBB+EDAC+DVO solver) aolibpvo (ie, AOBB+EDAC+PVO solver) CSP 4 J Max. CSP Toolbar (ie, DFBB+EDAC) Toolbar_BTD (ie, BTD+EDAC+VE) Toolbar_Max. SAT (ie, DPLL+specific EPT rules) Toulbar 2 (ie, DFBB+EDAC+VE+LDS) 188

CP’ 06 Results Overall ranking on all selected competition benchmarks 4 5 2 1 3 The longest dark green bar wins 189

UAI’ 08 Competition n AOBB-C+SMB(i) – (i = 18, 20, 22) n n AOBF-C+SMB(i) – (i = 18, 20, 22) n n AND/OR Best First search with pre compiled mini bucket heuristics (i bound), full caching, static pseudo trees, no constraint propagation Toulbar 2 n n AND/OR Branch and Bound with pre compiled mini bucket heuristics (i bound), full caching, static pseudo trees, constraint propagation OR Branch and Bound, dynamic variable/value orderings, EDAC consistency for binary and ternary cost functions, variable elimination of small degree (2) during search Toulbar 2/BTD n DFBB exploiting a tree decomposition (AND/OR), same search inside clusters as toulbar 2, full caching (no cluster merging), combines RDS and EDAC, and caching lower bounds 190

UAI’ 08 Competition Results 191

UAI’ 08 Competition Results (II) 192

UAI’ 08 Competition Results (III) 193

CP’ 08 Competitors n Solvers n n Abscon. Max (ie, DFBB+MRDAC) CSP 4 J Max. CSP Sugar (SAT based solver) Toulbar 2 (ie, BTD+EDAC+VE) 194

CP’ 08 Results 1 195

CP’ 08 Results 1 196

Conclusions n 1. Only a few principles: Inference and search should be combined time space trade off 2. AND/OR search should be used 3. Caching in search should be used 197