Скачать презентацию Computational Intelligence a Possible Solution for Unsolvable Скачать презентацию Computational Intelligence a Possible Solution for Unsolvable

3fa44f7f986ac28b474d8ca3508c99f5.ppt

  • Количество слайдов: 120

Computational Intelligence – a Possible Solution for Unsolvable Problems Annamária R. Várkonyi-Kóczy Dept. of Computational Intelligence – a Possible Solution for Unsolvable Problems Annamária R. Várkonyi-Kóczy Dept. of Measurement and Information Systems, Budapest University of Technology and Economics [email protected] bme. hu

Contents • • Motivation: Why do we need something ”non-classical”? What is Computational Intelligence? Contents • • Motivation: Why do we need something ”non-classical”? What is Computational Intelligence? How CI works? About some of the methods of CI – – Fuzzy Logic Neural Networks Genetic Algorithms Anytime Techniques • Engineering view: Practical issues • Conclusions – Is CI really solution for unsolvable problems? 03. 10. 2006 Tokyo Institute of Technology 2

Motivation: Why do we need something ”non-classical”? • Nonlinearity, never unseen spatial and temporal Motivation: Why do we need something ”non-classical”? • Nonlinearity, never unseen spatial and temporal complexity of systems and tasks • Imprecise, uncertain, insufficient, ambiguous, contradictory information, lack of knowledge • Finite resources Strict time requirements (real-time processing) • Need for optimization + • User’s comfort New challanges/more complex tasks to be solved more sophisticated solutions needed 3

Never unseen spatial and temporal complexity of systems and tasks How can we drive Never unseen spatial and temporal complexity of systems and tasks How can we drive in heavy traffic? Many components, very complex system. Can classical or even AI systems solve it? Not, as far as we know. But WE, humans can. And we would like to build MACHINES to be able to do the same. Our car, save fuel, save time, etc. 4

Never unseen spatial and temporal complexity of systems and tasks Help: • Increased computer Never unseen spatial and temporal complexity of systems and tasks Help: • Increased computer facilities • Model integrated computing • New modeling techniques • Approximative computing • Hybrid systems 5

Imprecise, uncertain, insufficient, ambiguous, contradictory information, lack of knowledge • How can I get Imprecise, uncertain, insufficient, ambiguous, contradictory information, lack of knowledge • How can I get to Shibuya? (Person 1: Turn right at the lamp, than straight ahead till the 3 rd corner, than right again. . . NO: better turn to the left) (Person 2: Turn right at the lamp, than straight ahead till appr. the 6 th corner. . . than I don’t know) (Person 3: It is in this direction → somewhere. . . ) • It is raining • The traffic light is out of order • I don’t know in which building do we have the special lecture (in Building III or. . . )? And at what time? ? (Does it start at 3 p. m. or at 2 p. m? And: on the 3 rd or 4 th of October? ) • When do I have to start from home at at what time? Who (a person or computer) can show me an algorithm to find an 6 OPTIMUM solution?

Imprecise, uncertain, insufficient, ambiguous, contradictory information, lack of knowledge Help: • Intelligent and soft Imprecise, uncertain, insufficient, ambiguous, contradictory information, lack of knowledge Help: • Intelligent and soft computing techniques being able to handle the problems • New data acquisition and representation techniques • Adaptivity, robustness, ability to learn 7

Finite resources Strict time requirements (real-time processing) • It is 10. 15 a. m. Finite resources Strict time requirements (real-time processing) • It is 10. 15 a. m. My lecture starts at 3 p. m. (hopefully the information is correct) • I am still not finished with my homework • I have run out of the fuel and I don’t have enough money for a taxi • I am very hungry • I have promised my Professor to help him to prepare some demo in the Lab this morning I can not fulfill everything with maximum preciseness 8

Finite resources Strict time requirements (real-time processing) Help: • Low complexity methods • Flexible Finite resources Strict time requirements (real-time processing) Help: • Low complexity methods • Flexible systems • Approximative methods • Results for qualitative evaluations & for supporting decisions • Anytime techniques 9

Need for optimization • Traditionally: optimization = precision • New definition: optimization = cost Need for optimization • Traditionally: optimization = precision • New definition: optimization = cost optimization • But what is cost!? presition and certainty also carry a cost 10

Need for optimization Let’s look ”TIME” as a resource: • The most important thing Need for optimization Let’s look ”TIME” as a resource: • The most important thing is to go the Lab and help my Professor (He is my Professor and I have promised it). I will spend there as needed, min. 3 hours • I have to submit the homework, but I will work in the Lab. , i. e. today I will prepare an ”average” and not a ”maximum” level homework (1 hour) • I don’t have time to eat at home, I will buy a bento at the station (5 minutes) • The train is more expensive then the bus but takes much less time, i. e. I will go by train (40 minutes) 11

User’s comfort • I have to ask the way to the university but unfortunately, User’s comfort • I have to ask the way to the university but unfortunately, I don’t speak Japanese • Next time I also want to find my way • Today it took one and a half hour to get here. How about tomorrow? • It would be good get more help • . . 12

User’s comfort Help: • Modeling methods and representation techniques making possible to – handle User’s comfort Help: • Modeling methods and representation techniques making possible to – handle – interprete – predict – improve – optimise the system and – give more and more support in the processing 13

User’s comfort Human language Modularity, simplicity, hierarchical structures Aims of the processing aims of User’s comfort Human language Modularity, simplicity, hierarchical structures Aims of the processing aims of preprocessing improving the performance of the algorithms giving more support to the processing (new) image processing / computer vision: preprocessing noise smoothing feature extraction (edge, corner detection) pattern recognition, etc. 3 D modeling, medical diagnostics, etc. automatic 3 D modeling, automatic. . . 14

The most important elements of the solution • Low complexity, approximative modeling • Application The most important elements of the solution • Low complexity, approximative modeling • Application of adaptive and robust techniques • Definition and application of the proper cost function including the hierarchy and measure of importance of the elements • Trade-off between accuracy (granularity) and complexity (computational time and resource need) • Giving support for the further processing These do not cope with traditional and AI methods. But how about the new approaches, about COMPUTATIONAL INTELLIGENCE? 15

What is Computational Intelligence? Computer + Increased computer facilities Intelligence Added by the new What is Computational Intelligence? Computer + Increased computer facilities Intelligence Added by the new methods L. A. Zadeh, Fuzzy Sets [1965]: “In traditional – hard – computing, the prime desiderata are precision, certainty, and rigor. By contrast, the point of departure of soft computing is thesis that precision and certainty carry a cost and that computation, reasoning, and decision making should exploit – whenever possible – the tolerance for imprecision and uncertainty. ” 16

What is Computational Intelligence? • CI can be viewed as a corsortium of methodologies What is Computational Intelligence? • CI can be viewed as a corsortium of methodologies which play important role in conception, design, and utilization of information/intelligent systems. • The principal members of the consortium are: fuzzy logic (FL), neuro computing (NC), evalutionary computing (EC), anytime computing (AC), probabilistic computing (PC), chaotic computing (CC), and (parts of) machine learning (ML). • The methodologies are complementary and synergistic, rather than competitive. • What is common: Exploit the tolerance for imprecision, uncertainty, and partial truth to achieve tractability, robustness, low solution cost and better rapport with reality. 17

Computational Intelligence fulfill all of the five requirements: (Low complexity, approximative modeling application of Computational Intelligence fulfill all of the five requirements: (Low complexity, approximative modeling application of adaptive and robust techniques Definition and application of the proper cost function including the hierarchy and measure of importance of the elements Trade-off between accuracy (granularity) and complexity (computational time and resource need) Giving support for the further processing) 18

How CI works? 1. Knowledge • • Information acquisition (observation) Information processing (numeric, symbolic) How CI works? 1. Knowledge • • Information acquisition (observation) Information processing (numeric, symbolic) Storage and retrieval of the information Search for a ”structure” (algorithm for the nonalgorithmizable processing) Certain knowledge (can be obtained by formal methods): closed, open world: ABSTRACT WORLDS) Uncertain knowledge (by cognitive methods) (ARTIFICIAL and REAL WORLDS) Lack of knowledge Knowledge representation 19

How CI works? 1. Knowledge • In real life nearly everything is optimization • How CI works? 1. Knowledge • In real life nearly everything is optimization • (Ex. 1. Determination of the velocity = Calculation of the optimum estimation of the velocity from the measured time and done distance) • Ex. 2. Determination of the resistance = the optimum estimation of the resistance with the help of the measured intensity of current and voltage • Ex. 3. Analysis of a measurement result = the optimum estimation of the measured quantity in the kowledge of the conditions of the measurement and the measured data) • Ex. 4. Daily time-table • Ex. 5. Optimum route between two towns In Ex. 1 -3 the criteria of the optimization is unambiguos and easily can be given Ex. 4 -5 are also simple tasks but the criteria is not unambiguos 20

Optimum route: What is optimum? (Subjective, depending on the requirements, taste, limits of the Optimum route: What is optimum? (Subjective, depending on the requirements, taste, limits of the person) - We prefer/are able to travel by aeroplane, train, car, . . . Let’s say car is selected: - the shortest route (min petrol need), the quickest route (motorway), the most beautiful route with sights (whenever it is possible I never miss the view of the Fuji-san. . . ), where by best restaurants are located, where I can visit my friends, . . . OK, let’s fix the preferences of a certain person: -But is it summer or winter, is it sunshine or raining, how about the road reconstructions, . . By going into the details we get nearer and nearer to the solution Knowledge is needed for the determination of a good descriptive model of the circumstances and goals 21 But do we know what kind of wheather will be in two months?

2. Model • Known model e. g. analithic model (given by differential equations) - 2. Model • Known model e. g. analithic model (given by differential equations) - too complex to be handled • Lack of knowledge - the information about the system is uncertain or imperfect We need new, more precise knowledge The knowledge representation (model) should be handable and should tolerate the problems 22

Learning and Modeling New knowledge by learning: Unknown, partially unknown, known but too complex Learning and Modeling New knowledge by learning: Unknown, partially unknown, known but too complex to be handled, ill-defined systems Model by which we can be analyze the system and can predict the behavior of the system + Criteria (quality measure) for the validity of the model 23

Input u Unknown system d Criteria Model c Measure of the quality of the Input u Unknown system d Criteria Model c Measure of the quality of the model y Parameter tuning 1. Observation (u, d, y), 2. Knowledge representation (model, formalism), 3. Decision (optimizasion, c(d, y)), 4. Tuning (of the parameters), 5. Environmental influence, (non-observed input, noise, etc. ) 6. Prediction ability (for the future input) 24

Iterative procedure: We build a system for collecting information We improve the system by Iterative procedure: We build a system for collecting information We improve the system by building in the knowledge We collect the information We improve the observation and collect more information 25

Problem Knowledge representation, Model Non-represented part of the problem Represented knowledge Independant space, coupled Problem Knowledge representation, Model Non-represented part of the problem Represented knowledge Independant space, coupled to the problem by the formalism 26

3. Optimization • • Valid where the model is valid Given a system with 3. Optimization • • Valid where the model is valid Given a system with free parameters Given an objective measure The task is to set the parameters which mimimize or maximize the qualitative measure • Systematic and random methods • Exploitation (of the deterministic knowledge) and exploration (of new knowledge) 27

Methods of Computational Intelligence • fuzzy logic –low complexity, easy build in of the Methods of Computational Intelligence • fuzzy logic –low complexity, easy build in of the a priori knowledge into computers, tolerance for imprecision, interpretability • neuro computing - learning ability • evalutionary computing – optimization, optimum learning • anytime computing – robustness, flexibility, adaptivity, coping with the temporal circumstances • probabilistic reasoning – uncertainty, logic • chaotic computing – open mind • machine learning - intelligence 28

Fuzzy Logic • • • Lotfi Zadeh, 1965 Knowledge representation in natural language ”computing Fuzzy Logic • • • Lotfi Zadeh, 1965 Knowledge representation in natural language ”computing with words” Perceptions Value imprecisiation meaning precisiation 29

History of fuzzy theory • • Fuzzy sets & logic: Zadeh 1964/1965 Fuzzy algorithm: History of fuzzy theory • • Fuzzy sets & logic: Zadeh 1964/1965 Fuzzy algorithm: Zadeh 1968 -(1973)Fuzzy control by linguistic rules: Mamdani & Al. ~1975 Industrial applications: Japan 1987 - (Fuzzy boom), Korea Home electronics Vehicle control Process control Pattern recognition & image processing Expert systems Military systems (USA ~1990 -) Space research • Applications to very complex control problems: Japan 1991 e. g. helicopter autopilot 30

Areas in which Fuzzy Logic was succesfully used: • • Modeling and control Classification Areas in which Fuzzy Logic was succesfully used: • • Modeling and control Classification and pattern recognition Databases Expert Systems (Fuzzy) hardware Signal and image processing Etc. 31

 • Universe of discourse: Cartesian (direct) product of all the possible values of • Universe of discourse: Cartesian (direct) product of all the possible values of each of the descriptors • Linguistic variable (linguistic term) [Zadeh]: ”By a linguistic variable we mean a variable whose values are words or sentences in a natural or artificial language. For example, Age is a linguistic variable if its values are linguistic rather than numerical, i. e. , young, not young, very young, quite young, old, not very old and not very young, etc. , rather than 20, 21, 22, 23, . . . ” • Fuzzy set: It represents a property of the linguistic variable. A degree of includance is associated to each of the possible values of the linguistic variable (characteristic function) • Membership value: The degree of belonging into the set. 32

An Example • A class of students (e. g. M. Sc. Students taking • An Example • A class of students (e. g. M. Sc. Students taking • the Spec. Course „Computational Intelligence”) • The universe of discourse: X • “Who does have a driver’s license? ” • A subset of X = A (Crisp) Set • (X) = CHARACTERISTIC FUNCTION 1 0 1 1 • “Who can drive very well? ” (X) = MEMBERSHIP FUNCTION 0. 7 0 1. 0 0. 8 0 0. 4 0. 2 FUZZY SET 33

Definitions • Crisp set: a A b A • Convex set: c • • Definitions • Crisp set: a A b A • Convex set: c • • d • a • x • y B Crisp set A A is not convex as a A, c A, but d= a+(1 - )c A, [0, 1]. B is convex as for every x, y B and [0, 1] z= x+(1 - )y B. • Subset: • x • y A • b B If x A then also x B. A B 34

Definitions • Relative complement or difference: A–B={x | x A and x B} B={1, Definitions • Relative complement or difference: A–B={x | x A and x B} B={1, 3, 4, 5}, A–B={2, 6}. C={1, 3, 4, 5, 7, 8}, A–C={2, 6}! • Complement: where X is the universe. Complementation is involutive: Basic properties: • Union: A B={x | x A or x B} For (Law of excluded middle) 35

Definitions • Intersection: A B={x | x A and x B}. For (Law of Definitions • Intersection: A B={x | x A and x B}. For (Law of contradiction) • More properties: Commutativity: Associativity: Idempotence: Distributivity: A B=B A, A B=B A. A B C=(A B) C=A (B C), A B C=(A B) C=A (B C). A A=A, A A=A. A (B C)=(A B) (A C), A (B C)=(A B) (A C). 36

Membership function Crisp set Fuzzy set Characteristic function Membership function A: X {0, 1} Membership function Crisp set Fuzzy set Characteristic function Membership function A: X {0, 1} A: X [0, 1] 37

Some basic concepts of fuzzy sets Elements Infant Adult Young Old 5 0 0 Some basic concepts of fuzzy sets Elements Infant Adult Young Old 5 0 0 10 0 0 1 0 20 0 . 8 . 1 30 0 1 . 5 . 2 40 0 1 . 2 . 4 50 0 1 . 6 60 0 1 0 . 8 70 0 1 80 0 1 38

Some basic concepts of fuzzy sets • Support: supp(A)={x | A(x)>0}. supp: Infant=0, so Some basic concepts of fuzzy sets • Support: supp(A)={x | A(x)>0}. supp: Infant=0, so supp(Infant)=0. If |supp(A)|< , A can be defined A= 1/x 1+ 2/x 2+…+ n/xn. • Kernel (Nucleus, Core): Kernel(A)={x | A(x)=1}. 39

Definitions • Height: – – • height(old)=1 height(infant)=0 If height(A)=1 A is normal If Definitions • Height: – – • height(old)=1 height(infant)=0 If height(A)=1 A is normal If height(A)<1 A is subnormal height(0)=0 (If height(A)=1 then supp(A)=0) • a-cut: Strong Cut: • – Kernel: – Support: • If A is subnormal, Kernel(A)=0 – 40

Definitions • Fuzzy set operations defined by L. A. Zadeh in 1964/1965 • Complement: Definitions • Fuzzy set operations defined by L. A. Zadeh in 1964/1965 • Complement: • Intersection: • Union: (x): 41

Definitions This is really a generalization of crisp set op’s! A B A 0 Definitions This is really a generalization of crisp set op’s! A B A 0 0 1 1 1 0 0 A B 0 0 0 1 1 1 1 - A min max 1 1 0 0 0 1 1 1 42

Fuzzy Proportion • Fuzzy proportion: X is P ‘Tina is young’, where: ‘Tina’: Crispage, Fuzzy Proportion • Fuzzy proportion: X is P ‘Tina is young’, where: ‘Tina’: Crispage, ‘young’: fuzzy predicate. Fuzzy sets expressing linguistic terms for ages Truth claims – Fuzzy sets over [0, 1] • Fuzzy logic based approximate reasoning is most important for applications! 43

 CRISP RELATION : SOME INTERACTION OR ASSOCIATION BETWEEN ELEMENTS OF TWO OR MORE CRISP RELATION : SOME INTERACTION OR ASSOCIATION BETWEEN ELEMENTS OF TWO OR MORE SETS. FUZZY RELATION : VARIOUS DEGREES OF ASSOCIATION CAN BE REPRESENTED A B A B 0. 5 0. 8 1 CRISP RELATION 0. 9 0. 6 FUZZY RELATION CR FR CARTESIAN (DIRECT) PRODUCT OF TWO (OR MORE) SETS X, Y X Y = { (x, y) x X, y Y } X Y Y X IF X Y ! MORE GENERALLY: 1 x = { (x , …, x ) x X , i N i= n i 1 2 n i i n } 44

Fuzzy Logic Control • Fuzzification: converts the numerical value to a fuzzy one; determines Fuzzy Logic Control • Fuzzification: converts the numerical value to a fuzzy one; determines the degree of matching • Defuzzification converts the fuzzy term to a classical numerical value • The knowledge base contains the fuzzy rules • The inference engine describes the methodology to compute the output from the input 45

Fuzzyfication μ 1 8, 4 X The measured (crisp) value is converted to a Fuzzyfication μ 1 8, 4 X The measured (crisp) value is converted to a fuzzy set containing one element with membership value=1 μ(x) = 1 0 if x=8, 4 otherwise 46

Defuzzification Center of Gravity Method (COG) 47 Defuzzification Center of Gravity Method (COG) 47

Specificity of fuzzy partitions Fuzzy Partition A containing three linguistic terms Fuzzy Partition A* Specificity of fuzzy partitions Fuzzy Partition A containing three linguistic terms Fuzzy Partition A* containing seven linguistic terms 48

Fuzzy inference mechanism (Mamdani) • If x 1 = A 1, i and x Fuzzy inference mechanism (Mamdani) • If x 1 = A 1, i and x 2 = A 2, i and. . . and xn = An, i then y = Bi The weighting factor wji characterizes, how far the input xj corresponds to the rule antecedent fuzzy set Aj, i in one dimension The weighting factor wi characterizes, how far the input x fulfils to the antecedents of the rule Ri. 49

Conclusion The conclusion of rule Ri for a given x observation is yi 50 Conclusion The conclusion of rule Ri for a given x observation is yi 50

Fuzzy Inference • Mamdani Type 51 Fuzzy Inference • Mamdani Type 51

Fuzzy systems: an example TEMPERATURE MOTOR_SPEED Fuzzy systems operate on fuzzy rules: IF temperature Fuzzy systems: an example TEMPERATURE MOTOR_SPEED Fuzzy systems operate on fuzzy rules: IF temperature is COLD THEN motor_speed is LOW IF temperature is WARM THEN motor_speed is MEDIUM IF temperature is HOT THEN motor_speed is HIGH 52

Inference mechanism (Mamdani) Temperature = 55 Motor Speed RULE 1 RULE 2 RULE 3 Inference mechanism (Mamdani) Temperature = 55 Motor Speed RULE 1 RULE 2 RULE 3 Motor Speed = 43. 6 53

Planning of Fuzzy Controllers Determination of fuzzy controllers = determination of the antecedents + Planning of Fuzzy Controllers Determination of fuzzy controllers = determination of the antecedents + consequents of the rules • Antecedents: – Selection of the input dimensions – Determination of the fuzzy partitions for the inputs – Determination of the parameters for the fuzzy variables • Consequents: – Determination of the parameters 54

Fuzzy-controlled Washing Machine (Aptronix Examples) • Objective Design a washing machine controller, which gives Fuzzy-controlled Washing Machine (Aptronix Examples) • Objective Design a washing machine controller, which gives the correct wash time even though a precise model of the input/output relationship is not available • Inputs: Dirtyness, type of dirt • Output: Wash time 55

Fuzzy-controlled Washing Machine • Rules for our washing machine controller are derived from common Fuzzy-controlled Washing Machine • Rules for our washing machine controller are derived from common sense data taken from typical home use, and experimentation in a controlled environment. A typical intuitive rule is as follows: If saturation time is long and transparency is bad, then wash time should be long. 56

Air Conditioning Temperature Control • • There is a sensor in the room to Air Conditioning Temperature Control • • There is a sensor in the room to monitor temperature for feedback control, and there are two control elements, cooling valve and heating valve, to adjust the air supply temperature to the room. Temperature control has several unfavorable features: non-linearity, interference, dead time, and external disturbances, etc. Conventional approaches usually do not result in satisfactory temperature control. Rules for this controller may be formulated using statements similar to: If temperature is low then open heating valve greatly 57

Air Conditioning Temperature Control – Modified Model • There are two sensors in the Air Conditioning Temperature Control – Modified Model • There are two sensors in the modified system: one to monitor temperature and one to monitor humidity. There are three control elements: cooling valve, heating valve, and humidifying valve, to adjust temperature and humidity of the air supply. Rules for this controller can be formulated by adding rules for humidity control to the basic model. If temperature is low then open humidifying valve slightly. This rule acts as a predictor of humidity (it leads the humidity value) and is also designed to prevent overshoot in the output humidity curve. 58

Smart Cars 1 - Rules The number of rules depends on the problem. We Smart Cars 1 - Rules The number of rules depends on the problem. We shall consider only two for the simplicity of the example: Rule 1: If the distance between two cars is short and the speed of your car is high(er than the other one’s), then brake hard. Rule 2: If the distance between two cars is moderately long and the speed of your car is high(er than the other one’s), then brake moderately hard. 59

Smart Cars 2 – Membership Functions – Determine the membership functions for the antecedent Smart Cars 2 – Membership Functions – Determine the membership functions for the antecedent and consequent blocks – Most frequently 3, 5 or 7 fuzzy sets are used (3 for crude control, 5 and 7 for finer control results) – Typical shapes (triangular – most frequent) 60

Smart Cars 3 – Simplify Rules using Codes – Distance between two cars: X Smart Cars 3 – Simplify Rules using Codes – Distance between two cars: X 1 speed: X 2 Breaking strength: Y Labels- small, medium, large: S, M, L PL - Positive Large PM - Positive Medium PS - Positive Small ZR - Aproximately Zero NS - Negative Small NM - Negative Medium NL - Negative Large – In the case of X 2 (speed), small, medium, and large mean the amount that this car's speed is higher than the car in front. – Rule 1: If X 1=S and X 2=M, then Y=L Rule 2: If X 1=M and X 2=L, then Y=M 61

Smart Cars 4 - Inference – Determine the degree of matching – Adjust the Smart Cars 4 - Inference – Determine the degree of matching – Adjust the consequent block – Total evaluation of the conclusions based on the rules To determine the control amount at a certain point, a defuzzifier is used (e. g. the center of gravity). In this case the center of gravity is located at a position somewhat harder than medium strength, as indicated by the arrow 62

Advantages of Fuzzy Controllers • Control design process is simpler • Design complexity reduced, Advantages of Fuzzy Controllers • Control design process is simpler • Design complexity reduced, without need for complex mathematical analysis • Code easier to write, allows detailed simulations • More robust, as tests with weight changes demonstrate • Development period reduced 63

Neural Networks • • (Mc. Cullogh & Pitts, 1943, Hebb, 1949) Rosenblatt, 1958 (Perceptrone) Neural Networks • • (Mc. Cullogh & Pitts, 1943, Hebb, 1949) Rosenblatt, 1958 (Perceptrone) Widrow-Hoff, 1960 (Adaline) It mimics the human brain 64

Neural Networks Neural Nets are parallel, distributed information processing tools which are • Highly Neural Networks Neural Nets are parallel, distributed information processing tools which are • Highly connected systems composed of identical or similar operational units evaluating local processing (processing element, neuron) usually in a well-ordered topology • Possessing some kind of learning algorithm which usually means learning by patterns and also determines the mode of the information processing • They also possess an information recall algorithm making possible the usage of the previously learned information 65

Application area where NNs are succesfully used • One and multidimentional signal processing (image Application area where NNs are succesfully used • One and multidimentional signal processing (image processing, speach processing, etc. ) • System identification and control • Robotics • Medical diagnostics • Economical features estimation 66

Application area where NNs are succesfully used • Associative memory = content addresable memory Application area where NNs are succesfully used • Associative memory = content addresable memory • Classification system (e. g. Pattern recognition, character recognition) • Optimization system (the usually feedback NN approximates the cost function) (e. g. radio frequency distribution, A/D converter, traveling sailsman problem) • Approximation system (any input-output mapping) • Nonlinear dynamic system model (e. g. Solution of partial differtial equation systems, prediction, rule learning) 67

Main features • • • Complex, non-linear input-output mapping Adaptivity, learning ability distributed architecture Main features • • • Complex, non-linear input-output mapping Adaptivity, learning ability distributed architecture fault tolerant property possibility of parallel analog or digital VLSI implementations • Analogy with neurobiology 68

The simple neuron Linear combinator with non-linear activation: 69 The simple neuron Linear combinator with non-linear activation: 69

Typical activation functions step linear sections tangens hyperbolic sygmoid 70 Typical activation functions step linear sections tangens hyperbolic sygmoid 70

Classical neural nets • Static nets (without memory, feedforward networks) – One layer – Classical neural nets • Static nets (without memory, feedforward networks) – One layer – Multi layer • MLP (Multi Layer Perceptron) • RBF (Radial Basis Function) • CMAC (Cerebellar Model Artculation Controller) • Dynamic nets (with memory or feedback recall networks) – Feedforward (with memory elements) – Feedback • Local feedback • Global feedback 71

Feedforward architectures One layer architectures: Rosenblatt perceptron 72 Feedforward architectures One layer architectures: Rosenblatt perceptron 72

Feedforward architectures One layer architectures Input Output Tunable parameters (weighting factors) 73 Feedforward architectures One layer architectures Input Output Tunable parameters (weighting factors) 73

Feedforward architectures Multilayer network (static MLP net) 74 Feedforward architectures Multilayer network (static MLP net) 74

Approximation property • universal approximation property for some kinds of NNs • Kolmogorov: Any Approximation property • universal approximation property for some kinds of NNs • Kolmogorov: Any continuous real valued N variable function defined over the [0, 1]N compact interval can be represented with the help of appropriately chosen 1 variable functions and sum operation. 75

Learning = parameter estimation • supervised learning • unsupervised learning • analytic learning 76 Learning = parameter estimation • supervised learning • unsupervised learning • analytic learning 76

Supervised learning estimation of the model parameters by x, y, d n (noise) Input Supervised learning estimation of the model parameters by x, y, d n (noise) Input x System: d=f(x, n) d Criteria: C(d, y) NN Model: y=f. M(x, w) C=C(ε) y Parameter tuning 77

Supervised learning • Criteria function – Quadratic: –. . . 78 Supervised learning • Criteria function – Quadratic: –. . . 78

 • Minimization of the criteria • Analytic solution (only if it is very • Minimization of the criteria • Analytic solution (only if it is very simple) • Iterative techniques – Gradient methods – Searching methods • Exhaustive • Random • Genetic search 79

Parameter correction • Perceptron • Gradient methods – LMS (least means square algorithm) • Parameter correction • Perceptron • Gradient methods – LMS (least means square algorithm) • . . . 80

LMS (Iterative solution based on the temporary error) • Temporary error: • Temporary gradient: LMS (Iterative solution based on the temporary error) • Temporary error: • Temporary gradient: • Weight update: 81

Gradient methods • The route of the convergence 82 Gradient methods • The route of the convergence 82

Gradient methods • Single neuron with nonlinear acticvation • Multilayer network: backpropagation (BP) 83 Gradient methods • Single neuron with nonlinear acticvation • Multilayer network: backpropagation (BP) 83

Teaching an MLP network: The Backpropagation algorithm 84 Teaching an MLP network: The Backpropagation algorithm 84

Design of MLP networks • Size of the network (number of layers, number of Design of MLP networks • Size of the network (number of layers, number of hidden neurons) • The value of the learning factor, µ • Initial values of the parameters • Validation, learning set, test set • Teaching method (sequential, batch) • Stopping criteria (error limit, number of cycles) 85

Modular networks • • Hierarchical networks Linear combination of NNs Mixture of experts Hybrid Modular networks • • Hierarchical networks Linear combination of NNs Mixture of experts Hybrid networks 86

Linear combination of networks 87 Linear combination of networks 87

Mixture of experts (MOE) Gating network experts 88 Mixture of experts (MOE) Gating network experts 88

Decomposition of complex tasks • Decomposition and learning – Decomposition before learning – Decomposition Decomposition of complex tasks • Decomposition and learning – Decomposition before learning – Decomposition during the learning (automatic task decomposition) • Problem space decomposition – Input space decomposition – Output space decomposition 89

Example: Automatic recognition of numbers (e. g. Postal code) • Binary pictures with 16 Example: Automatic recognition of numbers (e. g. Postal code) • Binary pictures with 16 x 16 pixels • Preprocessing (idea: the numbers are composed of edge segments): 4 edge detections • normalization four 8 x 8 pictures (i. e. 256 input elements • Classification by 45 independant networks, each classifying only two classes of the ten figures (1 or 2, 1 or 3, . . . , 8 or 0, 9 or 0) • The corresponding network output are connected to an AND gate, if its output equals to 1 then the figure is recognized 90

Example: Automatic recognition of handwritten figures (e. g. Postal codes) Edge detection normalization horizontal Example: Automatic recognition of handwritten figures (e. g. Postal codes) Edge detection normalization horizontal input diagonal Edge detection masks vertical diagonal / 91

Example: Automatic recognition of handwritten figures (e. g. Postal codes) 92 Example: Automatic recognition of handwritten figures (e. g. Postal codes) 92

Genetic Algorithms • John Holland, 1975 • Adaptive method for searching and optimization problems Genetic Algorithms • John Holland, 1975 • Adaptive method for searching and optimization problems • Copying the genetic processes of the biological organisms • Natural selection (Charles Darwin: The Origin of Species) • Multi points search 93

Successful applicational areas • Optimization (circuit design, scheduling) • Automatic programming • Machine learning Successful applicational areas • Optimization (circuit design, scheduling) • Automatic programming • Machine learning (classification, prediction, wheather forecast, learning of NNs) • Economical systems • Immunology • Ecology • Modeling of social systems 94

The algorithm • Initial population → parent selection → creation of new individuals (crossover, The algorithm • Initial population → parent selection → creation of new individuals (crossover, mutation) → quality measure, reproduction → new generation → exit criteria? • If no: continue with the algorithm • If yes: selection of the result, decoding • Like in biology in real word 95

Problem building • Selection of the most important features, coding • Fitness function = Problem building • Selection of the most important features, coding • Fitness function = quality measure (optimum criterium) • Exit criteria • Selection of the size of the population • Specification of the genetic operations 96

Simple genetic algorithms • Representation = features coded in a binary string (chromosome, string) Simple genetic algorithms • Representation = features coded in a binary string (chromosome, string) • Fitness function = representing the ”viability” (optimality) of the individual • Selection = selecting the parent individuals from the generation (e. g. random but fitness based, i. e. better chance with higher fittness value) 97

Simple genetic algorithms • Crossover from 2 parents two offsprings (one point, two point, Simple genetic algorithms • Crossover from 2 parents two offsprings (one point, two point, N-point, uniform) 98

Simple genetic algorithms • Mutation (of the bits (genes)) (one or independant) • Reproduction Simple genetic algorithms • Mutation (of the bits (genes)) (one or independant) • Reproduction = who will survive and form the next (new) generation – Individuals with the best fitness function • Exit: after a number of generation or depending on the fitness function of the best individual or average of the generation, . . . 99

Example for GAs Maximize the f(x)=x 2 function where x can take values between Example for GAs Maximize the f(x)=x 2 function where x can take values between 0 and 31 Let’s start with a population containing 4 elements (generated randomly by throwing a coin). Each element (string) consists of 5 bits (to be able to code numbers between 0 and 31) 100

Example for GAs number Initial x value population f(x) f(xi)/∑ f(x) ranking 1 01101 Example for GAs number Initial x value population f(x) f(xi)/∑ f(x) ranking 1 01101 13 169 0. 14 1 2 11000 24 576 0. 49 2 3 01000 8 64 0. 06 0 4 10011 19 361 0. 31 1 Sum 1170 1. 00 4 Average 293 0. 25 1 Maximum 576 0. 49 2 101

Example for GAs The pairs Sequence of the selection Position of New x value Example for GAs The pairs Sequence of the selection Position of New x value the population crossover f(x) 0 1 1 0 1 2 4 01100 12 144 1 1 0 0 0 1 4 11001 25 625 1 1 0 0 0 4 2 11011 27 729 1 0 0 1 1 3 2 10000 16 256 Sum 1754 Average 439 Maximum 729 102

Conclusions • The fitness improved significantly in the new generation (both the average and Conclusions • The fitness improved significantly in the new generation (both the average and the maximum) • Initial population: randomly chosen • Selection: 4 times by a roulette wheel where ”better” individuals had bigger sectors having bigger chance (the 3 rd (worst) string has died out!) • Pairs: the 1 -2, 3 -4 selections • Position of the crossover: randomly chosen • Mutation: bit by bit with p=0. 001 probability • (the generation contains 20 bits, in average 0. 02 bit will be mutated – in this example none) 103

Anytime Techniques – Why do we need them? • Larger scale signal processing (DSP) Anytime Techniques – Why do we need them? • Larger scale signal processing (DSP) systems, Artificial Intelligence – Limited amount of resources – Abrupt changes in… • Environment • Processing system • Computational resources (shortage) • Data flow (loss) – Processing should be continued • Low complexity lower, but possibly enough accuracy or partial results (for qualitative decisions) Anytime systems 104

Anytime Systems – What do they offer? • To handle abrupt changes due to Anytime Systems – What do they offer? • To handle abrupt changes due to failures • To fulfill prescribed response time conditions (changeable response time) • Continuos operation in case of serious shortage of necessary data (temporary overload of certain communication channels, sensor failures, etc. ) /processing time • To provide appropriate overall performance for the whole system • guaranteed response time, known error • Flexibility: available input data, available time, computational power, balance between time and quality 105 (quality: accuracy, resolution, etc…)

Anytime systems – How do they work? • Conditions: on-line computing, guaranteed response time, Anytime systems – How do they work? • Conditions: on-line computing, guaranteed response time, limited resources (changing in time) • Anytime processing: coping with the temporarily available resources to maintain the overall performance • “correct”models, treatable by the limited resources during limited time, low and changeable complexity, possibility of reallocation of the resources, changeable and guaranteed response time/ computational need, known error • tools: iterative algorithms, other types of methods used in a modular architecture 106

 • optimization of the whole system (processing chain) based on intelligent decisions (expert • optimization of the whole system (processing chain) based on intelligent decisions (expert system, shortage indicators) • algorithms and models of simpler complexity • temporarily lower accuracy • data for qualitative evaluations & for supporting decisions • coping with the temporal conditions • supporting ‘early’ decision making • preventing serious alarm situations 107

 • • • Shortage indicators Intelligent monitor Special compilation methods during runtime Strict • • • Shortage indicators Intelligent monitor Special compilation methods during runtime Strict time constraints for the monitor The number and the complexity of the executable task can be very high add-in + optimization 108

Missing input samples Temporary overload of certain communication channels, sensor failures, etc. the input Missing input samples Temporary overload of certain communication channels, sensor failures, etc. the input samples fail to arrive in time or will be lost prediction mechanism (estimations based on previous data) example: resonator based filters 109

Temporal shortage of computing power Temporary shortage of computer power the signal processing can Temporal shortage of computing power Temporary shortage of computer power the signal processing can not be performed in time Trade-off between the approximation accuracy and the complexity reduction techniques, reduction of the sampling rate, application of less accurate evaluations 110

Temporal shortage of computing power Examples: • application of lower order filters or transformers Temporal shortage of computing power Examples: • application of lower order filters or transformers (in case of recursive discrete transformers: to switch off some of the channels, obvious req. : to maintain e. g. the orthogonality of the transformations • Singular Value Decomposition applied to fuzzy models, B-spline neural networks, wavelet functions, Gabor functions, etc. - fuzzy filters, 111 human hearing system, generalized NNs

Temporal shortage of computing time Temporary shortage of computer time the signal processing can Temporal shortage of computing time Temporary shortage of computer time the signal processing can not be performed in time Examples: block-recursive filters and filter-banks overcomplete signal representations 112

Anytime algorithms – iterative methods • Evaluate 734/25! (after 1 second: appr. 30 → Anytime algorithms – iterative methods • Evaluate 734/25! (after 1 second: appr. 30 → after 5 seconds: better 29, 3 → after 8 seconds: exactly 29, 36 • We build a system for collecting information We improve the system by building in the knowledge We collect the information We improve the observation and collect more information 113

Anytime algorithms – modular architecture • Units = Distinct/different implementations of a task, with Anytime algorithms – modular architecture • Units = Distinct/different implementations of a task, with the same interface but different performance characteristics : – – characteristics complexity accuracy error transfer characteristic • selection 114

Engineering view: Practical issues • Well defined mathematical fundation but there is a gap Engineering view: Practical issues • Well defined mathematical fundation but there is a gap between theory and the implementation • When and which is working better? (the theory can not give any answer or is lazy to think over? ) • How to choose the sizes/parameters/shapes/definitions/etc. ? • What if the axioms are inconsistant/incomplete? (the practical possibility can be 0) • Handling of the exceptions, e. g. the rule for very young overwrites the rule young • Good advises: Modeling, a priori knowledge, iteration, hybrid systems, smooth systems/parameters (as near to the real world as possible) 115

Accuracy problems • How can we handle accuracy problems if we e. g. don’t Accuracy problems • How can we handle accuracy problems if we e. g. don’t have any input information? • What if in time critical applications not only the stationary responses are to be considered? • How can the different modeling/data representation methods interprete the other’s results? • New (classical+nonclassical) measures are needed 116

Transients • Dynamic systems: Change in the systems transients • Depending on the transfer Transients • Dynamic systems: Change in the systems transients • Depending on the transfer function and on the actual implementation of the structure • Strongly related to the „energy distribution” of the system • Effected by the steps and the reconfiguration „route” 117

Transients • Must be reduced and treated: – careful choosing of the architecture (orthogonal Transients • Must be reduced and treated: – careful choosing of the architecture (orthogonal structures have better transients) – multi step reconfiguration: selection of the number and location of the intermediate steps – estimation of the effect of transients 118

Is CI really solution for unsolvable problems? • Yes: The high number of succesful Is CI really solution for unsolvable problems? • Yes: The high number of succesful applications and the new areas where automatization became possible prove that Computational Intelligence can be a solution for otherwise unsolvable problems • Although: With the new methods new problems have arised to be solved by you Future engineering is unthinkable without Computational Intelligence 119

Conclusions • • What is Computational Intelligence? What is the secret of its success? Conclusions • • What is Computational Intelligence? What is the secret of its success? How does it work? What kind of approaches/concepts are attached? • New problems with open questions 03. 10. 2006 Tokyo Institute of Technology 120