Logical and Rule-Based Reasoning Part II Logic

Logical and Rule-Based Reasoning Part II

Logic for Cognitive Science Good News: n n Predicate Logic is sound and complete. A completely rigorous and correct system for predicate logic can be computerized so that any correct pattern of reasoning in the language can be discovered by a computer. Turing's Thesis: any process that can be expressed with a finite set of rules can processed by a digital computer that operates on representations in the language of logic. Bad News: n n A fully correct mechanism for logic problem solving may spend exponential time solving problems. So, systems have to sacrifice correctness to obtain efficiency. Turing's Theorem: Predicate Logic has no decision procedure. Although good reasoning can always be discovered (eventually) by a logic problem solver, there is no guarantee that bad reasoning can be identified as such in a finite amount of time.

Logic for Cognitive Science Good News: n Systems to handle belief, time and other so-called intensional concepts have been developed, and are being adopted by AI researchers. Bad News: n Predicate Logic doesn't let you take it back. Standard logic uses monotonic reasoning, which means that the more information you have the more you can prove from it. What if we find that something we knew is in fact false. Good News: n Non-monotonic logics have been developed that are modifications of predicate logic. In these systems you can say 'All birds fly', and then assert 'Penguins don't fly' without causing contradictions. Logics with exceptions. Bad News: n Predicate Logic doesn't (conveniently) let you handle matters of degree. If I write 'Tall(John)' then I have said that John really is tall. There is no way to say he is sort of tall, or somewhat tall. Similarly you can't (easily) say that the probability that John is tall is 90%. Good News: n Many-Valued Logics, Logics of Probability, and Fuzzy Logics allow expression of matters of degree. Fuzzy logics have been found to be quite useful in AI especially in controlling machines.

The Psychological Plausibility of Logic Primary complaint n n Logic is a poor model of human reasoning. Logic a fine standard for good reasoning but not necessarily how human beings actually reason. There are two kinds of evidence for this claim: n n Logic is too time intensive and full logical reasoning requires long derivations using inference rules People do not live up to the rules of logic – Recall the Wason's card selection task.

The Psychological Plausibility of Logic The general finding for the Wason card selection task has been replicated again and again on a wide variety of problems Tversky & Kahneman, 1974 n People use heuristics for judgments under uncertainty Representativeness Availability Anchoring and adjustment “People rely on a limited number of heuristic principles which reduce the complex tasks of assessing probabilities and predicting values to simpler judgmental operations. In general, these heuristics are quite useful, but sometimes they lead to severe and systematic errors”

Representativeness Linda is 31 years old, single outspoken and very bright. She majored in philosophy. As a student she was deeply concerned with the issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Rank order the following statements by their probability, using 1 for the most probable and 8 for the least probable a) Linda is a teacher in elementary school b) Linda works in a bookstore and takes Yoga classes c) Linda is active in the feminist movement d) Linda is a psychiatric social worker e) Linda is a member of the League of Women voters f) Linda is a bank teller g) Linda is an insurance salesperson h) Linda is a bank teller and is active in the feminist movement

Representativeness Which sequence is more likely to be produced by flipping a fair coin? HHTHT HHHHH

Representativeness Kahneman & Tversky: people judge the probability of an outcome based on the extent to which it is representative of the generating process The fact that HHTHT looks representative of a fair coin and HHHHH does not reflects our implicit theories of how the world works. n n Easy to imagine how a trick all-heads coin could work: high prior probability. Hard to imagine how a trick “HHTHT” coin could work: low prior probability.

The Psychological Plausibility of Logic Response to Time Complexity: n n Can be overcome using massively parallel system like the brain. Poor reasoning of humans does not show that people fail to have logical machinery in their brains. There are just limitations on how they use it: bad memory, poor attention, etc. Response to “people aren’t logical” n So what if humans don't use logic. They ought to. Cognitive science is the study of intelligence not stupidity.

The Psychological Plausibility of Logic Recall the modified Wason selection task So, people are in principle logical In practice, however, they are not because their ideal logical abilities are confronted with sever limitations in working memory n n n Cannot generate indefinitely long chains of deductions Cannot generate indefinitely many mental models May require a context to help in the selection of correct logical rules

Another Approach: dual-process accounts of reasoning Divide reasoning into two modular components System 1 n n n Early evolving set of many systems Common between man and animals Innate processes and associative learning Rapid, parallel, automatic Only their result is available to consciousness System 2 n n n Late evolving, unique to people Slow, serial processing involving working memory Capable of abstract, hypothetical thought

The Belief-Bias Effect One of the key methods for demonstrating dual processes in reasoning tasks n Seeks to create a conflict between responses based upon a process of logical reasoning and those derived from prior belief about the truth of conclusions. In belief-bias experiments, participants are instructed to treat the problem as a logical reasoning task and to endorse only conclusions that necessarily follow from the premises given. In spite of this, intelligent adult populations (undergraduate students) are consistently influenced by the prior believability of the conclusion given as well as by the validity of the arguments presented.

The Belief-Bias Effect Typically, syllogisms are presented for evaluation, which fall into one of the four following categories

The Belief-Bias Effect Participants are substantially influenced by both the logic of the argument and believability of its conclusion, with more belief-bias on invalid arguments Dual-process accounts propose that although participants attempt to reason logically in accord with the instructions, the influence of prior beliefs is extremely difficult to suppress and effectively competes for control of the responses made. Study of J. B. T. Evans, 1988 Green = Believable Red - Unbelievable

Rules as Mental Representations Rules n n People have mental rules. People have procedures for using these rules to search a space of possible solutions, and procedures for generating new rules. Procedures for using and forming rules produce the behavior. Rule-based systems: n manipulation and transformation of symbols Rule-based systems in AI and cognitive science: n n n Newell and Simon, GPS 1950 s-60 s Expert systems, 1970 s-90 s. MYCIN ACT 1983. John Anderson. SOAR, Newell and his students, 1980 s, John E. Laird (Univ. Michigan) Prolog: logic programming

Example of a rule-based system How to get from Chicago to Madison? n n n IF you want to get to Madison, and you are in Chicago, and you have no car, THEN take the bus. IF you want to take the bus from Chicago to Madison, THEN go to the bus depot and buy a ticket. IF you want to buy a ticket, THEN get some money. IF you want to get some money, then go to the bank and withdraw it. IF you want to get to Madison and you have a car, THEN take highway 94. IF: conditions (antecedent) THEN: action (consequent)

Production Rules in a rule based systems are often called production rules Production rules are one of the most popular and widely used knowledge representation languages. Early expert systems used production rules as their main knowledge representation language. Production rule system consists of three components working memory - contains the information that the system has gained about the problem thus far rule base - contains information that applies to all the problems that the system may be asked to solve Interpreter - solves the control problem, i. e. , decide which rule to execute on each selection-execute cycle (also called recognition-act cycle).

Example: Water Jug Problem We have two jugs: one holds 4 gallons and the other 3 gallons of water There are no external measuring devices We can fill-up a jug from a pump at any time We can pour water out of a jug or from one into the other The problem is to start from an initial state (each state would be the status of the two jugs) and get to a final state by a sequence of legal moves n e. g. , from [0, 0] (both jugs are empty) to [2, 0] (the 4 gallon jug has 2 gallons of water and the three gallon jug is empty).

Production Rules for the Water Jug Problem 1. Fill 4 -gallon jug: 2. Fill 3 -gallon jug: 3. Empty 4 -gallon jug: 4. Empty 3 -gallon jug: ([x, y] | x < 4) [4, y] ([x, y] | y < 3) [x, 3] ([x, y] | x > 0) [0, y] ([x, y] | y > 0) [x, 0] 5. From 3 -gallon jug to 4 gallon jug, until full: ([x, y] | x+y 4 and y > 0) [4, y-(4 -x)] 6. From 4 -gallon jug to 3 gallon jug, until full: ([x, y] | x+y 3 and x > 0) [x-(3 -y)], 3] 7. All of 3 -gallon jug into 4 -gallon jug: ([x, y] | x+y 4 and y > 0) [x+y, 0] 8. All of 4 -gallon jug into 3 -gallon jug: ([x, y] | x+y 3 and x > 0) [0, x+y] Suppose we start with two empty jugs, i. e. , [0, 0]. How do we get to the state [2, 2]?

How different from logic? Less informational power than logic: n rule-based system may not have full quantifiers and rules of inference. But can be more computationally efficient, because it focuses on the task to be accomplished. Uses processes that are not inherently part of logic: e. g. , subgoaling. Can be tied in with other processes, such as spreading activation to model human memory (ACT, PI). Can be combined with other representations, such as concepts.

Advantages of Rule-Based Systems Have been used in many commercial systems (e. g. , expert systems). Modular, easy to add to. Have modeled various kinds of psychological experiments. Lots of human knowledge and ability naturally described in terms of rules. Various techniques known for learning rules automatically.

Expert Systems A production rule system combined with an inference engine and a user interface It is often implemented as a “conversational” or interactive system n Pre-defined questions are designed to reduce the search space for rules whose antecedents are satisfied.

Weaknesses Inflexibility n To change the behavior of the systems must change the rules Over-generality n Rules may not be able to deal with specific problems that are not part of the knowledge base Control may be difficult n Actions are performed only if the rule conditions are satisfied Knowledge acquisition is difficult n n The most challenging part: knowledge engineering E. g. , in expert systems, the knowledge of many experts and about many situations must be encoded as rules (often manually).

Rules and Cognitive Science John Anderson and ACT-R n n n Cognitive skills are realized by production rules. Production rules are organized around a set of goals. Complex cognitive processes involve a sequence of production rules. Productions are matched against working memory. Rules are psychologically realistic, because they describe many aspects of skilled behavior, and predict the details of that behavior.

Rules and Problem Solving Forward chaining: n n n do deductive reasoning forward by modus ponens: if p then q; p; so q. But use strategies to focus inference, e. g. use most specific rule. Good for planning. Start with initial state, work forward to goal. Reasoning is search through a space of states and operators. Backward chaining: n if p then q; q; so check whether p can be accomplished. Good for diagnosis (explanation), planning. Start with goal, work backward to current state. Bidirectional search: forward and backward. Most of the successful commercial expert systems are rule-based. Modularity of rules: just add more to the rule base.

Learning Rules can be learned Generalization from examples: experience, imitation n Fa, Ga all F are G. Induction. Rule compilation, chunking (same as transitivity) n n p q, q r then p r. Important for skill acquisition: chunk together several rules into one that can be quickly executed. Learn by being told – rote learning Abductive inference uses rules to form hypotheses n n Babies with ear infections cry. Adam is a baby and is crying. So maybe Adam has an ear infection. Note that this is not a valid inference, but is a plausible use of backward chaining.

Psychological Plausibility of Rules Production systems have been used to model performance on many tasks, e. g. chess, Tower of Hanoi. Quantitative fit: power law of learning. n Rate of learning slows down. Applies to many kinds of skill acquisition. Learning in rats: not just associations, but rules. n E. g. if tone then shock. Conditioning is learning rules. Learning of social rules. n n If given something, say thank you more interesting: in explaining other people's behavior (abduction). Knowledge of physical systems: mental models. n If you turn the key, the engine starts. Learning language: grammatical rules n n Past tense: learned; turned; shocked But, what about cryed? OR go goed?

Rules and Language Basic questions: n n What representations are required for our ability to understand produce language? How is language learned? Behaviorist answer: n Language is based on a set of associations, learned by trial and error. Chomsky's revolution Syntactic Structures (1956) - Rejected associationism n n n grammars are complex, rule-like structures universal grammar is innate we are born with readiness to notice what kind of grammar our native language has.

Rules and Language Evidence supports Chomsky’s View productivity of language: we can understand sentences that have never been uttered before. n "Colorless green ideas sleep furiously. " Compare Ease of language learning: n Almost all children acquire language with relatively little feedback. Example of Chomskyan grammar (earlier work): n The girl kicked the ball The ball was kicked by the girl. A syntactic transformation produces a new structure. n This explains the productivity of language: transformations can produce an unlimited number of structures. Learning of grammars is relatively easy n We have innate expectations about structures and transformations. Learning is a kind of abduction: children form hypotheses to explain the utterances they here.

Rules and Language Chomsky's 1980's view: government and binding. n n n Less emphasis on transformations. More emphasis on constraints on what can count as grammatical. Innate universal grammar: E. g. asymmetry of subject and object. All languages have nouns, verbs, adjectives, and adpositions (pre- or post-positions). XP = X - YP. n For each X (verb, noun, adjective, preposition) there is a phrase YP (noun phrase, etc. ) that can follow it. Children merely need to learn parameters: set of switches to be set. But the basics of universal grammar are not learned, and could not be in the time available. Concepts are also innate and preexisting: children just need to learn what labels to apply to them.