ac2d6dbd30d10f64314884cbedb36ce1.ppt
- Количество слайдов: 34
New Approaches for Data Reduction in Generalized Multi-valued Decision Information System (GMDIS): Case Study of Rheumatic Fever Patients WRSTA 2006, 13 August 2006
By Abd El-Monem M. Kozea, Mohamed M. E. Abd El-Monsef, Soaad Abd El-Badie Attia El-Afify Mathematics Department, Faculty of Science, Tanta University, Egypt Email: savvymore@yahoo. com WRSTA 2006, 13 August 2006
Outline n n n Motivation / Introduction Basic Concepts of Rough Sets Rheumatic Fever Data: Characteristics New Thinking n Generalized Multi-Valued Decision Information System (GMDIS) n New Approaches for Data Reduction in GMDIS n Non-equivalence Relations, n Topological Spaces and n Degree of Dependencies in GMDIS Reduct Algorithms based on GMDIS n Rheumatic Fever GMDIS Reduction: Worked example Conclusion and Future Work WRSTA 2006, 13 August 2006
Motivation / Introduction (1) n n Rough set theory was developed by Zdzislaw Pawlak in the early 1982’s. n RS is based on the idea of equivalence relations which partition the domain into different classes. n It is a mathematical tool for dealing with incomplete data for induction of approximations of concepts and for discovering patterns hidden in data. n It can be used for feature selection, data reduction, identifies partial/total dependencies in data, gives approach to null values and missing data, and decision rule generation. Rough Set Features: n It is applicable to problems with both numeric and descriptive attributes n It is capable of finding all minimal knowledge representation n It is highly automated based on strict rules. n A multi-valued information system (MIS) is a generalization of the idea of a single valued information system (SIS). n In a multi-valued information system, n Attribute functions are allowed to map elements to sets of attribute values. WRSTA 2006, 13 August 2006
Motivation / Introduction (2) n n Initiated a new approach for data reduction in GMDIS. n By converting the Single-Valued Decision Information System (SDIS) to a GMDIS. Two general relations are defined Constructing new classes using the general relations The measure of decision dependency on the condition attributes is studied To evaluate the performance of the approach, n An application of Rheumatic Fever datasets. WRSTA 2006, 13 August 2006
Rough Set Theory: Basic Concepts Information/Decision Systems (Tables) n Indiscernibility n Set Approximation n Reducts and Core n Rough Membership n Dependency of Attributes n WRSTA 2006, 13 August 2006
Information Systems/Tables Driving Conditions Fact No. Weather Road Time 1 Misty Icy Day 2 Foggy Icy Night 3 Misty Not icy Night 4 Sunny Icy Day 5 Foggy Not icy Dusk 6 Misty Not icy Night n n IS is a pair (U, A) U is a non-empty finite set of objects. A is a non-empty finite set of attributes such that for every is called the value set of a. WRSTA 2006, 13 August 2006
Decision Systems/Tables Driving Conditions Consequence Fact No. Weather Road Time Accident 1 Misty Icy Day Yes 2 Foggy Icy Night Yes 3 Misty Not icy Night Yes 4 Sunny Icy Day No 5 Foggy Not icy Dusk Yes 6 Misty Not icy Night No n n DS: n is the decision attribute (instead of one we can consider more decision attributes). The elements of A are called the condition attributes. WRSTA 2006, 13 August 2006
Information Systems Types n. The first concept of IS was developed by Grzymala. Busse (1988). There are many types of IS as follows: n Single valued Information System (SIS) n The data takes a single value for each element n Single valued Decision Information System (SDIS) n A Multi-valued Information System (MIS) An ordinary information system which its values ore sets = (U , At , {Va : a Î At }, f a ) n A Multi-valued Decision Information System (MDIS) = (U , At U D, {Va : a Î At}, f a ) WRSTA 2006, 13 August 2006
Rheumatic Fever Data: Characteristics n n We obtained the used Rheumatic Fever patients data from Tanta University Hospital, Egypt. All patients are between 9 -12 years old with history of Arthritis began from age 3 -5 years. This disease has many symptoms and it is usually started in young age and still with the patient along his life. The following table shows seven patients characterized by 8 symptoms (attributes) using them to decide the diagnosis for each patient (decision attribute). WRSTA 2006, 13 August 2006
Rheumatic Fever Data: Characteristics ttribute A ttribute Symbol. A to? Refers Sex Values Pharyngitis Arthritis Carditis Chorea ESR Abdonominal Pain Headache Diagnosis WRSTA 2006, 13 August 2006 to? Refers Male Female Yes No No arthritis Began in the knee Began in the ankle Affected Not affected Yes No Normal High Absent Present Yes No Rheumatic Arthritis Carditis Rheumatic Arthritis and Carditis
New Thinking (1) n n A multi-valued information system (MIS) is a generalization of the idea of a single valued information system (SIS). In a multi-valued information system, attribute functions are allowed to map elements to sets of attribute values Covert the SDIS to a MDIS and vice versa? WRSTA 2006, 13 August 2006
New Thinking (2) Initiative two methods to: Covert the SIS to a MIS and vice versa! Covert the SDIS to a MDIS and vice versa! by ( Collecting of Attributes). WRSTA 2006, 13 August 2006
Worked Example 1 (SDIS ): Rheumatic Fever SDIS Data S x 1 x 2 x 3 x 4 x 5 x 6 x 7 F A R K E P H D s 2 s 1 s 1 s 1 f 1 f 1 f 2 f 1 a 1 a 2 a 1 a 0 a 1 a 2 r 1 r 1 r 2 r 1 r 1 k 1 k 2 k 2 k 2 e 1 e 1 e 2 e 1 p 1 p 1 p 2 p 1 h 2 h 2 h 2 d 3 d 3 d 1 d 2 d 3 WRSTA 2006, 13 August 2006
Worked Example 2 (MDIS ): Converted Data Description (MDIS) Attribute Symbol Refers to ? Attribute Values Refers to ? β δ D {S, K} {F, A, E} {R, P, H} Diagnosis S → s 1 K → k 1 α 3 {S, K}→ {s 2, k 2} β 1 β 2 β 3 β 4 F → f 1 A →a 2 E → e 1 β 5 α α 1 α 2 {F, A, E} →{f 2, a 0, e 2} δ 1 δ 2 δ 3 δ 4 d 1 d 2 d 3 R → r 1 P→p 1 H→h 1 WRSTA 2006, 13 August 2006 {R, P, H}→ {r 2, p 2, h 2} Rheumatic arthritis Rheumatic carditis Rheumatic arthritis and carditis
Worked Example 3 (MDIS ): Rheumatic Fever MDIS Data α β δ x 1 x 2 x 3 x 4 x 5 x 6 x 7 D {α 2} {β 1, β 2, β 4} {δ 1, δ 2, } {d 3 } {α 1, α 2} {β 1, β 2, } {δ 1, δ 2, δ 3} {d 3 } {α 3} {β 1, β 2, β 4} {δ 1, δ 2} {d 3 } {α 1} {β 1, β 2, β 4} {δ 2 } {d 1 } {α 1} {β 4} {δ 1 } {d 2 } {α 1} {β 1, β 2} {δ 1, δ 2} {d 3 } {α 1} {β 1, β 3, β 4} {δ 1, δ 2, δ 3} {d 3 } WRSTA 2006, 13 August 2006
Generalized Multi. Valued Decision Information System (GMDIS) WRSTA 2006, 13 August 2006
Initiated a New Approach n Initiate a new approach for data reduction in Generalized Multi–Valued Decision Information System (GMDIS). n n n Convert the SDIS to GMDIS. Two general relations are defined on condition attributes and decision attribute. Construct new classes using the general relations which are used for data reduction. Study The measure of decision dependency on the condition attributes Evaluate the performance of the approach, n an application of, rheumatic fever datasets has been chosen and the reduct approach have been applied to see their ability and accuracy. WRSTA 2006, 13 August 2006
Generalized Multi-valued Decision Information System A Generalized Multi-valued Information System can be defined as follows. (1) GMIS = (U , At , {y a : a Î At}, fa , {h B : B Í At}) A Generalized Multi-valued Decision defined as follows. Information System can be (2) GMDIS = (U , At U D , {y : a Î At}, f , {h : B Í At}) a a B WRSTA 2006, 13 August 2006
Set Approximations in GMDIS (1) h. B = {(x, y) : fa ( x)c Í fa (y) , "a Î B , B Í At} (2) h. B = {(x, y) : fa ( y) Í fa ( x), "a Î B , B Í At} (3) h = {( x , y ) : f D ( x ) depends on f D ( y )} D = {( x , y ) : f D ( x ) Í f D ( y )} Define the set of all intersections of members of as the Meeting Point Relation (MPR) can be written as: (4) m = {m = A I A , m ¹ U A , a l i j l k k A , A Î Ah , i ¹ j} i j k a WRSTA 2006, 13 August 2006
Set Approximations in GMDIS (2) D (5) POS B (D ) = U X Î Ah X h B , B Í At D Where, for any subset X Í U the lower and upper approximations are defined by, (6) X X h h B B = U {h : h Bx Í X }, B Í At Bx = U {h : h Bx I X ¹ F }, B Í At Bx WRSTA 2006, 13 August 2006
Suggested New Technique : Consideration (1) 1. The set of attributes B Í At is called a reduct if t B £ t D and B is minimal, where t B £ t D iff " G Î t B , $G 'Î t D s. t. G Ì G ' , G '¹ U 2. The attribute a Î At is called the principal attribute (PA) if , ta f tb , "a, bÎ At b ¹ a and , if ta tb then both a and b are principal attributes. = WRSTA 2006, 13 August 2006
Suggested New Technique : Consideration (2) 3. The set of attributes of equal highest degree of dependency is the PA of the GMDIS. If the set of all reducts of any SDIS is , Y = { R 1 , R 2 , L , Rn, and the set of reducts for the } GMDIS system using tha new approach is, . Y' = { R 1 ', R 2 ', L, Rn '}. Then, it can be said that Y’ is more refinement than, Y if. " 'ÎY, $ ÎYs. t. R'ÍR R ' R i WRSTA 2006, 13 August 2006 i i i
Simplified Reducts Is the set of all reducts, after omitted the supersets of each reduct in the set RED (At), and we denote it by SRED (At). WRSTA 2006, 13 August 2006
GMDIS Reduction Algorithms Algorithm 1: GMDIS Reduct n Algorithm 2: GMDIS PA Algorithm n WRSTA 2006, 13 August 2006
GMDIS Reduction Algorithms: GMDIS Reduct A GMDIS = (1) R ¬ {} (U , At U D , {y a : a Î At}, fa , {h. B : B Í At}) (2) Do (3) GMDIS ¬ R (4) Loop a Î ( At - R ) (5) If t R U{ a } £ t D (6) GMDIS ¬ R U { a} (7) R ¬ GMDIS (8) Until (9) Return R : A set of minimum attribute subset; R Í Where R º Reduct WRSTA 2006, 13 August 2006 t. R £t. D R At
GMDIS Reduction Algorithms: GMDIS PA Algorithm A GMDIS = (U , At U D , {y a : a Î At }, f a , {h B : B Í At }) (1) PA ¬ {} (2) Do (3) Loop a Î At (4) Loop b Î At (5) If t a f t b (6) PA ¬ PA U { a} (7) End Loop (8) End Loop (9) Return PA PA: A set of principal attribute subset, PA Í At WRSTA 2006, 13 August 2006
Rheumatic Fever GMDIS Reduction: Worked Example Applying the new approach on MDIS Rheumatic Fever data to be a GMDIS by using the relation h B = {( x , y ) : f a ( x ) c Í f a ( y ) , "a Î B , B Í At } So we conclude that {a} is the reduct and it is the PA of the GMDIS and this is the same result obtained using the second consideration. RED( At ) = {a } = { S , K } WRSTA 2006, 13 August 2006
Discernibility Matrix versus GMDIS x 1 x 2 x 3 x 4 x 5 x 6 x 1 Ф x 2 Ф Ф x 3 Ф Ф Ф x 4 {S, R, K} {R, K, E, H} {S, A, R} Ф x 5 {S, F, A, K, P} {F, A, K, E, P, H} {S, F, A, P} {F, A, R, P} Ф x 6 Ф Ф Ф {R, E} {F, A, E, P} Ф x 7 Ф Ф Ф {A, R, H} {F, A, P, H} Ф x 7 Ф Rheumatic Fever Data August 2006 WRSTA 2006, 13 Discernibility Matrix
The discernibility function f = { S Ú R Ú K } Ù { R Ú K Ú E Ú H} Ù { S Ú A Ú R} Ù { S Ú F Ú A Ú K Ú P} At Ù { F Ú A Ú K Ú E Ú P Ú H} Ù { S Ú F Ú A Ú P} Ù { F Ú A Ú R Ú K Ú P} Ù { R Ú E} Ù { F Ú A Ú E Ú P} Ù { A Ú R Ú H} Ù { F Ú A Ú P Ú H} Re d ( At) = {{ S Ú R Ú K }, { S Ú A Ú R}, { S Ú F Ú A Ú P}, { F Ú A Ú R Ú K Ú P}, { R Ú E} , { F Ú A Ú E Ú P}, { A Ú R Ú H}, { F Ú A Ú P Ú H}} RED ( At ) = {a } = { S , K } WRSTA 2006, 13 August 2006
Final Note Reducts obtained by GMDIS is contained in the reducts obtained on SDIS using the discernibility matrix, that means that the new approach gives more reduction. WRSTA 2006, 13 August 2006
Conclusion n n New approach for data reduction in GMDIS is considered as a generalization in the case of MDIS. n This approach extended to Pawlak approach if the system is single-valued and the relations are equivalence. Opens the way for other approaches of data reduction n if we use the general topological recent concepts such as Pre -open sets, Semi-open sets, etc. In many real life situations, the use of attributes in a single fashion is not represetable for the actual effect of attributes. So, it is necessary to consider subsets of the attributes as a multi criteria. An application of, Rheumatic Fever datasets has been chosen and the reduct approach has been applied to see their ability and accuracy. WRSTA 2006, 13 August 2006
Acknowledgment n The authors greatly appreciate and thanks many peoples for their valuable comments and advices: Dr. K. E. Sturtz, , Air Force Research Laboratory, Wright Patterson Air Force Base, Ohio; n Prof. Aboul Ella Hassanien, Cairo University n Prof. E. Rady, , I. S. S. R. , Cairo University. n Dr. A. S. Salama. Pure Mathematics Dept. , Faculty of Science, Tanta University. n WRSTA 2006, 13 August 2006
ﺷﻜﺮﺍ ﻟﺤﺴﻦ ﺍﺳﺘﻤﺎﻋﻜﻢ 6002 WRSTA 2006, 13 August


