Скачать презентацию A Two-Step Fast Algorithm for the Automated Discovery Скачать презентацию A Two-Step Fast Algorithm for the Automated Discovery

e4a132434a07a8a7c97e60670e9d04a6.ppt

  • Количество слайдов: 28

A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows Claudio Di Ciccio A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows Claudio Di Ciccio and Massimo Mecella Claudio Di Ciccio (cdc@dis. uniroma 1. it) IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2013) Wednesday, April the 17 th, Singapore

Process Mining Definition • Process Mining [Aalst 2011. book], also referred to as Workflow Process Mining Definition • Process Mining [Aalst 2011. book], also referred to as Workflow Mining, is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded real executions (logs). • Pro. M [Aalst. Et. Al 2009] is one of the most used plug-in based software environment for implementing workflow mining (and more) techniques. • www. processmining. org 2 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

A different context for process mining Artful processes and knowledge workers • • • A different context for process mining Artful processes and knowledge workers • • • 3 Artful processes [Hill. Et. Al 06] § informal processes typically carried out by those people whose work is mental rather than physical (managers, professors, researchers, engineers, etc. ) • “knowledge workers” [ACTIVE 09] Knowledge workers create artful processes “on the fly” Though artful processes are frequently repeated, they are not exactly reproducible, even by their originators, nor can they be easily shared § Loosely structured § Highly flexible CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ Our mining algorithm § MINERful++ is the workflow discovery algorithm of Mail. Of. MINERful++ Our mining algorithm § MINERful++ is the workflow discovery algorithm of Mail. Of. Mine § Its input is a collection of strings T and an alphabet ΣT § Each string t is a trace § Each character is an event (enacted task) § The collection represents the log § Its output is a declarative process model § What is a declarative process model? 4 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

On the modeling of processes The imperative model • Represents the whole process at On the modeling of processes The imperative model • Represents the whole process at once • The most used notation is based on a subclass of Petri Nets (namely, the Workflow Nets) 5 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

On the modeling of processes The declarative model • Rather than using a procedural On the modeling of processes The declarative model • Rather than using a procedural language for expressing the allowed sequence of activities, it is based on the description of workflows through the usage of constraints • the idea is that every task can be performed, except those which do not respect such constraints • this technique fits for processes that are highly flexible and subject to changes, such as artful processes The notation here is based on [Pesic 08, Maggi. Et. Al 11] (Con. Dec, Declare) 6 CIDM 2013, Singapore If A is performed, B must be perfomed, no matter before or afterwards (responded existence) Whenever B is performed, C must be performed afterwards and B can not be repeated until C is done (alternate response) A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Declare constraint templates Constraint templates as Regular Expressions (REs) 7 CIDM 2013, Singapore A Declare constraint templates Constraint templates as Regular Expressions (REs) 7 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

On the modeling of processes Imperative vs. declarative Declarative Imperative 8 CIDM 2013, Singapore On the modeling of processes Imperative vs. declarative Declarative Imperative 8 CIDM 2013, Singapore Declarative models work better in presence of a partial specification of the process scheme A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

A real discovered process model “Spaghetti process” [Aalst 2011. book] 9 CIDM 2013, Singapore A real discovered process model “Spaghetti process” [Aalst 2011. book] 9 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

The declarative specification of an artful process (e. g. ) Scenario § A project The declarative specification of an artful process (e. g. ) Scenario § A project meeting is scheduled § We suppose that a final agenda will be committed (“confirm. Agenda”) after that requests for a new proposal (“request. Agenda”), proposals themselves (“propose. Agenda”) and comments (“comment. Agenda”) have been circulated. § Shortcuts for tasks (process alphabet): § § 10 p r c n (“propose. Agenda”) (“request. Agenda”) (“comment. Agenda”) (“confirm. Agenda”) CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

The declarative specification of an artful process (e. g. ) Constraints on activities § The declarative specification of an artful process (e. g. ) Constraints on activities § Existence constraints § The agenda 1. 2. 3. Participation(n) Uniqueness(n) End(n) 1. 2. 3. 4. 5. 6. Response(r, p) Responded. Existence(c, p) Succession(p, n) § Relation constraints must be confirmed, only once: it is the last thing to do. 4. the proposal follows a request; if a comment circulates, there has been / will be a proposal; after the proposal, there will be a confirmation, and there can be no confirmation without a preceding proposal. § During the compilation: 5. 6. 11 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ Workflow discovery by constraints inference § MINERful++ is a two-step algorithm 1. Construction MINERful++ Workflow discovery by constraints inference § MINERful++ is a two-step algorithm 1. Construction of a Knowledge Base 2. Constraints inference by means of queries evaluated on the KB § This allows the discovery of constraints through a faster procedure on data which are smaller in size than the whole input § Returned constraints are weighted with their support § the normalized fraction of cases in which the constraint is verified over the set of input traces 12 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ by example Workflow discovery by constraints inference § In order to see how MINERful++ by example Workflow discovery by constraints inference § In order to see how it works now § we see a run of MINERful++ over a string, compliant with the previous example: rrpcrpcrcpcn § We start with the construction of the algorithm’s Knowledge Base 13 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ by example Building the “ownplay” of p and n § p rrpcrpcrcpcn § MINERful++ by example Building the “ownplay” of p and n § p rrpcrpcrcpcn § p occurred 3 times in 1 string γp(3) = 1 • For each m ≠ 3 γp(m) = 0 § p did not occur as the first nor as the last character gi(p) = 0 gl(p) = 0 rrpcrpcrcpcn § γn(1) = 1 § For each m ≠ 1, γn(m) = 0 § n occurred as the last character in 1 string gi(n) = 0 gl(n) = 1 14 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ by example Building the “interplay” of p and n § With respect to MINERful++ by example Building the “interplay” of p and n § With respect to the occurrence of p, n occurred… i. ii. iv. v. Never before: 3 times δp, n(-∞) = 3 2 char’s after: 1 time δp, n(2) = 1 6 char’s after: 1 time δp, n(6) = 1 9 char’s after: 1 time δp, n(9) = 1 Repetitions in-between: i. ii. 15 § Looking at the string CIDM 2013, Singapore onwards: 2 times b→p, n = 2 backwards: never b←p, n = 0 i. rrpcrpcrcpcn iii. rrpcrpcrcpcn iv. rrpcrpcrcpcn v. i. rrpcrpcrcpcn ii. rrpcrpcrcpcn A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ by example Building the “interplay” of r and p -∞ δr, p -5 MINERful++ by example Building the “interplay” of r and p -∞ δr, p -5 -2 +1 +2 +4 +5 +8 +9 2 1 2 2 2 1 1 b→r, p = 1 b←r, p = 0 rrpcrpcrcpcn 16 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ Computing the support of constraints § Let Gr and Gp be the number MINERful++ Computing the support of constraints § Let Gr and Gp be the number of times in which r and p respectively appear in the log we have, e. g. , § support for Response(r, p) § hint: how many times p was not read in the traces after r occurred? In those cases, Response(r, p) does not hold § support for Succession(r, p) § how many times p was not read in the traces after r occurred, nor r was read before p occurred? In those cases, Succession(r, p) does not hold § … 17 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ Computing the support of constraints 18 CIDM 2013, Singapore A Two-Step Fast Algorithm MINERful++ Computing the support of constraints 18 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

MINERful++ by example Computing the support of constraints 1. 2. 3. 4. 5. 6. MINERful++ by example Computing the support of constraints 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 19 rrpcrpcrcpcn rccrppcpcrcppn cccccrrprrprpppn cprpn rppn rrrcrcpn rpcn crprrccppn pccn crrcccrrpcpccpn CIDM 2013, Singapore § Support for… § Response(r, p) 1. 0 § Succession(r, p) 0. 96364 § Precedence(c, n) 0. 9 § The support can be used to prune out those constraints falling under a given threshold § § E. g. , 0. 95 A threshold equal to 1. 0 selects those constraints which are always valid on the log A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Relation constraint templates subsumption Constraint templates are not independent of each other § E. Relation constraint templates subsumption Constraint templates are not independent of each other § E. g. , § A trace like ababcabcc satisfies (w. r. t. b and a): • Responded. Existence(a, b), Responded. Existence(b, a), Co. Existence(a, b), Co. Existence(b, a), Response(a, b), Alternate. Response(a, b), Chain. Response(a, b), Precedence(a, b), Alternate. Precedence(a, b), Chain. Precedence(a, b), Succession(a, b), Alternate. Succession(a, b), Chain. Succession(a, b) § The mining algorithm would have to show the most strict constraint only (Chain. Succession(a, b)) § MINERful++ faces this issue, by pruning the returned constraints on the basis of the subsumption hierarchy of constraints 20 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Relation constraint templates subsumption Constraint templates are not independent of each other 21 CIDM Relation constraint templates subsumption Constraint templates are not independent of each other 21 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Relation constraint templates subsumption A hint on the pruning procedure ✖ ✖ 22 CIDM Relation constraint templates subsumption A hint on the pruning procedure ✖ ✖ 22 CIDM 2013, Singapore 1. 0 ✖ ✖ ✖ 1. 0 1. 0 A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Relation constraint templates subsumption A hint on the pruning procedure S u p p Relation constraint templates subsumption A hint on the pruning procedure S u p p o rt 23 ✖ ✖ ✖ CIDM 2013, Singapore 0. 9 ✖ ? ✖ 0. 9 0. 7 ? ✖ 0. 9 0. 8 A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Evaluation Characteristics of MINERful++ § MINERful++ is § § Independent on the formalism used Evaluation Characteristics of MINERful++ § MINERful++ is § § Independent on the formalism used for expressing constraints Modular (two-phase) Capable of eliminating redundancy in the process model Fast Sony VAIO VGN-FE 11 H Intel Core Duo T 2300 1. 66 GHz 2 MB L 2 cache 2 GB of DDR 2 RAM @ 667 Mhz 24 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Evaluation On the complexity of MINERful++ § Linear w. r. t. the number of Evaluation On the complexity of MINERful++ § Linear w. r. t. the number of traces in the log |T| § Quadratic w. r. t. the size of traces in the log |tmax| § Quadratic w. r. t. the size of the alphabet |ΣT| § Hence, polynomial in the size of the input O(|T|·|tmax |2·|ΣT |2) 25 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Preliminary results MINERful++ on real data § Logs extracted from 2 mailboxes § [Di. Preliminary results MINERful++ on real data § Logs extracted from 2 mailboxes § [Di. Ciccio. Mecella 2012] § 5 traces § 34. 75 events each on average § 139 events read in total § Evaluation of inferred constraints conducted with an expert domain § Precision ≈ 0. 794 26 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

Future work Research in progress § Integrate MINERful++ with Pro. M § MINERful++ is Future work Research in progress § Integrate MINERful++ with Pro. M § MINERful++ is already capable of reading/writing XES logs § Study the effects of errors in logs on the inferred workflow § Error-injected synthetic logs can help us conduct an automated analysis on the quality of results § Refine the estimation calculi for support § Auto-tune the support threshold § Depending on the constraint template, which can be more or less “robust” § Enlarge the set of discovered constraint templates to the branching Declare constraints 27 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows

References Cited articles and resources, in order of appearance • • 28 [Aalst 2011. References Cited articles and resources, in order of appearance • • 28 [Aalst 2011. book] van der Aalst, W. M. P. : Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011). [Aalst. Et. Al 2009] van der Aalst, W. M. P. , van Dongen, B. F. , Güther, C. W. , Rozinat, A. , Verbeek, E. , Weijters, T. : Prom: The process mining toolkit. In de Medeiros, A. K. A. , Weber, B. , eds. : BPM (Demos). Volume 489 of CEUR Workshop Proceedings. , CEUR-WS. org (2009) [Hill. Et. Al 06] Hill, C. , Yates, R. , Jones, C. , Kogan, S. L. : Beyond predictable workflows: Enhancing productivity in artful business processes. IBM Systems Journal 45(4), 663– 682 (2006) [ACTIVE 09] Warren, P. , Kings, N. , et al. : Improving knowledge worker productivity - the active integrated approach. BT Technology Journal 26(2), 165– 176 (2009) [Pesic 08] Pesic, M. : Constraint-based Workflow Management Systems: Shifting Control to Users. Ph. D. Thesis. Technische Universiteit Eindhoven, 2008. [Maggi. Et. Al 11] Maggi, F. M. , Mooij, A. J. , van der Aalst, W. M. P. : User-guided discovery of declarative process models. In: CIDM, IEEE (2011) 192– 199 [Di. Ciccio. Et. Al 2012] Di Ciccio, C. , Mecella, M. , Scannapieco, M. , Zardetto, D. , Catarci, T. : Mail. Of. Mine – analyzing mail messages for mining artful col- laborative processes. In: Data-Driven Process Discovery and Analysis. Springer, 55 -81 (2012). CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows