e4a132434a07a8a7c97e60670e9d04a6.ppt
- Количество слайдов: 28
A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows Claudio Di Ciccio and Massimo Mecella Claudio Di Ciccio (cdc@dis. uniroma 1. it) IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2013) Wednesday, April the 17 th, Singapore
Process Mining Definition • Process Mining [Aalst 2011. book], also referred to as Workflow Mining, is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded real executions (logs). • Pro. M [Aalst. Et. Al 2009] is one of the most used plug-in based software environment for implementing workflow mining (and more) techniques. • www. processmining. org 2 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
A different context for process mining Artful processes and knowledge workers • • • 3 Artful processes [Hill. Et. Al 06] § informal processes typically carried out by those people whose work is mental rather than physical (managers, professors, researchers, engineers, etc. ) • “knowledge workers” [ACTIVE 09] Knowledge workers create artful processes “on the fly” Though artful processes are frequently repeated, they are not exactly reproducible, even by their originators, nor can they be easily shared § Loosely structured § Highly flexible CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
MINERful++ Our mining algorithm § MINERful++ is the workflow discovery algorithm of Mail. Of. Mine § Its input is a collection of strings T and an alphabet ΣT § Each string t is a trace § Each character is an event (enacted task) § The collection represents the log § Its output is a declarative process model § What is a declarative process model? 4 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
On the modeling of processes The imperative model • Represents the whole process at once • The most used notation is based on a subclass of Petri Nets (namely, the Workflow Nets) 5 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
On the modeling of processes The declarative model • Rather than using a procedural language for expressing the allowed sequence of activities, it is based on the description of workflows through the usage of constraints • the idea is that every task can be performed, except those which do not respect such constraints • this technique fits for processes that are highly flexible and subject to changes, such as artful processes The notation here is based on [Pesic 08, Maggi. Et. Al 11] (Con. Dec, Declare) 6 CIDM 2013, Singapore If A is performed, B must be perfomed, no matter before or afterwards (responded existence) Whenever B is performed, C must be performed afterwards and B can not be repeated until C is done (alternate response) A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Declare constraint templates Constraint templates as Regular Expressions (REs) 7 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
On the modeling of processes Imperative vs. declarative Declarative Imperative 8 CIDM 2013, Singapore Declarative models work better in presence of a partial specification of the process scheme A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
A real discovered process model “Spaghetti process” [Aalst 2011. book] 9 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
The declarative specification of an artful process (e. g. ) Scenario § A project meeting is scheduled § We suppose that a final agenda will be committed (“confirm. Agenda”) after that requests for a new proposal (“request. Agenda”), proposals themselves (“propose. Agenda”) and comments (“comment. Agenda”) have been circulated. § Shortcuts for tasks (process alphabet): § § 10 p r c n (“propose. Agenda”) (“request. Agenda”) (“comment. Agenda”) (“confirm. Agenda”) CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
The declarative specification of an artful process (e. g. ) Constraints on activities § Existence constraints § The agenda 1. 2. 3. Participation(n) Uniqueness(n) End(n) 1. 2. 3. 4. 5. 6. Response(r, p) Responded. Existence(c, p) Succession(p, n) § Relation constraints must be confirmed, only once: it is the last thing to do. 4. the proposal follows a request; if a comment circulates, there has been / will be a proposal; after the proposal, there will be a confirmation, and there can be no confirmation without a preceding proposal. § During the compilation: 5. 6. 11 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
MINERful++ Workflow discovery by constraints inference § MINERful++ is a two-step algorithm 1. Construction of a Knowledge Base 2. Constraints inference by means of queries evaluated on the KB § This allows the discovery of constraints through a faster procedure on data which are smaller in size than the whole input § Returned constraints are weighted with their support § the normalized fraction of cases in which the constraint is verified over the set of input traces 12 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
MINERful++ by example Workflow discovery by constraints inference § In order to see how it works now § we see a run of MINERful++ over a string, compliant with the previous example: rrpcrpcrcpcn § We start with the construction of the algorithm’s Knowledge Base 13 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
MINERful++ by example Building the “ownplay” of p and n § p rrpcrpcrcpcn § p occurred 3 times in 1 string γp(3) = 1 • For each m ≠ 3 γp(m) = 0 § p did not occur as the first nor as the last character gi(p) = 0 gl(p) = 0 rrpcrpcrcpcn § γn(1) = 1 § For each m ≠ 1, γn(m) = 0 § n occurred as the last character in 1 string gi(n) = 0 gl(n) = 1 14 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
MINERful++ by example Building the “interplay” of p and n § With respect to the occurrence of p, n occurred… i. ii. iv. v. Never before: 3 times δp, n(-∞) = 3 2 char’s after: 1 time δp, n(2) = 1 6 char’s after: 1 time δp, n(6) = 1 9 char’s after: 1 time δp, n(9) = 1 Repetitions in-between: i. ii. 15 § Looking at the string CIDM 2013, Singapore onwards: 2 times b→p, n = 2 backwards: never b←p, n = 0 i. rrpcrpcrcpcn iii. rrpcrpcrcpcn iv. rrpcrpcrcpcn v. i. rrpcrpcrcpcn ii. rrpcrpcrcpcn A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
MINERful++ by example Building the “interplay” of r and p -∞ δr, p -5 -2 +1 +2 +4 +5 +8 +9 2 1 2 2 2 1 1 b→r, p = 1 b←r, p = 0 rrpcrpcrcpcn 16 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
MINERful++ Computing the support of constraints § Let Gr and Gp be the number of times in which r and p respectively appear in the log we have, e. g. , § support for Response(r, p) § hint: how many times p was not read in the traces after r occurred? In those cases, Response(r, p) does not hold § support for Succession(r, p) § how many times p was not read in the traces after r occurred, nor r was read before p occurred? In those cases, Succession(r, p) does not hold § … 17 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
MINERful++ Computing the support of constraints 18 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
MINERful++ by example Computing the support of constraints 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 19 rrpcrpcrcpcn rccrppcpcrcppn cccccrrprrprpppn cprpn rppn rrrcrcpn rpcn crprrccppn pccn crrcccrrpcpccpn CIDM 2013, Singapore § Support for… § Response(r, p) 1. 0 § Succession(r, p) 0. 96364 § Precedence(c, n) 0. 9 § The support can be used to prune out those constraints falling under a given threshold § § E. g. , 0. 95 A threshold equal to 1. 0 selects those constraints which are always valid on the log A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Relation constraint templates subsumption Constraint templates are not independent of each other § E. g. , § A trace like ababcabcc satisfies (w. r. t. b and a): • Responded. Existence(a, b), Responded. Existence(b, a), Co. Existence(a, b), Co. Existence(b, a), Response(a, b), Alternate. Response(a, b), Chain. Response(a, b), Precedence(a, b), Alternate. Precedence(a, b), Chain. Precedence(a, b), Succession(a, b), Alternate. Succession(a, b), Chain. Succession(a, b) § The mining algorithm would have to show the most strict constraint only (Chain. Succession(a, b)) § MINERful++ faces this issue, by pruning the returned constraints on the basis of the subsumption hierarchy of constraints 20 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Relation constraint templates subsumption Constraint templates are not independent of each other 21 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Relation constraint templates subsumption A hint on the pruning procedure ✖ ✖ 22 CIDM 2013, Singapore 1. 0 ✖ ✖ ✖ 1. 0 1. 0 A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Relation constraint templates subsumption A hint on the pruning procedure S u p p o rt 23 ✖ ✖ ✖ CIDM 2013, Singapore 0. 9 ✖ ? ✖ 0. 9 0. 7 ? ✖ 0. 9 0. 8 A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Evaluation Characteristics of MINERful++ § MINERful++ is § § Independent on the formalism used for expressing constraints Modular (two-phase) Capable of eliminating redundancy in the process model Fast Sony VAIO VGN-FE 11 H Intel Core Duo T 2300 1. 66 GHz 2 MB L 2 cache 2 GB of DDR 2 RAM @ 667 Mhz 24 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Evaluation On the complexity of MINERful++ § Linear w. r. t. the number of traces in the log |T| § Quadratic w. r. t. the size of traces in the log |tmax| § Quadratic w. r. t. the size of the alphabet |ΣT| § Hence, polynomial in the size of the input O(|T|·|tmax |2·|ΣT |2) 25 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Preliminary results MINERful++ on real data § Logs extracted from 2 mailboxes § [Di. Ciccio. Mecella 2012] § 5 traces § 34. 75 events each on average § 139 events read in total § Evaluation of inferred constraints conducted with an expert domain § Precision ≈ 0. 794 26 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
Future work Research in progress § Integrate MINERful++ with Pro. M § MINERful++ is already capable of reading/writing XES logs § Study the effects of errors in logs on the inferred workflow § Error-injected synthetic logs can help us conduct an automated analysis on the quality of results § Refine the estimation calculi for support § Auto-tune the support threshold § Depending on the constraint template, which can be more or less “robust” § Enlarge the set of discovered constraint templates to the branching Declare constraints 27 CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows
References Cited articles and resources, in order of appearance • • 28 [Aalst 2011. book] van der Aalst, W. M. P. : Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011). [Aalst. Et. Al 2009] van der Aalst, W. M. P. , van Dongen, B. F. , Güther, C. W. , Rozinat, A. , Verbeek, E. , Weijters, T. : Prom: The process mining toolkit. In de Medeiros, A. K. A. , Weber, B. , eds. : BPM (Demos). Volume 489 of CEUR Workshop Proceedings. , CEUR-WS. org (2009) [Hill. Et. Al 06] Hill, C. , Yates, R. , Jones, C. , Kogan, S. L. : Beyond predictable workflows: Enhancing productivity in artful business processes. IBM Systems Journal 45(4), 663– 682 (2006) [ACTIVE 09] Warren, P. , Kings, N. , et al. : Improving knowledge worker productivity - the active integrated approach. BT Technology Journal 26(2), 165– 176 (2009) [Pesic 08] Pesic, M. : Constraint-based Workflow Management Systems: Shifting Control to Users. Ph. D. Thesis. Technische Universiteit Eindhoven, 2008. [Maggi. Et. Al 11] Maggi, F. M. , Mooij, A. J. , van der Aalst, W. M. P. : User-guided discovery of declarative process models. In: CIDM, IEEE (2011) 192– 199 [Di. Ciccio. Et. Al 2012] Di Ciccio, C. , Mecella, M. , Scannapieco, M. , Zardetto, D. , Catarci, T. : Mail. Of. Mine – analyzing mail messages for mining artful col- laborative processes. In: Data-Driven Process Discovery and Analysis. Springer, 55 -81 (2012). CIDM 2013, Singapore A Two-Step Fast Algorithm for the Automated Discovery of Declarative Workflows


