Скачать презентацию Design Implementierung und Evaluierung einer virtuellen Maschine für Скачать презентацию Design Implementierung und Evaluierung einer virtuellen Maschine für

f5b9542b40413433b9446f3980b92fa9.ppt

  • Количество слайдов: 47

Design, Implementierung und Evaluierung einer virtuellen Maschine für Oz Ralf Scheidhauer PS Lab, DFKI Design, Implementierung und Evaluierung einer virtuellen Maschine für Oz Ralf Scheidhauer PS Lab, DFKI May 18, 1999 1

Oz q Developed at DFKI since 1991 q DFKI Oz 1. 0 (1995), DFKI Oz q Developed at DFKI since 1991 q DFKI Oz 1. 0 (1995), DFKI Oz 2. 0 (1998) q Mozart 1. 0 (1999) m 180 000 lines of C++ m 140 000 lines of Oz m 65 000 lines documentation q Since 1996 collaboration with SICS and UCL q Application strength system: multi agents (DFKI, SICS), computer-bus scheduling (Daimler), gate scheduling (Singapore), NL (SFB), comp. biology (LMU), . . . 2

Related Work q LP, CLP [Warren 77], [Jaffer Lassez 86] q Concurrency [Saraswat 93] Related Work q LP, CLP [Warren 77], [Jaffer Lassez 86] q Concurrency [Saraswat 93] q AKL [Janson Haridi 90, Janson 94] q FP [Appel 92] 3

Overview q Language L q Virtual machine q Implementation q Evaluation 4 Overview q Language L q Virtual machine q Implementation q Evaluation 4

The Language L q Core language of Oz q Presentation as extension of a The Language L q Core language of Oz q Presentation as extension of a sub language of SML m Logic variables m Threads m Synchronization m Dynamic type system q Extensions via predefined functions lvar() logic variable unify(x, y) unification spawn(f) thread creation 5

Graph Model q Integers TUPLE q Tuples q Functions INT/3 TUPLE CELL q Cells Graph Model q Integers TUPLE q Tuples q Functions INT/3 TUPLE CELL q Cells (references) q Constructors CON INT/5 q Strict evaluation of expressions e 0 e 1 . . . 6

Why Logic Variables? q Programming techniques: backpatching, difference lists, . . . q Cyclic Why Logic Variables? q Programming techniques: backpatching, difference lists, . . . q Cyclic data structures q Tail recursive definition of many functions (append, map, . . . ) q Synchronization of threads q Search 7

Logic Variables: Creation and Representation TUPLE let val x = lvar() in (4, x, Logic Variables: Creation and Representation TUPLE let val x = lvar() in (4, x, 23) INT/4 VAR INT/23 end 8

Logic Variables: Unification unify( , ) TUPLE INT/3 VAR INT/2 INT/3 VAR TUPLE INT/3 Logic Variables: Unification unify( , ) TUPLE INT/3 VAR INT/2 INT/3 VAR TUPLE INT/3 INT/5 INT/2 INT/3 INT/5 9

Threads thread 1 e 1 . . . threadn en threadn+1 f() store q Threads thread 1 e 1 . . . threadn en threadn+1 f() store q Creation spawn(f) q Synchronization: logic variables (x+y) q Fairness 10

Virtual Machine 11 Virtual Machine 11

X-regs stack Model threads heap scheduler . . . move Y 3 X 0 X-regs stack Model threads heap scheduler . . . move Y 3 X 0 move G 5 X 1 apply G 2 2 return. . . code 12

V-Addressing q Address toplevel variables via V-registers q Loader builds data on the heap V-Addressing q Address toplevel variables via V-registers q Loader builds data on the heap code contains direct references into heap q Example fun f(l, u) = map(fn(x)=>h(x)+g(x)+u, l) q h and g in V-register reduced memory consumption 13

Dynamic Code Specialization apply V 3 2 spec. Apply V 3 2 fast. Apply Dynamic Code Specialization apply V 3 2 spec. Apply V 3 2 fast. Apply V 3 14

Unification in the Machine Model unify( , ) TUPLE INT/3 VAR INT/2 INT/3 REF Unification in the Machine Model unify( , ) TUPLE INT/3 VAR INT/2 INT/3 REF VAR TUPLE INT/3 INT/5 INT/2 INT/3 INT/5 REF 15

Synchronization = Suspension + Wakeup (x+y). . . suspension x: VAR y: VAR thread Synchronization = Suspension + Wakeup (x+y). . . suspension x: VAR y: VAR thread . . . 16

Synchronization = Suspension + Wakeup to the scheduler q Wakeup: unify(x, 23) (x+y). . Synchronization = Suspension + Wakeup to the scheduler q Wakeup: unify(x, 23) (x+y). . . INT/23 x: REF y: VAR thread . . . 17

Implementation 18 Implementation 18

Emulator vs. Native Code virtual machine implementation emulator q portable native code q fast Emulator vs. Native Code virtual machine implementation emulator q portable native code q fast (? ) q flexible 19

Threads q X registers: once per machine, not per thread m Save live X Threads q X registers: once per machine, not per thread m Save live X registers upon preemption/suspension: pessimistic guess per function m Exact determination during GC by code interpretation 20

Representation of the Graph: Naiv register heap type. . . INT 23 21 Representation of the Graph: Naiv register heap type. . . INT 23 21

Representation of the Graph: Optimized register 23 heap INT PTR type. . 22 Representation of the Graph: Optimized register 23 heap INT PTR type. . 22

Representation of the Graph: Logic Variables register 23 heap INT VAR PTR REF PTR Representation of the Graph: Logic Variables register 23 heap INT VAR PTR REF PTR . . . 23

Logic Variables: Optimized register 23 heap INT PTR type. . . VAR WAM REF Logic Variables: Optimized register 23 heap INT PTR type. . . VAR WAM REF register REF 24

Moving More Tags register 23 heap INT type PTR . . . REF TPL Moving More Tags register 23 heap INT type PTR . . . REF TPL . . 25

Evaluation 26 Evaluation 26

Comparison with Emulators q Mozart is one of the fastest emulators q Competitive with Comparison with Emulators q Mozart is one of the fastest emulators q Competitive with OCAML and Java q Significantly faster than Moscow ML q Twice as fast as Sicstus Prolog and Erlang 27

Comparison with Native Code Systems q Few memory accesses (i. e. arithmetics) Mozart is Comparison with Native Code Systems q Few memory accesses (i. e. arithmetics) Mozart is easily one order of magnitude slower q Memory intensive (symbolic computation) m Difference only approx. factor 2 -3 m Mozart in single cases faster than native ML or C++ 28

Threads q Threads in Mozart are very light weight q Leading position both for Threads q Threads in Mozart are very light weight q Leading position both for creation and communication q Up to nearly 2 orders of magnitude faster than Java (creation) 29

Summary q Extended sub language of SML by logic variables and threads q Machine Summary q Extended sub language of SML by logic variables and threads q Machine model m V - registers m Dynamic code specialization m Synchronization q Implementation m Efficient implementation of threads m Tagging scheme q Evaluation m Mozart is one of the fastest emulators m Compares well with native code systems on its target applications m Mozart has very light weight threads 30

Backup Slides for the Discussion 31 Backup Slides for the Discussion 31

Logic Variables vs. Functions q Runtime fibonacci speedup takeushi 1. 18 1. 45 q Logic Variables vs. Functions q Runtime fibonacci speedup takeushi 1. 18 1. 45 q Memory (large scale applications) m Use approx. 18 % of heap memory m Approx. twice as much as objects m Approx. as much as records 32

Memory Profile 33 Memory Profile 33

Mandelbrot (Floats) 1. 00 2. 65 1/1. 11 1/1. 58 1/8. 77 1/11. 23 Mandelbrot (Floats) 1. 00 2. 65 1/1. 11 1/1. 58 1/8. 77 1/11. 23 1. 37 1/39. 24 34

Quicksort with Lists 1. 00 2. 43 1. 57 5. 19 1/2. 59 1/3. Quicksort with Lists 1. 00 2. 43 1. 57 5. 19 1/2. 59 1/3. 69 1/2. 99 1/3. 46 35

Quicksort with Arrays 1. 00 1. 25 1/1. 48 1/4. 01 1/7. 92 1/1. Quicksort with Arrays 1. 00 1. 25 1/1. 48 1/4. 01 1/7. 92 1/1. 52 1/20. 86 36

Naiv Reverse 1. 00 1. 81 1. 59 1. 51 11. 82 1. 04 Naiv Reverse 1. 00 1. 81 1. 59 1. 51 11. 82 1. 04 1/1. 60 2. 05 1. 70 37

Threads: Creation 38 Threads: Creation 38

Threads: fib(20) 1. 09 4. 73 708. 06 1/1. 14 39 Threads: fib(20) 1. 09 4. 73 708. 06 1/1. 14 39

Tagging Scheme of Mozart q 4 bit tag, but only 2 bit loss for Tagging Scheme of Mozart q 4 bit tag, but only 2 bit loss for address space (=1 GB): align structures on word boundaries q Lists, tuples: no need to unmask before type test q REF - tag m no unmask before test necessary m no unmask before deref 40

Threads task move Y 3 X 0 move G 5 X 1 apply G Threads task move Y 3 X 0 move G 5 X 1 apply G 2 2. . . PC L G X thread 41

Emulators: Optimization Techniques q Threaded code q Instruction collapsing q Register access q Specialization Emulators: Optimization Techniques q Threaded code q Instruction collapsing q Register access q Specialization q Example move Y 5 X 3 move Y 6 X 1 34 11 (SPARC) 42

Address Modes (Registers) name X thread liveness Xi notation usage temp. values, parameters local Address Modes (Registers) name X thread liveness Xi notation usage temp. values, parameters local fct-body Lilocal variables global function Gi free variables virtual program Vi constants 43

Threads q Fairness: status-register PRE GC IO . . check on every function call Threads q Fairness: status-register PRE GC IO . . check on every function call (and return) 44

L e : : = | | | x variable n integer (e 1, L e : : = | | | x variable n integer (e 1, . . . , en) tuple fn (x 1, . . . , xn) => e function | | e 0(e 1, . . . , en) application let val x = e in e endvariable declaration let con x in e end constructor declaration case e of p 1 => e 1 |. . . | pn=>en pattern matching Operators lvar unify spawn : () -> logic variable : -> () unification : (() -> () thread creation 45

add Xi Xk Xn Tagged Xi = X[*(PC+1)]; DEREF(Xi); if (is. Int(get. Tag(Xi))) { add Xi Xk Xn Tagged Xi = X[*(PC+1)]; DEREF(Xi); if (is. Int(get. Tag(Xi))) { Tagged Xk = X[*(PC+2)]; DEREF(Xk); if (is. Int(get. Tag(Xk))) { int aux = int. Value(Xi)+int. Value(Xk); XPC(3) = oz_int(aux); ovflw+shifttag+store DISPATCH(4); } } no derefs no type tests overflow 2 2 1+2 1+1+1 3+2+2 0 (2) 0 0 2 0 (2) 3 3 -------277 (11) 23 17 6 46

Java: JIT vs. Emulator speedup quicksort (array) fib (int) fib (float) queens nrev quicksort Java: JIT vs. Emulator speedup quicksort (array) fib (int) fib (float) queens nrev quicksort (list) fib (thread) mandelbrot deriv (virtual) 18. 8 14. 2 4. 9 6. 1 2. 0 2. 3 1. 1 5. 4 1. 9 47