ffb117d8787439113413553a8b982131.ppt
- Количество слайдов: 92
An Introduction to Parrot Dan Sugalski dan@sidhe. org January 28, 2004
Overview What’s it all about
Purpose Optimized for Dynamic Languages Perl 5, Python, Ruby specifically Run really, really fast Or at least as fast as reasonable under the circumstances • Easily extendable • Easily embeddable • Play Zork • •
History How we got where we are
OSCON 2000 • Infamous mug pitching incident • Perl 6 started • Language and software developed separately
Perl 6 -- not too much bigger • • • That hasn’t lasted Allison’s talking about that one The start was smallish, though Fix the annoyances Amazing how many things turned out to be annoying
Big language umbrella • Not much semantic difference between Perl 5, Python, and Ruby • Perl 6 was obviously going to borg them and a bit more • Even ML and Haskell haven’t been safe • More concepts have gone in as time has progressed
Parrot went for them all • • Yeah, we were getting bored Had to do something We liked Ruby and even Python We hated having multiple interpreters around
Parrot and the Parrot Prank • • • 2001 April Fools Joke Perpetrated by Simon Cozens Parrot -- New language Perl & Python Amalgam Pretty funny as these things go
Timeline • The project came first • Then, the Parrot Joke • We grabbed the name
Non-Purpose • Don’t care about non-dynamic languages • Not much, at least • Other people can worry • Engineering tradeoffs favor dynamic languages
True language neutrality is impossible • • Vicious sham All engines have a bias Even the hardware ones Processors these days really like C
Architecture How it’s supposed to look
Buzzwords • Register based, object-oriented, language agnostic, threaded, eventdriven, async I/O capable virtual machine • No, really
Software goals • • • Fast Safe Extendable Embeddable Maintainable
Administrative goals • Resource Efficient • Controllable • Not suck when used as an apache module • Cautious about whole-system impact
Driving assumptions • • • C function calls are inexpensive L 1 & L 2 caches are large Memory bandwidth is limited CPU pipeline flushes are expensive Interpreter must be fast JIT a bonus, not a given
Interpreter Core in Pictures Frame Stack User Stack String registers Integer registers Interpreter Core Lexicals Float registers Frame Stack Globals PMC registers Control Stack Frame Stack
Parser • Source goes in, AST comes out • Built in part on perl 6 rules engine • Pluggable parser architecture
Compile and optimize (IMCC) • Turns the output of the parser into executable code • Optional optimizing step • Register coloring algorithms provided here
Execution • • Interpreter JIT C code Native executables
Base Engine • • • Bytecode driven Platform-neutral bytecode Register-based system Stacks Continuation-passing style
Bytecode • Directly executable • Resembles native executable format • • Code Constants Metadata No BSS, though
Designed for efficiency • Directly executable • mmap()ped in • Only complex constants (strings, PMCs) need fixup • Converts on size and/or endian mismatch
Platform Neutrality • • If native format, used directly Otherwise endian-swapped Off-line utlity to convert Only difference is speed hit on startup
Registers • All operations revolve around VM registers • Essentially CPU registers • Four types • • Integer Float String PMC • 32 of each
Registers • Parrot’s one RISC concession • Non-load/store must operate on registers or constants • JIT maps VM registers to platform registers if there are some • Otherwise pure (and absolute) memory addressing to VM registers
Stacks • Six stacks • One general purpose typed stack • Four register backing stacks • Push/pop half register frames in one go • Faster than push/pop of frames to general stack • One control stack
Stacks • Bit of a misnomer • Really tree of stack frames • Confusing, though
Continuation Passing Style • Used for calling conventions • Parrot makes heavy use of continuations • If you don’t know they’re there you’ll not care • All Ruby’s fault, really • Hidden from HLL code
Parrot’s data Where the magic lives
Data isn’t passive • Lots of functionality hidden in data • Partly OO • Or as OO as you get in C
Strings • Language neutral • Encapsulate language behavior, encoding, and character set • Annoyingly complex
Basic String Diagram Buffer Info Encoding Charset Language Flags
Encoding • Represents how the bits are turned into ‘characters’ • Code points, really • Even for non-unicode encodings • Handles transformations from/to storage
Character Set • Which characters the code points represent • Basic character manipulation happens here • Case mangling, substrings • Transformations to other character sets
Language • Nuances of sorting and case mangling • Interpretation of most asian text when using Unicode • Ignorable if you don’t care
Unicode • • Parrot does Unicode Used as pivot encoding/charset IBM’s ICU library Didn’t want to write another badly done unicode library
Efficiency concerns • Multiple encodings/charsets means less conversion • Transform data only when needed • Strings are mutable • COW system for space/speed efficiency
The PMC • Represents a HLL variable • Language agnostic • Everything pivots off PMCs
PMC diagram Vtable Flags Cache Data Pointer Metadata GC handle Synchronization
The Vtable • • How all the functionality is implemented Almost everything defers to PMCs Large part of interpreter logic in PMCs Allows fast operator overloading and tying
Some vtable operations • • • Addition Subtraction Multiplication Division Bitwise operations • • Loading Storing Comparison Truth Type conversion Logical operations Finalization
Vtable functions may be Parrot • How languages implement user operator overloading • Used for perl-style tying • Usable for operator wrapping
PMCs are typed • Types can change • Allows customized behavior • Cuts out some overhead
All PMCs indexable • • As array or hash Operations may be delegated PMC may be both hash and array Scalar as well
Multimethod dispatch • • Core interpreter functionality Used for many PMC operations Beats hand-rolling it Dispatch surprisingly fast
Magic all hidden • User code never knows about magic • Allows transparent behaviour changes • One big pivot point for dispatch
Objects • Standard but optional object system • Standard object protocols • Standard object opcodes
Everything can be an object • Objects have attributes • Objects can have methods call on them • All PMCs have get/set attribute vtable entries • All PMCs have a method call entry • Therefore, all PMCs are objects
Objects are cross-language • Obey the protocols and use the facilities and you’re fine • Can even inherit across object systems • Parrot will enforce some invariance
Object system optional • Okay to roll your own • Don’t have to interoperate • Load up your own ops and go for it
Base support for objects • • Scoped method caches Selective cache invalidation Signature based dispatch in core Op support • • Property and attribute access Method call Subclassing can, is, and does
Assembly Language Because hand-generating bytecode is annoying
Sample set N 0, 10 set N 1, 0 loop: print "Hello, world!n" add N 1, 1 # Could be “inc N 1” ne N 0, N 1, loop end
Straightforward • Destination, source add DEST, SOURCE 1, SOURCE 2 • VAX is not dead • Some magic during assembly
Ops pre-exploded • • • No actual add op add_i_I_ic, add_i_I_i, add_p_i_i Etc… Assembler chooses right op No runtime type checking needed No runtime JIT code analysis needed
Ops pre-exploded • Little extra code needed • Ops source has custom macro preprocessor • Reduces maintenance load
Add example inline op add(out INT, in INT) { $1 = $2 + $3; goto NEXT(); }
Add example opcode_t * Parrot_add_i_i_i (opcode_t *cur_opcode, struct Parrot_Interp * interpreter) { IREG(1) = IREG(2) + IREG(3); return (opcode_t *)cur_opcode + 4; } Parrot_add_i_ic_i (opcode_t *cur_opcode, struct Parrot_Interp * interpreter) { IREG(1) = cur_opcode[2] + IREG(3); return (opcode_t *)cur_opcode + 4; }
Very CISCy • • • I like assembly Wanted it to be easily targeted Wanted to be easy to hand-write Good fit to compiler output CISC fits interpreters better
Rich instruction set • • Side-effect of interoperability Nifty side effects Very fast dispatch Much lower JIT overhead
Extensible instruction set • • • Loadable on demand Provides fast access to code Allows language-specific opcodes Even writable in parrot bytecode Blurs opcode/function/method lines
PIR • Parrot Intermediate Language • Slightly higher level than assembly • Runs through the optimizer
PIR • Assembly without the annoyances • Infinite number of registers • Function header and parameter setup
Sample (Same as assembly) $N 0 = 10 $N 1 = 0 Loop: print "Hello, world!n" N 1 = N 1 + 1 ne N 0, N 1, Loop end
Assembly++ • • • Locals Register allocation and coloring Automatic sub creation Simple expressions Calling-convention aware
PIR Example 2. sub _MAIN prototyped. param pmc argv. local int count = argv[0] _printme(count) end . sub _printme prototyped. param int Max. local int Current = 0 Loop: print "Hello, worldn” inc Current ne Current, Max, Loop. pcc_begin_return 1. pcc_end_return. end
Register allocation • • Infinite number of temps Lifetimes are traced and managed Automatic spilling Single nastiest register task
Toys and Tools It’s Alive!
Demos • • Ncurses demo Parrot Basic demo Parrot CGI demo Real Work demo
Functioning Languages • The gag languages • Befunge • BF • Ook!
Functioning Languages • The real languages • • Forth BASIC Scheme Decision. Plus
Functioning Languages • The unfinished languages • • Perl 5 Perl 6 Python Ruby
Security Because sometimes people just suck
Security requirements: Handle • Untrustworthy code • Malicious code • Badly written code
Protection Categories • • Resource usage Access Mistrusted bytecode Isolated Interpreters
Resource usage • Memory, CPU, IO, and time quotas • Individually settable • May be enabled and disabled on the fly with sufficient privilege
Access • Restrictions on what code can do • Introduces a VMS-style privilege system • Areas of higher and lower privilege
Mistrusted bytecode • • Assumes malformed bytecode Verifies all arguments Verifies jump destinations Much slower
Isolated interpreters • Can run code in a separate interpreter • Controlled environment
Quickies Putting a limit on boredom
Events • Async event system built in • One shared, integrated event loop • Everything can use it
IO • • All IO asynchronous Synchronous wrappers provided Integrated with event system Under-the-hood thread games where needed
Threads • Designed to be threaded from the ground up • Not the POSIX thread model, alas • Interpreters too heavy-weight • No guarantees of user safety, just interpreter safety
Parrot Development Always ongoing
Getting and installing Parrot • Point releases • Whenever “Big Things” get done • Get good workout b efore release • Snapshots • Three times a day • For folks without easy CVS access • http: //cvs. perl. org/snapshots/parrot/
Getting and installing Parrot • CVS • Full anon access • : pserver: anonymous@cvs. perl. org: /cvs/public • Rsync • From latest CVS tree • rsync -av --delete cvs. perl. org: : parrot-HEAD parrot
Builds on • Many Unices • • • Linux Mac OS X *BSD Solaris AIX • Win. XP • Visual Studio • Cygwin
Regular automated testing • Tinderbox system • Regular checkout, build, and testing • http: //tinderbox. perl. org/tinderbox/bdsho wbuild. cgi? tree=parrot
Parrot Mailing lists • Parrot-internals@perl. org • Was perl 6 -internals • Most of the action • Parrot-compilers@perl. org • @parrotcode. org soon, hopefully
Questions? ?


