392b464fd8d057a8641e8a298887d85f.ppt
- Количество слайдов: 33
Liquid Version Climber: a reproducibility-based approach for upgrading complex software Dennis Shasha joint work with Christophe Pradal, Sarah Cohen. Boulakia, and Patrick Valduriez 1
Personal Story • XYZ. com likes to use open source software. • Free, right? • They want to upgrade My. SQL or Python or some sort of Object-Relational System or Scikit Learn or … • Then the nightmare begins. • Not so free anymore. 2
At First… • I thought this was the fault of open source software. So many bugs. So much more need to upgrade. • Then I thought much better to use a single system that can do everything (paradigm: kdb which is a database and a fast array language rolled into one). • First is partly true. Second works, but not if you would like to use packages. 3
Now… • I like open source. • I like libraries that can squeeze the last ounce out of machine learning algorithms. • I don’t like upgrading (or seeing my students lose time upgrading). 4
Where Does the Time Go? • Set up new system with newest version of software package P and hope for the best. • If this doesn’t work, try other versions of the other packages. • First level of the Inferno… if only Dante could have lived in our time. 5
Questions to Answer • Can this be automated cleanly? – Yes, use reproducibility tools and virtual machines • Can this be automated cleverly? – Yes, depending how much information can be captured. 6
Formalizing the Problem • Given a configuration of packages and versions that work: P 1. v 1, P 2. v 2, … Pk. vk • Given an ordering of some of these packages Pi 1, …, Pij along with permitted versions of all packages V 1, V 2, … Vk • Intuitively, maximize the versions of Pi 1 from Vi 1, then Pi 2 from Vi 2, …. 7
Order Among Configurations • Configuration C 1 = Pi 1. vi 1, Pi 2. vi 2, … Pij. vij, … • Configuration C 1’ = Pi 1. vi 1’, Pi 2. vi 2’, … Pij. vij’, … • C 1 > C 1’ if either vi 1 > vi 1’ or vi 1=vi 1’ and vi 2 > vi 2’, or …. or vi 1=vi 1’ and … vij-1 = vij-1’ and vij>vij’ • If C is some configuration that executes and there is no other configuration C’ that executes such that C’ > C, then C is maximum. 8
Find a Maximum Configuration • Use Repro. Zip (a tool for reproducibility) to capture all the dependencies of the current configuration. • Extract version numbers from packages • Naïve: find every possible combination of versions in Vi 1, … Vij of package and create virtual machines for each one. Try each. 9
Liquid Version Climber 10
Deeper Look at Components • Repro. Zip uses ptrace to track system calls including file opens, starts of processes. • Net effect is that the reprozip image contains all the used packages of a particular execution. • Parser determines which packages are involved. 11
Virtual Machines • The liquid virtual machine is a repository where the packages are all present, but none of the versions are specified. • Each configuration that is tested has all the package versions specified and is “frozen” into a real virtual machine. • Thus the liquid virtual machine plays the role of a generator frozen virtual machines. 12
Improving on Exponential Search • Assumption Pairwise/Global: An execution of configuration P 1. v 1, P 2. v 2, … Pn. vn executes if for all i, j, such that Pi calls Pj, no call from Pi. vi fails on a call to Pj. vj. • Heuristic Implication: If we can detect that the call from Pi. vi to Pj. vj fails, then any configuration that has Pi. vi and Pj. vj will fail. 13
How To Detect Call That Fails • In interpreted languages, we can look at the call stack to determine which call failed. • In compiled languages, we might know only which package has failed. • Sometimes can use minimal pairs. If config P 1. v 1, … Pn. vn works and we change Pi. vi to Pi. vi’ and there is a failure in Pj. vj, then Pi. vi’ must have called Pj. vj. 14
Algorithm Sketch • Restrict versions of each package to permitted ones. • Current = initial working configuration • In order from highest priority package to lowest, use highest version possible. • Whenever an execution fails due to some Pi. vi calling Pj. vj, “play” with versions of Pi and Pj. • Remember bad calls. 15
When “pushing” package p • When optimizing the version for a package p (pushing p), go to the highest desired version first (optimistic). • Upon seeing a call that fails Pi. vi to Pj. vj, if Pi precedes p in the priority order then consider only vi, otherwise consider allowed versions of Pi. Same holds for Pj. 16
Complexity • If M is the maximum number of versions to consider in any package and n is the number of packages in the configuration, then there are potentially n * (n-1) package calls and we might have to test M*M versions, so worst case complexity is: • O(M*M*n*n) • Expensive but sub-exponential 17
Upward Compatibility • Much software says things like “depends on OSX or greater. ” • That is the CUDF model of Treinen and Zacchiroli, where user makes such assertions. • Upward compatibility: If Pj is upward compatible then if Pi. vi calls Pj. vj and succeeds, then Pi. vi will succeed on any Pj. y for y >= vj. 18
Why Should Upward Compatibility Help? • Every time we push a package we start with the highest version (optimistic). • Whenever a call doesn’t work, we consider the highest versions of the caller and callee. • If Pj is upward compatible, then any caller to Pj will have the best chance of working if Pj is at its highest version. 19
New Complexity • n packages in total, c are upward compatible. • M versions per package max. • (M*M)*(n-c)*(n-(c+1)) – non-upward compatible packages calling one another • + (M*M)*c*(n-c) – upward compatible packages calling non-upward compatible ones. • + c – Maybe two tests per package. 20
Experiment • 3 D Canopy reconstruction and light interception • The model is implemented using the Scientific Workflow Open. Alea (Pradal et al. , 08) • Challenge: Testing a large number of genotypes to find those genotypes that work best in a particular environment, e. g. plants should have small leaves in deserts. 21
The model Pradal et al. , SSDBM, 2015 22
Configuration Package Language Synopsis Versioning Adel R / Python Canopy reconstruction & ecophysiology svn MTG Python Multiscale Tree Graph svn Caribu C++ / Python Radiative model (Radiosity) svn RPy 2 R / Python / R wrapper Py. Pi (web) Pandas C / Python Data Analysis Git Plant. GL C++ / Python 3 D library svn Open. Alea Python Scientific Workflow Git Num. Py, Sci. Py, Matplotlib, … Python / C / Fortran Scientific Libraries Git / Py. Pi 23
Configuration Package Versioning Initial Working Configuration (Year) # versions Adel Subversion 2013 340 MTG Subversion 2012 52 Caribu Subversion 2014 20 RPy 2 Py. Pi / Mercurial 2014 11 One run takes around 1 mn. Testing all combinations would take 8 years 24
Demo Parameters • When an error happens do we always know the caller as well as package that fails. Knowcaller=True • Do we try very recent versions first (optimistic) or climb up from the last version that works (pessimistic). • How many versions of the four principal packages (n 1, n 2, n 3, n 4). 25
Demo Case 1: know. Caller=False, optimistic, (4, 3, 4, 2) • Try four configurations from cross-product: • {'adel': ['3417', '4506', '5176', '5650'], 'caribu': ['4122', '4387', '5049'], 'mtg': ['13066', '14424', '16399', '18392'], 'rpy 2': ['2. 5. 1', '2. 7. 0']} • Final configuration was latest of everything. 26
Demo Case 2: know. Caller=False, optimistic, (340, 20, 6) • Once again, tried just four configurations • Final configuration was latest of everything. 27
Demo Case 3: know. Caller=False, optimistic, (300, 50, 20, 10) • This time, versions have been shuffled, so last version in the list is not really the last. • Tested 35 configurations • Final configuration was second to last for first package, and then middle versions for others. • If system can’t determine which package fails, we may not get to optimal. 28
Demo Case 4: know. Caller=False, pessimistic, (5, 3, 3, 3) • Versions out of order again • Tested 9 configurations • Final configuration latestversion for three out of four packages (again, need proper error identification). 29
Demo Case 5: know. Caller=False, pessimistic, (5, 3, 3, 3) • Versions out of order again, but this time underlying system always returned “I don’t know what caused error” • Tested 25 configurations, but always got latest version. • Using optimistic, took 31 configurations. 30
Demo Case 6: know. Caller=True, optimistic, (300, 10, 3) • As in demo 3, versions are shuffled even more • Tested 9 configurations • Final configuration was real best possible one in every case. • If you can identify the errors, you’re golden. 31
Implementation • Liquid Version Climber is a pure Python package • Dependencies are found with Repro. Zip • Each execution is isolated using a Virtual Environment (Virtual Env) • List of available versions are retrieved from git, svn or on public repositores (Py. Pi). • Each execution is run in isolation. Versions are installed using pip. • Errors are found automatically using reprozip (slow) or based on user knowledge or if we don’t know then algorithm will still work. • Soon avaliable on github. 32
Summary • Infrastructure for testing configurations automatically in “sandboxes”. • Algorithms for searching efficiently. • Ready for prime time in 6 months. 33
392b464fd8d057a8641e8a298887d85f.ppt