9f7927f3e399ba3d710e406fbc009589.ppt
- Количество слайдов: 25
Better Science with Python Nick Barnes at AMS, 2012 -01 -24 climatecode. org Copyright Climate Code Foundation, license CC-BY 1
What is the CCF? • A UK non-profit founded in 2010; • “to promote the public understanding of climate science…” • … through software activities. • • Continuing projects started in 2008; A few software consultants, currently unpaid part-time; Advisory committee of a dozen experts; A growing network of climate scientists. Nick Barnes at AMS, 2012 -01 -24 climatecode. org 2
What is the problem? Scientists have to write code, but: • They aren’t well-trained; • They aren’t properly rewarded; • There is no incentive to publish it. So science code looks like the industry 30 years ago: • No version control or configuration management; • No issue systems or defect tracking; • No automated testing or test-driven development. Critically: code is being written for computers, not people. Nick Barnes at AMS, 2012 -01 -24 climatecode. org 3
Clear Climate Code • Project started in 2008. • Over-riding goal is clarity: code which interested members of the public can download, run, read and understand. • Open-source, of course. • First target NASA GISTEMP: • ccc-gistemp. googlecode. com • 12 KLOC of Fortran (etc). • became 3678 lines of Python • (including 1500 of docstrings) • fixed minor bugs. • fosters new science: • one paper out now, more draft. Nick Barnes at AMS, 2012 -01 -24 climatecode. org 4
Why clarity? • Original motivation was to answer critics: • • Not the real code; Can’t be run; Contains “obvious bugs”; “divinci code written by the shortbus crew. ” • But also a key message of software engineering: Your target audience is people, not compilers • Those people are, most often, yourselves. Nick Barnes at AMS, 2012 -01 -24 climatecode. org 5
What is clarity? def step 1(record_source): """An iterator for step 1. Produces a stream of `giss_data. Series` instances. : Param record_source: An iterable source of `giss_data. Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record Nick Barnes at AMS, 2012 -01 -24 climatecode. org 6
Clear how? def step 1(record_source): """An iterator for step 1. Produces a stream of `giss_data. Series` instances. : Param record_source: An iterable source of `giss_data. Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record Nick Barnes at AMS, 2012 -01 -24 climatecode. org 7
Clear to whom? def step 1(record_source): """An iterator for step 1. Produces a stream of `giss_data. Series` instances. : Param record_source: An iterable source of `giss_data. Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record Nick Barnes at AMS, 2012 -01 -24 climatecode. org 8
Unclear how? def step 1(record_source): """An iterator for step 1. Produces a stream of `giss_data. Series` instances. : Param record_source: An iterable source of `giss_data. Series` instances (which it will assume are station records). """ records = comb_records(record_source) helena_adjusted = adjust_helena(records) combined_pieces = comb_pieces(helena_adjusted) without_strange = drop_strange(combined_pieces) for record in alter_discont(without_strange): yield record Nick Barnes at AMS, 2012 -01 -24 climatecode. org 9
Unclear how? for m in range(12): sum_new = 0. 0 # Sum of data in new sum = 0. 0 # Sum of data in average count = 0 # Number of years where both new and average are valid for a, n in itertools. izip(average[first_year*12+m: last_year*12: 12], new[first_year*12+m: last_year*12: 12]): if invalid(a) or invalid(n): continue count += 1 sum += a sum_new += n if count < min_overlap: continue bias = (sum-sum_new)/count Nick Barnes at AMS, 2012 -01 -24 climatecode. org 10
Clarity enables new science • By promoting “computational thinking” (Wing, NSF), • Clear code raises new questions… • • • Airport-only trends? Effect of US data? Effect of restricting to long-record stations? Use of land data for ocean cells? Adding more data scraped from met sites? • …and helps answer them… • …for both original authors and others. Nick Barnes at AMS, 2012 -01 -24 climatecode. org 11
Why Python? • Syntax: • • Very small and simple core language; Clear syntax (compared with Perl, C++, Fortran, etc); Indentation for blocks (huge win although often derided); No type declarations or decorations; • • Garbage collection: no code for memory management; First-class functions. “Duck-typing” for maximum code flexibility and re-use; A simple object system; • • A huge amount of useful functionality; Kept out of the way of the core language: explicit import; Great documentation; One great way to do it (not TMTOWTDI). • Semantics: • Library (“batteries included”): Nick Barnes at AMS, 2012 -01 -24 climatecode. org 12
Wait, there’s more: • Open-source: • • Zero cost; No licensing trap, for your audience; Future-proof. “Interpreted” (i. e. has a really good REPL); Long-lived and stable; Very portable (and easy to install); Easy interfaces to other languages and systems; Terrific eco-system; • A BDFL who is right much more often than he is wrong; • And probably more. Nick Barnes at AMS, 2012 -01 -24 climatecode. org 13
So: Why not Python? • Performance; • Concurrency; • Many things not in the library (and may never be); • … so there’s more than one way to do it! • Package management (TMTOWTDI!); • Some unpleasant corners (e. g. @decorators, **kwargs, old -style classes); • 2 vs 3; • Stability not as good as traditional languages; • Language direction: (e. g. lambda deprecated!). Nick Barnes at AMS, 2012 -01 -24 climatecode. org 14
So: Why not Python? • Performance; • Concurrency; • Many things not in the library (and may never be); • … so there’s more than one way to do it! • Package management (TMTOWTDI!); Use a distribution? • Some unpleasant corners (e. g. @decorators, **kwargs, old -style classes); • 2 vs 3; • Stability not as good as traditional languages; • Language direction: (e. g. lambda deprecated!). Nick Barnes at AMS, 2012 -01 -24 climatecode. org 15
So: Why not Python? • Performance; • Concurrency; • Many things not in the library (and may never be); • … so there’s more than one way to do it! • Package management (TMTOWTDI!); Use a distribution? • Some unpleasant corners (e. g. @decorators, **kwargs, old -style classes); Of Python 3? • 2 vs 3; • Stability not as good as traditional languages; • Language direction: (e. g. lambda deprecated!). Nick Barnes at AMS, 2012 -01 -24 climatecode. org 16
So: Why not Python? • Performance; • Concurrency; • Many things not in the library (and may never be); • … so there’s more than one way to do it! • Package management (TMTOWTDI!); Use a distribution? • Some unpleasant corners (e. g. @decorators, **kwargs, old -style classes); Of Python 3? • 2 vs 3; • Stability not as good as traditional languages; Committed to • Language direction: (e. g. lambda deprecated!). Compatibility. Nick Barnes at AMS, 2012 -01 -24 climatecode. org 17
So: Why not Python? • Performance; • Concurrency; With a new implementation? • Many things not in the library (and may never be); • … so there’s more than one way to do it! • Package management (TMTOWTDI!); Use a distribution? • Some unpleasant corners (e. g. @decorators, **kwargs, old -style classes); Of Python 3? • 2 vs 3; • Stability not as good as traditional languages; Committed to • Language direction: (e. g. lambda deprecated!). Compatibility. Nick Barnes at AMS, 2012 -01 -24 climatecode. org 18
A great language is just the start • Vital software development skills and tools: • • • Version control; Defect tracking; Code inspection; Automated testing; Automated building; Bundling and delivery; Documentation; Team-work; Publication. • Many free integrated suites of tools, online and offline. • Beware: “You can write FORTRAN in any language. ” Nick Barnes at AMS, 2012 -01 -24 climatecode. org 19
Google Summer of Code • Google pays students to write code ($5000 for 3 months); • Any open-source project; • Our 2011 projects: • • Hannah Aizenman: Common Climate Project; Filipe Fernandes: Extensions to ccc-gistemp; Daniel Rothenberg: Homogenization; (these names might look familiar if you were here yesterday). • 2012? • • Program to be announced soon (late Jan); we hope to be accepted as a mentoring org (March); then we will welcome student proposals, or collaborations with scientists. Nick Barnes at AMS, 2012 -01 -24 climatecode. org 20
Open Science • Accelerating trend towards more openness in science. • Redefining publication: • • • Open Access; Open Data; Open Knowledge; Open Notebooks; Data-driven intelligence; Workshops, conferences, summits; There’s a war on: PRISM, RWA; Policy studies at AAAS, NSF, Royal Society, etc; But no coherent message about open software in science. • Michael Nielsen: Reinventing Discovery Nick Barnes at AMS, 2012 -01 -24 climatecode. org 21
Science Code Manifesto Code: All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper. Copyright: The copyright ownership and license of any released source code must be clearly stated. Citation: Researchers who use or adapt science source code in their research must credit the code's creators in resulting publications. Credit: Software contributions must be included in systems of scientific assessment, credit, and recognition. Curation: Source code must remain available, linked to related materials, for the useful lifetime of the publication. Nick Barnes at AMS, 2012 -01 -24 sciencecodemanifesto. org 22
Future Plans • Changing policies: • Transparency; • Rewards for all research products. • Training scientists: • Basic techniques (testing, version control, agile, etc); • Code publication and reuse. • Providing resources: • White papers, blog posts; • Directories. • Building networks, partnering with institutions; • Leading by example: • ccc-gistemp; • ccf-homogenization; • etc…. Nick Barnes at AMS, 2012 -01 -24 climatecode. org 23
Questions? Nick Barnes at AMS, 2012 -01 -24 climatecode. org 24
Funding • • I say "non-profit". Approximately “non-revenue". All accounts open. Total revenue to date £ 7037. 94 (+ GSo. C students). Total costs to date £ 3888. 55 (as of 2011 -11 -18). All work unpaid (not counting GSo. C students). Personal lost income to date probably £ 30 -40 K. Funding model seeks £ 150 K-£ 500 K annually from corporate or NGO sponsorship (plus some project money from academic collaborations). • Too much? Not enough? Depends who you ask. • Open to suggestions! Nick Barnes at AMS, 2012 -01 -24 climatecode. org 25
9f7927f3e399ba3d710e406fbc009589.ppt