749caa3e8ffc05dcb853b1db81f9f0de.ppt
- Количество слайдов: 18
Data science and economic statistics Louisa Nolan, Senior Data Scientist Alex Noyvirt, Ioannis Tsalamanis, Gareth Clews, Rhydian Page Data Science Campus, Office for National Statistics 22 nd GSS Methodology Symposium 12 July 2017
Government spending investment funds foreignowned balances MONIAC Monetary National Income Analogue Computer national income
Using data science to understand the economy 1. Automatic classification of the financial sector using neural networks 3. Can we use admin data as a superfast indicator of GDP growth? 2. Using data from ship tracking to understand trade
1. Automated classification of the financial sector Total financial asset levels as proportion of nominal GDP, by G 7 country and sector
Proposed taxonomy Financial corporations Monetary financial Central Bank institutions (MFI) Other monetary financial institutions Financial corporations Non-money market except MFIs and funds investment funds Insurance corporations and pension funds (ICPFs) currently published to be published 2017 Other financial intermediaries, except target MFIs and ICPFs classifications Insurance companies and pension funds Pension funds Other deposit Ring-fenced taking. Other UK-owned corporations Foreign-owned Money market funds (MMF) Collective Institutional Open-ended Leveraged investment schemes excl. hedge funds Retail Unleveraged Closed-ended Leveraged Unleveraged Open-ended Leveraged Unleveraged Closed-ended Leveraged Unleveraged Exchange traded funds Hedge funds Private equity Buyout funds Other Financial vehicle corporations engaged in securitisation transactions Security and derivative dealers Financial Include a split of type of lending e. g. mortgages, auto, corporations consumer credit, business) engaged in lending Specialised financial corporations (incl. central counterparties) Financial auxiliaries Captive financial institutions and money lenders Life insurance Non-life insurance Defined benefit Defined contribution
Project scope Data sources Financial Services Survey Inter-Departmental Business Register industry body lists R&D unsupervised machine learning -> clustering of groups of similar companies Outputs high speed data linking Companies House Financial Conduct Authority Reuters Bureau van Dijk web scraping feature extraction supervised, unsupervised machine learning -> classification: mapping of companies to subsectors clusters of companies with similar activity -> useful classification? granular financial statistics for the enhanced financial accounts
Half-time score • Fuzzy dataset linking • highly optimised algorithm (Spark, SCALA) • ~150 million combinations in 2 hours • Sector classification from name alone • 15 - 18% accuracy (19 SIC groups) • Sector modelling part using the validation dataset, FSS • K-nearest neighbours (K-NN) clustering • ~60% accuracy – work in progress • Next steps: • neural networks • ensemble approach – combine several weak indicators • more data… share price movements, annual accounts
2. Tracking ships to understand trade Can we use shipping as an early indicator for GDP? Can we better understand traffic at British ports?
Automatic Identification System
GDP growth rate / % 6. 0 -2. 0 1997 Q 1 1997 Q 3 1998 Q 1 1998 Q 3 1999 Q 1 1999 Q 3 2000 Q 1 2000 Q 3 2001 Q 1 2001 Q 3 2002 Q 1 2002 Q 3 2003 Q 1 2003 Q 3 2004 Q 1 2004 Q 3 2005 Q 1 2005 Q 3 2006 Q 1 2006 Q 3 2007 Q 1 2007 Q 3 2008 Q 1 2008 Q 3 2009 Q 1 2009 Q 3 2010 Q 1 2010 Q 3 2011 Q 1 2011 Q 3 2012 Q 1 2012 Q 3 2013 Q 1 2013 Q 3 2014 Q 1 2014 Q 3 2015 Q 1 2015 Q 3 2016 Q 1 2016 Q 3 2017 Q 1 3. Superfast indicators of GDP growth rate, chained volume measure, seasonally adjusted 4. 0 2. 0 -4. 0 -6. 0 -8. 0 How early can we identify negative GDP growth?
Superfast indicators of GDP growth January February March April quarter 1 May June quarter 2 quarter 1 preliminary estimate quarter 1 2 nd estimate quarter 1 quarterly national accounts
Superfast indicators of GDP growth January February March April quarter 1 June quarter 2 VAT turnover returns May quarter 1 preliminary estimate quarter 1 2 nd estimate quarter 1 quarterly national accounts
Superfast GDP indicator from VAT turnover • start simple • compare the quarter with same quarter a year ago, to minimise seasonality • index = number of companies where [(Tt 0 – Tt-4] > 0 total number of companies in sample • no deflation (yet) • no outliering (yet) • no bias adjustment (yet) • no seasonal adjustment (yet) • where we have a TO value for both q 0 and q-4 • test for month 1, 2 and 3 returns is this a useful indicator of the direction and broad magnitude of GDP growth?
2008 Q 2 2008 Q 3 2008 Q 4 2009 Q 1 2009 Q 2 2009 Q 3 2009 Q 4 2010 Q 1 2010 Q 2 2010 Q 3 2010 Q 4 2011 Q 1 2011 Q 2 2011 Q 3 2011 Q 4 2012 Q 1 2012 Q 2 2012 Q 3 2012 Q 4 2013 Q 1 2013 Q 2 2013 Q 3 2013 Q 4 2014 Q 1 2014 Q 2 2014 Q 3 2014 Q 4 2015 Q 1 2015 Q 2 2015 Q 3 2015 Q 4 2016 Q 1 2016 Q 2 2016 Q 3 2016 Q 4 2017 Q 1 0 500000 400000 300000 200000 100000 0 2008 Q 1 2008 Q 2 2008 Q 3 2008 Q 4 2009 Q 1 2009 Q 2 2009 Q 3 2009 Q 4 2010 Q 1 2010 Q 2 2010 Q 3 2010 Q 4 2011 Q 1 2011 Q 2 2011 Q 3 2011 Q 4 2012 Q 1 2012 Q 2 2012 Q 3 2012 Q 4 2013 Q 1 2013 Q 2 2013 Q 3 2013 Q 4 2014 Q 1 2014 Q 2 2014 Q 3 2014 Q 4 2015 Q 1 2015 Q 2 2015 Q 3 2015 Q 4 2016 Q 1 2016 Q 2 2016 Q 3 2016 Q 4 2017 Q 1 sample size index value 0. 8 0. 7 4 0. 6 0. 5 2 0. 4 0 0. 3 -2 0. 1 -4 GDP, CP NSA Qon. Q 4 growth / % Superfast GDP indicator – results 6 -6 900000 800000 700000 600000 m 1 m 2 m 3 m 1 index m 2 index m 3 index Qon. Q 4 GDP growth CP NSA
Superfast GDP indicator - results 2008 quarter 4 GDP growth = -1. 3% month 3 index 0. 6 0. 5 index value 0. 55 0. 4 -4 -3 -2 -1 0 1 2 3 4 5 6 GDP growth rate, CP NSA, Qon. Q 4 / % 2013 quarter 2 GDP growth = 4. 3%
Superfast GDP indicator - results month 3 index 0. 6 0. 5 index value 0. 55 0. 4 -4 -3 -2 -1 0 1 2 GDP growth rate, CP NSA, Qon. Q 4 / % 3 4 5 6
What have we learned so far? • data science can enhance our understanding the economy • an experimental approach allows rapid prototyping • collaboration with subject matter experts is important • we need to think about implementation early in the project lifecycle • (work can be fun)
contact us web: email: Twitter: www. ons. gov. uk/datasciencecampus@ons. gov. uk @Data. Sci. Campus


