0f7336f64d048a7f63fbd2c6660b7313.ppt
- Количество слайдов: 51
ESRC Census Development Programme Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography
Why is income so important? • Arguably the most direct measure of utility [Yes, I know - “Money can’t buy happiness”] • Helps target Neighbourhood renewal • Helps with planning – [of houses, shops, leisure facilities…] • Consumer marketing • Tax-benefit analysis
• Most requested addition to 2001 Census
The 2001 Census Geography of income:
Other sources of data on income • Benefits data • Government surveys (e. g. GHS, LFS, FES, FRS, NES) • Commercially-held data [Postcode sector and postcode unit estimates] • The Census Rehearsal (1999)
Key Objectives Evaluation of: • Extant methods for small-area income estimation • New approaches • Utility of non-census information (e. g. council tax; house price; benefits data) [ • Methods of imputing income band means ]
Definition of ‘income’ • • • Income Wealth Gross or net income? Pre or post housing costs? Adult or Household? – Total – Equivalised [Per capita / OECD / Mc. Clements]
Surrogates • Univariate – % unemployed – % 2+ car households – % residents in Social Classes I + II – % owner-occupation
• Multivariate (deprivation indices) – Carstairs [Unemployment, overcrowding; not owning car; head in Social Class IV or V] – Townsend [Unemployment; overcrowding; not owning home; not owning car] – Breadline [not owning car; not owning house; lone parenthood; social class IV or V; illness; unemployment]
– DLTR Index of Multiple Deprivation 2000 – Green (Wealth) [owning 2+ cars; NS-SEC I or II; High qualifications]
• Geodemographic – Super. Profiles – MOSAIC – GB Profiles
• Model Individual income – Dale (SOC 2000; Economic activity; age; sex; Region] – Lee (SOC 2000; Economic activity] – Regression (individual and/or ecological) Household income – Regression (household and/or ecological) – Bramley & Smart (H/h comp. ; earners; tenure; area level deprivation)
The 1999 Census Rehearsal Key features • full census questionnaire + INCOME • Large achieved sample – c. 65, 000 households – c. 140, 000 individuals • Spatially contiguous
Clustered sampling strategy: – 7 part districts [Excluding NI] – 38 wards – 650 EDs
Potential problems • non-response rate – overall (~ 50%) – income (~15%) – other variables (5 -20%) – full responses for ~ 55 % of achieved sample [individuals and households] • non-response bias
• Banding of income question What is your total current gross income from all sources? Per week Nil Less than £ 60 to £ 119 £ 120 to £ 199 £ 200 to £ 299 £ 300 to £ 479 £ 480 or more or _ _ _ Per year (approximately) Nil Less than £ 3, 000 _ £ 3, 000 to £ 5, 999 £ 6, 000 to £ 9, 999 £ 10, 000 to £ 14, 999 £ 15, 000 to £ 24, 999 £ 25, 000 or more
– Only 10% of adults in top band – but problem compounded when individual incomes aggregated to estimate household income – band mid-point band mean – value of band means area sensitive?
Source: FRS 1998/9 (Crown Copyright)
Digression: modelling income band means Alternative modelling strategies include: • National mean • Sub-group mean (e. g. by council tax band) • Statistical distributions (log-normal; pareto) • New variant of log-normal approach with addition of modelled median etc.
Results • For all bands sub-group mean best – if possible • For closed-bands, national mean is next best • For open (top) band, new proposed lognormal approach is best, particularly where there is evidence of strong spatial clustering
Results of modelling top income band
Results of modelling top income band
• Spatial scale – At what scale does income vary most? • MAUP – 1991 vs 1998/9 boundaries – zones with <10 households or 25 residents excluded from analysis • SOC 2000 / NS-SEC – Lack of alternative SOC 2000 coded data – Therefore have to use Census Rehearsal data – Use partitioned data to avoid unduly advantaging SOC 2000 based approaches
Results
Census Rehearsal Income Distribution
Heterogeneity rules OK! • At ward level the % household reps. in top income-band averaged 9. 1% – but ranged from 2. 8% to 21. 6% • 89% of EDs contained one or more household reps. in top income-band – i. e. in top income-decile of the population
Missing data • Missing data have minimal impact on results – From ‘Raw’ to ‘Ideal’ data, most correlations change by <0. 02 – Very few values change by >0. 05 – Exception is NS-SEC 8 [by definition!] – Correlations lower for ‘Ideal’ than ‘Raw’ • Surrogates calculated direct from Rehearsal – circumvents data response bias?
Scale • Higher correlations at higher geographies • District effect small but significant – BUT none of districts in SE England Overfitting • No significant impact
MAUP • Correlations vary by up to 0. 1 between alternative boundaries at same spatial scale BUT • No detectable effect on rankings of surrogate income measures
Adult income (r 2)
Regression model (adults) • • Age, Age 2, sex, ethnicity, marital status Type and tenure of dwelling Qualifications Economic (in)activity and health Mean SOC 2000 and SIC 2000 income Supervisory status District of residence
Caveats • ‘Best’ performing surrogates in danger of over-fitting? – For Dale, Lee and Voas mean occupational income calculated directly from Census Rehearsal dataset (no other SOC 2000 sources available at time of analysis) BUT – No significant difference if SOC minor or unit codes used – No significant difference if data partitioned
Household income (r 2)
Accuracy • For many purposes relative, rather than absolute, accuracy is most important ranking
Other data sources • < 1% of unexplained spatial variation in income attributable to area level effects • House price has no significant impact – could be due to data problems • Council tax band has small but significant effect [for areas of enumeration district size and below] • Lack of utility counter-intuitive? – current value purchase price – purchase income current income
Conclusions (I) • Best approaches capture 80 -90% of spatial variation in income, even for smallest spatial units • But considerable within-area heterogeneity • Best approaches are regression or subgroup mean based • Conventional deprivation indices a poor second to % social class / NS-SEC I+II
Conclusions (II) • Geodemographic classifications at best perform as well as % NS-SEC I+II, and perform best for areas of ward size and above • Qualified support for use of statistical distributions in modelling top income band means
Implications Moral for marketers: • Target people, not places Moral for policy makers: • Deprivation indices not the best proxy for income • ONS ward income estimates (based on ecological regression) likely to perform well
Longer term • Consider external correlates (e. g. IMD 2000; benefits data) • Lobby for Census Office to create smallarea income estimate – by imputing income on Census microdata – include non-census information (? )
Acknowledgements • House price data were taken from the Experían Limited Postal Sector Data, ESRC/JISC Agreement. • Grateful thanks are due to the Census Custodians of England, Wales and Scotland for granting permission to access the Census Rehearsal dataset. • A debt of gratitude is also owed to a number at the Office for National Statistics, in particular Keith Whitfield and Philip Clarke. • Finally, thanks are due to David Voas for undertaking some of the preparatory work for this project. • All analyses and conclusions remain the sole responsibility of the Dr Paul Williamson.
Definitions (I) • NS-SEC I+II: % persons aged 16 -74 in NS-SEC I or II • Townsend: Multiple deprivation indicator based on % economically active unemployed; % overcrowded households; % households with no car and % of households not owner occupied • Green (Wealth): Affluence indicator based on % households with 2+ cars; % persons aged 16 -74 in NS-SEC I and % adults with high educational qualifications • PCA_96: Geodemographic classification based on principal components analysis of 20 normalised census variables, individuals in each of 96 area types assumed to have mean income of all persons in area type • Voas: Alternative geodemographic classification, in which five census variables are divided into above or below median, one variable into thirds; with all cross-tabulated to give a total of 96 discrete area types
Definitions (II) • Dale: Income imputed given mean income for population sub-group defined by sex, SOC 2000 minor group, economic activity (missing; employed full-time; employed part-time; self-employed; other), age (missing; 0 -15; 16 -19; 20 -29; 30 -49; 50+) [Maximum of 4860 valid sub-groups] • Lee: Income imputed given mean income for population sub-group defined by SOC 2000 minor group, economic activity (child; not applicable; employed full-time; employed part-time; self-employed; unemployed; retired; other inactive) [maximum of 649 valid subgroups]
Definitions (III) • Voas (individual): Regression model for adult income (children assumed to have 0 income); INCOME 0. 5 predicted given: mean income by SOC 2000 unit; mean income by Industry category, age 2, residents 2, rooms and cars plus dummy variables for sex, white, full-time student, married, Single/Widowed/Divorced, Longterm ill, No qualifications, GCSE or equivalent, A levels or equivalent, Undergraduate degree or equivalent, employed full-time, employed part-time, self-employed, unemployed, retired, permanently sick, other economically inactive excluding pensioners and students, Semi-detached, terrace, flat, caravan, privately rented, social rented, employed manager or supervisor and district of residence • Voas (household): Regression model for total household income; HHINC 0. 5 predicted given same set of predictors as for Voas (individual), but based only upon head of household’s characteristics


