Скачать презентацию WORKSHOP ON SCANNER DATA Geneva 10 May 2010 Скачать презентацию WORKSHOP ON SCANNER DATA Geneva 10 May 2010

7feb34e6246dc6c7c1a7f5bad4ab6687.ppt

  • Количество слайдов: 48

WORKSHOP ON SCANNER DATA Geneva 10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics WORKSHOP ON SCANNER DATA Geneva 10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands)

Historical overview – NL Supermarkets ¡ ¡ Mid 90 s: first contacts with chain(s) Historical overview – NL Supermarkets ¡ ¡ Mid 90 s: first contacts with chain(s) 2002: first implementation: 1/2 chain(s) l Yearly Laspeyres (labour intensive) Construction of yearly basket of items ¡ Manual linking of items to COICOP-groups ¡ Manual replacement of disappearing items ¡ l Reduction of ca 10 000 monthly price quotes in field survey

Historical overview – NL, cont Supermarkets ¡ 2010: extension: 6 chains l Monthly chained Historical overview – NL, cont Supermarkets ¡ 2010: extension: 6 chains l Monthly chained Jevons (efficient process) No manual linking of items ¡ No explicit replacements ¡ l Extra reduction of ca 5 000 monthly price quotes in field survey

Historical overview – N ¡ 1997: first contact with one chain l l ¡ Historical overview – N ¡ 1997: first contact with one chain l l ¡ ¡ Gradually contact with more chains Implementation in the CPI ¡ only price information of specific representative items 2002: scanner data from all the chains (no questionnaires - big incentive) Aug 2005: expanded use for COICOP 01 l price and quantity information for all items in representative outlets

Questions to be answered when dealing with scanner data How/Where require scanner data? ¡ Questions to be answered when dealing with scanner data How/Where require scanner data? ¡ Which statistical method? ¡ How to link items to COICOP? ¡ How to deal with all kind of particularities in data? ¡ Development of new computer system? ¡

Source of scanner data ¡ Market research companies l l l ¡ Cleaned data Source of scanner data ¡ Market research companies l l l ¡ Cleaned data (very) expensive Two-stage delivery chain (timeliness) Companies/Chains l l l Raw data Cheap (NL/N do not pay) Direct contact with original supplier

Negotiations with companies ¡ Time consuming process l l ¡ Negotiations can take up Negotiations with companies ¡ Time consuming process l l ¡ Negotiations can take up to a year or more including meetings, sending test data, analysing data etc. Be aware of some company establishing costs e. g. preparing the data extractions Can company provide what you want/need? l E. g. information to link items to COICOP automatically

Negotiations with companies, cont. ¡ Focus on advantages for companies l l ¡ Minor Negotiations with companies, cont. ¡ Focus on advantages for companies l l ¡ Minor costs once established (just a copy of their sales administration) No questionnaires or monthly visits of price collectors Other incentives for companies? l l Money – not likely Information ¡ E. g. company price development compared to overall price development

Negotiations with companies, cont. ¡ Establishing good routines with the companies are essential l Negotiations with companies, cont. ¡ Establishing good routines with the companies are essential l l Strict time schedules No changes in formats when implemented

Pre - production work ¡ Take your time analyzing the data l l Enormous Pre - production work ¡ Take your time analyzing the data l l Enormous amount of data ¡ N: Over 300 000 price observations each month divided into about 14 000 items Build shadow system (prototype) ¡ Compare the new price indexes based on scanner data with the old method for a certain period of time before implementation ¡ Discover possible problems in advance l Unexpected situations will arise for sure

Pre - production work ¡ Ideas for analysing the data: l l l l Pre - production work ¡ Ideas for analysing the data: l l l l Is same EAN always same item? Extreme price changes Specific price development at beginning or end life cycle EAN structurally ¡ Risk of bias! All kind of dynamics in data Missing prices Do properties of data change over time Etc

Methodology / IT-system ¡ Find methodology that: l l ¡ ¡ Delivers good indexes Methodology / IT-system ¡ Find methodology that: l l ¡ ¡ Delivers good indexes (e. g. no bias) Can deal with all particularities in data Build IT-system that supports the chosen methodology Learn from experiences other countries using scanner data

Properties of data Consequences for methodology NL and N ¡ High attrition rate of Properties of data Consequences for methodology NL and N ¡ High attrition rate of items

Properties of data, cont. Consequences for methodology NL and N ¡ How to deal Properties of data, cont. Consequences for methodology NL and N ¡ How to deal with high attrition rate of items l NL : monthly chained index l N : monthly chained index

Properties of data, cont. Consequences for methodology NL and N Sales: low prices combined Properties of data, cont. Consequences for methodology NL and N Sales: low prices combined with enormous increase in quantities sold

Properties of data, cont. Consequences for methodology NL and N ¡ Consequences of sales: Properties of data, cont. Consequences for methodology NL and N ¡ Consequences of sales: l Single observations can have extremely high influence on elementary index l Risk of bias applying monthly chaining and explicit weights

Properties of data, cont. Consequences for methodology NL and N ¡ Bias not just Properties of data, cont. Consequences for methodology NL and N ¡ Bias not just theoretically! l Example for detergents Formula Weekly index I(200835; 200501=100) Laspeyres 7 794 207. 27 Monthly index I(200808; 200501=100) 11 301. 04 Paasche 0. 0000033 0. 88 Fisher 5. 10 99. 89 Törnqvist 7. 40 101. 53 Jevons 78. 76 91. 75 Walsh 33. 78 107. 72

Properties of data, cont. Consequences for methodology NL and N ¡ How to deal Properties of data, cont. Consequences for methodology NL and N ¡ How to deal with sales? l NL l N crude weighting on item level: w=0 or 1 Manual checks of price ratios that contribute most to elementary results: “critical observations”

Properties of data, cont. Consequences for methodology NL and N ¡ Implausible price changes Properties of data, cont. Consequences for methodology NL and N ¡ Implausible price changes l NL price changes (pt/pt-1) of more than a factor 4 are deleted l l N Changes of +5000% and -99% do actually occur price changes (pt/pt-1) of more than a factor 3 are deleted

Properties of data, cont. Consequences for methodology NL and N ¡Temporarily missing prices Properties of data, cont. Consequences for methodology NL and N ¡Temporarily missing prices

Properties of data, cont. Consequences for methodology NL and N ¡ How to deal Properties of data, cont. Consequences for methodology NL and N ¡ How to deal with temporarily missing prices: l NL: imputation of missing prices l N: no adjustments, but imputing prices is considered for the near future

Properties of data, cont. Consequences for methodology NL and N ¡ Quality differences l Properties of data, cont. Consequences for methodology NL and N ¡ Quality differences l l ¡ Items with same EAN are considered to be identical Items with different EAN are treated as different items (no matching) How to deal with quality differences: l NL l N Only adjustment in exceptional cases: manual interference No adjustment

Actual method - NL ¡ Data received: l For each item each week: EAN Actual method - NL ¡ Data received: l For each item each week: EAN ¡ Short description ¡ (Chain specific) product group ¡ l Used to link items to COICOP automatically Expenditures ¡ Quantities sold ¡

Actual method – NL, cont. ¡ Price of item: l ¡ Unweighted price index Actual method – NL, cont. ¡ Price of item: l ¡ Unweighted price index elementary level: l ¡ Unit value based on first three weeks of month Monthly chained Jevons on selection of items Weighted price index higher aggregates: l l Yearly chained Laspeyres Weights based on scanner data of all 52 weeks of previous year

Actual method – NL, cont. ¡ Item selection at elementary level l l ¡ Actual method – NL, cont. ¡ Item selection at elementary level l l ¡ ¡ Items with low expenditures Other items : w=0 : w=1 Threshold of low (average) expenditure share: Example: threshold =1% for χ=2 and N=50

Actual method – NL, cont. ¡ Determination of threshold value ¡ Simulations lead to: Actual method – NL, cont. ¡ Determination of threshold value ¡ Simulations lead to: l ¡ Optimal value: χ=1. 25 ¡ Ca 50% of items is excluded (on average) ¡ Elementary index based on 80 à 85% of total expenditures Elementary level (chain dependent) comparable with COICOP 6

Actual method – NL, cont. ¡ Refinements: l l l Extreme price changes are Actual method – NL, cont. ¡ Refinements: l l l Extreme price changes are excluded (factor 4) Missing prices are imputed Dump prices at end lifecycle item are excluded (see paper)

Actual method – NL. What advantages were achieved? ¡ Indexes are of higher quality Actual method – NL. What advantages were achieved? ¡ Indexes are of higher quality l l ¡ Response burden for companies is lower l ¡ Compared with old method scanner data Compared with field survey No price collection in the shops Efficiency gains? l l Yes: more or less automatic production process Investment costs (IT-system) were (very) high

Illustrations ¡ Price indexes based on five supermarkets Illustrations ¡ Price indexes based on five supermarkets

Illustrations ¡ Price indexes based on five supermarkets Illustrations ¡ Price indexes based on five supermarkets

Actual method - N ¡ Data received: l For each item in the midweek Actual method - N ¡ Data received: l For each item in the midweek of the month: EAN/PLU ¡ Short description ¡ (Chain specific) product group ¡ Calculated average price ¡ Quantity sold ¡ Expenditure ¡

Actual method – N, cont. ¡ Sample of representative outlets l Stratified by chain Actual method – N, cont. ¡ Sample of representative outlets l Stratified by chain and concept ¡ Matching EAN/PLU with COICOP 6 ¡ Weighted Jevons price index on elementary level with expenditures shares of current and base period; l ¡ Monthly chained Törnqvist index Scanner data weights between the COICOP 6 groups

Actual method – N, cont. ¡ Higher aggregates: l l Yearly chained Laspeyres Weights Actual method – N, cont. ¡ Higher aggregates: l l Yearly chained Laspeyres Weights from HES (NR as of 2011) ¡ Exclude strongly seasonal items only available for a certain period of the year ¡ Manual control and possibly exclusion of extreme contributions to elementary results

Actual method – N What advantages were achieved? ¡ Indexes of higher quality? l Actual method – N What advantages were achieved? ¡ Indexes of higher quality? l l ¡ Low response burden for companies l ¡ New methodology led to reduction of e. g sampling and measurement errors, but also to new biases Much more data – more detailed price indexes Considering both prices and quantities Many indexes have improved, others have not No questionnaires Efficiency gains? l Automatic production process which requires some manual interference ¡ l Resources demanded not much higher than before High investment costs (IT-system)

New methodology ¡ Newly developed index (Ivancic, Diewert, Fox) l Rolling year GEKS price New methodology ¡ Newly developed index (Ivancic, Diewert, Fox) l Rolling year GEKS price index ¡ Source: l l l GEKS-algorithm of purchasing power parities (International Comparison Programme) GEKS index transitive by construction ¡ chained index equals direct index ¡ no chain drift A geometric mean of direct superlative price indexes

New methodology, cont. bilateral indexes (Törnqvist or Fisher) between entities j and l (l=1. New methodology, cont. bilateral indexes (Törnqvist or Fisher) between entities j and l (l=1. . M) and between entities k and l, respectively Purchasing power parities Scanner data : entity is country : entity is month

New methodology, cont. ¡ ¡ Expanding time period leads to revising all previous GEKS New methodology, cont. ¡ ¡ Expanding time period leads to revising all previous GEKS indexes Solution: rolling version (chaining) etc

RYGEKS and NL ¡ RYGEKS specifically developed for Statistics Netherlands as remedy for not-weighting RYGEKS and NL ¡ RYGEKS specifically developed for Statistics Netherlands as remedy for not-weighting at elementary level l Not (yet) applied in practice l Used as benchmark ¡ Finding optimal value threshold l Current method (NL) resembles RYGEKS quite well (on average) ¡ No bias found

RYGEKS and NL: Illustrations RYGEKS and NL: Illustrations

RYGEKS and NL: Illustrations RYGEKS and NL: Illustrations

RYGEKS and NL: Illustrations RYGEKS and NL: Illustrations

RYGEKS and NL, cont. ¡ Plans for near future: l l l Shadow system RYGEKS and NL, cont. ¡ Plans for near future: l l l Shadow system based on RYGEKS indexes Continuous benchmark for current method Implementation when RYGEKS is widely accepted? ¡ More (international) analysis needed

RYGEKS and N ¡ RYGEKS indexes tested on Norwegian scanner data on different levels; RYGEKS and N ¡ RYGEKS indexes tested on Norwegian scanner data on different levels; l ¡ ¡ EAN, elementary and aggregated COICOP levels For COICOP 01 compared a monthly chained Törnqvist index with a monthly chained RYGEKS index The results indicate some bias in the Törnqvist index

RYGEKS and N, cont. ¡ Small deviations for many COICOP aggregates l Milk, Cheese RYGEKS and N, cont. ¡ Small deviations for many COICOP aggregates l Milk, Cheese and eggs, Oils and fats, Vegetables, Fish

RYGEKS and N, cont. ¡ While others show more deviations l Meat, Sugar, jam RYGEKS and N, cont. ¡ While others show more deviations l Meat, Sugar, jam and chocolate

RYGEKS and N, cont. RYGEKS and N, cont.

RYGEKS and N, cont. ¡ Causing bias; l l l ¡ Missing prices Seasonal RYGEKS and N, cont. ¡ Causing bias; l l l ¡ Missing prices Seasonal items (not excluded) Price and quantity oscillating over time Shadow system for calculating RYGEKS indexes on monthly basis established l Too early to be implemented

Scanner data in other branches? ¡ NL: l l ¡ Expanding to other branches Scanner data in other branches? ¡ NL: l l ¡ Expanding to other branches desirable Data available (e. g. durables) Problem of quality changes Analysis needed N: l Continuously working to expand scanner data ¡ l Data available for pharmaceutical products, wine and spirits (state monopoly) and petrol ¡ l Increasing pressure from chains and outlets Mostly price information implemented Have tried to cover clothing, but matched item model unsuccessful