Chapter 6 ASSESSMENT AND IMPROVEMENT OF DATA COLLECTION PROGRAMME
- International Monetary Fund
- Published Date:
- June 2002
6.1. This chapter deals with the fourth line of action in the NOE measurement strategy, namely improvement of the basic data collection programme so as to reduce the amount of production that is non-observed. Assessment of non-observed activities and their indirect measurement during compilation of the national accounts (as described in Chapters 4 and 5) will almost certainly reveal weaknesses in the basic data. Addressing these weaknesses is the basis for long-term improvement in NOE measurement. Thus the objective of this chapter is to provide an assessment template for systematic review of the basic data collection programme and identification of potential improvements. The template comprises a description of the desirable components and characteristics of the programme, accompanied by a list of review points. Whereas the presentation in Chapters 4 and 5 are from a national accountant’s perspective, in this chapter the discussion is aimed primarily at the survey statisticians responsible for data collection.
6.2. Exhaustive coverage of activities within the production boundary is just one aspect – admittedly a very important aspect – of data quality. Thus, NOE measurement cannot be handled independently of other quality assessment and improvement initiatives. It must be harmonised with them. There is a wide range of survey design and quality management documentation available, including the new IMF Data Quality Assessment Framework summarised in Chapter 4. However, such material covers a much broader range of quality issues than simply exhaustive coverage. Thus, the aim of this chapter is to extract the essentials that have some bearing, directly or indirectly, on NOE measurement and to present them within a comprehensive assessment template that takes account of other quality improvement considerations.
6.3. The assessment template covers the following points:
Statistical data requirements. Identifying the data requirements of major users is the starting point for defining programme content. Inadequate understanding of user needs may result in misallocation of resources and outputs that do not match user expectations. In a nutshell, how much do users care about the NOE and are their data needs being met?
Institutional framework. The legislative framework, organisational structure, planning and quality management practices all affect the ability of a programme to meet its user needs efficiently and to deal with problems such as the NOE. In short, will the institutional framework support efforts to improve NOE measurements and, if not, how can it be changed to do so?
Conceptual framework. Identification and use of appropriate standards is vital for integration of data from the usual wide range of data sources. Does the programme make use of the international standards?
Data collection mechanisms. Choices have to be made in the selection of administrative and survey data sources. Have the right choices been made from the NOE perspective?
Survey frames. The quality of the business register is the main determinant of the coverage of enterprise surveys and the extent to which the data they produce are consistent with one another. A general-purpose household area frame provides an operational integrating framework for household surveys and determines their coverage. Are improved survey frames likely to be a source of significant improvements in dealing with the NOE?
Survey design principles and practices. Good survey design is vital in addressing coverage, response and reporting problems whether or not they are NOE related. Are there improvements in survey design practices that could address NOE problems?
Enhancing programme content. Assessment of the extent to which the basic programme provides the exhaustive coverage required for the national accounts was discussed in Chapter 5. Having done everything possible to improve existing surveys and data from administrative sources, what new data collections should be added?
Relationship with NOE analytical framework. The final section relates the assessment back to the analytical framework.
6.2. Statistical Data Requirements
6.4. The starting point for the review of the data collection programme is the set of economic statistics required by the users and the uses to which these statistics are put. Users and uses define the data requirements that the programme has to address. Given that there are many diverse users and uses, it is helpful to classify them into broad categories. The significance of the various components of the NOE on the statistical outputs can then be more easily assessed in terms of their effects on the major types of use and users.
6.5. As an example, the major uses to which users put the data can be classified as macro-economic analysis (structural and short term), micro-economic analysis (short term, industry based, activity based, and business dynamics), and regional analysis. Users can be grouped into seven broad headings:
internal statistical office users, specifically including the national accounts area;
national government – the national bank, and the ministries dealing with economic affairs, finance, treasury, industry, trade, employment, environment;
regional and local governments;
business community – individual large businesses and business associations;
trade unions and non-governmental organisations;
academia – universities, colleges, schools, research institutes, etc.;
media – newspapers, radio and TV stations, magazines, etc.;
6.6. The table in Annex 3.1 indicates how, based on broad use and user categories, data requirements may be summarised. This table should be regarded as illustrative only. In fact, there is no international standard stipulating exactly what statistics a national statistical office should produce, although the Special Data Dissemination Standard (SDDS) and the General Data Dissemination Standard (GDDS), created and maintained by the International Monetary Fund (2002a, 2002b), and the Eurostat (1998) statistical requirements compendium go part way in this direction.
6.7. From the perspective of exhaustive measurement of the GDP the most important data requirements are those associated with GDP compilation. However, although the 1993 SNA defines precisely the structure of the national accounts and the corresponding data components, it does not specify how or at what level of detail these data should be obtained. The table in Annex 3.2, prepared by the Interstate Statistical Committee of the Commonwealth of Independent States, is an example of how the basic data requirements may be more explicitly identified. The first column of the table summarises the minimum data needed for compilation of the production account, the generation of income account (compiled for industries or economy as a whole) and the goods and services account. With these data, GDP can be estimated by the three methods. Annex 3.3 indicates the minimal data for compilation of sector accounts.
Have the national accounts data requirements been identified and made known to the branch statisticians?
Have survey statisticians documented their procedures for collection, editing, estimation and preparation of data outputs in such a way that potential NOE problems can be identified?
Have deficiencies in the data outputs with respect to national accounts requirements been documented and made known to the survey statisticians?
Have the main users, uses and corresponding data requirements been analysed and documented?
How concerned are the main users about any of the NOE problem areas? The bigger their concerns, the more effort NOE measurements merit.
6.3. Institutional Framework
6.8. Satisfying data requirements requires an institutional framework within which to collect, process and disseminate the data. An important element of this framework is the legislation within which the statistical office operates. Typically it is proclaimed in one or more statistics acts and in accompanying or supplementary government regulations, which should include:
the right to collect data;
the obligation to ensure that the data collected are used only for statistical purposes except with the express permission of respondents and except for certain types of data that are in any case publicly available;
the right to access for statistical purposes data that have been collected for government administrative purposes;
the obligation to ensure that no individual data are released either consciously or unwittingly;
freedom from political interference in the timing or content of data releases, i.e., independence of the statistical office from political pressure;
designation of an agency responsible for overall co-ordination of official statistics within the country; and
arrangements for appointment and removal of the chief statistician of that agency that do not depend upon political whim.
6.9. Without such safeguards it is difficult to address data problems. Eurostat (1999a) presents a generic statistics act that can be used as a starting point for development of a statistics act tailored to a specific country.
Is there a statistical act, or equivalent, providing appropriate legislation along the lines noted above?
Does the spirit of the legislation permeate the organisational culture? Is political interference in the timing and content of data dissemination resisted? Are confidentiality rules respected and seen to be respected? Does fear that confidentiality will be breached result in NOE problems such as non-response or misreporting?
Does the national statistical office have access to administrative data? Are there administrative sources that could provide additional coverage?
6.3.2. Organisational Structure
6.10. Another aspect of the institutional framework is the organisational structure of the national statistical office. Typically it comprises the following functions:
economic data collection, processing and analysis – comprising subject matter areas concerned with the collection of economic data by surveys and from administrative sources and the processing and analysis of these data;
social data collection, processing and analysis – comprising subject matter areas, concerned with social data;
national accounts, balance of payments and economic analysis – areas concerned with integration and analysis of data from surveys and administrative sources;
marketing and dissemination – assessing user requirements, segmenting users into groups, managing output;
concepts, standards and methods – developing, promoting, and monitoring use of a common conceptual framework, survey best practice, and quality management;
information technology – developing and promoting effective use of data processing, data and metadata management, and communications technology;
management and services – budgeting, planning, personnel, pay, training, etc. – required in any organisation, not just a statistical agency.
6.11. These functions may be combined or split in a variety of different ways. For example, the economic and social subject matter areas may not be separated and instead there may be a split into business register, data collection and capture, and data analysis functions. The functions may be divided between central and regional offices. Furthermore, they may be divided among several agencies, hence the need to distinguish the national statistical office – the lead statistical agency – from the national statistical system. However, regardless of the particular arrangements, all the functions should be present. The fundemental principles of offical statistics (United Nations Statistical Commission, 1994) provide guidance.
Does the statistical office recognise and attach appropriate importance to these functions in its structure? Is the present structure out of date? Does it enable communication within and between functions, in particular between data collection and national accounts areas?
Does the structure unnecessarily constrain conception and implementation of improvements in NOE measurement?
6.3.3. Planning Framework
6.12. The national statistical office should have a strategic master plan that outlines the major initiatives and statistical outputs envisaged over the next five years or so. It should be accompanied by a multi-year plan that indicates the provisional allocation of resources by project and function over the next, say, three years, and by a definitive allocation of the budget for the current year. The multi-year plan and budget allocations should be updated on an annual cycle. They provide the basis for and expression of priority setting amongst the many ongoing programmes and development initiatives that are competing for resources. The Multi-annual Integrated Statistical Programme (MISP) framework developed by Eurostat (1999b) indicates the sort of information that needs to be developed and maintained.
Does the statistical office have a multi year plan? if not, this is a priority.
Are there procedures for involving external and internal data users in the planning process?
How can NOE related initiatives best be included within the planning framework?
6.3.4. Quality Consciousness and Organisational Culture
6.13. The performance of an organisation is often thought of in terms of two aspects – effectiveness (the organisation is doing the right thing) and efficiency (the organisation is doing the thing right). To ensure effective and efficient operations a statistical office should have a quality programme. Measurement of the NOE should be seen as just one, albeit very important, factor in the quality programme. The quality programme should be based on the sort of principles that have been popularised in the total quality management literature over the last two decades, including (see Colledge and March, 1993):
Customer (user) focus: user satisfaction is a paramount goal; establish partnerships with users; define quality of statistical products in terms of fitness for use.
Supplier (respondent) focus: establish a partnership relationship with respondents by ensuring that the reporting characteristics of respondents are well understood, that respondents are motivated to report, and that the burden on them is minimised.
Internal partnerships: improve the quality of the data collection processing and dissemination process by considering it to be a chain of customer-supplier interfaces.
Continuous improvement: define quality and encourage small scale initiatives to improve it, develop and use quality measures.
Reengineering: start from scratch in large scale redesign initiatives.
Total employee involvement: value all staff members, promote their involvement, give them responsibility and resources to make improvements, and recognise their achievements.
Quality management programme: the programme does not necessarily involve a “quality unit”, but it does require quality to be championed and budgeted by a very senior staff member or committee so that it is high on the organisation’s agenda and permeates the organisation’s culture.
6.14. At the core of a quality programme is the definition of data output quality. Here quality is interpreted broadly in terms of fitness for use, not just accuracy. As elaborated by Eurostat (1999d), Brackstone (1999), Carson (2001) and others, the dimensions of quality typically include:
relevance: the data serve well the identified needs;
accuracy: the data are sufficiently accurate for the purposes to which they are targeted and their limitations in respect of accuracy are made known;
timeliness: the data are produced sufficiently early to be useful;
presentation: the data are well publicised, easy to access and easy to understand;
consistency: the data are not subject to major revisions or to differences across the various media in which they are disseminated; and
coherence: the data can be analysed meaningfully in conjunction with previous data from the same survey or other related data.
6.15. Data output quality is a measure of the effectiveness of a statistical office’s performance. The efficiency aspect is reflected in the quality of the organisation’s internal organisation and processes. Aspects of this are as follows.
The statistical office is a learning organisation where creative ideas are encouraged and nurtured, where problems can be openly discussed and not underground for fear of repercussions.
There is communication, co-operation and data sharing between staff in all areas, and at all staff levels, from the highest levels of management down. Staff are aware of data and developments in other areas that might help them in their own work.
Staff in data collection areas go beyond their traditional data gathering and processing role and assess the quality of their statistical outputs. Where possible, they cross check their outputs with data from other areas in the agency, knowing the degree of consistency they should expect to find.
Staff are open to new ideas. They do not have a tendency to defend existing methodologies simply because of workload pressures and the need to meet tight deadlines.
Senior management play a leadership role in developing an environment where critical assessment of existing methodologies is encouraged, together with a willingness to accept/adopt changes required to resolve identified short-comings in the basic data collection programme.
Are the survey statisticians responsible for analysing and validating the statistical outputs they produce?
Are the survey statisticians and national accounts in open and regular two-way communication, sharing data, problems and ideas? In particular are survey staff aware of and involved in addressing NOE problems?
Is open expression of problems possible? Are creative ideas encouraged and followed up?
Are senior managers fully supportive of quality management in general and are they funding initiatives to improve measurement of the NOE in particular?
Do senior managers deal effectively with critical comment from users about the quality of the statistics produced by the agency, in particular with criticisms that the NOE is not being measured?
Is staff adequately trained in survey design and operations? Are survey and questionnaire design guidelines and manuals readily available?
6.3.5. Data and Metadata Management
6.17. Data acquired from individual surveys and administrative sources are typically processed and analysed in separate organisational units each focusing on a particular subject matter area. To ensure appropriate communications between these staff and with the final users, all the metadata – i.e., the information about the definitions, sources and methods required for accessing, combining, interpreting and using the data – should be stored in a commonly accessible metadata repository. Furthermore, to ensure that the resulting statistical outputs may be readily combined and viewed collectively, they should be brought together in an output database that can be accessed by all staff and, with appropriate security and possibly payment, by external users. This database may physically contain the data. Alternatively, it may be a conceptual store that appears to be a single integrated repository but that actually interfaces seamlessly with data physically stored in a number of separate databases. An integrated output database facilitates horizontal confrontation of data from different sources and encourages survey staff to go beyond the purely vertical data flows within individual organisational units. Papers by Sundgren (1997) and Colledge (1999) provide further details and there is more documentation available in a Eurostat (1999c) handbook on information technology, also from the Statistical Output Database Seminar Series (Statistics Canada, 2001) and the UN/ECE Work Sessions on Statistical Metadata (UN/ECE, 2002).
Are the definitions, sources and methods metadata required to identify and analyse NOE problems readily available?
Are the data required to identify and analyse NOE problems readily available and easy to combine and analyse?
6.4. Conceptual Framework
6.18. The effective integration of data from a broad range of administrative sources and statistical surveys depends upon the definition and use by a statistical office of a common conceptual framework for its data collection programme. This framework should be based on the relevant international standards, supplemented as necessary by country specific standards and operational practices.
6.19. International standards alone are not sufficient. For example, the 1993 SNA defines the data items required to compile GDP, but it does not specify how they are to be obtained from the data actually available in business records, which are based on country specific legislation and accounting standards. It indicates the need for a breakdown of large enterprises into smaller producing units, but it does not specify how these units should be derived. It designates Resolution II of the Fifteenth (1993) International Conference of Labour Statisticians as the basis for defining the informal sector, but does not specify a precise operational definition. It stipulates the use of ISIC Rev 3 for classification by industrial activity but provides no standards for classification by geography or size, which are also needed for sampling and analytic purposes. In summary, the international standards need to be extended along the lines described in the following paragraphs.
6.4.2. Statistical Units
6.20. As noted in Section 2.3.2, the 1993 SNA and ISIC Rev 3 suggest that an enterprise engaged in a range of different activities and/or at a number of different locations be divided into smaller, more homogeneous producing units that can be more precisely classified and that collectively represent the enterprise as a whole. Partitioning an enterprise by reference to its activities results in one or more kind of activity units. Partitioning an enterprise by reference to its various locations results in one or more local units. Using both methods of partitioning simultaneously results in one or more establishments. The 1993 SNA does not specify the operational procedures by which these statistical units should be delineated nor in what circumstances they should actually be used, although the European Council Regulation on Statistical Units (European Commission, 1993) provides some additional guidance.
6.21. The simplest arrangement of all is to have no breakdown of enterprises at all and, for better partitioning of the data, to request each enterprise to report its data broken down by kind of activity and location. However, this is in effect asking each enterprise to make its own division without providing guidance on how to identify the units into which the enterprise’s activities should be subdivided. It does not work well, as an enterprise cannot be expected to understand the breakdown that is required.
6.22. On the other hand, dividing enterprises into kind of activity units, local units and establishments is heavily resource intensive in terms of both the investigations required and the computer systems. So the benefits of maintaining four different types of standard statistical unit may not justify the costs. There is no “right” set of statistical units, i.e., standard statistical units model. In practice, a national statistical office must identify the standard statistical units that it intends to maintain, based on the enterprise structures that are typically found in the country and the size of the statistical office budget. It is even possible that an additional standard unit should be defined for statistical purposes, comprising groups of enterprises that are linked by ownership and control. This occurs when there are groups of enterprises that are, in effect, operating like single enterprises and need to be treated as such. In summary, the key principle in the choice of the statistical units model is that it should be as simple as possible whilst providing detail sufficient to meet user needs.
6.23. The practical process of dividing enterprises into smaller producer units (or combining them) in accordance with a statistical units model is referred to as profiling. Profiling procedures include the specifications for handing births, deaths, mergers and other changes of enterprises and any associated producing units, as discussed later.
6.4.4. Classification by Industry
6.24. ISIC Rev 3 provides the international standard classification by industry. It has four levels, which going beginning with the most detailed, are Class, Group, Division, and Section (Tabulation Category). Additional breakdown of some industry classes may be appropriate, depending upon the circumstances in the country.
6.4.5. Classification by Geography
6.25. As countries are quite different in shape and size, there is no international standard for geographical classification, although the EU provides guidance for its Member States. The statistical office should design and promote a national standard. The factors to be taken into account are:
user needs for geographic breakdown;
the area boundaries that are of most utility for sample stratification and data collection; and
existing administrative boundaries – in addition to the fact that users may require data for administrative areas, it is cheaper for the statistical office if another organisation is responsible for defining and maintaining the geographical descriptions.
6.4.6. Classification by Size
6.26. In comparison with people, or even households, enterprises are very heterogeneous. In particular they can vary enormously in size and hence have very different impacts upon the statistical aggregates to which they contribute. Thus classification by size is vital for sampling and data collection purposes, and useful for analysis. The European Community Council Recommendation on structural business statistics (European Commission, 1997) specifies a classification by size, but there is no international standard. Experience suggests that, for most data collection purposes, a classification of producing units into four basic size groups based on number of employees along the following lines is useful:
Large: more than X employees – typically X is in the range 50-200;
Medium: less than X but more than Y employees – typically Y is in the range 20-100;
Small: between Y and Z employees – typically Z is in the range 1-5;
Micro: 0 employees.
6.27. Sometimes no distinction is made between small and micro and/or between medium and large. Sometimes the micro group is split into market producers and producers for own final use, in accordance with 1993 SNA (Para 6.52). All boundaries, X, Y, and Z, may be varied according to the industrial division of the units being classified in order that the producing units within the large (or large and medium) categories account for a specified percentage of the total production within each division.
Are survey staff familiar with all aspects of the conceptual framework that are important in terms of NOE definition and analysis?
Has a standard model for statistical units been defined? Is it too simple to provide adequate industrial and geographical breakdown of data? Is it too complex to be implemented in practice?
Are classifications by industry, geography and size based on international and national standards? Does misclassification result in a significant undercoverage of enterprises?
Have the procedures for transforming from business to national accounting concepts been defined and fully understood by survey and national accounts staff?
6.5. Data Collection Mechanisms
6.28. To produce the data outputs required, a statistical office collects and transforms basic data from the institutional units – corporations, government units, households and non-profit institutions serving households – in their roles as producers, consumers and investors, income earners, etc. There are two basic mechanisms for collecting economic data. They are, access to data already being collected for administrative purposes, and direct survey by the statistical office. The relative merits of these two mechanisms, and the basis for choosing between them, are discussed in the following paragraphs. In either case, however, the original providers of the data are the same, namely the institutional units, and the original sources of the data are the same, namely the records kept by these units. Typically, these records are set up by the units in response to legislated administrative requirements or simply for internal purposes to assist the units in managing their operations. In the case of corporations, for example, corporate law requires certain accounting reports, tax laws require income tax returns, and payroll deductions records for employees. Only a very few data items, for example opinions asked by business tendency surveys, do not depend upon such records. Where appropriate records are not maintained by the units being queried, the statistical office may persuade the respondents to set up special records for reporting purposes, for example to collect data on household spending patterns, but this is a difficult and expensive process. Thus, the records kept by enterprise and household units typically set a limit to the data that can actually be obtained from these unit, whatever the user requirements.
6.5.2. Administrative Sources
6.29. Administrative processes are set up in response to legislation and regulation. Each regulation (or related group of regulations) results in a register of the institutional units – enterprises, persons, etc. – bound by that regulation and in data resulting from application of the regulation. Typically the register and data are referred to collectively by the statistical office as an administrative source. Administrative sources thus produce two types of data that can be used by the statistical office for statistical purposes:
registration data, describing the institutional units that are required to register under the legislation – useful in building and maintaining lists of units as the starting points for surveys; and
transaction data, describing the transactions administered under the legislation – useful to supplement or replace surveys.
6.30. Each administrative register of enterprises is potentially usable by the statistical office to create and maintain a single statistical register as the starting point for data collection from enterprises. As elaborated later in this chapter, such a statistical register is commonly termed the business register and contains a list and details of enterprises (and other statistical units) on the basis of which survey samples are selected.
6.31. Administrative data have some significant advantages relative to survey data. It is invariably cheaper for the statistical office to acquire data from an administrative source than to conduct a survey. Furthermore, administrative sources provide complete coverage of the population to which the administrative process applies and generally have very high response rates.
6.32. On the other hand, the fact that the administrative processes are not under statistical office control limits their data coverage, content, quality, and consistency, and hence their use. An administrative process such as employer registration almost inevitably does not use the standard statistical definitions of the corresponding units and data items. It does not cover enterprises that are not employers. The administrative data referring to numbers of employees and wages and salaries will not be sufficient for all statistical purposes. The classifications of the data, for example by industry, are unlikely to be in exact accordance with statistical standards and may be not be based on coding procedures that are sufficiently reliable for statistical purposes. Furthermore, administrative processes are subject to change in response to new legislation without much (or any) regard for the impact on the statistical series.
6.5.3. Statistical Surveys
6.33. Administrative data alone are not sufficient to meet all the needs of the basic programme. Thus they must be supplemented by statistical surveys,1i.e., direct collections by the statistical office of data for statistical purposes. Conducting surveys is the main activity of the statistical office.
6.34. The advantages of surveys over data from administrative sources are that the data items to be collected and the collection and processing procedures are under statistical office control. Also, in principle, respondents have less reason to deliberately misreport as the statistical office guarantees that the data it collects are strictly confidential and that they will not be used for administrative purposes. The disadvantages of surveys are higher costs, higher non-response rates, and additional respondent burden. Another problem is that in practice respondents may not trust the confidentiality clause.
6.35. Given its budget, an NSO has to choose what surveys to conduct as part of its regular programme, and of what type these surveys should be. Surveys may be divided into five general types according to the units sampled and/or contacted: enterprise surveys; household surveys; mixed household enterprise surveys; indirect enterprise surveys; and price surveys.
6.36. Enterprise surveys are those in which enterprises (or statistical units belonging to these enterprises) constitute the sampled units, the reporting units from which data are obtained, and the observation units about which data are obtained. By contrast, in household surveys the sampled, reporting and observation units are households. In mixed household-enterprise surveys the sampled units and initial reporting units are households but the final observation units are enterprises. In indirect enterprise surveys, the reporting enterprises are asked for data about a different set of enterprises, i.e., the observation units do not belong to the reporting units. An example would be a survey of city markets in which the market administrators are asked about the numbers and turnover of the market traders. Price surveys are those used to obtain data on prices, which may involve collection from enterprises or households, or direct observation of prices in the market.
6.37. Surveys may also be classified as list based or area based depending upon the source of the list of enterprises or households from which the survey sample is drawn. In a list based survey, the initial sample is selected from a pre-existing list of enterprises or households. In an area based survey, the initial sampling units are a set of geographical areas. After one or more stages of selection, a sample of areas is identified within which enterprises or households are directly listed. From this list, the sample is drawn and data obtained.
6.38. Each type of survey has its own particular characteristics and appropriate uses, as described in the following paragraphs.
6.39. In a list based enterprise survey, the initial sample is selected from a pre-existing list of enterprises. Typically the list is supplied from the business register (described later) that is maintained by the statistical office to support a range of surveys. Sometimes the survey list is derived from another administrative register. In an area based enterprise survey, a sample of areas is selected, within each of which enterprises are enumerated and then sampled. List based enterprise surveys are generally preferred to area based surveys for the following reasons:
A list-based survey is more efficient from a sampling perspective. Because the area based approach involves cluster sampling, a larger sample is required to achieve a given level of precision than in the case of list based survey.
It may be difficult to enumerate the enterprises within an area. While retail enterprises are likely to be readily visible, service enterprises that carry out their work in other locations are usually difficult to identify.
Maintenance of a list of enterprises via a general purpose business register is cheaper than maintenance of an area based list, except for very small enterprises;
Area based sampling is inappropriate for large or medium sized enterprises that operate in several areas because of the difficulty of collecting data from just those parts of the enterprises that lie within the areas actually selected. Furthermore, in order to avoid inadvertently missing parts of the enterprise, it is usually considered preferable to collect data from the whole of an enterprise not just a part of it.
6.40. Thus, area based enterprise surveys are typically only used for collection of data from small enterprises (particularly agricultural smallholdings) and then only when no adequate list exists. Even in these circumstances, a mixed household-enterprise survey (described below) may be preferable. Those area based enterprise surveys that do exist are usually supplementary to a list based survey. An example is the long-standing area frame component of US Bureau of the Census Retail Trade Survey (Monsour, 1976).
6.41. Household surveys are valuable in providing coverage of production by household enterprises that are too small to be recorded in any readily usable administrative list of enterprises. As household surveys exist for the purposes of collecting labour force and household expenditure data, additional questions related to production activities can be added at relatively little extra cost. This makes the use of a household survey generally cheaper than conducting an area based enterprise survey for the same purpose. However, the responding unit is a person in a household, not an enterprise, thus the data that can be collected about the activities of the enterprise may be correspondingly more limited.
6.42. Some statistical offices maintain, or can access, population or household registers, at least for urban areas, and thus can conduct list-based household surveys. However, there are few such registers, so most household surveys are area-based.
Mixed household-enterprise surveys
6.43. In a mixed household-enterprise survey, a sample of households is selected and each household is asked whether any of its members is an entrepreneur, i.e., the sole proprietor of, or a partner in, an unincorporated enterprise. Data for all the enterprises thereby identified (or for a sub-sample of them) are then collected – either immediately from the respondent reporting on behalf of the enterprise or in a subsequent stage of data collection. Thus the feature of a mixed household-enterprise survey that distinguishes it from a household survey is that it collects information about enterprises per se, whereas a household survey collects information about the persons in a household, including possibly their personal contributions to enterprises.
6.44. Mixed household-enterprise surveys can thus provide coverage of small enterprises that are not included in list based enterprise surveys. However, they suffer from similar disadvantages to area based enterprise surveys, namely the inefficiency of the sample design and the difficulty of handling enterprises with production units in more than one location.
6.45. In addition, an enterprise that is a partnership may be reported by each of its partners who may be in different households. The duplication of coverage that this implies has to be allowed for in the survey estimation system. This is the feature that distinguishes a mixed household- enterprise survey from an area based enterprise survey, as, in the latter case, enterprises are directly identified and listed (hopefully) without duplication. The process of producing an unduplicated list is the reason why area based enterprise surveys are generally more expensive than mixed household-enterprise surveys.
6.46. In summary, mixed household-enterprise surveys are sometimes preferred to household surveys or area based enterprise surveys for estimating the production of small units that are excluded from list based enterprise surveys.
Indirect enterprise surveys
6.47. An example of an indirect survey is where the enterprises that administer city markets are asked for data about the holders of the market stalls. This sort of survey provides only limited data about the observation units and often only in aggregate form.
6.48. Producer and consumer prices are usually collected by entirely separate surveys from those used to measure production or expenditure. With few exceptions, the survey samples are not probability samples – as the items selected for pricing and the enterprises from which the prices are collected are chosen purposively. Thus exhaustive coverage is not a goal.
Choice of survey type
6.49. Table 6.1 illustrates how registration, sampling and surveying mechanisms could vary according to size of enterprise.
|Registration and collection characteristics||Size of enterprise|
|Small and micro||Medium||Large|
|In business register||No/yes||Yes||Yes|
|Need for profiling||No||No||Yes|
|Collection vehicle||Interview/self completion short form||Self completion long form||Self completion long form|
Have all administrative data sources been thoroughly examined to determine to what extent the data they contain can be used to support the statistical programme? Are there unused administrative sources that would help address NOE coverage and misreporting problems?
Is there scope for partnerships with administrative agencies with a view to enhancing administrative sources to better satisfy statistical office data needs, in particular to address NOE coverage problems?
Are the existing surveys the most appropriate type for the size of enterprise being contacted?
Is there any way in which the present set of surveys can be combined or split thus releasing resources that could be applied to better NOE measurement?
Are there any surveys in the present programme that can be eliminated totally, thus releasing resources that could be applied to better NOE measurement?
6.6. Survey Frames and Business Register
6.6.1. Survey Frame Requirements and Characteristics
6.50. The starting point for every survey is the survey frame, i.e., the set of units subject to sampling and the details about those units required for stratification, sampling and contact purposes. The set of units and data are collectively referred to as frame data. The survey frame has more influence than any other aspect of survey design upon the coverage of the survey and hence on measurement of the NOE.
6.51. Ideally the frame for a survey should contain all the units that are in the survey target population, without duplication or superfluous units. Associated with each unit should be all the data items required for efficient stratification and sample selection, for example, industrial, geographical and size codes, and these data should be accurate and up to date. Also associated with each unit should be the contact information – name, address and description of the unit, telephone and preferably a contact name and this should all be accurate and up to date. The extent to which survey frames in practice fall short of these requirements determines to a considerable extent the size of the NOE statistical deficiency problem area.
6.6.2. Need for Business Register
6.52. The frame for (almost) every list-based enterprise survey belonging to the basic programme should be derived from a single general purpose, business register maintained by the statistical office.2 There are two basic reasons for using a single business register. First, and most importantly, the business register operationalises the selected model of statistical units and facilitates classification of units according to the agreed conceptual standards for all surveys. If survey frames are independently created and maintained, there is no means of guaranteeing that the surveys are properly co-ordinated with respect to the coverage they provide. For example, there may be unintentional duplication of coverage by surveys that are supposed to have mutually exclusive target populations; some enterprises may fall between the cracks and not be covered by any survey. Second, it is more efficient for a single organisational unit within the statistical office to be responsible for frame maintenance than for each survey unit to create the frames for each of its surveys.
6.53. In some countries, enterprise survey frames are derived from lists created during periodic enterprise censuses or from a specially maintained area frame. This is not an ideal arrangement. At the very least there should be a permanent business register containing the very large enterprises in view of the special treatment that these must be accorded on account of their geographic or industrial diversity.
6.54. The only surveys that need not be based on a business register are those where the frame is derived from a well-defined administrative process and for which there is no requirement for co-ordination with other surveys. An example might be a survey of registered banking organisations that collects financial information specific only to the banking industry.
6.6.3. Construction of Business Register
6.55. In principle, a business register can be built from scratch and maintained by the statistical office by enumerating all the enterprises within the country. However, this is an impossibly expensive process. Thus, the starting point for a business register is invariably one or more administrative (business) registers, that is registers of enterprises that are created and maintained to support the administration of regulations. The ideal administrative register would be one that provided complete, up to date coverage of all enterprises within the 1993 SNA production boundary without duplication or inclusion of defunct units, and that contained all the appropriate frame data, i.e., classification and contact items required for sampling and data collection. However, given the broad range of enterprises within the production boundary, including household enterprises, even those with no market output, there is no such perfect source. Thus the choice of administrative registers on which to base the business register is a compromise.
6.56. In some countries (for example France) the administrative register used by the statistical office to underpin the business register results from a regulation that specifically takes account of statistical needs and that is actually administered by the statistical office. In this case coverage and content are likely to be very good, though not perfect. In most countries, however, use is made of an administrative register maintained for another purpose, for example, value added tax in New Zealand and pay-as-you-earn income tax deductions in Australia. The resulting business register is inevitably deficient in coverage and content. The greater the difference between the set of enterprises defined within the 1993 SNA production boundary and the administrative register underlying the business register the greater the risk of non-observed production.
6.57. The coverage and content of a business register can be improved by incorporating data from several administrative sources. This is illustrated by the following examples from Canada and Ukraine.
6.58. Until June 1997 the Statistics Canada business register was based on one primary administrative source, namely payroll deduction accounts maintained by Revenue Canada. Thus it included only enterprises with employees. Since 1997 the business register has been using three additional sources maintained by Revenue Canada, namely incorporated tax accounts, goods and services tax accounts and import/export accounts. Blending these data was made possible by the introduction of a single business number for all enterprises. Castonguay and Monty (2000) provide more details.
6.59. The State Statistics Committee of Ukraine maintains a business register for administrative and statistical purposes, known as the Unified State Register of Enterprises and Organisations of Ukraine. As described in World Bank (2001), the register is based on essentially three groups of administrative sources, but the Statistics Committee has responsibility for assigning the enterprise identification numbers.
Most enterprises are registered under national legislation by the district or regional administration in their locality. Registration is a prerequisite for obtaining appropriate permissions, including being able to open a bank account. An enterprise may be registered as a legal person or may operate as an entrepreneurship under the legal framework of one or more natural persons. A legal person may also register a geographically separate part of itself (division, affiliate, etc.) as a local unit. The business register records all legal persons and local units thus registered.
Enterprises engaged in certain regulated types of activity (e.g. banking, stock exchange) and other legal persons are registered by the bodies that administer the legislation under which they exist, e.g. Ministry of Justice, State Tax Administration, Ministry for Foreign Economic Relations and Trade, the State Committee for Religious Affairs Commission for Securities and the Stock Market. These are also recorded in the business register.
Other enterprises, including government budgeted organisations, professional associations, non-market co-operatives and associations of apartment owners are not required to register formally their economic activities. However, these enterprises are required to complete a registration card for statistical purposes and they are recorded in the business register.
6.60. The key feature of both of these examples is a common enterprise identification coding system. In fact, use of multiple administrative sources is practical only if they are known to contain mutually exclusive sets of enterprises or if they share a common identification coding system that allows records for the same enterprise to be brought together. Experience has shown that trying to identify units across registers in the absence of a common identifier is impossibly expensive unless one of the registers is very small.
6.61. Even in the event of a common identification coding system, the use of multiple administrative sources should be undertaken with care. The gain in coverage resulting from the incorporation of data from an additional administrative source may not justify the increase in cost, particularly if the additional source is of poor quality. To take a specific example, based on the Ukrainian case described above. Suppose that the primary source for a business register is an administrative source based on legislation that requires all enterprises other than those in the household sector to register. Suppose, furthermore, that all individuals operating a household enterprise are required to file a personal tax return reporting their business earnings under personal tax legislation and that the resulting list of personal tax returns is made available to the statistical office. Should it be used as a second source? The benefit is that the tax returns provide coverage of household enterprises that are not covered by the primary source. Furthermore, there is no duplication, as no business will be registered to pay tax twice. However there are some quality problems to consider. Firstly there are likely to be a very large number of household enterprises of which quite a large proportion will go out of business each year. Thus the tax list contains many records that refer to inactive or dead businesses. Secondly, the industrial activity codes associated with the tax records are likely to be unreliable. Finally, there are a significant number of duplicates in the tax list because each of the business partners in joint proprietorship files a tax return. These quality problems have to be addressed if the tax list is to be incorporated in the business register and used for sampling purposes. The question to be answered is whether the additional coverage that the tax list provides justifies the costs of dealing with the problems that it brings. The alternative, and quite possibly better, approach is to obtain coverage of household enterprises through an entirely different mechanism such as an area frame household based survey.
6.62. The administrative register usually provides a list of legal entities, or some breakdown of these entities, to suit the administrative purpose for which it is designed. Typically it does not provide a list of enterprises broken down into establishments (or other statistical units) according to the statistical office units model and classified by activity. The information to support such a breakdown is gathered by business register staff using a form of register survey commonly referred to as profiling. Profiling procedures include all the rules for identifying the enterprises and other units defined in the statistical unit’s model.
6.63. Profiling of enterprises often requires personal visits by statistical office staff and tends to be expensive and resource intensive. Thus typically it is restricted to large enterprises. It is rarely worthwhile subdividing medium or smaller enterprises, even if they are engaged in a variety of activities. First, the enterprise may not actually be able to report data for subdivisions of itself. Second, the loss of information by not subdividing is usually statistically insignificant and less than the errors in trying to obtain a subdivision.
6.64. In summary, the development of a business register can be seen along a continuum of gradually expanding scope and complexity:
at a minimum, comprising a list of large enterprises divided into establishments;
including medium and small sized enterprises derived from a single primary administrative source;
including additional enterprises and data from supplementary administrative sources.
6.6.4. Register Maintenance: Dealing with Enterprise Dynamics
6.65. Enterprises do not remain the same over time. The institutional units that own them may merge or amalgamate; they may split up or go out of business; they may change production activities, they may move location; and so on. New enterprises may be created (births), existing enterprises may cease to exist (deaths), and ongoing enterprises may change activity. Births, deaths, and changes of classification of enterprises must all be fully defined, and the corresponding business register procedures must be articulated. For example, it must be clearly stated whether an enterprise can be deemed to continue existence through a change of ownership, or whether a change of ownership inevitably means the death of an enterprise and the birth of another. For practical reasons, these procedures depend upon the sources of information for updating the business register. There are three basic sources, namely administrative sources, feedback from enterprise surveys, and business register surveys, as further described below.
6.66. Given the large number of small enterprises in any market economy, it is vital that maintenance of the business register is automated to the maximum extent possible. This means that the frame data for small enterprises is maintained essentially by updating the register from administrative sources. Updating must be substantially automated as there are neither the time nor resources for register staff to verify all the frame data received from each source. Staff effort should be focussed on collecting and verifying frame data for the medium and larger enterprises that cannot be automatically updated.
6.67. Administrative registers are notorious for containing inactive units. Thus, it is vital to make use of any information from administrative sources that can indicate whether the enterprise is active or not. For example, if the administrative source contains information about enterprises required to make payroll deductions on behalf of employees, then the date of the last recorded deduction and the total size of the deductions over the preceding year and a half are good indicators of enterprise activity. No deductions suggest that the enterprise is inactive, at least as an employer. This information can used to reduce the number of inactive enterprises.
6.68. Notwithstanding comprehensive use of all the administrative sources available, the data obtained will be somewhat deficient in terms of activity classification, of contact information, and of the ability to track an unincorporated enterprise through a change of owners. The sale of an enterprise may well appear in the business register as the death of an enterprise and the birth of another, in line with the changes recorded in the underlying administrative register.
Feedback from enterprise surveys
6.69. Feedback from enterprise surveys is a vital source for maintaining the business register as regards medium and large enterprises. However, for small enterprises that are sampled with probability less than one in repeating surveys, updating information has to be carefully applied so as not to cause bias in future survey samples. For example, suppose that when a particular quarterly survey is first conducted, the sample is found to contain 30% dead enterprises. (This is not an improbable figure.) Furthermore, suppose that, based on this sample information, the dead enterprises are removed from the business register, and that the survey sample for the next quarter comprises the 70% live units from the previous sample plus a replacement of the 30% drawn afresh from the register. This new sample contains about 9% (30% of 30%) dead units. Thus it is no longer representative of the population of dead enterprises on the register which is still nearly 30%, assuming that the survey sample is a relatively small proportion of the population. There are proportionally too many live enterprises in the sample. If the weighting procedures do not take this into account by, in effect, including the dead enterprises that were found in the samples, the result will be an upward bias in the estimates. Furthermore, the bias will get increasingly worse with each survey repetition.
Business register surveys and profiling
6.70. Register updating information that cannot be obtained from the administrative source on which the register is based, or from survey feedback, has to be obtained by business register surveys (sometimes termed nature of business surveys) and profiling operations conducted by business register staff. Re- profiling large enterprises is a resource intensive activity but vital to keep the register up to date when institutional units owning large enterprises go through complicated changes like mergers, amalgamations, split-offs, etc.
6.6.5. Area Frames for Household and Enterprise Surveys
6.71. For household surveys, the direct equivalent of the business register is a household register. However, in most countries there is no administrative source on the basis of which a household register can be readily constructed and maintained. Thus an area frame is commonly used as the starting point for household surveys, in particular for labour force and household budget surveys that are likely to form part of the basic programme.
6.72. Construction and maintenance of a household area frame involves:
division of the country into area segments, using information about the numbers of households in each segment obtained from the population and housing census;
selection of a representative sample of segments;
a two, three, or even four stage design – typically involving different treatment of urban and rural areas – the penultimate stage being enumeration of all households within the areas selected and the final stage being selection of a sample of these households;
systematic maintenance of the selected areas and enumerated households;
replacement of the frame following the next census when new information on the numbers of households in each area is available.
Is there a single, general-purpose business register? At the minimum there should be a business register containing large, multi industry and/or multi region enterprises that require profiling.
Is the business register based on the most suitable administrative sources? What enterprises within the 1993 SNA production boundary are not covered?
What is the time lag between the birth of an enterprise and its appearance on the business register? Is this a serious source of undercoverage?
What is the quality of industrial and size classification of the units in the business register? Is misclassification likely to be a source of NOE and if so how can it be addressed?
What are the proportions of duplicate and inactive and dead units in the business register? Are they so large as to prevent efficient surveying of small enterprises?
What are the proportions of units in the business register with wrong or missing contact details? Are they so large as to prevent efficient surveying of small enterprises?
Are large units with multiple activities in more than one region profiled in accordance with the units model?
Is the business register actually used as the source of survey frames for enterprise surveys, or do some surveys use other sources, thus risking duplication or omission in coverage?
Is the information about enterprises obtained by surveys fed back to the register? If so, is it fed back in such a way as to avoid survey bias?
Are area frame maintenance procedures adequate? Is the frame kept current or is there a risk that some areas that may be important from the NOE perspective are not properly represented?
6.73. Similar design principles are used for area based enterprise surveys as for area based household surveys. However, as the ultimate object is to enumerate enterprises, the size criterion used in delineating the areas is the number of enterprises rather than the number of households.
6.7. Survey Design Principles and Practices
6.74. An important part of the statistical office infrastructure is a set of standards and best practices for survey design, data collection, processing and dissemination, and for the use of data from administrative sources. Application of internationally or nationally accepted methodology not only ensures that good practices are incorporated, it speeds up the process of survey design and it facilitates the use of standard computer software. However, although there are international regulations, standards, and guidelines providing a solid conceptual framework, as discussed in Chapter 2, there are relatively few that provide up to date guidance on survey design methodology and even fewer that deal with the use of administrative data. The survey design handbook by Eurostat (1998) is a good starting point. It references the relevant European Commission regulations, directives, and recommendations and other manuals. Beyond this, each statistical office must develop its own best practices. There are a number of documents produced by individual statistical offices that are useful. For example, “Quality Guidelines” (Statistics Canada, 1998) and “Statistical Quality Checklist” (Office of National Statistics, 1998) provide good design and planning checklists for surveys and administrative collections. There are also many textbooks on sample and survey design, of which Cochran (1977) is probably the best known. The following paragraphs provide a summary and review points applicable to each individual survey, with particular reference to the NOE measurement problems of undercoverage, non-response and misreporting.
6.7.2. Survey Objectives, Users and Uses
6.75. The starting point for design of each survey within the programme is identification of the primary groups envisaged as users of the data and the basic uses that they make of the data. These uses are interpreted concretely in the form of the main statistical output tables that are required and the frequencies with which they are needed. The outputs are then expressed in terms of the input data to be requested from enterprises or households that will report to the survey.
6.76. The target population, i.e., the set of enterprises (or establishments), or households, about which the data are required, must be established. In particular, for enterprise surveys it must be decided if the estimates are to include small and micro enterprises. Typically this decision will depend on the balance between the extent of the deficiencies in the estimates caused by their omission and the costs of their inclusion. If they are excluded they become part of the NOE.
6.77. The possible sources of the required data within the enterprises or households are identified and the feasibility of data acquisition thereby determined. Target response rates are specified, as should the maximum allowable response burden, particularly in the case of enterprise surveys. The resource and operational constraints within which the survey must operate have to be articulated as they influence all aspects of the design.
Can the target population be increased to provide better coverage of activities inside the 1993 SNA production boundary? If so, can the costs associated with providing the additional coverage be justified?
Are the data items consistent with the 1993 SNA framework?
6.7.3. Collection Vehicles and Questionnaire Design
6.78. For many enterprise surveys, mail out mail back of self-completion questionnaires is the most efficient collection method. Both the mail out and mail back may be by regular mail, telefax or e-mail depending on the respondents’ preferences. For enterprise surveys covering a limited set of variables, collection by telephone may be possible and quicker, though usually more costly. In industrialised countries, personal interviews are usually inappropriate although they may be used in the collection of complex information, for example in conjunction with profiling very large enterprises to establish data supply procedures or to solve data supply problems. In developing countries on the other hand, labour costs are generally low enough to justify personal interviews particularly where literacy rates are low and enterprise accounting is not well developed. Rapid progress in electronic processing and communications technology means that the ultimate goal of automated data collection, direct from enterprise computer to statistical office computer, may soon become feasible.
6.79. For household surveys, personal or telephone interview is more frequently appropriate than mail questionnaires.
6.80. The design of questionnaires has a significant impact on response rates and incidence of misreporting. Questionnaire design is a specialised craft, involving knowledge of accounting practices (how enterprises or households keep their records), of the cognitive reactions of respondents (how they interpret questions), of subsequent data capture procedures (how easily statistical office staff are able to convert the responses into electronic form), and of the underlying data requirements (how data are to be transformed to 1993 SNA concepts). In a nutshell, the questionnaire must:
indicate the purposes for which the data are being collected and the confidentiality provisions;
motivate the recipient to respond, for example by explaining briefly the uses of the data; and the confidentiality provisions;
be concise yet clear, with adequate but not too many instructions and an attractive layout; and
ask only for data that are needed and that can reasonably be provided without undue respondent burden.
6.81. Questionnaire testing prior to its use, and evaluation after its use, are an essential part of the survey design process.
Is every question on the questionnaire essential? Are there questions for which the responses are never captured or captured but never used, thus contributing to unnecessary respondent burden and increased risk of non-response?
Could a questionnaire with a reduced set of questions be used for small enterprises, thus reducing the risk of non- response?
Is the layout of the questionnaire attractive or does it contribute to non-response and misreporting?
Does the questionnaire emphasise the confidentiality of the results?
Does the questionnaire contain clear instructions? Do respondents understand the questions? Has their comprehension of the questions ever been assessed?
Are answers to the questions readily available from the records maintained by enterprises?
6.7.4. Sampling and Estimation
6.82. Typically the sample design for an enterprise survey has the following characteristics:
identification of the set of enterprises in the business register that are in scope for the survey;
stratification by size to improve sample efficiency, with not more than 4 strata recommended, unless the size measures on the business register are known to be very accurate;
stratification by geographical area, primarily to meet user needs;
stratification by industrial activity, primarily to meet user needs;
identification of a design data item (usually the most important single item) on which to base the size of the sample and allocation across strata;
sampling enterprises in the largest size strata with certainty (as data from these units are vital);
sampling enterprises in the other size strata with probability depending upon size in such a way that, after weighting, the sampled units in different strata each tend to make roughly the same contribution to the total value of the design data item;
control of sample overlap between successive occasions of repeated surveys, in particular, controlled rotation of the sample;
control of sample overlap between separate surveys, for example by assignment of a random number to each enterprise used by all surveys for selection purposes.
6.83. In extracting the frame for an enterprise survey from the business register, it may be desirable to have a size cut-off, i.e., not to include enterprises beneath a certain size because of the absence or unreliability of their frame data or the insignificance of the economic activity they represent. If significant economic activity is thus missed, an area-based survey can be used to provide complementary coverage. The choice of whether to provide complete coverage of all enterprises in the business register or to omit the small enterprises and carry out an additional area based survey, is based on the relative costs. The complete coverage option may involve sampling and collecting data from enterprises in a poorly defined list frame. The alternative option means having to create and maintain an area frame and separate survey.
6.84. For a household survey, the design features typically include:
use of a general purpose household area frame;
a two, three, or four stage design – typically involving different treatment of urban and rural areas, with the last stage being selection of households enumerated within selected areas;
systematic rotation of samples.
Could a census be replaced by a sample survey?
Is the sample design appropriate?
Are there procedures to control the number of questionnaires received by any one enterprise?
What is the coverage of small enterprises? Is there scope for increasing coverage?
6.7.5. Respondent Interface: Data Collection, Capture, and Follow-up
6.85. Effort should be made at every design stage to minimise response burden. For enterprises, the costs incurred in responding to surveys are every bit as real as the costs to the statistical office of collecting data. While households might not calculate the costs, a high response burden due to long and complicated questionnaires will negatively affect response rates. Response burden can be kept to a minimum by efficient sample design, clarity of questionnaires and flexibility in the time when interviews take place and in how respondents may supply information.
6.86. Before a respondent receives a questionnaire for the first time, especially one that is to be repeated, there should be some form of initial contact in which the purpose of the survey and the reporting arrangements are explained. Ideally enterprise respondents should be presented with a variety of response options, for example mail back, fax, telephone, from which they can choose the one that suits them best. This is more trouble for the statistical office but raises response rates. Likewise household respondents should have a choice of types and times of interview. Data capture procedures must be decided well in advance. There must be a well-defined, follow-up strategy, without which response rates are unlikely to be acceptable.
What is the current response rate? Should it be improved by introducing more rigorous follow up procedures?
Are efforts being made to establish a good rapport with the enterprises in the sample?
Have the procedures for handling non-response been articulated? Are non-responding enterprises all assumed to be inactive and imputed as zero thus contributing to the NOE?
6.7.6. Editing, Imputation and Estimation
6.87. The aims of editing are the detection and elimination of errors. Whilst editing is essential in assuring quality, there has to be a balance between the resources applied to editing and those invested elsewhere. For example it is better to spend effort in eliminating poorly phrased questions on a questionnaire than in trying to correct the wrong responses received as a result of the poor questions. In quality management terms, the focus should be on upstream quality assurance not on trying to “inspect the quality” into the data by editing.
6.88. Large random errors by respondents can usually be picked up through plausibility checks on the data, for example by comparing the data reported with previous values, or the ratios of data reported with reasonable bounds for the types of enterprise. However, small random errors cannot be detected by these means. Neither can sustained, systematic errors, such as under-reporting incomes or exaggerating costs such as are associated with the NOE.
6.89. It is impossible to eliminate all errors, nor is it necessary to do so as some errors have negligible effect on the estimates. Thus in enterprise surveys, editing effort should be focused on those particular data item responses, often termed influential observations, that will have the most significant impact upon the main estimates. In particular, very large enterprises are usually a source of influential observations and their data should be individually checked.
6.90. Outliers are a particular category of influential observations for enterprise surveys. They are observations that are correct but that are unusual in the sense that they do not represent the population from which they have been sampled and hence will tend to distort the estimates. A typical example would be the response from a large retail supermarket that was by mistake included in the smallest size stratum and thus sampled as if it were a small store. The simplest treatment for outliers is to reduce their sampling weight to one so that they can only (and correctly) represent themselves. This requires reweighting of the rest of the sample appropriately to make up for the loss.
6.91. The values of individual data items that are missing from the original response or believed to be in error should not be automatically interpreted as zeroes. They should be imputed by one of the following types of methods:
(for monthly or quarterly repeating) carry forward the value for the enterprise from the previous survey occasion, possibly adjusting the value to reflect the average increase (decrease) of the data item reported by other respondents in the stratum;
(for monthly or quarterly repeating) carry forward the value for the enterprise from the same survey occasion in the previous year, adjusted to reflect the average increase (decrease) of the data item in the stratum;
if no previous data for the enterprise is available, impute the value from a responding enterprise that is judged to be similar, or impute the stratum mean.
6.92. Preferably imputation should be automated, not only to save time but also to ensure consistency of treatment.
6.93. Total non-response from an enterprise or household that has never responded is not usually dealt with by imputation but rather by re-weighting the sample to include only the respondents. This approach can also be used to deal with missing individual data items but is not commonly used, as it requires different weights for different data items on an individual record. It is essential that the weights be revised to reflect the absence of non-respondents and the reweighting of outliers. If not, estimates will be biased towards zero, contributing towards the NOE.
Are influential observations identified and dealt with?
Are item non-responses being inappropriately imputed as zeros thus contributing to the NOE?
Are the data being appropriately reweighted to allow for non-response?
6.7.7. Analysis, Dissemination, Revision, and Evaluation
6.94. One of the best ways of finding the strengths and deficiencies of data is to use them in a wide range of analyses. Comparisons with data from other sources are invaluable in checking the quality and coherence of the data and identifying possible problems. The national accounts provide a very good framework for confrontation of data from different sources.
6.95. Dissemination is an important aspect of quality defined in the broad sense. Unless users are aware the data exist, are able to access them and to understand them, then the survey might just as well not have taken place. User needs regarding format, media and style should be identified and taken into account in building the dissemination mechanisms. Confidentiality of data for individual enterprises or households must be preserved or respondents will fail to report or will misreport. Data items should be accompanied by metadata defining them and explaining how they have been produced, thus enabling users to determine how suitable the data are for their purposes.
6.96. The revision policy for each series has to be articulated in accordance with user needs. Too many large revisions suggest that timeliness should be reduced to improve accuracy. No revisions suggest timeliness could be improved.
Are the data weaknesses encountered during national accounts compilation being fed back and discussed with survey staff?
Are adequate metadata made available to users? Can users readily see the scope for NOE?
Are NOE problems being articulated and considered during the evaluation process?
Are contact points for further information publicised?
6.7.8. Data from Administrative Sources
6.98. With the possible exception of sampling, the design principles for surveys have their counterpart in relation to the collection of data from administrative sources. Even sample design is appropriate in cases where data from an administrative source are actually sampled. The main challenge is to persuade the administrative authorities to take account of the statistical needs and to apply appropriate design principles. Thus, for each particular source used, the statistical office should establish an agreement with the corresponding administrative agency. The agreement should specify the content and quality of the data to be provided, the mechanism for transfer of the data from the administrative source to the statistical office, and the procedures for introducing changes to the administrative processes in such a way as to minimise the impact upon the statistical office. If possible, the statistical office should influence the legislation itself in order to secure the best possible supply of data.
Are NOE related data problems being discussed with administrative authorities?
6.8. Additional Surveys and Administrative Data Collections
6.99. The previous sections in this chapter focus on potential improvements that can be brought about through better organisation of the existing data collection infrastructure and programme. This section considers enhancements through additions to the programme, either new surveys or use of different administrative sources. Such additions can be conducted on a regular or occasional basis. Additional regular surveys augment the basic data output and may thus contribute to reduction in the incidence of non-observed activities. Occasional surveys may contribute to better understanding and measurement of non-observed activities in compiling the national accounts, i.e., reduction of non- measured activities. The decision whether a new survey or administrative collection should be regular or occasional will vary according to the particular situation of the national statistical office.
Additional Regular and Occasional Surveys
6.100. The existing programme of regular surveys should be examined to determine what (if any) additional surveys should be conducted to fill the data gaps. Annex 3.4 gives an indication of the sort of regular surveys included in the basic programme of a national statistical office in a developed country. The appropriate choice will depend upon the size and nature of the NOE related data problems, the options for using administrative sources, the potential respondent burden involved, and the resources available. Some examples of surveys that can be introduced on a regular or occasional basis are as follows:
City market surveys to measure retail trade in small enterprises. Booleman (1998) provides an example.
Surveys of services to measure turnover by small enterprises. Masakova (2000) provides an example of a sample survey for small enterprises.
Agricultural production surveys to measure informal sector and household production of specific commodities for own use. Masakova (2000) provides examples.
Informal sector surveys. Specialised surveys of the informal sector conducted occasionally or regularly may also serve to improve the information base for the national accounts. These are described in detail in Chapter 10.
Qualitative surveys. Opinion surveys can be used to monitor incidence and causes of non- observed production, typically asking respondents for their impressions of the industry as a whole rather than their personal involvement in the underground or illegal activities. Examples were given in Chapter 4 and are further discussed in Chapter 8 in connection with underground production.
Additional Administrative Sources
6.101. It is also appropriate to evaluate administrative data sources not currently being accessed by the statistical office and to determine to what extent the data they contain could be accessed and used to support the basic data collection programme in general and to reduce NOE in particular. Annex 3.5 gives an indication of the sort of administrative sources included in the basic programme of a developed western national statistical system. The appropriate choice will depend upon the size and nature of the data problems relating to the NOE, the availability, accessibility, content and coverage of the administrative sources, the options for direct survey, and the resources available.
6.9. Relationship to NOE Analytical Framework
6.102. In concluding this chapter, it is worthwhile reiterating what the basic programme can be expected to measure and its inherent limitations in terms of the NOE problem areas and the types of statistical deficiency.
Undercoverage of enterprises
6.103. Although it is undoubtedly possible to make improvements, the basic data collection programme cannot be expected to deal completely with undercoverage of enterprises. The business register can only provide coverage of those small enterprises that are included in the administrative sources on which it is based. Whilst it may be possible to mount an occasional large-scale household survey to cover the missing small enterprises, including those involved in production for own final use, it is unlikely that this is affordable as a regular part of the basic programme. Adjustments must be made within the national accounts to compensate for undercoverage of the basic programme, as described in Chapter 5.
Underreporting by enterprises
6.104. Likewise, although editing and plausibility checks can pick up many reporting errors, the basic programme cannot provide a mechanism for detecting sustained, deliberate, and widespread underreporting by enterprises of their activities. Again these require special investigations and adjustments within the national accounts.
Non-response by enterprises
6.105. On the other hand, non-response problems should be fully addressed within the basic programme. Entries missing from returned questionnaires should be appropriately imputed and the data re-weighted to allow for questionnaires that have not been returned. Thus non-response should not be a source of non-observed activities for which corrections have to be made during the national accounts compilation.
6.106. In summary, improvements to the basic programme are the appropriate way to tackle all NOE problems due to statistical deficiencies. There is also scope to address, in part at least, the coverage problems that arise because of the small size and lack of registration of enterprises that are in the informal sector or involved in household production for own final use. However, special surveys and adjustments within the national accounts are needed to deal with most underground and illegal activities.
6.107. Prioritisation and choice of options for improvement depend upon the particular situation of the national statistical system, including the current state of its basic programme, the likely magnitudes of the improvements in terms of reduction of non-observed activities, the resource implications, and the resources available. This is further discussed in the following chapter. Furthermore, as previously noted, the NOE measurement programme must be blended with, and considered a part of, the statistical office strategic objectives and quality improvement framework as a whole.
Here, the term survey is assumed to include a census as a particular type of survey in which all units are in the sample.
More precisely this should be referred to as a statistical business register to distinguish it from other (administrative) business registers, but when the context is clear the qualifier statistical is usually omitted.