You are here: Home / SILC / Methodology / Report On Sample Design And Estimation Procedures

Report On Sample Design And Estimation Procedures

ANNUAL AND QUARTERLY LABOUR FORCE SURVEYS - REPORT ON SAMPLE DESIGN AND ESTIMATION PROCEDURES

Dr David Steel

February 1997

1. INTRODUCTION

1. The Irish Labour Force Survey (LFS) provides valuable information about the labour market such as estimates of the number of people unemployed and employed classified by age, sex, marital status, geographic area, type of job, industry and occupation. It also provides estimates of the number of private households classified by the size of household and the social class of the head of household and the composition of family units. A Labour Force Survey was first conducted in Ireland in 1975 and was conducted biennially until 1983 from which time it has been conducted during April and May of each year. It is planned to conduct the annual LFS for the last time in 1997 and to introduce a quarterly LFS in the summer quarter (June-August) of 1997 to provide a regular and timely picture of the labour market.

2. The purpose of this report is:

to briefly review the annual LFS and comment on the likely precision of estimates of different magnitudes,
comment on the design proposed for a continuous quarterly LFS, the likely precision of estimates of different magnitude and of quarter to quarter change, and suggest possible alternative approaches.

Section 2 considers the annual LFS and in Section 3 the design and estimation issues associated with a quarterly LFS are considered.

2. THE ANNUAL LABOUR FORCE SURVEY

DesignDesign

3. The annual LFS is a household survey using stratified multi-stage area sampling and face-to-face interviewing. The key features of the sample design and estimation procedures are summarized below.

Sample Size

4. In the 1995 survey a sample of 46,800 private dwellings was selected. A response rate of 96 per cent was achieved resulting in a sample of 148,100 people of which 111,200 were aged 15 or more. This corresponds to 4.1 percent of the usual resident population of Ireland. The independent estimate of the usual resident population at the time of the survey was 3,582,200 and the total number of households was estimated from the survey to be 1,145,700. The number of people aged 15 or more in the population was 2,701,300. These figures imply an average number or people per household of 3.13 of which 2.36 are aged 15 or more. This sample size was considered adequate to give reliable key estimates at the State level and for the eight planning regions.

Stratification

5. An initial stratification into eight types of strata was performed. The strata types are listed in Table 1. Within each of these area types strata were formed using counties or large towns. This procedure resulted in a total of 171 strata.

Selection Methods

6. Within each stratum a sample of private and non-private dwellings was selected This was achieved by selecting a sample of geographically defined Primary Sampling Units (PSUs) and selecting a sample of dwellings from each selected PSU. In the lower density strata, namely other urban, mixed urban/rural and rural areas, census Enumeration Areas (EAs) were used as PSUs. EAs have an average of about 340 private dwellings. Within each stratum a specified number of PSUs was selected using equal probability systematic sampling from a geographically ordered list of PSUs. In the remaining higher density strata "blocks" of approximately 75 dwellings were used as PSUs. Each EA was allocated a number of blocks in proportion to the number of households recorded in the population census. A sample of blocks was then selected using equal probability sampling. EAs in which a selected block fell were then segmented into the appropriate number of blocks and the required block was selected. This segmentation was done in the central office using information obtained in the population census. These blocks were used as the PSUs in these strata. Table 1 gives the number of PSUs and the approximate number of households selected in each type of strata. A list of habitable dwellings within each selected PSU was compiled by visual enumeration in the field. Private dwellings were then selected using a sampling interval which for PSUs with 250 or less private dwellings was 1 in 2. For PSUs with 251 to 450 private dwellings a 1 in 3 sampling interval was used, for those with between 451 and 600 a 1 in 4 sampling interval was used and if there were more than 600 a 1 in 5 sampling interval was adopted. All households and usual residents in selected dwellings were included in the sample. Information on labour force status and related variables was collected for people aged 15 or more. Information was obtained directly from 77 percent of those aged 15 or more and in the remaining cases the information was provided by a responsible adult in the household. This information was collected and published according to a respondent's perception of the principal economic status (PES) and the concepts adopted by the International Labour Office (ILO).

7. Non-private dwellings (NPDs) such as convents, hospitals, hotels were included in the list of dwellings and selected in the same way as PDs provided the estimated number of usual residents was 15 or fewer. In these small NPDs all usual residents were included as a single non-private household. Larger NPDs in selected PSUs were included in the sample and a sampling interval of 1 in 4 was applied to select a sample of usual residents where the number of usual residents was 200 or fewer, a 1 in 6 sampling interval was used when there was between 201 and 500 usual residents and a 1 in 10 sample was selected when there was more than 500 usual residents. This resulted in 275 non-private households in the 1995 survey.

8. The sample was selected so that there was an approximate 25% overlap with the previous year's sample at the dwelling level.

Clustering

9. The selection method used means that the sample of selected households was geographically clustered. The average number of households per PSU was 140 but the actual number in a particular PSU would have varied considerably because of the use of EAs and blocks as PSUs in different strata. There were 8,200 PSUs of which 1033 were selected. This resulted in an average number of households selected per selected PSU of about 45. There was presumably a wide variation about this average, since, in those strata in which EAs were used as PSUs, clusters of up to 150 dwellings were possible.

Field Work

10. The field work was carried out by 430 interviewers. Besides interviewing at selected dwellings, these interviewers carried out the visual enumeration which involved listing all habitable dwellings and determining the number of usual residents in NPDs. The visual enumeration stage was undertaken over a two week period and the interviewing was done over a four week period. There were 40 field supervisors who selected the sample of dwellings from the list provided by the interviewers. There were also 8 area supervisors, one for each planning region.

Weighting and Estimation

11. The sample design and selection procedures resulted in households having different probabilities of selection. Non-response and variation in the composition of households led to the composition of the resulting sample by age and sex to differ from independently derived population estimates. For these reasons a weight was calculated for each person in the sample. The calculation of the weight was a three step process undertaken separately for private and non-private households.

The first step reflected the sampling rate achieved within the selected PSU for PDs and small NPDs. For each large NPD in the sample this step involved weighting up to the usual resident population in that NPD.
The second phase was a ratio type adjustment which inflates from the PSU level to the stratum level for private households on the basis of 1991 census figures on the number of private households. For small and large NPDs combined, the sample was weighted by the ratio of the institutional population in a region to the institutional population of NPDs selected in the region.
The final phase consisted of forcing the weighted estimates of the usual resident population in each region by age and sex categories to conform to independently derived population estimates.
Households and family estimates were produced using the weight of the first person listed in the household or family.

Reliability of Estimates

12. A key element of the reliability of the estimates produced from the survey is the sampling error produced by the fact that the estimates are based on a sample of households and people. Being a probability sample it is in theory possible to calculate estimates of the likely sampling error by estimating standard errors directly from the survey. The estimation of standard errors would have to take into account the key features of the complexity of the sample design and estimation procedures used. In particular the stratification, use of multi-stage unequal probability sampling of households and age-sex post stratification would need to be accounted for. Estimation of such standard errors is not easily carried out using standard statistical packages such as SAS or SPSS. Specialist software such as SUDAAN or WESVAR can be used or purpose built software developed. This approach would take some time to carry out.

13. An approximate indication of the likely standard errors can be obtained by calculating the standard error that would apply if a simple random sample (SRS) of people was taken and a judgment made on how the sample design and estimation methods adopted affect the standard errors. The ratio of the standard error of an estimate obtained from a particular sample design to that of an SRS of the same size is called the design factor or deft. This approach is particularly suited to estimates of the proportion or number of people in categories e.g. unemployed. For an SRS of people the standard error for the proportion of people in category "c" of the population is \r(Pc(1-Pc)/k) and the estimate of the number of people has standard error of K \r(Pc(1-Pc)/k), where Pc is the proportion of people in the category and k and K are the number of people in the sample and population respectively. For an SRS of people the resulting standard errors for different sized estimates are shown in the column labelled deft=1.0 in Table 2.

14. The key features of the design and estimation procedures used in the annual LFS that will have a beneficial effect on the deft are;

the geographic stratification,
the geographic ordering used in selecting the PSUs,
the age-sex post stratification.

The key features that will have a detrimental effect on the deft are;

the geographic clustering, i.e. sampling PSUs,
the selection of households,
the variation in the probabilities of selection.

15. With an average of 45 dwellings selected in each selected PSU the level of geographic clustering in the ALFS is relatively high. An analysis of 1991 census data indicates that the net effect of geographic stratification, household sampling, use of post-stratification and the geographic clustering results in a deft of 2.22 for total ILO unemployment and 1.73 for total ILO employment. The resulting standard errors are also given in Table 2. In Table 3 these standard errors are expressed as a percentage of the estimate, i.e. as relative standard errors (RSEs).

16. The figures in Tables 2 and 3 can be used to obtain a range within which the standard error for a particular estimate is likely to lie. The standard errors corresponding to a deft of 1.0 can be used as a lower bound, since it is very unlikely that for any estimate the net effect of the design and estimation procedure used would produce an estimate more reliable than an SRS of people. For unemployment estimates the standard errors corresponding to deft=2.22 can be used as a guide to the upper limit and for employment estimates those corresponding to deft =1.73 can be used. For categories of unemployment and employment that are not geographically defined, such as the number of people employed in retailing, the effect of geographic clustering and hence the associated deft will probably be lower than for total unemployment or employment and so the ranges given in Tables 2 and 3 are reasonable. For example an estimate of 10,000 unemployed probably has a relative standard error somewhere between 5 to 11 percent. For geographically defined sub-groups it would be advisable to assume a deft the same as for total unemployment or total employment as appropriate. In fact in some regions the within PSU homogeneity may be higher, leading to higher defts.

17. What constitutes an acceptable standard error for an estimate depends on how it is to be used. However, the figures in Table 3 can give some guidance on how useful estimates of different sizes might be, taking into account the likely RSE associated with them. Essentially these figures suggest that for estimates less than 1,000 the value of the survey is indicating that the number is small but that little attention should be given to the specific estimate obtained. Estimates greater that 10,000 are likely to have RSE less than 11 percent and generally can be considered reliable. For estimates between 1,000 and 10,000 the estimates may be useful but they should be interpreted carefully taking into account the likely standard errors associated with them.

3. QUARTERLY LABOUR FORCE SURVEY

Proposed Design

18. An annual LFS provides a periodic picture of the state of the labour market but does not provide timely information that allows early indication of important changes. A quarterly survey will provide much more regular and timely information. It is proposed that the quarterly LFS will also be a household survey using a stratified multi-stage area sampling approach and face-to-face interviewing. The key features of the proposed sample design and estimation procedures are summarized below.

Sample Size

19. A sample of approximately 46,000 private dwellings is proposed. Reliable key estimates are required at the State level and for the eight planning regions.

Stratification

20. An initial stratification into the eight regions is proposed. Within each region a stratum will be formed for each of the eight area types. Because not all regions contain each area type this will result in 58 strata.

Selection Methods

21. The selection methods planned are very similar to those used in the annual LFS. The main difference is that blocks will be used as PSUs in all areas and not just the higher density strata. Within each stratum a sample of private and non-private dwellings will be selected. This will be achieved by selecting a sample of geographically defined Primary Sampling Units (PSUs) and selecting a sample of dwellings from each selected PSU. The PSUs will consist of blocks of approximately 75 dwellings. In all strata the EAs containing selected blocks will be segmented into the appropriate number of blocks. This segmentation will be done in the central office using information obtained in the population census. These blocks will be used as the PSUs. There are about 3,400 EAs and it is proposed to have 15,300 blocks in the population. Within each stratum a number of PSUs will be selected using equal probability systematic sampling from a geographically ordered list of PSUs. The number of PSUs selected in each stratum will be proportional to the number of PSUs in the stratum, thus giving each dwelling the same chance of selection in each stratum. A list of habitable dwellings within each selected PSU will be compiled by visual enumeration in the field. All private dwellings and NPDs with less than 15 usual residents will be selected from the selected PSUs. All households and people in selected dwellings will be included in the sample. Information on labour force status and related variables will be collected for people aged 15 or more on a PES and ILO basis. Larger NPDs in selected PSUs will be included in the sample and a sampling interval applied, depending on the number of usual residents, of between 1 in 4 and 1 in 10 to select a sample of the usual residents.

22. Selected dwellings will be included in the survey for five consecutive quarters. The sample is to be selected so that there is an approximate 80% overlap at the dwelling level with the previous quarter's sample and a 20% overlap with the sample four quarters before. Because people and households move between surveys the actual overlap will be less than these theoretical values.

Clustering

23. The selection method means that the sample is geographically clustered. There are 15,300 PSUs of which 625 are to be selected. This results in an average number of households selected per selected PSU of about 75. The approximate number of PSUs and households that would be selected in each area type is shown in Table 1. The selected blocks will be allocated to five rotation groups and each quarter all the blocks in one rotation group will be rotated i.e. replaced by another block, which will usually be the next block in the list of blocks within the stratum.

Field Work

24. The field work is to be carried out by 125 interviewers who will each enumerate 5 blocks and therefore an average of 368 dwellings per quarter. These interviewers will carry out the visual enumeration. The survey is to be continuous so that interviewing will be spread evenly over the thirteen weeks of the quarter. Hence there will be about 3540 dwellings in the sample each week for the State and an average of 28.3 per interviewer per week. Assuming the sample is spread across each rotation group each week, this implies an average of 5.7 dwellings per block per week for each interviewer, that is each week an interviewer will enumerate an average of 28.3 dwellings spread over five blocks. There will be 10 field supervisors who select the sample of dwellings from the list provided by the interviewers.

Weighting and Estimation

25. The sample design and selection procedures proposed will lead to people having approximately the same probabilities of selection provided the same proportion of PSUs is selected in each stratum. There will be some variation in the sampling interval applied in large NPDs and if a block is found to have considerably more than 75 dwellings a sampling interval may be applied to select dwellings. These cases will lead to some variation in the probabilities of selection. Variation in probabilities of selection should, in general, be avoided in household surveys unless it is done deliberately to place proportionately more sample in areas of higher variation or lower cost, or to ensure reasonable sample sizes in geographic areas for which separate estimates are required. Non-response and random variation will lead to the resulting sample composition by age and sex to differ from independently derived population estimates. For these reasons a weight will be calculated for each person in the sample using the same approach as used in the annual LFS.
Design Issues

26. In developing the design for the QLFS the objective is to produce as reliable estimates as possible for the funds available. The key estimates to consider are:

quarterly estimates of unemployment and employment levels at the State and planning regional level,
estimates of the change in unemployment and employment between consecutive quarters and quarters a year apart. Before indicating the likely reliability of these estimates under the proposed design some comments will be made concerning the proposed design.

Sample Size

27. The proposed sample size in terms of households is the same as the annual LFS. The sample will be more geographically clustered and so the likely standard errors will be higher than those associated with the annual LFS.

Stratification

28. The geographic stratification is not as fine as in the annual LFS, although the use of geographic ordering of PSUs within strata should still mean that different area types are approximately proportionately represented. However, there could still be some advantages in forming more strata. For example if a proposed stratum consists of several towns which have different predominate industries then making each town a stratum will ensure they are proportionately represented. A further advantage is in variance estimation. The estimates, both of levels and changes, produced from a quarterly LFS will receive considerable attention and it will be important to periodically produce empirical estimates of the standard errors associated with the estimates to enable them to be interpreted appropriately. For replication type variance estimation procedures, the degrees of freedom on the variance estimators is determined by the number of strata. Hence having more strata will produce more reliable estimates of standard errors.

Selection Methods

29. A basic two stage area sampling method seems appropriate although the high degree of clustering needs to be considered carefully. In some low density areas the resulting sample will be spread thinly and a three stage approach, in which PSUs consist of a number of EAs and blocks are selected from selected PSUs, may be cost-effective. Assuming the sampling fraction for PSUs is the same in each stratum, the probability of selection of households will be approximately equal which for a household survey is approximately optimal. There will be some variation in selection probabilities due to the number of PSUs selected in each stratum being an integer. Also if a block is found to have a much larger number of dwellings than 75 a 1 in 2 or higher sampling interval will be applied, but hopefully such cases will be rare. There will be variation in the number of dwellings selected from each stratum because of the variation in the size of blocks. There will inevitably be some variation in block size since it will not be feasible to form blocks with clearly defined boundaries with exactly the same number of household in the census. Moreover, there will be some inaccuracies in the census data and there will also be changes since the census and in the future as areas are redeveloped and new residential areas established. There will also be variation in the representation of the different age-sex categories due to the variation in the composition of households and differences in response rates. The weighting procedures will partly account for these factors. However, it is desirable to develop some system whereby areas subject to major changes in the number of dwellings through development or redevelopment can be identified and the blocks formed in those areas redefined so that they are close to equal in size. The sample in these areas can then be reselected.

30. Rotation of a very large NPD in or out of the sample may contribute to the volatility of the estimates over time. The selection of NPDs may be improved by identifying very large ones, for example those with more than 50 or 100 usual residents and maintaining a list sample of such NPDs.

Rotation Pattern

31. The annual LFS has obtained a very high response rate. For the quarterly LFS it is proposed that an in-for-5 rotation scheme be used. It is difficult to judge what impact this will have on the effective response rate. However there will be a degree of cumulative non-response in a particular panel over time and so limiting the length of time a panel is in the survey is advisable. Having as high a sample overlap as possible between surveys is beneficial for the estimates of change implying we should keep a panel in the sample for as long as possible. In developing a good rotation scheme these two factors need to be balanced.

Clustering

32. The proposed design is highly clustered. In developing the appropriate degree of clustering the costs of the survey must be balanced with the reliability of the estimates produced.

33. In considering costs it is necessary to take account of the quarterly and continuous nature of the survey. The following cost components apply.

Forming the blocks - under the proposed design 625 EAs containing selected blocks are segmented into blocks in the office.
Recruitment and training of interviewers.
Visual enumeration - this will be done for each selected block initially and a re-enumeration will be carried out each quarter to check for changes.
Travel by the interviewer to each block - because of non-contact and a particular block being included each week, each block will be visited several times in each week of the quarter.
Travel by the interviewer within each selected block.
Time spent interviewing at responding dwellings.

The actual formula by which interviewers are paid may not directly show these costs, but the payments will have to implicitly reflect these components especially if the interviewing positions are to be attractive to the staff involved.

34. The proposed design reduces costs by having a smaller number of interviewers therefore reducing the recruitment and training costs compared with the annual LFS. The number of EAs that are initially segmented into blocks is proportional to the number of blocks selected. The actual time spent interviewing depends on the number of dwellings in the sample and does not depend on the number of blocks in the sample. The cost of the initial visual enumeration depends on the number of blocks included in the survey over the life of the sample. While 625 blocks are included in the sample initially, 125 new blocks will have to be included each quarter. Hence over a five year period 625+19*125=3000 blocks will have to be included in the sample and subjected to visual enumeration. Towards the end of the five year period a small number of EAs will have had all blocks exhausted. These EDs will have to be replaced by other EAs which will need to be segmented into blocks. The cost of visual re-enumeration is determined by the number of blocks in the sample in any quarter. It is not clear how travel costs depend on the number of blocks in the sample. Under the proposed selection method the selected blocks will be geographically uniformly spread and so increasing the number of blocks over which an interviewer's workload is spread will not greatly affect the total size of the interviewer's workload. However, there will be some increase in the travel within the workload through having more blocks in the sample.

35. Clustering of the sample increases the standard errors of the estimates. This increase arises from the fact that blocks will differ in their unemployment and employment rates and this variation is greater than would be experienced if blocks were effectively randomly formed. Hence part of the variation in the estimates comes from the sampling of blocks and under the proposed design only 625 are included in the sample in each quarter. The implications of this for the standard errors depends on between block variation or equivalently the within block homogeneity. The effect of geographic clustering of the sample can be approximately reflected by the factor 1+(\o(n,\s\up7(–)) - 1) Hereis a measure of within block homogeneity relevant to the sample design and estimation methods used and \o(n,\s\up7(–)) is the average number of dwellings selected per selected block. This factor tells us approximately how much higher the variance errors obtained from the clustered household sample would be relative to an unclustered sample. By taking a less clustered sample, i.e. including more blocks in the sample we will reduce the standard errors but for a fixed total sample size the costs will be increased. However, a design can be developed which includes more blocks but keeps the costs at the budgeted level by reducing the total number of dwellings in the sample. With an appropriate choice of the clustering this can be done in such a way that the standard errors are less than the proposed design. This will be considered in detail in the next section.

Field Work

36. Another feature of the proposed design is that only 125 interviewers are used. This introduces another dimension of clustering which will increase variances by a factor of 1+(\o(nI,\s\up7(–)) - 1)δI , where \o(nI,\s\up7(–)) is the average interviewer workload size and δI is the within interviewer homogeneity. This reflects the fact that interviewers can affect the responses of respondents. For the proposed design \o(nI,\s\up7(–)) = 375, so even if δ is quite small, e.g. 0.01, the factor is 4.7. This factor is often not properly reflected in the variance estimates, but nevertheless will affect the variability of the estimates. Increasing the number of interviewers increases the recruitment and training costs, and would reduce the amount of work each obtains. However, the beneficial effect on the variance needs to be taken into account. Having more interviewers means that it may be easier to handle interviewer absences that would occur under a continuous survey. Also if other household surveys are conducted then there is capacity to handle these.

37. For a continuous quarterly survey in which interviewing is carried out in each week of the quarter, there are a number of options for how to spread the sample over time and space.

Weighting

38. The same general approach to weighting can be used.

Reliability of Estimates

39. Using the results of the analysis of 1991 census information and adjusting for the fact the average size of PSUs is smaller in the proposed QLFS than in the annual LFS an indication of the likely deft can be obtained. This gives a deft of 3.23 for ILO unemployment and 2.51 for ILO employment. Tables 4 and 5 show the resulting standard errors and RSEs respectively. For estimates of unemployment the standard errors corresponding to a deft of 3.22 should be taken as an upper limit and for employment those corresponding to a deft of 2.51. These tables clearly show the increase in standard errors that may occur, for example for total unemployment the RSE is likely to be 3.7 percent compared with 2.5 percent in the annual LFS. A suggestion has been made to use only 500 blocks and in this case, assuming the same size blocks are used, the standard errors would increase by a factor of \r(625/500) = 1.14.

40. Table 6 shows the likely RSEs for estimates of total ILO unemployment and employment for the proposed quarterly LFS for the eight planning regions. These figures assume that the same probability of selection is used in each region so that the sample size in each region is proportional to the population in that region. It has also been assumed that the design effects are the same in each region. While the RSEs are higher for regions with smaller populations the variation does appear to justify moving away from an equal probability design.

41. For estimates of change assumptions about the correlation of the survey estimates over time need to be made. A simple but effective model is that the correlation between the estimates yt+s and yt at times t+s and t respectively is k(s)h(s)ρ(s). Here k(s) is the theoretical sample overlap between samples s quarters apart, h(s) is a factor to allow for households and people moving between surveys and ρ(s) is an effective population correlation of the characteristic of interest. For the proposed rotation scheme k(s)=1-s/5 s=1,...,5 and it was assumed that h(s)=0.9(0.99)3s. The value of ρ(1) was taken as 0.70 and 0.93 for unemployment and employment respectively and ρ(4) was taken as 0.40 and 0.87 respectively. Table 7 shows the likely standard errors for quarterly and annual changes in the estimates of unemployment and employment. For the proposed sample design and rotation scheme the standard errors of the estimates of quarterly changes are about 3.7 percent and 0.67 percent of the corresponding level estimate for total unemployment and employment respectively.

Alternative Designs

42. The proposed design is very clustered. It is possible to develop a more efficient design which, for approximately the same cost, will deliver key estimates with lower standard errors. The more information that is available on the cost and variance structure the better design that can be developed. However, even with only limited information available it should be possible to come up with a more efficient design.

43. The basic principal can be illustrated as follows. Suppose the cost of the survey depends on the number of blocks m and number of dwellings n=m \o(n,\s\up7(–)) in the survey through the cost function Cost=C0+C1m+C2n. Here C1 reflects the cost of including a block in the sample and C2 the cost of including a dwelling. Assuming that the factor reflecting the effect on the variance of taking different numbers of dwellings per selected block is 1+(\o(n,\s\up7(–)) - 1)δ then the theoretical optimum choice of \o(n,\s\up7(–)) is given by \r(C1/C2 (1-δ)/δ) . Table 8 shows the optimum choice for C1/C2 varying from 1.0 to 10.0 and δ varying from 0.01 to 0.1. Based on an analysis of within EA homogeneity in the 1991 census and making an adjustment for the fact that blocks are smaller than EAs and therefore likely to be more homogenous, values of δ in the range 0.07 to 0.1 are likely for estimates of total ILO unemployment and employment. Categories of unemployment and employment are likely to have lower values of δ. Whilst this approach simplifies the cost structure and we do not know δ or C1/C2 precisely, the figures in Table 8 suggest that we should be considering a less clustered design, with \o(n,\s\up7(–)) in the range 3 to 15. In this approach the total sample size is determined by the cost function, so the total sample size varies for different choices of \o(n,\s\up7(–)). For different choices of \o(n,\s\up7(–)) the total sample size should be altered to keep the total cost at approximately the desired level.

44. This approach gives only a rough guide to the degree of clustering which may be appropriate. The cost of the survey is more complicated than reflected by the simple cost function. Moreover, for a repeated survey, the cost structure depends on how the sample is designed over time, i.e. we must consider the cost of the survey over time and how the sample is spread over the weeks within each quarter.

45. To explore the implications of a less clustered design consider the following design:

use blocks with an average of 75 dwellings as already proposed:
stratify as originally proposed,
select 1 in 6 blocks with equal probability,
within selected blocks identify 5 'clusters', the first cluster would consist of the 1st, 6th, 11th..., listed dwellings, the second cluster would consist of the 2nd, 7th, 12th listed dwellings etc,
initially select one of the clusters at random and when the dwellings in that cluster are due for rotation they are replaced by the next cluster in the block.

This leads to a 1 in 30 sample, i.e. about 38,000 dwellings with an average of 15 dwellings selected from each block. The approximate number of PSUs and households that would be selected in each area type under this design is shown in Table 1. Such a design will produce estimates with lower standard error than the proposed design unless δ is less than 0.004.

46. Within this basic approach the design parameters can be varied. For example the sampling rate of PSUs could be altered to reduce the total sample size further. Also the number of clusters formed within PSUs could be changed. An attraction of having 5 clusters is that it means that a block can be used for a minimum of 1+4*5=21 surveys under an in-for-5 rotation scheme. This implies that once the initial sample of blocks is set up there are no additional blocks in the sample for over five years. By using an in-for-6 scheme the blocks will last a minimum of 1+4*6=25 surveys. Alternatively forming 6 clusters in each block would imply blocks lasting a minimum of 1+5*5=26 surveys under an in-for-5 rotation scheme. A statistical advantage of this design is that rotation occurs within blocks which has a beneficial effect on the volatility of the quarterly series.

47. The new design will have some additional costs compared with the initial proposal, but the lower total sample should compensate for these additional costs. If this is not the case, the basic design can be refined to produce the required costs. The new proposal may not be as expensive per dwelling as it first appears, if the costs over a five year period are taken into account. Consider each of the cost components identified in paragraph 32.

Forming the blocks - under the original design each of the 625 EAs containing selected blocks are segmented into blocks in the office, whereas in the new proposal 2550 EAs have to be segmented.
Recruitment and training of interviewers - the new proposal does not necessarily mean more interviewers are needed. Under the original proposal an interviewer would enumerate 375 dwellings spread over 5 blocks in a quarter, under the new proposal each interviewer would enumerate an average of 304 dwellings spread over about 20 blocks. There is a case for increasing the number of interviewers because of the impact of interviewer effects as was discussed in paragraph 36.
Visual enumeration - under the initial proposal 625 blocks are initially set up, but at each quarter 125 of these are replaced by new blocks which have to be set up. Thus over a five year period 625+19*125=3000 blocks are included each of which has to be subjected to an initial visual enumeration. Under the new proposal 2550 blocks are initially included in the sample, but there is no need to replace any blocks for over five years. So in fact there is a saving in the initial visual enumeration of 450 blocks over this period. There will be some increase in the cost of visual re-enumeration since the number of blocks involved each quarter is increased from 500 to 2550.
Travel by the interviewer to each selected block - under the original proposal an interviewer enumerated an average of 5.8 dwellings in each of five blocks each week. If each block is included each week in the new proposal then an interviewer would visit each of 20 blocks each week to enumerate an average of 1.15 dwellings. Effectively the weekly sample would be an evenly spread sample through the interviewer workload area. There would be an increase in travel time, but it would not be a factor of 4 - it could be something more like a factor of 2. If this adds significantly to the costs it could be reduced by not spreading the enumeration of each block across the quarter. For example, a particular block could be allocated to one week of the quarter, which would lead to 196 blocks being in sample in each week. With 125 interviewers this gives an average of 1.56 blocks per interviewer per week. Alternatively 196 interviewers could be used who each do one block per week. This introduces some clustering in time but this will have less impact than the geographic clustering. An alternative would be to form 10 clusters in each block with an average of 7.5 dwellings per block and enumerate two of these smaller clusters in different weeks of the quarter. This could be organised so that each interviewer visited 2 or 3 blocks per week. Hence the cost due to travel to blocks can be made about the same as in the original proposal.
Travel by the interviewer within each selected block - this is increased under the new proposal since instead of travelling around 625 blocks each quarter the interviewers travel around 2550.
Time spent interviewing at responding dwellings - this should be the same in both designs and because of the reduction in the total number of dwellings in the sample this component of the cost should be reduced by about 18 per cent.

48. The reduced clustering in the new proposal should reduce the deft to about 1.63 for total ILO unemployment and 1.29 for total ILO employment. Tables 4 and 5 show the likely standard errors for the estimates of unemployment and employment under the initial proposed design and the new proposal. These figures show that unless deft is close to 1.0 the new proposal gives estimates with considerably lower standard errors despite the lower total sample size. For example, the relative standard error on the estimate of total unemployment would be reduced from 3.7 to 2.1 percent and for employment from 2.9 to 1.6 percent. Even for those characteristics which are not geographically clustered the new proposal would result in standard errors only a little higher. For example, an estimate of 10,000 would have a relative standard error increase from 5.0 to 5.4 percent. These gains carry through to the estimates of quarterly and annual change as shown in Table 7 which shows the standard error of the quarterly change in unemployment is reduced from about 6,500 to 3,600.

49. Table 6 shows that the new proposal should provide estimates of total ILO unemployment and employment for regions with relative standard errors less than 10 percent and 2 percent respectively.

50. Because of the higher number of blocks selected a less clustered design has advantages in representing new residential development and redevelopment of existing areas and large NPDs.

51. Another aspect of the design that should be considered is the rotation pattern. There are advantages in keeping selected dwellings in sample for longer in terms of increasing the quarterly and annual overlap. For example under an in-for-5 rotation scheme the quarterly overlap is 80 percent and the annual overlap is 20 percent. If an in-for-6 rotation scheme is used these overlap factors increase to 83 and 33 percent respectively. An in-for 8 rotation scheme increases the overlap factors to 87 percent and 50 percent. The gains through increasing the overlap are summarized in Table 7. These figures suggest that in terms of standard errors on the estimates of changes the advantage of keeping selected households in the sample for longer are small for unemployment estimate. However, for quarterly changes in employment there is an appreciable gain, with a reduction associated with a change from an in-for-5 to an in-for-8 rotation scheme being equivalent to 21 percent increase in sample size. There is also likely to be some cost saving in including selected households for longer since the first interview is usually the most costly. In assessing the benefit of increasing the length of time in the survey allowance must be made for sample attrition and a judgement made on how best to balance the two factors.

52. The block size of an average of 75 dwellings is reasonable. Using a smaller block size would reduce the visual enumeration and within block travel costs but is likely to increase the within block homogeneity further increasing the deft. Conversely increasing the block size would increase these costs but reduce the within block homogeneity. An additional cost saving would apply if the EAs were used as PSUs, since the process of block formation would not be necessary. However, the resulting cost associated with visual enumeration and within PSU travel would be very high. Thus the investment in forming blocks should be more than offset by the reduced visual enumeration and travel costs.

53. Selecting PSUs with equal probability means that there will be some variance associated with the variation in the size of the PSUs. For example, suppose a stratum consists of low and high unemployment areas then in expectation these areas are represented proportionately, but variation of the PSU sizes will lead to variation in the representation of the area types which will affect the variation in the unemployment estimate. This implies that, if possible, PSUs should be formed with little variation in the number of dwellings. Some variation is inevitable but if the variation becomes large then steps to control the impact should be considered. If blocks are used as the first stage of selection then the size of blocks need to be restricted to between approximately 50 and 100 dwellings. One way of reducing the impact of the variation in PSUs sizes while retaining equal selection probabilities for households is to use probability proportional to size selection (PPS) of PSUs. If the first stage of selection is the selection of EAs, which are then formed into blocks then the size of the PSUs has already been determined and so if this approach is adopted a check should be made of the variation on the EAs. If the coefficient of variation of the EA sizes exceeds 0.2 then PPS selection can be considered. An alternative would be to stratify the EAs by size or at least separately sample the very large and small EAs. However, an analysis of 1991 census data suggests that if the use of population benchmarks in the estimation is taken into account there is very little additional gain in using PPS selection for EAs and this probably also applies to blocks.

54. Further efficiencies may be obtained by varying the design in different areas. The overall sampling fraction should be the same in all areas. Such an allocation is close to optimal for a categorical variable unless the proportion of people in the category varies considerably between areas, say by a factor of more than five. A uniform sampling fraction also simplifies estimation and gives flexibility in production of estimates for geographic areas. However, the simple formula for the optimal cluster size given in paragraph 43 suggests that in areas with a high ratio of block to dwelling costs we can use a higher degree of clustering, (although the higher costs can also be accompanied by an increase in δ so that the increase may be not be large). The size of the PSUs in terms of number of dwellings can also be varied. In low density areas the size of a block of 75 dwellings may lead to blocks which are very large so that the cost of the visual enumeration and within block travel is high. In such areas smaller blocks could be used, for example blocks of 30 dwellings could be formed and a 1 in 2 sample selected within them. This leads to a smaller blocks, but means that more blocks will have to be replaced over time. In fact compact blocks of 15 dwellings could be formed and all dwellings selected. This approach will increase the clustering of the sample in these areas and should only be used in low density rural areas.

55. In some areas additional stages of selection can also be introduced. In very low density areas the blocks over which an interviewer's workload is spread may be very large leading to very large travel costs. Costs can be reduced here by forming PSUs consisting of several EAs. For example, suppose a sample 300 dwellings is required in an area which consists of 30 EAs. A two stage design would result in selecting blocks spread over these EDs that are very separated and thus lead to one interviewer incurring a lot of travel or the recruitment of two or more interviewers each doing a small workload. An alternative would be to use a three stage design and form say 6 PSUs each consisting of 5 EAs and then selecting two PSUs and 150 dwellings from each PSU possibly using smaller blocks. Again this approach leads to a more clustered sample and so should only be used where it is clear that the dwelling density is low enough to justify its use.

56. To provide a guide on when to use three stage sampling and compact blocks it is useful to develop some guidelines on the size of an interviewer's workload. Consider the rural and mixed rural/urban strata. The dwelling density in these area is probably 6 to 7 dwellings per square km. Hence on average a block of 75 dwellings will be about 11.5 square kms (equivalent to a square of 3.4 kms or a circle of radius 1.4 kms). A workload of 375 dwellings would, under the proposed design, correspond to 9166 dwellings in the population, which corresponds to an area of 1410 square kms (equivalent to a square of 37.6 kms or a circle of 15.0 kms). These figures refer to the average dwelling density. There will be some areas in which the dwelling density is considerably lower. For example, suppose that in an area the dwelling density is 1.5 dwellings per square km, then a block of 75 dwellings would have an area of 50 square kms (a square of 7.1 kms or a circle of radius 2.8 kms) and a workload of 375 dwellings corresponds to an area of 6108 square kms (a square of 78 kms or a circle of 31.2 kms). It would be useful to develop a limit on the density below which smaller blocks and three stage sampling is used.

57. There may also be some benefit in using towns as PSUs in the stratum composed of small towns.

Other Considerations

58. There are several issues which are not of great importance when the survey is originally introduced but which will become more important over time. Until data for several years are available seasonal adjustment will not be possible. This means that the survey estimates will have to be treated with caution since much of the movement in the figures will be due to seasonal factors. To help assess the trends in the series attention will be given to the change relative to the same quarter in the previous year. However, once seasonally adjusted figures can be produced the value of the series will increase considerably and so will the attention they receive. Some issues that should be considered in the three to five year period after the survey is introduced are discussed below.

Trend estimation

59. Seasonal adjustment of the quarterly series will greatly assist in the assessment of the underlying trends and in the identification of important changes in the labour market. However, the seasonally adjusted series still includes the effect of the irregular components. There are methods available that can be used to smooth the seasonally adjusted series to greatly reduce the impact of the irregular component of the series. These methods should be considered as part of the planning for the introduction of seasonal adjustment.

Rolling Quarterly Estimates

60. The survey will produce estimates for the standard quarters. However, under a continuous survey, estimates can be produced for any 13 week period and each estimate would be based on the quarterly sample size and all blocks would be included. This permits production of quarterly estimates each month to provide more timely updates of the picture of the labour force. There may be little value in this approach until seasonally adjusted series can be produced. In this approach the monthly series of quarterly estimates can be directly seasonally adjusted. This produces a simple smoothing of the underlying monthly series. While possible implementation of this option is some time in the future it has implications for the survey design. The sample should be relatively evenly spread over the weeks in the quarter and all weeks should be included in the sample. Besides ensuring proper representation of the weeks in the quarter it gives flexibility in how the series can be analysed in the future. The problem is then allowing for interviewers taking leave and other absences. This can be handled by having more interviewers, for example by having twice the number of interviewers who essentially interview in alternate weeks. Absences and leave could then be handled by interviewers doing the workload of each other as required. While this will increase the cost associated with the recruitment and training cost of interviewers it will give more flexibility and reduce the impact of interviewer effects.

Monthly and Weekly estimates

61. In a continuous survey the sample is spread over all the weeks in the quarter and so in theory estimates could be produced for each week in the quarter. These estimates would be based on a small sample size of approximately of 3600 households per week and since there would be no overlap in the sample between weeks within a quarter the estimates of change between weeks would be very volatile and could not be used directly. However, it may be possible to develop a trend estimation procedure using these weekly estimates which could be used as an aid to analysing the trends in the series. Similarly, monthly estimates could be produced which again could not be used directly but may provide useful trend estimates.

Composite estimation

62. Composite estimation is an approach to estimation in repeated samples in which the overlapping and non-overlapping components of the sample are weighted differently. The gains are usually marginal for estimates of unemployment but more appreciable for employment estimates. This approach introduces added complexity in the estimation procedures and the benefits need to be considered carefully. However, new methods of composite estimation have been developed recently which appear to gives approach gains for estimates of change for both unemployment an employment estimates. In the longer term these approaches should be evaluated.

Variance estimation

63. For some methods of variance estimation it is desirable to have an even number of PSUs selected in each stratum. It is then useful to give each selected PSU a variance group number of 1 or 2.

Numbering System

64. It is important to record information on the survey data files from the beginning of the quarterly survey that will easily permit various analyses and output from the survey. There should be codes recorded on each person's record that indicate the week of interview, the rotation group, the interviewer who obtained the information, the block in which the selected dwelling is located and whether the information was obtained directly from the respondent or some other responsible adult. Such codes will enable investigation of rotation group and interviewer effects and also give flexibility in the geographic output that can be produced from the survey.

Household Estimates

65. The use of person weights to calculate household and family estimates can cause inconsistencies between estimates. Methods of integrating person and household weights have been developed and are under evaluation by other statistical agencies. These methods could be considered for the LFS if discrepancies between estimates are found to be important.

Table 1: Sample Allocation to Strata Types(c)

	Annual LFS		Quarterly LFS Original Proposal (a)		Quarterly LFS New Proposal (b)
Stratum Type	Household	PSUs	Household	PSUs	Household	PSUs
County Borough	10,400	312	10,400	141	8,600	574
Suburb of County Borough	5,900	198	5,900	80	4,800	325
Urban/Rural around County Borough	230	12	230	3	190	13
Large Towns	6,800	287	6,800	92	5,600	377
Urban/Rural around Large Towns	130	6	130	2	110	7
Small Towns	3,400	39	3,400	46	2,800	187
Other Urban Rural	7,100	80	7,100	97	5,900	395
Rural	12,100	99	12,100	165	10,000	673
Total	46,000	1033	46,000	625	38,000	2550

(a)_47,000 households, 75 per block
(b)_38,000 households, 15 per block
(c)_The number of households and PSUs are proportional to the 1991 population in
_private households and for the annual LFS are only indicative

Table 2: Standard Errors for Annual LFS

Size of Estimate	deft = 1.0	deft = 2.22	deft = 1.73
100	49	110	85
500	110	240	190
1,000	160	350	270
5,000	350	770	600
10,000	490	1,100	850
50,000	1,100	2,400	1,900
100,000	1,500	3,400	2,700
175,300	2,000	4,400	3,500
200,000	2,100	4,700	3,700
500,000	3,100	7,000	5,500
1,000,000	3,900	8,700	6,800
1,267,000	4,000	9,000	7,000
1,442,700	4,000	9,000	7,000

Table 3: Relative Standard Errors for Annual LFS (%)

Size of Estimate	deft = 1.0	deft = 2.22	deft = 1.73
100	49	110	85
500	22	49	38
1,000	16	35	27
5,000	7.0	15	12
10,000	5.0	11	8.5
50,000	2.2	4.9	3.8
100,000	1.5	3.4	2.7
175,300	1.1	2.5	2.0
200,000	1.1	2.4	1.8
500,000	0.63	1.4	1.1
1,000,000	0.39	0.87	0.68
1,267,000	0.32	0.71	0.55
1,442,700	0.28	0.62	0.49

Table 4: Standard Errors for Quarterly LFS

	Original Proposal (a)			New Proposal (b)
Size of Estimate	deft = 1.0	deft = 3.22	deft = 2.51	deft = 1.0	deft = 1.63	deft = 1.29
100	49	160	120	54	89	70
500	110	350	280	120	200	160
1,000	160	500	390	170	280	220
5,000	350	1,100	870	380	630	500
10,000	490	1,600	1,200	540	890	700
50,000	1,100	3,500	2,700	1,200	2,000	1,600
100,000	1,500	4,900	3,800	1,700	2,800	2,200
175,000	2,000	6,400	5,000	2,200	3,600	2,800
200,000	2,100	6,800	5,300	2,300	3,800	3,000
500,000	3,100	10,000	7,900	3,500	5,700	4,500
1,000,000	3,900	13,000	98,000	4,300	7,100	5,600
1,267,000	4,000	13,000	10,000	4,500	7,300	5,800
1,442,000	4,000	13,000	10,000	4,500	7,300	5,800

(a)_47,000 households, 75 per block
(b)_38,000 households, 15 per block

Table 5: Relative Standard Errors for Quarterly LFS (%)

	Original Proposal (a)			New Proposal (b)
Size of Estimate	deft = 1.0	deft = 3.22	deft = 2.51	deft = 1.0	deft = 1.63	deft = 1.29
100	49	160	120	54	89	70
500	22	71	55	24	40	31
1,000	16	50	39	17	28	22
5,000	7.0	22	17	7.7	13	9.9
10,000	5.0	16	12	5.4	8.9	7.0
50,000	2.2	7.0	5.5	2.4	3.9	3.1
100,000	1.5	4.9	3.8	1.7	2.8	2.2
175,000	1.1	3.7	2.9	1.3	2.1	1.6
200,000	1.1	3.4	2.7	1.2	1.9	1.5
500,000	0.63	2.0	1.6	0.70	1.1	0.90
1,000,000	0.39	1.3	1.0	0.43	0.71	0.56
1,267,000	0.32	1.0	0.80	0.35	0.58	0.45
1,442,000	0.28	0.90	0.70	0.31	0.51	0.40

(a)_47,000 households, 75 per block
(b)_38,000 households, 15 per block

Table 6: Relative Standard Errors for Regional Estimates, Quarterly LFS (%)(c)

	Original Proposal (a)		New Proposal (b)
Region	Unemployment	Employment	Unemployment	Employment
Border	12	2.5	6.6	1.4
Dublin	6.1	1.4	3.4	0.80
Mid-East	12	2.5	6.6	1.4
Midland	17	3.5	9.5	2.0
Mid-West	15	2.7	8.2	1.5
South-East	11	2.5	6.1	1.4
South-West	9.6	2.2	5.4	1.2
West	13	2.6	7.3	1.5
State	3.7	0.80	2.1	0.45

(a)_47,000 households, 75 per block
(b)_38,000 households, 15 per block
(c)_Assumes same design effect in each region.

Table 7: Standard Errors on Quarterly and Annual Changes

	Original Proposal (a)				New Proposal (b)
	Unemployment		Employment		Unemployment		Employment
Rotation Scheme	Quarterly Change	Annual Change	Quarterly Change	Annual Change	Quarterly Change	Annual Change	Quarterly Change	Annual Change
in for 5	6,490	7,840	8,480	9,560	3,640	4,390	4,820	5,440
in for 6	6,360	7,780	8,150	9,310	3,570	4,360	4,630	5,290
in for 8	6,190	7,710	7,710	8,980	3,470	4,320	4,380	5,100

(a) _47,000 households, 75 per block
(b)_38,000 households, 15 per block

Table 8: Optimum Cluster Size

C1/C2	δ = 0.01	0.02	0.03	0.04	0.05	0.07	0.10
1	9.9	7.0	5.7	4.9	4.4	3.6	3.0
2	14	9.9	8.0	6.9	6.2	5.2	4.2
3	17	12	9.8	8.5	7.5	6.3	5.2
4	20	14	11	9.8	8.7	8.3	6.0
5	22	16	13	11	9.7	9.3	6.7
10	32	22	18	16	17	13.1	9.5

Back to top

Report On Sample Design And Estimation Procedures

ANNUAL AND QUARTERLY LABOUR FORCE SURVEYS - REPORT ON SAMPLE DESIGN AND ESTIMATION PROCEDURES

Dr David Steel

February 1997

1. INTRODUCTION

2. THE ANNUAL LABOUR FORCE SURVEY

Sample Size

Stratification

Selection Methods

Clustering

Field Work

Weighting and Estimation

Reliability of Estimates

3. QUARTERLY LABOUR FORCE SURVEY

Proposed Design

Sample Size

Stratification

Selection Methods

Clustering

Field Work

Weighting and Estimation

Sample Size

Stratification

Selection Methods

Rotation Pattern

Clustering

Field Work

Weighting

Reliability of Estimates

Alternative Designs

Other Considerations

Trend estimation

Rolling Quarterly Estimates

Monthly and Weekly estimates

Composite estimation

Variance estimation

Numbering System

Household Estimates

Contact us

Links

Careers

Follow us