Back to Top

 Skip navigation

Methodology

CSO Frontier Series Research Paper

CSO research publication, , 11am
Frontier Series Output

CSO Frontier Series outputs may use new methods which are under development and/or data sources which may be incomplete, for example new administrative data sources. Particular care must be taken when interpreting the statistics in this release.
Learn more about CSO Frontier Series outputs.

Methodology

This section gives an overview of the methodological approach taken to producing population estimates from administrative data. The results presented in this release are based upon linking administrative data sets from a range of public service bodies, which are listed in the Background Notes. A series of rules are then applied to decide who should be included in or excluded from the population.

This project involves creating an estimate of the number of people usually resident in Ireland in April 2021. The rationale for this approach is that almost every person who usually lives in Ireland has some level of interaction with the State directly or indirectly, through a spouse or dependant, such as through taxation, benefit or pension payments or enrolment in education. Administrative data records from other government bodies allows the CSO to identify all persons who interacted with the State around April 2021. This is then used as a proxy for being present in the country at that time.

Pseudonymisation

Before using personal administrative data for statistical purposes, the CSO removes all identifying personal information. This includes name, address, the Personal Public Service Number (PPSN), a unique number used by people in Ireland to access social welfare benefits, personal taxation and other public services and the Eircode, a unique geographical code identifying the location of every dwelling in the state. A pseudonymised Protected Identifier Key (PIK) is created by the CSO when the PPSN is removed. This PIK is unique and non-identifiable and is only used by the CSO. A similar PIK is created when the Eircodes are removed from person records.

Using the PIKs enables the CSO to link and analyse data for statistical purposes, while protecting the security and confidentiality of the individual data. All records in the matched datasets are pseudonymised and the results are in the form of statistical aggregates which do not identify any individuals.

Identification of data sources

The first step taken in this project was to identify available data sources that contain people who are resident in Ireland based on their interactions with public sector bodies. A preliminary analysis of administrative data available to the CSO was carried out and multiple suitable datasets were identified. The following key requirements needed to be met:

  • the datasets were linkable using the pseudonymised version of the PPSN
  • there was good coverage of pseudonymised PPSN in the dataset to facilitate linking
  • the dataset was available for time periods relevant to deriving usual residence in April 2021

The approach taken was to examine cohorts of the population and assess what available data could be used to ensure persons active in that cohort were included in the population count. The cohorts were broadly aged based and consisted of the following groups:

  • Pre-school children
  • Primary school children
  • Secondary school children
  • Post-secondary and Third level students
  • Employees
  • Self-employed persons
  • Welfare recipients
  • Spouses or partners not included in other administrative data sets
  • Pensioners

Many persons belong to more than one of these cohorts, but this does not affect their inclusion in the population. Linkage across multiple administrative datasets through the pseudonymised version of the PPSN used as a unique identifier ensures that persons who appear in more than one data source will be counted only once in the population. 

Timeliness of administrative data sets

One of the principal challenges when deriving an administrative population count rests with the timing of when the various administrative data sets become available to the CSO. This was a key factor in deciding the ‘reference date’ for the population estimates which is the specific time period for which the population is being estimated. In order to optimise the value of the estimates, it is necessary to minimise the time lag between the reference date and the publication of the population estimates. ‘Real time’ data sources, such as the PMOD dataset received from the Revenue Commissioners and the DSP Payments file from the Department of Social Protection, are available to the CSO within weeks of being created, whilst some other sources are available with up to a 14-month time lag. An April 2021 reference date was selected to maximise the use of available datasets. The changing landscape in respect of timeliness and access to administrative data sources will determine any changes in methodology in future iterations of these estimates. The full list of datasets used in this publication is in the Background Notes.

Combining administrative data sets

After the datasets covering the various population cohorts were identified, the next step was to combine them to create a ‘raw’ population file containing all persons who had any interaction with public sector bodies in a time period around April 2021. This resulted in a file of approximately 5.5 million people. In order to refine this figure to those persons who usually lived in Ireland and exclude persons who may have only been in the country temporarily, it was then necessary to create the ‘usually resident’ population.

Usual residence

Usual residence is a widely used concept in demographic statistics. A person’s usual residence is defined as follows:

The place where a person normally spends their daily period of rest, regardless of temporary absences for purposes of recreation, holidays, visits to friends and relatives, business, medical treatment or religious pilgrimage.

The following qualifications apply to usual residence.

Only the following persons shall be considered to be usual residents of a specific geographical area in question:

  1. those who have lived in their place of usual residence for a continuous period of at least 12 months before the reference time (date); or
  2. those who arrived in their place of usual residence during the 12 months before the reference date with the intention of staying there for at least one year.

Where the circumstances described in point (i) or (ii) cannot be established, "usual residence" shall mean the place of legal or registered residence.

In surveys and censuses, the concept of usual residence can be established through direct questions. This is not possible using administrative data. Instead, usual residence is established through patterns of interaction with public sector bodies. Persons who display continuous interaction with the State, as recorded in administrative data, for 12 months around a reference period are deemed to be usually resident. The rules used to determine usual residence in this publication are outlined in the section below.

The population concept used in the Census of Population is the ‘De facto’ population. This is defined as all persons present in a country on a specific night, irrespective of where they usually live. This may mean that persons who usually live outside a country but are temporarily present in it on the night of the census are included, whereas persons who usually live in the country but are temporarily abroad are excluded. It is not possible to produce a de facto population count from administrative data.

Usual residence rules

The administrative data used in this report was not initially created for measuring the population, but rather for service delivery and day to day operations of public bodies. However, activity in administrative data can be a sign of presence in the State. A set of rules were developed to align the population as closely as possible to the definition of usual residence.

Using a reference period of April 2021, rules were applied to include or exclude individuals. These rules varied depending on the frequency of the receipt of the dataset by the CSO. Some datasets, for example the Primary Online Database of primary school children, is received annually by the CSO. Other datasets, for example the PMOD dataset are received monthly. Having access to more frequent datasets facilitates more detailed rules to be applied in deciding on whether a person was usually resident in April 2021. Details on the frequency of receipt of each dataset used in this project are provided in the Background Notes.

Annual datasets

The key rule applied to annual datasets is that all persons appearing on these datasets are counted as a usual resident unless:

  1. There is an indicator on the dataset to flag that they are not resident in Ireland or
  2. There is an indicator on the dataset to flag that they have left Ireland
    Monthly datasets

    There are several key rules used to decide whether persons appearing on monthly datasets are counted as usually resident. It should be noted that the months under consideration to decide usual residence are determined by the completeness and timeliness of datasets and to allow for publication approximately 12-14 months after the reference period. For this project the months included were January 2020 to December 2021. Monthly datasets include the PMOD file from the Revenue Commissioners and the DSP Payments file from the Department of Social Protection. 

    There are four key groups of people who we are trying to identify in the monthly datasets. The four groups and their criteria for inclusion or exclusion in the usually resident population are outlined below.

    1. The first group of interest are usually resident persons who are continuously active over a long period of time. Most people are in this group.
      • Criteria: Include persons who were active in monthly datasets between January 2020 and December 2021 over a period of 12 months including the reference month (April)
    2. The second group are persons who are potentially outgoing migrants and would not be considered usual residents.
      • Criteria: Exclude persons who were not active in monthly datasets between April and December 2021
    3. The third group are persons who are in Ireland for a less than a year, such as seasonal workers, and would not be considered usual residents.
      • Criteria: Exclude persons who were active for a period of less than 12 months in the datasets between January 2020 and December 2021
    4. The fourth group are person who are potentially incoming migrants and would be considered usually residents.
      • Criteria: Include persons who were first active between February and April 2021 and were still active in December 2021

    In cases where persons flagged as potential outgoing migrants or seasonal workers appear on annual datasets they will be included in the population. For example a third level student who was also employed for a 5 month period would be included in the population.

    Additional rules

    Two further universal rules were applied to all datasets. These were:

    1. Persons who were born after 01st April 2021 are excluded from the count and
    2. Persons who died on or before 01st April 2021 were excluded from the count.

    Information about births and deaths and the timing of them was also available to the CSO through administrative data sources.

    The application of these rules to the ‘raw’ population resulted in the ‘usually resident’ population figure of 5.28 million.

    Alternative rules

    The rules outlined above represent a subjective assessment by the CSO of how to derive the usually resident population from the administrative data sources currently available to the office. They have been devised based on a range of considerations including the variables available on the datasets, the frequency of their creation by the source department or agency and the timeliness of their receipt by the CSO.  It is possible that alternative sets of rules could be used, which would result in different population figures at State and sub-State levels.  For illustrative purposes, the CSO has applied two different rule sets to demonstrate alternative ways in which the usually resident population could be interpreted.

    1. A signs of life approach which includes all persons with any evidence of activity in the calendar year of the reference period. Except for basic exclusions for deceased or emigrated categories, no level of activity or exclusion parameters or rules have been applied.

      This population count is estimated at 5.48 million persons.

    2. Narrowing the 12-month activity period to strictly within the reference period April 2021. This alternative rule set relies on real time, monthly administrative data as follows; 

    1. include persons who were active in October 2020 and still active in September 2021.
    2. include persons in annual datasets unless:
      1. there is an indicator on the dataset to flag that they are not resident in Ireland or
      2. there is an indicator on the dataset to flag that they have left Ireland
    3.  exclude persons were born after 01st April 2021
    4.  exclude persons who died on or before 01st April 2021

       This population count is estimated at 5.1 million persons.

    Attribute data

    Attribute data provides additional insight into the population and includes key analysis variables such as date of birth, sex and nationality. The attribute data that appears in this publication are sourced from the various administrative data files used to construct the population. They are presented in this report in order to display the scope of what is currently possible in using the National Data Infrastructure to produce census-type outputs. The range of variables available in administrative data sets is currently limited in comparison with what can be produced from a full Census of Population.

    Geography

    Geography describes the location of persons and dwellings. It can be high level (e.g. country or county) or low level (e.g. Local Electoral Area or Electoral Division). The production of population statistics at lower levels of geography is a key objective of this publication. One of the key strengths of the Census of Population is the ability to produce data for a wide range of geographies every 5 years. The demand from data users for more frequent low-level geography statistics on the population is growing. There will also soon be EU legislation enacted that will require all member states to produce annual population and related attribute data for low levels of geography. This is currently not possible in Ireland outside of a census year.

    The key variable in adding geography to the population file in this project is the Eircode PIK. When a person's record has an associated Eircode PIK, it is possible to produce statistics on that person at all levels of geography. Eircode PIKs were assigned by identifying the address associated with the person during the reference period and if no Eircode was associated with the address, a matching exercise was conducted to establish if there was an Eircode PIK for that person on an alternative dataset. Where no Eircode PIK was found, the CSO matched the address to an address database to allocate an Eircode PIK. In some cases, particularly when addresses were non-unique, it was not possible to allocate an Eircode PIK. In these cases, a Small Area Code was generally allocated. Small Area Codes are the lowest level of geography used in the Census of Population publications, but in the absence of an Eircode PIK they cannot be used to allocate persons to individual dwellings, which is required for producing certain household statistics. For this publication, it was possible to directly assign an Eircode PIK to approximately 80% of the usually resident population. In the remaining 20% of cases a small area code was assigned.

    The key focus of the geocoding exercises was on a number of administrative data sets. These were:

    • Revenue Commissioners: Local Property Tax Returns
    • Department of Social Protection: Central Records System
    • Department of Social Protection: Pandemic Unemployment Payments
    • Higher Education Authority: Third Level Enrolments
    • Health Service Executive: COVAX Dataset
    • Residential Tenancy Board: Tenancies
    • Student Universal Support Ireland: Further and Higher education grants

    These files are regarded as robust sources of geography as they reflect activity close to or in the reference period used in this publication. This entails that the address data in them is likely to be up to date. Some administrative data sets received by the CSO contain address information, but not an indicator as to whether the address is up to date or when it was last validated by the public sector body compiling it.

    COVAX access is under Section 11. It is important to note that COVAX vaccination data is only used for geocoding purposes only and not as an IPEADS activity indicator. The CSO’s use of COVAX data for official statistics is fully governed by CSO Data protection protocols. The CSO’s access to sensitive and confidential health records is also underpinned by the written permission of the Minister for Health and provided for under Section 30 of the Statistics Act 1993 - ‘Use of Records of Public Authorities for Statistical Purposes’. This permission has been duly granted by the Minister.

    Using the geography information allocation described here allowed for assignment of a Small Area Code to 99.9% of the usual residence population count. Data has not been produced at Electoral Division or Small Area level in this publication however, as it was deemed insufficiently robust. As coverage of Eircode on administrative data files received by the CSO increases over time, it should become possible to produce reliable population statistics at Small Area Level with improved family coding. Eircode coverage on administrative data sets improves when public sector bodies include collection of Eircode whenever members of the public are providing them with address information.