CSO Frontier Series outputs may use new methods which are under development and/or data sources which may be incomplete, for example new administrative data sources. Particular care must be taken when interpreting the statistics in this release.
Learn more about CSO Frontier Series outputs.
This section gives an overview of the methodological approach taken to producing population estimates from administrative data. The results presented in this release are based upon linking administrative data sets from a range of public service bodies, which are listed in the Background Notes. A series of rules are then applied to decide who should be included in or excluded from the population.
This project involves creating an estimate of the number of people usually resident in Ireland in April 2022. The rationale for this approach is that almost every person who usually lives in Ireland has some level of interaction with the State directly or indirectly, through a spouse or dependant, such as through taxation, benefit or pension payments or enrolment in education. Administrative data records from other government bodies allows the CSO to identify all persons who interacted with the State around April 2022. This is then used as a proxy for being present in the country at that time.
Before using personal administrative data for statistical purposes, the CSO removes all identifying personal information. This includes name, address, the Personal Public Service Number (PPSN), a unique number used by people in Ireland to access social welfare benefits, personal taxation and other public services and the Eircode, a unique geographical code identifying the location of every dwelling in the state. A pseudonymised Protected Identifier Key (PIK) is created by the CSO when the PPSN is removed. This PIK is unique and non-identifiable and is only used by the CSO. A similar PIK is created when the Eircodes are removed from a persons records.
Using these PIKs enables the CSO to link and analyse data for statistical purposes, while protecting the security and confidentiality of the individual data. All records in the matched datasets are pseudonymised and the results are in the form of statistical aggregates which do not identify any individuals.
The first step taken in this project was to identify available data sources that contain people who are resident in Ireland based on their interactions with public sector bodies. A preliminary analysis of administrative data available to the CSO was carried out and multiple suitable datasets were identified. The following key requirements needed to be met:
The approach taken was to examine cohorts of the population and assess what available data could be used to ensure persons active in that cohort were included in the population count. The cohorts were broadly aged based and consisted of the following groups:
Many persons belong to more than one of these cohorts, but this does not affect their inclusion in the population. Linkage across multiple administrative datasets through the pseudonymised version of the PPSN used as a unique identifier ensures that persons who appear in more than one data source will be counted only once in the population.
One of the principal challenges when deriving an administrative population count rests with the timing of when the various administrative data sets become available to the CSO. This was a key factor in deciding the ‘reference date’ for the population estimates which is the specific time period for which the population is being estimated. In order to optimise the value of the estimates, it is necessary to minimise the time lag between the reference date and the publication of the population estimates. ‘Real time’ data sources, such as the PMOD dataset received from the Revenue Commissioners and the DSP Payments file from the Department of Social Protection, are available to the CSO within weeks of being created, whilst some other sources are available with up to a 14-month time lag. An April 2022 reference date was selected to maximise the use of available datasets. The changing landscape in respect of timeliness and access to administrative data sources will determine any changes in methodology in future iterations of these estimates. The full list of datasets used in this publication is in the Background Notes.
After the datasets covering the various population cohorts were identified, the next step was to combine them to create a ‘raw’ population file containing all persons who had any interaction with public sector bodies in a time period around April 2022. This resulted in a file of approximately 5.5 million people. In order to refine this figure to those persons who usually lived in Ireland and exclude persons who may have only been in the country temporarily, it was then necessary to create the ‘usually resident’ population.
Usual residence is a widely used concept in demographic statistics. A person’s usual residence is defined as follows:
The place where a person normally spends their daily period of rest, regardless of temporary absences for purposes of recreation, holidays, visits to friends and relatives, business, medical treatment or religious pilgrimage.
The following qualifications apply to usual residence.
Only the following persons shall be considered to be usual residents of a specific geographical area in question:
Where the circumstances described in point (i) or (ii) cannot be established, "usual residence" shall mean the place of legal or registered residence.
In surveys and censuses, the concept of usual residence can be established through direct questions. This is not possible using administrative data. Instead, usual residence is established through patterns of interaction with public sector bodies. Persons who display continuous interaction with the State, as recorded in administrative data, for 12 months around a reference period are deemed to be usually resident. The rules used to determine usual residence in this publication are outlined in the section below.
The population concept used in the Census of Population is the ‘De facto’ population. This is defined as all persons present in a country on a specific night, irrespective of where they usually live. This may mean that persons who usually live outside a country but are temporarily present in it on the night of the census are included, whereas persons who usually live in the country but are temporarily abroad are excluded. It is not possible to produce a de facto population count from administrative data.
The administrative data used in this report was not initially created for measuring the population, but rather for service delivery and day to day operations of public bodies. However, activity in administrative data can be a sign of presence in the State. A set of rules were developed to align the population as closely as possible to the definition of usual residence.
Using a reference period of April 2022, rules were applied to include or exclude individuals. These rules varied depending on the frequency of the receipt of the dataset by the CSO. Some datasets, for example the Primary Online Database of primary school children, is received annually by the CSO. Other datasets, for example the PMOD dataset are received monthly. Having access to more frequent datasets facilitates more detailed rules to be applied in deciding on whether a person was usually resident in April 2022. Details on the frequency of receipt of each dataset used in this project are provided in the Background Notes.
The key rule applied to annual datasets is that all persons appearing on these datasets are counted as a usual resident unless:
There are several key rules used to decide whether persons appearing on monthly datasets are counted as usually resident. It should be noted that the months under consideration to decide usual residence are determined by the completeness and timeliness of datasets and to allow for publication approximately 12-14 months after the reference period. For this project the months included were January 2021 to December 2022. Monthly datasets include the PMOD file from the Revenue Commissioners and the DSP Payments file from the Department of Social Protection.
There are four key groups of people who we are trying to identify in the monthly datasets. The four groups and their criteria for inclusion or exclusion in the usually resident population are outlined below.
In cases where persons flagged as potential outgoing migrants or seasonal workers appear on annual datasets they will be included in the population. For example a third level student who was also employed for a 5 month period would be included in the population.
Two further universal rules were applied to all datasets. These were:
Information about births and deaths and the timing of them was also available to the CSO through administrative data sources.
The application of these rules to the ‘raw’ population resulted in the ‘usually resident’ population figure of 5.33 million.
The rules outlined above represent a subjective assessment by the CSO of how to derive the usually resident population from the administrative data sources currently available to the office. They have been devised based on a range of considerations including the variables available on the datasets, the frequency of their creation by the source department or agency and the timeliness of their receipt by the CSO. It is possible that alternative sets of rules could be used, which would result in different population figures at State and sub-State levels. For illustrative purposes, the CSO has applied two different rule sets to demonstrate alternative ways in which the usually resident population could be interpreted.
This population count is estimated at over 5.5 million persons.
Narrowing the 12-month activity period to strictly within the reference period April 2022. This alternative rule set relies on real time, monthly administrative data as follows;
Attribute data provides additional insight into the population and includes key analysis variables such as date of birth, sex and nationality. The attribute data that appears in this publication are sourced from the various administrative data files used to construct the population. They are presented in this report in order to display the scope of what is currently possible in using the National Data Infrastructure to produce census-type outputs. The range of variables available in administrative data sets is currently limited in comparison with what can be produced from a full Census of Population.
Geography describes the location of persons and dwellings. It can be high level (e.g. country or county) or low level (e.g. Local Electoral Area or Electoral Division). The production of population statistics at lower levels of geography is a key objective of this publication. One of the key strengths of the Census of Population is the ability to produce data for a wide range of geographies every 5 years. The demand from data users for more frequent low-level geography statistics on the population is growing. There will also soon be EU legislation enacted that will require all member states to produce annual population and related attribute data for low levels of geography. This is currently not possible in Ireland outside of a census year.
The key variable in adding geography to the population file in this project is the Eircode PIK. When a person's record has an associated Eircode PIK, it is possible to produce statistics on that person at all levels of geography. Eircode PIKs were assigned by identifying the address with the latest administrative activity associated with the person, if that address had an associated Eircode PIK, it was assigned to the person. If no Eircode PIK was associated with the address, a matching exercise was conducted to establish if there was an Eircode PIK for that person on an alternative dataset. Where no Eircode PIK was found, the CSO matched the address to an address database to allocate an Eircode PIK. In some cases, particularly when addresses were non-unique, it was not possible to allocate an Eircode PIK. In these cases, a Small Area Code was generally allocated. Small Area Codes are the lowest level of geography used in the Census of Population publications, but in the absence of an Eircode PIK they cannot be used to allocate persons to individual dwellings, which is required for producing certain household statistics. For this publication, it was possible to directly assign an Eircode PIK to over 90%, the matching exercise brought the percentage close to 99% of the usually resident population. In the remaining cases a small area code was assigned.
The key focus of the geocoding exercises was on a number of administrative data sets. These were:
These files are regarded as robust sources of geography as they reflect activity close to or in the reference period used in this publication. This entails that the address data in them is likely to be up to date. Some administrative data sets received by the CSO contain address information, but not an indicator as to whether the address is up to date or when it was last validated by the public sector body compiling it.
COVAX access is under Section 11. It is important to note that COVAX vaccination data is only used for geocoding purposes only and not as an IPEADS activity indicator. The CSO’s use of COVAX data for official statistics is fully governed by CSO Data protection protocols. The CSO’s access to sensitive and confidential health records is also underpinned by the written permission of the Minister for Health and provided for under Section 30 of the Statistics Act 1993 - ‘Use of Records of Public Authorities for Statistical Purposes’. This permission has been duly granted by the Minister.
Using the geography information allocation described here allowed for assignment of a Small Area Code to 99.9% of the usual residence population count. As coverage of Eircode on administrative data files received by the CSO increases over time, it should become possible to produce reliable population statistics at Small Area Level with improved family coding. Eircode coverage on administrative data sets improves when public sector bodies include collection of Eircode whenever members of the public are providing them with address information.
Learn about our data and confidentiality safeguards, and the steps we take to produce statistics that can be trusted by all.