CSO Frontier Series outputs may use new methods which are under development and/or data sources which may be incomplete, for example new administrative data sources. Particular care must be taken when interpreting the statistics in this release.
Learn more about CSO Frontier Series outputs.
This new report ‘Irish Population Estimates from Administrative Data Source’ (IPEADS) aims to estimate the usually resident population of Ireland for April 2020. It is from the CSO Frontier Series of outputs and uses pseudonymised administrative data from public sector bodies to produce experimental estimates of the population in Ireland and includes breakdowns by several variables including sex, age, nationality, marital status and economic status over three levels of geography, State, County and Electoral Division.
The quality of administrative population counts ultimately depends on the quality, relevance and availability of administrative data to the CSO. Note that individual records on administrative data sets were not checked or corrected. The purpose of this project is to demonstrate the potential for administrative data to be used to produce detailed annual demographic statistics.
Protected Identifier Keys
Before using personal administrative data for statistical purposes, the CSO removes all identifying personal information. This includes the Personal Public Service Number (PPSN), a unique number used by people in Ireland to access social welfare benefits, personal taxation and other public services. A pseudonymised Protected Identifier Key (PIK) is created by the CSO when the PPSN is removed. This PIK is unique and non-identifiable and is only used by the CSO.
Using the PIK enables the CSO to link and analyse data for statistical purposes, while protecting the security and confidentiality of the individual data. All records in the matched datasets are pseudonymised and the results are in the form of statistical aggregates which do not identify any individuals.
The linkage and analysis was undertaken by the Central Statistics Office (CSO) for statistical purposes in line with the Statistics Act, 1993 and the CSO Data Protocol. The related transparency notice is available here.
The CSO is committed to broadening the range of high-quality information it provides on societal and economic change. The large increase in the volume and nature of secondary data in recent years poses a variety of challenges and opportunities for institutes of national statistics. Joining secondary data sources in a safe manner across public service bodies, while adhering to statistical and data protection legislation, can provide new analysis and outputs to support decision-making and accountability in a way that is not possible using discrete datasets. Furthermore, a coordinated approach to data integration can lead to cost savings, greater efficiency and a reduction in duplication.
The CSO has a formal role in coordinating the integration of statistical and administrative data across public service bodies that together make up the Irish Statistical System (ISS). Underpinning this integration is the development of a National Data Infrastructure (NDI) – a platform for linking data across the administrative system using unique identifiers for individuals, businesses and locations. The data linking for statistical purposes is carried out by the CSO on pseudonymised datasets using only those variables which are relevant to the research being undertaken. A strong focus on data integration, which requires the collection and storage of identifiers such as PPSN and Eircodes, is a priority of the ISS in its goal of improving the analytical capacity of the system.
Data protection is a core principle of the CSO and is central to the development of the NDI. As well as the strict legal protections set out in the Statistics Act, 1993, and other existing regulations, we are committed to ensuring compliance with all data protection requirements. These include the Data Sharing and Governance Act (2019) and the General Data Protection Regulation (GDPR, EU 2016/679).
Children included in IPEADS are collected using the following data sources:
Child Benefit (CB)
The Child Benefit dataset contains information on eligible children’s benefit payments to parents/guardians. Data is supplied by the Department of Employment and Social Protection on an annual basis. The CRS Client file (see Central Records System (CRS) below) is used to identify children born in the year prior to the reference date and not yet in receipt of Child Benefit.
Primary Pupils Database (POD)
The Primary Pupils Database contains data on each student enrolled in each recognised primary school collected by the Department of Education. Data is supplied on an annual basis.
Post-Primary Pupils Database (PPPDB)
The Post-Primary Pupil Database is currently the only national archive of student enrolment at post-primary schools. Individual and personal data on each student enrolled in each recognised post-primary school are collected by the Department of Education. Data is supplied on an annual basis.
Primary Care Reimbursement Service (PCRS – GMS)
The PCRS is responsible for making payments to healthcare professionals – doctors, dentists, pharmacists and optometrists/ophthalmologists – for the free or reduced costs services they provide to the public across a range of community health schemes. The scheme is the infrastructure through which the HSE delivers a significant proportion of Primary Care to the public. PCRS also manages the National Medical Card Unit (NMCU) which was established in 2011 to process all Medical Card and GP Visit Card applications at a national level. Data is supplied by the HSE on an annual basis.
Students included in IPEADS are collected using the following data sources:
Higher Education Authority (HEA)
The Higher Education Authority data provides details on annual enrolments and graduations from the publicly funded universities and institutes of technology in Ireland. Data is supplied by the HEA on an annual basis.
Programme Learner Support System (PLSS)
The Programme Learner Support System is used to manage course information, learner records and reporting by SOLAS (an tSeirbhís Oideachais Leanúnaigh agus Scileanna). Solas is the Further Education and Training Authority. They provide a clear, integrated pathway for learners seeking to enrol in Further Education and Training. Data is supplied by SOLAS on an annual basis.
Quality and Qualifications Ireland (QQI)
Quality and Qualifications Ireland is an amalgamation of the previously operational Further Education and Training Awards Council (FETAC); the Higher Education and Training Awards Council (HETAC); the Irish Universities Quality Board (IUQB) and the National Qualifications Authority of Ireland (NQAI). Data is supplied on an annual basis.
Student Universal Support Ireland (SUSI)
Student Universal Support Ireland contains funding information for all higher and further education grants. SUSI offers funding to eligible students in approved full-time, third-level education. Data is supplied on an annual basis.
HEA Springboard
HEA Springboard and ICT provides information on students who have undertaken HEA springboard or ICT courses. This data includes course details and basic demographic information for enrolled students. Data is supplied by the HEA on an annual basis.
Employees, pensioners and persons in receipt of welfare payments included in IPEADS are collected using the following data sources:
DSP Payments (DSP)
Department of Social Protection’s database (real-time) from the Business Object Model implementation (BOMi) and Integrated Short-Term Payments System (ISTS) contains information on welfare payments, including state pension, unemployment benefit and child benefit (adults only). Data is supplied monthly.
PAYE Modernisation (PMOD)
The Revenue Commissioners’ PAYE Modernisation (PMOD) dataset contains information on payslip submissions of persons in employment and on occupational pensions from 2019 onwards. Data is supplied monthly.
Self-employed persons included in IPEADS are collected using the following data sources:
Form 11 Income Tax returns (ITForm11)
The ITForm11 contains the annual income tax returns of the self-employed. Data for a calendar year is only complete three years after the reference year, because of the nature of self-assessment, although the majority of records are available about 14 months after the reference year.
Linkage across multiple administrative datasets through a pseudonymised version of the PPSN used as a unique identifier ensures that persons who appear in more than one data source will be counted only once in the population.
Data sources used to assign geography and other attribute variables:
Residential Tenancies Board (RTB) Register
The Residential Tenancies Board register contains information on all tenancies registered by landlords, both private and Approved Housing Bodies (AHB). Data is supplied by the RTB on a quarterly basis.
Local Property Tax (LPT)
The LPT file contains one record - the most recent LPT return - for each of the properties in the State. A local property tax return is not an indicator of activity but used to determine location (among other sources). Data is supplied by the Revenue Commissioner on an annual basis.
Central Records System (CRS)
The Central Records System is a legacy system within the Department of Social Protection (DSP) which holds data on their customers held on different systems within DSP. Data from the CRS used in this analysis includes information on age, sex, address, nationality and relationships (for example dependent children and marital status). Data is supplied by the DSP on a quarterly basis.
Address data from the HEA dataset records students' usual place of residence outside term time and is used to assign up to date off campus geography data for students.
Note: some individuals that may be administratively active may not live in the State, for example professionals commuting to work from Northern Ireland or individuals living abroad and receiving a state pension. Persons are excluded from the population count where indicators for usually resident outside Ireland are available from the administrative data sources listed above.
Information about where people live is contained in many of the administrative data sources used to derive the estimated population count. The quality of this administrative geographic location varies between different administrative data sources and very much depends on the coverage and accuracy of address information on each dataset. Where available, geocoded location data was first sourced from the RTB, LPT and HEA datasets. If good quality geocoded location data was not available on these datasets, then data from the CRS was used.
The presence of an accurate Eircode with an address on administrative data sources significantly enhances its statistical value for the purposes of publications like this. It facilitates data linkage and more accurate spatial analysis.
Approximately 60% of the person records in this publication had Eircodes associated with them in the administrative data sets from which they were sourced. For the remaining records, the address data in administrative data sources was matched against the national address database to add an Eircode. This database contains the geographical details of over 2.3 million residential and commercial address points across the State. When this was not possible, primarily in the case of non-unique addresses, a Small Area code was added to the record in lieu of an Eircode.
Population totals at Electoral Division and Small Area level are not provided as part of this publication as they are not yet deemed of sufficient quality. Improved coverage of up to date Eircodes on the administrative data sets received by the CSO will facilitate the production more robust population statistics at the lowest levels of geography. See also Methodology.
As outlined in the Population Estimates chapter, data on nationality is derived from information collected by the DSP. For many individuals this data may have been collected several years ago and in some cases, people may no longer identify with the nationality recorded here. In particular, many people have become Irish citizens by naturalisation in the last 10 years or so. This more recent status may not be reflected in these statistics and may partly explain why the figures in this report differ from those reported elsewhere such as in the census. Passport and citizenship data could help improve the currency of the nationality statistics produced in this publication.
Nationality is classified using The International Standard for country codes (ISO 3166) and presented as follows;
Irish | French | Polish | American (US) | Nigerian |
UK | German | Portuguese | Australian | Pakistani |
Bulgarian | Hungarian | Romanian | Brazilian | South African |
Croatian | Italian | Slovak | Chinese | Other nationalities |
Czech | Latvian | Spanish | Filipino | Not stated |
Dutch | Lithuanian | Other EU28 | Indian | All nationalities |
It can be challenging to ascertain what a person’s ‘principal’ economic status (PES) is from administrative data. In this publication, generally a person’s economic status is determined based upon the administrative dataset on which they are active around the reference period. For example, a person with regular monthly activity recorded on PMOD will usually be assigned an economic status of ‘at work’.
Where persons appear on multiple datasets around the reference period applied, the assignment of economic status is based on a hierarchy of criteria. For example, a student at university who also works will be assigned an economic status of ‘Student or pupil’.
Persons receiving pandemic unemployment payment (PUP) who were previously in employment in the months prior to the COVID-19 pandemic will be assigned a principal economic status of ‘at work’.
An economic status of ‘unemployed’ is assigned to persons in receipt of an unemployment payment from the DSP around the reference period.
Note: The official labour force and unemployment estimates are reported in the Labour Force Survey (LFS). The principal economic status concept reported in the LFS is derived from a person’s subjective assessment of their own economic status. The LFS uses the International Labour Organisation (ILO) definition of unemployment. These are derived by asking respondents a specific set of questions. It is not possible to glean the same information from administrative data. The underlying methodologies for producing PES in the LFS/Census and IPEADS are very different. As a result, there will be differences between the figures output in each of the publications.
The economic sector classification (NACE) is based on the ‘Statistical Classification of Economic Activities in the European Community, Rev. 2 (2008)’ which can be accessed on the Eurostat website. NACE groups organizations according to their business activities. NACE code has been assigned to persons at work using data from PAYE Modernisation (PMOD) dataset or Form 11 Income Tax returns (ITForm11) for self-employed persons.
The data sources selected for this analysis were selected based on their capacity to meet the following criteria;
The administrative data sources contributing to this project vary in respect of the criteria listed here. The administrative data landscape is constantly developing. The CSO will continue to assess the quality and availability of existing and new data sources as they become available for inclusion and further development of this project.
There are several different ways to undertake the data collection process in a population and housing census. Table 4.1 outlines the range of census data collection methods common throughout UNECE countries. In Ireland the census is primarily traditional but included as combined in the UNECE census wiki summary table below due to the use of the national address database and some administrative data for the purposes of imputation.
Table 4.1 Censuses of the 2020 round. Plans and practices of UNECE countries for the population and housing censuses of the 2020 round | ||
---|---|---|
Main Census Type | Description | Countries |
Register-Based | Register-based census: based on data from registers and administrative sources, with no field data collection; may also include data from existing surveys not conducted for census purposes | Austria, Belgium, Denmark, Iceland, Latvia, Lithuania, Netherlands, Norway, Spain, Sweden, Turkey, Finland, Slovenia |
Combined | Combined census: some data obtained directly from registers or administrative sources, while other data are collected through field data collection conducted specifically for census purposes, covering the whole population or only a sample | Armenia, Belarus, Czechia, Estonia, Germany, Hungary, Israel, Luxembourg, North Macedonia, Poland, Romania, Slovakia, Ireland, Liechtenstein, Italy, Switzerland |
Traditional | Traditional census: full field enumeration, whether in person or online; registers and administrative sources may be used to support the enumeration but not directly to obtain census data | Albania, Azerbaijan, Bosnia and Herzegovina, Bulgaria, Croatia, Cyprus, Georgia, Greece, Kazakhstan, Kyrgyzstan, Malta, Mexico, Monaco, Montenegro, Portugal, Republic of Moldova, Russian Federation, Serbia, United Kingdom, United States of America, Uzbekistan, Canada, Ukraine, Tajikistan, Turkmenistan |
Rolling | Rolling census: based on cumulative continuous sample survey covering whole country over an extended period | France |
Source: UNECE Censuses of the 2020 round |
Next: Contact Details
Learn about our data and confidentiality safeguards, and the steps we take to produce statistics that can be trusted by all.