Back to Top

 Skip navigation

Methodology

Methodology

CSO statistical release, , 11am
A CSO Frontier Series Output

This publication is categorised as a CSO Frontier Series Output. Particular care must be taken when interpreting the statistics in this release as it may use new methods which are under development and/or data sources which may be incomplete, for example new administrative data sources.

Cohort Definitions

The analysis in this research paper uses as its primary data sources the HEA database on annual higher education enrolments, the SOLAS Programme Learner Support System (PLSS) database which contains details on further education activity for each calendar year, and the SUSI database containing information on financial supports for the purpose of education. The term 'student' is used to describe an individual in higher education and 'learner' describes an individual who is undertaking further education.

Pseudonymisation 

Before using personal administrative data for statistical purposes, the CSO removes all identifying personal information. This includes name, address, the Personal Public Service Number (PPSN), a unique number used by people in Ireland to access social welfare benefits, personal taxation and other public services and the Eircode, a unique geographical code identifying the location of every dwelling in the state. A pseudonymised Protected Identifier Key (PIK) is created by the CSO when the PPSN is removed (CSOPPSN). This PIK is unique and non-identifiable and is only used by the CSO. A similar PIK is created when the Eircodes are removed from an individual's records (EircodePIK). 

Using these PIKs enables the CSO to link and analyse data for statistical purposes, while protecting the security and confidentiality of the individual data. All records in the matched datasets are pseudonymised and the results are in the form of statistical aggregates which do not identify any individuals. 

Honours and General Degree New Entrants

New Entrants are defined as full time undergraduate first year students entering higher education study for the first-time. A small number of new entrant records have a missing or invalid CSOPPSN, and therefore cannot be matched to other administrative data sources. After excluding records with missing or invalid CSOPPSN, the remaining new entrants for academic years 2012 - 2017 are pooled together to ensure large enough numbers for each outcome category. Data was available on student graduations up to and including the calendar year 2022 meaning that all students have at least a 5 year window to complete their degree course. Of the students counted as not completing their degree, a small number may be on course and complete in later years. The final cohort has only one record per individual which is convenient for data matching and analysis.

Post Leaving Certificate Learners

Since 2016, PLC activity is recorded on the PLSS database, the centralised system hosted by SOLAS that collects learner data from most SOLAS funded further education courses. For each calendar year, the records of those who finish a PLC course are submitted to the PLSS database. 

Learners often finish more than one PLC course, but for the purposes of this analysis only one record per individual was included. In cases where an individual finished more than one course per year, the single record was selected according to a hierarchy based on the outcome reported in the PLSS. The following order of preference from highest to lowest was applied: Full completer, Partial completer, Early leaver. In cases where a learner finished more than one course across the years 2017-2021 then the earliest instance was kept.

A number of learner records have a missing or invalid CSOPPSN, and therefore could not be matched to other administrative data sources. After excluding records with missing or invalid CSOPPSN, the remaining records for 2017 - 2021 were pooled together to ensure large enough numbers for each outcome category.

Dependency for students and learners who did not receive SUSI support

The analysis in this research paper required the classification of students aged 23 years and over who did not receive SUSI support in terms of their estimated SUSI dependency status. 

How does SUSI determine dependency status?

The SUSI dependency status (applicant class) determines which incomes are included in the calculation for grant assessment purposes. For those classified as independent, their own income and the income of their spouse or civil partner are included. For those classified as dependent, the calculation is based on their own income plus the income of their parent(s) or legal guardian(s). SUSI use the following rules to determine dependence status:

  • If the student or learner is under 23 on 1 January of the year of their first point of entry into further or higher education, they are automatically classed as dependent.
  • If the student or learner is 23 years or over on 1 January of the year of their first point of entry into further or higher education, they may be classed as either mature dependent or mature independent.
  • To be classed as mature independent, the student or learner must have been living independently from their parent(s) or legal guardian(s) from the October in the year prior to their first point of entry / re-entry into further or higher education and must provide evidence for this.
  • In the absence of documents to confirm independent residency, they can only apply as mature dependent.

How to estimate dependency for those who did not receive SUSI support?

The following process was implemented to estimate the likely dependency status of those not in receipt of SUSI support:

  • The CSOPPSN of higher education cohorts aged 23 years and over who received SUSI support were linked with the Census of Population 2016 Analysis dataset. This dataset contains information about the relationships between the students and the person who completed the Census form. This household information was used to estimate if the students were independent or living with their parent(s) or legal guardian(s) on the Census night.
  • An AI model was developed and trained using the most significant characteristics that predict whether an individual aged 23 years and over who received support was classed as either mature independent or mature dependent by SUSI.
  • The model considers two factors, the age of the student or learner, and whether they were noted as a son or daughter of the person who completed the Census form. The older an individual the more likely they are to be classed independent, and those noted as a son or daughter are more likely to be classed dependent.
  • The model reduces the overall proportion of students or learners who are misclassified while also ensuring balance in misclassification for both the mature independent and mature dependent categories separately.
  • The model was applied to those aged 23 years and over who did not receive SUSI support and was used to predict whether they were likely dependent or independent.
  • It should be noted that statistical tests suggest misclassification is as high as 24% for either category. Therefore, caution should be used when interpreting analysis which classifies those who did not receive SUSI support as either mature independent or mature dependent.

Adjacency for students and learners who did not receive SUSI support

The analysis in this research paper required the classification of students who did not receive SUSI support in terms of their estimated adjacency status.

How does SUSI determine adjacency status?

SUSI pays different rates of support based on the adjacency status of each eligible applicant. Those classified as non-adjacent receive higher grant rates, and the following rules are used to determine adjacency status:

  • Adjacency refers to the distance between the student's normal residence and the college. It is determined by measuring the distance of the shortest most direct route using a web service that provides detailed information about geographical regions and sites.
  • For dependent / mature dependent students, the normal residence is the permanent or home address of their parent(s) or legal guardian(s).
  • For mature independent students, the normal residence is their permanent or home address.
  • For the cohorts examined in this paper, the adjacent rate was paid when a student or learner's college is less than 45km from their normal residence. The higher non-adjacent rate was paid when the student's college was 45km or more from their normal residence.
  • For academic years 2022/2023 onwards, the adjacency cut off distance was reduced to 30km from 45km.

How to estimate adjacency for those who did not receive SUSI support?

The following process was implemented to estimate the likely adjacency status of those not in receipt of SUSI support:

  • The CSOPPSN of higher education student cohorts were linked with the Census of Population 2016 Analysis dataset. This dataset also contains Eircodes for the Census night address locations.
  • Eircodes were not introduced until 2015, and while EircodePIK coverage in administrative data has greatly improved in recent years, coverage is sparse for higher education datasets in 2016 and 2017. Therefore, EircodePIK from the Census of Population 2016 was used to estimate Adjacency for those new entrants who did not receive SUSI support in academic years 2012-2017.
  • Using the EircodePIK from the Census of Population 2016 Analysis dataset, the Small Area location for each student who did not receive SUSI support was estimated.
  • Using publicly available college Eircodes, the Haversine (or great circle) distance between the college and the student’s estimated Small Area was calculated.
  • The Haversine distance is calculated in the CSO to ensure strict confidentiality. However, it is systemically lower than the SUSI method which also incorporates the road network and uses third party software. Therefore, the Haversine distance is adjusted upwards using a linear transformation so that bigger adjustments are made for larger distances.
  • Those students with an adjusted distance of less than 45km are classified as adjacent, and those with a distance 45km or more are said to be non-adjacent. 
  • An EircodePIK could not be found for approximately 10% of the those that did not receive SUSI support, and these individuals are classified as having unknown adjacency.
  • It should also be noted that statistical tests suggest misclassification is as high as 7% for those assigned adjacency with this method. Therefore, caution should be used when interpreting analysis which classifies those who did not receive SUSI support as either adjacent or non-adjacent.