Back to Top

 Skip navigation

CSO Synthetic House Price Dataset

Introduction

The Central Statistics Office (CSO) has created a Synthetic House Price Dataset as a statistical training tool for educators in universities and elsewhere. This dataset contains about 170,000 synthetic house price records which will enable students to learn, in a practical way, about correlation and the statistical modelling of house prices.  The dataset does not contain any real house price records, thus ensuring that it meets the strict confidentiality requirements under the Statistics Act 1993 and privacy requirements under the General Data Protection Regulation (GDPR).

What is the difference between the Synthetic House Price Dataset and the dataset used to compile the Residential Property Price Index (RPPI)?

The RPPI is compiled from a dataset of residential property transactions, obtained by the CSO from eStamping, Building Energy Rating and other data sources. The raw data used in compiling the RPPI is strictly confidential.

The Synthetic House Price Dataset has been generated by applying several statistical masking techniques to create a dataset which is anonymous and materially different from the RPPI raw data but which preserves the aggregate correlations between the variables which affect house prices. The techniques used in constructing the synthetic data prevent the identification of individual transactions or dwellings. As the Synthetic House Price Dataset is materially different to the RPPI dataset, it cannot be used to replicate RPPI results.

How is the Synthetic House Price Dataset useful as a training tool?

While confidentiality is guaranteed, the Synthetic House Price Dataset retains the aggregate correlations and relationships between variables (e.g. selling price and floor area of a house) that are relevant for the statistical modelling of house prices. Educators in universities, national statistical institutes, international organisations and elsewhere will be able to use this dataset to teach students about the specific regression techniques used for modelling house prices.

How does the CSO ensure the confidentiality of data it collects?

All information provided to the CSO under the Statistics Act 1993 is treated as strictly confidential and is used only for statistical purposes. This is guaranteed by law under Sections 32 and 33 of the Act and is also guaranteed by the EU statistical legislation. Confidentiality is the cornerstone of all the work in the CSO and the generation of the Synthetic House Price Dataset meets these high standards.

How does an educator get access to the Synthetic House Price Dataset?

The CSO will provide the Synthetic House Price Dataset, upon application and at the discretion of the Director General of the CSO, to teaching staff of educational or statistical institutions who agree to certain conditions of use. These are as follows:

1)      The Synthetic House Price Dataset will only be used for training and educational purposes.

2)      The CSO is to be acknowledged as the source of the Synthetic House Price Dataset.

3)      In all coursework material, training workshops and related reports the dataset is referred to as the CSO Synthetic House Price Dataset, or simply as a synthetic dataset, not as Ireland’s Residential Property Price Index dataset.

4)      The CSO is not liable in any way for the usage of the Synthetic House Price Dataset. All analyses, inferences and interpretations of the data and all consequences thereof are entirely the responsibility of the data user.

5)      Neither the Synthetic House Price Dataset, full or partial copies of the dataset, nor any record-level derivatives of the dataset are to be made available to any third party. In making the dataset available to students, the following conditions apply:

  1. The dataset may only be used for training purposes.
  2. Students may not make copies of the dataset (except for course purposes).
  3. Students may not disclose the dataset to third parties.

6)      A register of educators who receive the Synthetic House Price Dataset will be maintained by the CSO. See note on data protection in relation to this register below.

Who do you contact to get access to the synthetic dataset?

Please email a request for access to the synthetic dataset to the Residential Property Section (rppi@cso.ie) of the CSO. The request should include information on the reason for access, organisation details (if any) and other relevant information. The CSO may request further information in response to a request. The granting or refusal of a request for access to the Synthetic House Price Dataset will be at the sole discretion of the Director General of the CSO.

Note on data protection under the General Data Protection Regulation (GDPR) – information for educators who receive the CSO Synthetic House Price Dataset:

The CSO will keep a register of individuals who receive the CSO Synthetic House Price Dataset. Full name, related organisation and date the synthetic dataset was made available will be recorded. The information will be kept indefinitely and is collected for the purpose of administering the provision of the Synthetic House Price Dataset, ensuring it is used in line with the specified conditions, and compiling a record of individuals in possession of the synthetic dataset. The CSO will also retain indefinitely correspondence (including letters, emails etc.) with persons requesting access to the Synthetic House Price Dataset and persons granted such access, for the same purposes. The information will not be used for any other purpose (but may be subject to disclosure under Freedom of Information legislation).