Skip to main content

Differentially private synthetic linked 2011 Census and mortality

Contents

Details

Details

Summary

Synthetic Census 2011 linked with death data, both created through statistical modelling to make it non sensitive.

Description

Synthetic data, created by the Office for National Statistics Data Science Campus by statistically modelling the original data (Census 2011 linked with Mortality data) and then using those models to generate new data values that reproduces the original data’s statistical properties. Any biases already in the real data may propagate though to this synthetic version. created using a differentially private algorithm for categorical data synthesis. This algorithm seeks to preserve the most important statistical properties within the data while protecting the confidentiality of the data contributors according to a mathematical definition of privacy. The privacy budget, epsilon = 1.0. This will be used in the IDS to facilitate analysis and innovation whilst maintaining the principle of data minimisation. It will also facilitate data access within the IDS, while preventing disclosure of confidential respondent information. This dataset is also known by Synthetic Linked Census 2011 Mortality. Data available only to provider approved projects.

Documentation:

Details of any additional information regarding this dataset (opens in a new tab)

About this data

Data creator
Office for National Statistics
Temporal coverage
01 January 2011 to 31 December 2020
Frequency
Historical
Dataset theme
Health
Restrictions for access
Access for all accredited researchers
Project approval
Projects must be accredited and have approval from the data owner
Search keywords
Synthetic data Census 2011 Mortality Linked MSOA

Metadata

Metadata

Dataset themeMain category for the topic of the resource
Health
Dataset resource typeThe type of the dataset resource
Statistical Output - Experimental
Geographic coverageThe geographic area covered by the dataset
England and Wales
Temporal coverageThe timeframes covered by these data
01 January 2011 to 31 December 2020
FrequencyThe frequency at which the dataset resource is published or updated
Historical
Geographic levelThe lowest level of geography covered by the dataset
Middle layer Super Output Area
Data creatorThe name of the organisation that produced or published this resource
Office for National Statistics
Data contributorsThe name(s) of any organisation(s), other than the data supplier or provider, that have data which are included in this dataset resource
Licensing statusThe license used for making this resource available and defining how it can be used
Restricted
Disclosure controlThe standard or bespoke disclosure control rules that apply to this dataset
Standard disclosure rules apply
Restrictions for accessRestrictions for users and researchers accessing data in Google Cloud Platform
Access for all accredited researchers
Research outputsThe approval route researchers must take for their outputs to be published
Research outputs must be approved by the data owner
Project approvalThe project approvals needed for this resource
Projects must be accredited and have approval from the data owner
Research disclaimerThe disclaimer required to be published with the outputs for this dataset
A disclaimer must be published with research outputs
AcronymAny other names or acronyms used to refer to the data
ProvenanceDetails of how this dataset resource came to be generated