Data linkage

What is data linkage?


Data linkage brings together multiple sources of data that relate to the same person. When a person undertakes different courses at different educational institutions - say a course at TAFE or an undergraduate degree at university - information on their participation is collected.

Linking this information with current LSAY survey responses has the potential to transform the value of LSAY to both participants of the survey and the research and policy community that use the data by:

  • improving the quality of the data by providing more detailed, accurate and objective data about educational participation and attainment
  • increasing the richness and depth of information by linking to data that might be outside the scope of LSAY
  • providing an opportunity to remove some of the more detailed questions from the survey and asking more engaging, attitudinal items. This helps reduce the effort required to complete the survey and enhances the survey experience for participants.

How to access Linked Data


LSAY records for the Y15 cohort have now been linked to the following data sources:

  • ACARA My School data
  • National Assessment Program — Literacy and Numeracy (NAPLAN)
  • Senior secondary administrative data
  • National VET Provider Collection
  • Higher Education Statistics Collection.

Access to the linked data is restricted and available via a formal request and registration process managed by the Australian Data Archive (ADA). See our How to access LSAY data page for more information.

For more information 

  • See the 'Data linkage' section of the LSAY Y15 user guide for detailed information about the linkage methodologies used for each data source, consent rates for each linkage, and references to additional resources.
  • See the 'Data linkage' worksheet located in the LSAY variable listing and metadata for a listing of the variables contained in the linked datasets.

How are data linked?


Participants are asked for their consent to link their LSAY records to other educational data sets and only those who consent will have their data linked. Participants can withdraw their consent at any time, however any data that have already been linked will be retained and continue to be available to data users.

LSAY data is linked via deterministic linking which compares an identifier or a group of identifiers across databases; a link is made when these identifiers match. Types of identifiers used to match LSAY data and other administrative data include contact details (name, address, date of birth), school information (name, suburb, postcode) or a participant’s unique student identifier (USI).

Privacy and data linkage

Personal information is handled in the strictest confidence in accordance with the Australian Privacy Principles.  Respondent contact details are only ever held by Wallis (the LSAY fieldwork contractor) or the agencies authorised to do the linkage and are stored on secure servers located within Australia.

All LSAY linkage projects are conducted using the separation principle which ensures the separation of personal identifying information (e.g. names, addresses, and unique identifiers) from administrative and survey data. A linkage key is created to link the datasets so at no point are survey data and identifying information contained on the same file. For more information on the linkage process specific to each project see the fact sheets above.

Anonymised linked data are stored on secure servers by NCVER and the Australian Data Archive (ADA) where they are made available to researchers. Data use is restricted to research purposes only and cannot be used for commercial or financial gain.  When researchers use the data, information is always grouped together to ensure no individual can be identified. See the our privacy notice for more information.

Overview of the separation principle for LSAY data linkage