Data Linking Toolkit: All About Data Linking

What is Data Linking?

Data linking is combining data from two or more sources to create a new and richer dataset. For our purposes, the data from two (or more) data sets are assumed to be record-level (child, family, teacher/provider, etc.) and are matched on the record common to both data sets. Multiple data sets can be manually linked by finding and matching records in one data set with records from another data set. For example, data from two data sets may be manually linked for a specific, one-time query. However, increasingly data linking relies on technology to find and associate the matching records from both data sources. For example, an algorithm may be developed to link data for a routine process scheduled at periodic intervals (e.g., quarterly, annually). In some cases, data linking could be ongoing (e.g., nightly uploads). The latter is often considered data integration, a more complex and specific type of data linking.

What are the Benefits of Linking?

The primary benefit of linking data is developing a richer set of information to support program improvement efforts through analyses previously not possible. More data to answer more questions can inform and support program improvement and ultimately result in better programs and services for children and families. A second benefit of linking data is that it can support internal administrative efficiencies. For example, linking child outcome data by regions may suggest where specific technical assistance might best be provided. Another benefit of data linking is that there are no additional resources or efforts on the part of those providing services or entering data. The data already exists, in multiple locations, the administrative effort to link data results in no additional collection burden. In many cases a probable benefit of data linking is increased collaboration between two partners. Because data linking requires a minimum of two partners, often in different agencies, to define and agree upon the parameters of the partnership, a linking process, and the technology to join their data there is an increased opportunity for the two partners to work more collaboratively.

What are the Types of Data Linking?

Data linking ranges from relatively simple to complex. (See Table 1.) A simple example is a single program within one agency that desires to match records from two different data sets under their control. Staff are familiar with the two data systems and all the attributes (e.g., field definitions) of the data within the data sets. This requires little more than adherence to existing data governance policies and straightforward internal staff effort. Consider an example from the other end of the spectrum. Two or more programs located within different agencies desire to integrate their data and update each other’s data system in almost real time—full data integration. This would require complex agreements, updated cross agency data governance, alignment of data fields, possibly redefining terms to create alignment, extensive technical programming, updated security, permission changes, substantial time, and cross agency support at the highest levels. Often data linking for Part C and/Part B 619 programs falls between the two extreme examples above.

Table 1: Types of data linking

Single Program, Single Agency Multi-Program, Single Agency Multi-Program, Multi-Agency
Type Data Linking Data Linking Data Integration Data Linking Data Integration (Multi-Agency)
Example State agency links vendor provided child outcome data with child service delivery data. SEA connects student assessment data with high school graduation data. SEA includes IDEA Part B 619 data into Statewide Longitudinal Data System (SLDS). Two agencies link Part C data to Part B 619 to support transition notification. State agency develops Early Childhood Integrated Data System (ECIDS) to share data across Departments of: health, education, and, family and protective services.
Description Two or more datasets from a single program are connected using unique identifiers or through probabilistic matching. Element definitions across datasets have not been modified for consistency. Two or more datasets from multiple programs in one agency are connected using unique identifiers or probabilistic matching. Element definitions across datasets have not been modified for consistency. Multiple programs within one agency contribute data to a single data system or data warehouse. Element definitions across datasets have been modified for consistency. Federated Model: Two or more datasets from multiple programs in multiple agencies are connected using unique identifiers or probabilistic matching. Element definitions across datasets have not been modified for consistency. Centralized Model: Data from participating agencies are consolidated into one database or data warehouse. Element definitions across datasets have been modified for consistency.
Requires MOU No Yes YesYes Yes
One-Time Event Option Yes Yes Yes No No
Point in Time/ Longitudinal Both Both Longitudinal Only Both Longitudinal Only

What do program staff need to do to prepare?

All types of data linking, whether within a single program, from two different programs within the same agency, or from multiple programs located in multiple agencies. The pre-work is not associated with the actual matching and linking of records. Examples of pre-work include the following.
(etc)

Once the groundwork has been established and planned for the technology and methods used for matching and actual data linking are relatively straightforward.  

Data Linking Steps

Once partners have agreed, in principal, to move forward with data linking, the steps below will need to occur. These steps are associated with existing DaSy resources.

  • Formalize Data Linking Partnership
  • Complete Data Linking Technical Work
    • Activities and Resources to Support Data Linking Technical Assistance Providers
  • Link the Data
    • Activities and Resources to Support Data Linking Technical Assistance Providers
  • Sustain Data Linking
    • Activities and Resources to Support Data Linking Technical Assistance Providers