What’s the Difference Between Data Sharing, Data Linking and Data Integration?

Group sharing and linking data

Part Three in our Data Linking Blog Series

Authors: Bruce Bull and Denise Mauzy
Contributor: Margo Smith

Data, data, everywhere. And how do we get that data to answer our questions? Data from other sources can sometimes help. You can share, link, and/or integrate data from other sources with your Part C and Part B 619 data to answer critical questions1 that cannot otherwise be answered. The terms data sharing, data linking, and data integration are sometimes used inaccurately and interchangeably. In all cases, sharing, linking, or integrating data starts with a desire to use data from other sources to better inform your Part C or Part B 619 program.

Data Sharing

Data Sharing Aggregate Example

A state audiology organization requests county totals from the Part B 619 program of the number of children who received services for hearing impairments over a 12-month period.

Data Sharing Record-Level Example

The Part C program requests monthly hearing screening results of children who failed an initial screening from the state Early Hearing Detection and Intervention (EHDI) program to confirm that all children with possible hearing loss have been referred to Part C in a timely manner.

The simplest of the three concepts, data sharing is defined as providing partners with access to information they can’t access in their own data systems. In most cases data sharing requires little more than running a few simple queries on existing data and formatting the results to meet the request. Data are shared in aggregate or at record-level.

Sharing aggregate data is providing summary information. Aggregate data generally provides a high-level overview of a program. Aggregate data often consists of counts, totals, and/or percentages. Often these summary data are broken out by smaller meaningful subgroups (e.g., counties, race/ethnicity).

Sharing record-level data is providing information about individual records (e.g., person, program). Record-level data may or may not be deidentified. Deidentified data has all personally identifiable information (PII) removed. When PII is shared, a data sharing agreement (DSA) is recommended and often required. The DSA governs all aspect of the shared data (use, analysis, dissemination, destruction, etc.).

Data Linking

Data Linking Example (internal program)

The Part B 619 program desires to link records in their statewide child-based data system with preschooler outcome data that exists in a separate vendor-supported data system.

Data Linking Example (cross agency)

The state Part C program and Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) work together to link records to learn the percentage of Part C children and families that also receive WIC benefits and services

Data linking is connecting information about a record for an entity (e.g., child, service provider, service) in one data source with information related to that same record in another data source. It connects two or more sources of information about the same record. Data linking requires some technical skill and considerably more time and effort than data sharing. If the data sources are from two different programs linking also requires collaboration between data stewards and/or subject matter experts associated with the sources. If the data to be linked are from two or more agencies, these agencies should formalize their data partnership through a DSA and the implementation of a data partnership management plan before data are linked.

Technical skill is needed to match records from different data sources so that desired record level information can be linked (“joined”). Finding the same record in each data set requires developing matching criteria. When linking data, the source data sets are preserved and a new data set is created. The resulting new data set is used for analysis.

Data Integration

Data Integration Example

A state’s Department of Education desires to investigate the long-term education outcomes of children who received early intervention services. The Department of Education and the state lead agency for Part C develop a partnership to integrate selected Part C PII and outcome data into Education’s data system. Over time, the partners will be able to access a longitudinal data set to investigate educational outcomes of groups of children who received and who did not receive IDEA Part C, Part B 619, and/or school age services.

The most complex of the three data concepts is data integration. Similar to data linking, data integration requires record matching. However, instead of creating a new external data set with linked data, integrating data usually results in merging (integrating) data from the original data sets into a new more comprehensive data set. Both technical and business processes are used to integrate the multiple data sources. Technical processes are developed to extract data from at least one data set, transform (modify) the data to fit the destination data set, then load (populate) the destination data set according to established rules. Extract, transform, and load (ETL) requires a substantial amount of automated technical processes. In the end, there is at least one updated data set with record-level data from multiple sources.

Data integration across programs or agencies requires a comprehensive and ongoing DSA and data partnership management plan. These then support the detailed steps of integration, the responsibilities of those handling the data, and the governance of the integrated data.

Ready for More?

Whether sharing, linking, or integrating data, more meaning can come from careful additions of data. Part C and Part B 619 data can inform other programs/agencies, and other program/agency data can inform Part C and Part B 619. The understanding gained through sharing, linking, and integrating data can support Part C and Part B 619 program leaders to make better decisions that more effectively improve programs and improve outcomes for children and families.

Reach out to your DaSy State Liaison to discuss opportunities to share, link, or integrate data. DaSy’s Data Linking Toolkit is available to support Part C and Part B 619 staff with all aspects of data linking.

Resources

Read Part Four of our Data Linking Blog Series: A Data Linking Success Story in North Carolina

About the Authors

Bruce Bull

Bruce Bull is a DaSy TA provider. He has worked directly in the IDEA data world since 1996 as a state Part B and Part C data manager, developer of data collection systems, and as a TA provider with six OSEP-funded TA Centers.

 

Denise Mauzy

Denise Mauzy is a DaSy TA provider. She designs and delivers technical assistance on early childhood, child and family services, data governance and management, and data systems. She has worked extensively with education and human service program staff on the development of data governance and management policies and procedures, including data linking.

Photo: Margo Smith

Margo Smith is a DaSy consultant providing communications and other support for DaSy TA products. She has a background in journalism, data visualization, and data use in early childhood care and education TA.

 

Published April 2022.


1Many of the DaSy Critical Questions for Part C and Part B 619 require multiple sources of information.