Data Linking Toolkit: Potential Risks of Data Linking

While there are numerous benefits to data linking, there are also limitations, potential risks, and costs. Below DaSy shares some common risks below to support proactive planning efforts. (Adapted from Introduction to Data Sharing & Integration, AISP). Considering limitations and risks is part of data stewardship and helps Part C or Part B 619 program staff be mindful of issues and plan mitigation strategies. This toolkit includes selected resources that facilitate discussions of some common limitations and risks to help inform planning and mitigation efforts. Although DaSy lists a number of risks below, many are small or non-existent. For example, Part C and Part B 619 data are usually available and of high quality. Part C and Part B 619 staff can minimize disclosure and misinterpretation risks by implementing strong data governance policies. In addition, strong stakeholder engagement can help with mitigating structural racism.

  • Unavailable data: Important information may not be routinely captured in records or may not be in a format that lends itself to efficient handling. For example, case notes may be inconsistently captured or contained within scanned documents associated with child records.
  • Poor data quality: Part C and Part B 619 data are collected by local entities across the state and entered on an ongoing basis by many staff. Local staff have different roles and varying levels of expertise and training (e.g., understanding of Part C or Part B 619 requirements, data entry, data cleaning). Also, in many states, different data systems are used across local programs. This results in data systems designed with different standards, especially regarding error-checking logic. Collectively, these differences may result in underlying issues with data quality, such as entry errors, missing data, incomplete data, inaccurate data, or untimely data. Therefore, when linking data, program staff should assess whether the data are of sufficient quality for subject matter experts to have confidence in the results of the analyses. (If Part C or Part B 619 data quality is a limitation, contact DaSy for technical assistance to discuss methods to increase data quality.)
  • Misinterpretation: Like most data, Part C and Part B 619 data can potentially be misinterpreted without proper context and understanding of Individuals with Disabilities Education Act (IDEA) programs and services and collection cycles. For example, data misinterpretations can result from inappropriate analytics (e.g., using mean or standard deviation with ordinal or nominal data) or incorrect variable assumptions (e.g., does PK stand for primary key or prekindergarten?). Incorrect assumptions can also occur when predictive tools are used inappropriately (e.g., multiple regression on child outcome data from a small subset of the expected population). In such cases, although the statistical process may be fine, the lack of appropriate context or quantity would generate results that should not be interpreted as accurate or acceptable. Therefore, it is important to have subject matter experts work with staff to analyze any linked data and review all analyses and resulting documents.
  • Unauthorized and unintended disclosure of personally identifiable information: Whenever Part C and Part B 619 data are prepared for linking, there is a risk of improper handling and unauthorized access. This can happen by accident or through a security breach. Either would lead to privacy disclosure. With appropriate safeguards in place, such instances are rare—but they are a potential risk. Similarly, there is also a risk of unintended disclosure if results from the data linking are reported in a way that allows for the identification of individuals (e.g., insufficient standard for minimum cell size).
  • Replicating structural racism: Considering administrative data to be race-neutral can lead to system-level data use that unintentionally replicates structural racism. Data partners should determine how information will be used or perceived and should consider an analysis of current racial disparities around the related issue or topic. (For a more nuanced discussion of balancing risk vs. benefit, see AISP’s Toolkit for Centering Racial Equity Throughout Data Integration.)
  • Harming individuals: Certain uses of Part C and Part B 619 data may carry a risk of harm to individuals. For example, some educators may be unintentionally biased if they know students previously received IDEA services. To mitigate this risk, all relevant stakeholders should carefully evaluate the potential ways that linked Part C and Part B 619 data could be used or misused. The benefit to individuals, communities, and society at large must outweigh the risks when linking data.
  • Costs: Technical and staff costs to develop, implement, and maintain data linking can vary significantly depending on the complexity and frequency of data linking activities. Linking data from two data sets within the same Part C or Part B 619 program is relatively quick and inexpensive. Within a single program, no approvals are needed, and the data manager or another skilled person already assigned to support the program can usually use available tools (e.g., Microsoft Excel). On the other hand, linking data from two programs in two separate agencies that have no current data sharing agreement requires much more time and costs. In this case, both data partners must engage legal staff to develop their sharing agreement. Also, technical staff from both programs need time to conduct technical linking activities and secure and prepare the data before linking. Sometimes, data linking activities require more complex tools that are unavailable to Part C or Part B 619 program staff.

Published July 2022.