Skip to content

Clinical Data Management: Data Integration vs. Data Reconciliation

clinical data management

Clinical trials and research studies generate significant amounts of data to support study design end points

Data is complex and fascinating, originating from a variety of sources, including patients, sites, labs, wearables, and ePRO, just to name a few. This makes precision everything. This data collection includes receiving electronic external data, as well as utilizing Clinical Data Management (CDM) systems, such as an Electronic Data Capture (EDC) database, where key data points are entered by site research personnel from source documents and paper or electronic medical records.

The data collected can potentially pass through two critical processes, data integration or data reconciliation.

The terms sound similar, but they are not interchangeable. In fact, one of the top CDM questions we receive from Sponsors is, “What is the difference between data integration and data reconciliation? Aren’t they the same?”

In this article, we will outline data integration vs data reconciliation and explore why the distinction matters.1

What does electronic external data mean?

Electronic external data is defined as ‘electronic data’ that is collected outside of the EDC. Let’s start by looking at the types of data this includes:

  • Patient Reported Outcomes: Patient Reported Outcomes (PRO) can be captured on a paper source document and entered into the EDC by site research personnel, or the term may include electronic Patient Reported Outcomes (ePRO). With ePRO, the subject enters data (e.g., dosing compliance, diaries, questionnaires) directly into a device, such as a tablet or cell phone, avoiding the need for paper documentation.2
  • Central Laboratory Data: Safety, efficacy and exploratory measures depend on external data sources provided by specialty laboratories that analyze and assess important data points. This may include cardiac imaging data, radiology results, safety labs (e.g., chemistry, hematology, urinalysis, coagulation), pharmacokinetics (PK), pharmacodynamic (PD), genetics, or biomarkers, amongst others. There is a vast array of data points resulting data from these external data sources that are not entered into the EDC.3
  • Randomization Data: Interactive Response Technology (IRT) can be part of the EDC as a service provided by the EDC vendor or a separate third-party system that is not connected to the EDC.


What does data integration mean?

Data integration in CDM is funneling all sources of clinical research data into the EDC. The end purpose is to display results data onto the EDC screen.

The practice of CDM data integration requires EDC back-end programming, programming validation time and recurring maintenance of these data connections. The external data vendor also needs to be aware of this request as it will require the vendor’s technical expertise to support the EDC back-end programming by providing outgoing programming to connect the data systems using webservices or Application Program Interface (API).

This also requires programmatic manipulation of the raw, external data file to configure the external data file to fit the configuration requirements of the EDC system – and the process can be precarious. Any data manipulation could degrade the quality of the original raw, external data. Reconfiguring these files, even with validation, might introduce manual errors in the programming code which can affect the dataset.

What does data reconciliation mean?

Data reconciliation is ensuring that the electronic external data required per study protocol are collected and reviewed at the intervals/time points required.

CDM data reconciliation is a data review process that compares unique identifiers in the EDC data such as subject number, visit, nominal time point, collection dates and collection times with the same data points in the electronic external data source datasets. The data points to be reconciled are defined at the project level through discussions between the Sponsor, CRO and electronic external data vendor and documented in a data cleaning plan. Discrepancies between the EDC data and the external data source are identified by CDM, and those discrepancies are addressed by the external data vendor, Clinical Research Associate (CRA), or site. After data reconciliation discrepancies are communicated to the appropriate party (e.g., through site data queries, vendor communication, Sponsor teleconferences, etc.), the data are corrected to ensure both the EDC and electronic external data are reconciled and matching.

Some examples of discrepancies uncovered during data reconciliation include missing records, duplicate entries, incorrect formatting, broken relationships across data sources, inaccurate values, or empty fields.

Data transfer agreements (DTA) and Data transfer specifications (DTS), are developed between the external data vendor and the data recipient to ensure agreement and understanding:

  • Contact information (sender and recipient)
  • Granular details of data handling (e.g., addressing screen failures, patient numbering, unscheduled visits, canceled tests, comments, blinded data)
  • File type (.TXT (flat or ASCII format), .XLS(X) (MS-Excel), .SAS7BDAT (SAS dataset))
  • File structure (dataset attributes such as column names, column length, type (num/char), column labels, description (e.g., format of dates, times))
  • Frequency of transfers
  • Method of transfer (e.g., sFTP, email password-protected ZIP)
  • Code lists (e.g., visits, time points, test names, units)


How is data integration different from data reconciliation?

Data integration requires a data connection, which includes a technical mapping and programming effort to funnel data from an external data source into the EDC to display data points on the EDC screen. In contrast, data reconciliation refers to receiving and managing external data in its native format to clean and analyze.

One of the common misconceptions is that there is a need or requirement to integrate all external data sources directly into the EDC. While this is considered to be a nice-to-have, it does add more time to start up and should be thought of as optional as data can still be viewed in its native form or directly from the source. For the purposes of data analysis, Biometrics (Clinical Data Management and Biostatistics) can fully support handling multiple sources of datasets to perform data cleaning and statistical analysis. From the Sponsor and medical reviewer perspectives, reviewing aggregate clinical data and patient-specific data can be done using reports and tools outside of the EDC by utilizing programmed patient profiles or data visualization software (e.g., JReview).

Why does data integration vs data reconciliation matter?

Data integration and data reconciliation are both critical elements in a well-designed CDM plan, but they are also aspects that are heavily impacted by the CRO partner the Sponsor chooses to execute the protocol. Protect your endpoints by selecting a CRO that has the expertise and experience to make sure your final data set is as representative and accurate as possible.

Learn more about our indication-specific approach to end-to-end CDM coverage >



1. For the purposes of this article, the discussion of data integration and data reconciliation will not include EMR, ePRO/Randomization capabilities built into the EDC. This article also does not address any non-clinical subject data related processes such as EDC to outside system data pushes to support grants/site payments, project tracking such as CTMS (Clinical Trial Management System).

2. ePRO collection can be a part of the EDC as a service provided by the EDC vendor or a separate third-party system that is not dynamically connected to the EDC.

3. As a side note, for safety labs, this does not include local labs where the results are entered by the research site into the EDC from local laboratory result reports.