The HCSRN’s expertise in working with complex clinical and claims data is unparalleled in breadth and depth.  

The centerpiece of the HCSRN data infrastructure is our Virtual Data Warehouse (VDW). The VDW is a Common Data Model that facilitates multi-site research while protecting the privacy and security of patient data, as well as proprietary health practice information. Originally developed in 2001 by the Cancer Research Network, the VDW now supports studies of cancer, drug safety, cardiovascular disease, mental health, and the organization and delivery of health care services. HCSRN organizations agree on data to make available for research and derive standard definitions and formats – but rather than creating a centralized database, each HCSRN member organization maintains control of its own data via a “distributed” or “federated” model.

The HCSRN’s VDW has served as the blueprint for numerous other common data models, including those used by the FDA’s Sentinel Initiative, PCORI’s National Patient-Centered Clinical Research Network (PCORnet), and Kaiser Permanente’s Center for Effectiveness and Safety Research (CESR).

The VDW is a cornerstone of HCSRN collaborative research, protecting privacy and fostering standardization.



Centralized Governance and Management

  • HCSRN Board: Provides overall policy and direction setting about content, resources, and access.
  • VDW Operations Committee: Manages development activities across the HCSRN, and provides technical input.
  • VDW Data Area Workgroups: Define, maintain, and interpret data file specifications, propose new variables, identify site-specific issues with data standards, and provide scientific input for each data area.
  • VDW Implementation Group: Site data managers and others who extract data from local systems, convert it to standard VDW structures, ratify the data specifications, and share best practices.


  • We use published data standards where available (e.g., ICD10, CPT, LOINC, SNOMED) and create our own when necessary.

Hardware and Software

  • Each site needs hardware and software (mainly SAS®) to store, retrieve, process, and manage datasets. VDW files are separate from health plan files. Many sites maintain a research data warehouse that is distinct from, but compatible with the health system’s enterprise data warehouse.


  • Sites contribute rigorous data documentation (e.g., data provenance and any local variations) to a password-protected Web portal. The key VDW Specifications (variable name, length, format, etc.) are published on our website.

Quality Control

  • Periodic checks look at ranges, cross-field agreement, implausible data patterns, and cross-site comparisons. Quality control outputs are reviewed by VDW personnel at least annually.

Data Availiability

  • Each institution’s VDW data remain at its site until a study-specific need arises. The minimum necessary required data are extracted after ethical, contractual and HIPAA requirements are met.

Learn More about the VDW Data Domains and Specifications on the VDW Home Page



  • Since underlying data are collected for treatment, payment, and operations, not for research, the data requires careful curation and preparation for use in research projects.

  • Source data may vary substantially within and across sites.

  • Health plans frequently change their information systems, often requiring adaptation or re-implementation at sites.

  • Sharing data beyond project collaborators may be complicated for technical, regulatory, and political reasons.

  • Maintaining a data infrastructure requires resources. Project-specific grant funding does not support the level of cross-site and cross-project maintenance and knowledge sharing that is needed.

  • Moreover, it takes time to gain concurrence for new variables or data areas, develop clear specifications to guide implementers and end-users, and implement the new variables across all sites.  


The VDW is an example of a distributed (or federated) data-sharing model based on EMR, claims, and administrative health care data. It is applicable for multi-site health services and population health research and can also support pragmatic and traditional clinical trials. With planning and ongoing funding, it yields data across institutions and over time. Benefits for multi-site research include:

  • Improved data efficiency, accuracy, and completeness

  • Analytical precision plus patient and institutional protections

  • More generalizable results