Overview of Data Resources
The HCSRN’s expertise in working with complex clinical and claims data is unparalleled in breadth and depth.
The centerpiece of the HCSRN data infrastructure is our Virtual Data Warehouse (VDW). The VDW is a Common Data Model that facilitates multi-site research while protecting the privacy and security of patient data, as well as proprietary health practice information. Originally developed in 2001 by the Cancer Research Network, the VDW now supports studies of cancer, drug safety, cardiovascular disease, mental health, and the organization and delivery of health care services. HCSRN organizations agree on data to make available for research and derive standard definitions and formats – but rather than creating a centralized database, each HCSRN member organization maintains control of its own data via a “distributed” or “federated” model.
The HCSRN’s VDW has served as the blueprint for numerous other common data models, including those used by the FDA’s Sentinel Initiative, PCORI’s National Patient-Centered Clinical Research Network (PCORnet), and Kaiser Permanente’s Center for Effectiveness and Safety Research (CESR).
The VDW is a cornerstone of HCSRN collaborative research, protecting privacy and fostering standardization.
Key Attributes of the VDW
Centralized Governance and Management
- HCSRN Board: Provides overall policy and direction setting about content, resources and access.
- VDW Operations Committee: Coordinates VDW development activities across the HCSRN sites, supports data area work group leads, provides technical input and coordinates communications and meetings.
- VDW Data Area Workgroups: Define, maintain, and interpret data file specifications, propose new variables, identify site-specific issues with data standards, and provide scientific input for each data area.
- VDW Implementation Group: Site data managers and other analyst/programmers who extract data from local systems, convert it to standard VDW structures, ratify the data specifications, and share best practices.
- We use published data standards where available (e.g., ICD10, CPT, LOINC, SNOMED) and create our own when necessary.
Hardware and Software
- Each site needs hardware and software (mainly SAS®) to store, retrieve, process and manage datasets. VDW files are separate from health plan files. Many sites maintain a research data warehouse that is distinct from, but compatible with the health system’s enterprise data warehouse.
- Sites contribute rigorous data documentation (e.g., data provenance and any local variations) to a password-protected Web portal. The key VDW Specifications (variable name, length, format, etc.) are published on our VDW Data Model page.
- Periodic checks look at ranges, cross-field agreement, implausible data patterns, and cross-site comparisons. Quality control outputs are reviewed by VDW personnel at least annually.
- Each institution’s VDW data remain at its site until a study-specific need arises. The minimum necessary required data are extracted after ethical, contractual and HIPAA requirements are met.
Learn More about the VDW Data Domains and Specifications on the VDW Data Model page.
Insights and Notable Challenges
Since underlying data are collected for treatment, payment, and operations, not for research, the data requires careful curation and preparation for use in research projects.
Source data may vary substantially within and across sites.
Health plans frequently change their information systems, often requiring adaptation or re-implementation at sites.
Sharing data beyond project collaborators may be complicated for technical, regulatory, and political reasons.
Maintaining a data infrastructure requires resources. Project-specific grant funding does not support the level of cross-site and cross-project maintenance and knowledge sharing that is needed.
- Moreover, it takes time to gain concurrence for new variables or data areas, develop clear specifications to guide implementers and end-users, and implement the new variables across all sites.
Benefits of a Common Data Model
The VDW is an example of a distributed (or federated) data-sharing model based on EMR, claims, and administrative health care data. It is applicable for multi-site health services and population health research and can also support pragmatic and traditional clinical trials. With planning and ongoing funding, it yields data across institutions and over time.
Benefits for multi-site research include:
Improved data efficiency, accuracy, and completeness
Analytical precision plus patient and institutional protections
More generalizable results
- Data that has been standardized across all HCSRN sites