metadata solution for data lakes
Healthcare organizations are progressively becoming more data-focused. A critical step toward this strategy is to select a robust and extensible metadata management solution. Metadata is essential to understand data lineage and quality, and also to support data governance and improve data accessibility. Improved accessibility enables users to effectively select and utilize available data resources. Semedy’s solution is designed to support all data lake metadata requirements, including different types of data obtained from multiple sources and iterative cycles of integration and refinement. This solution allows an organization to:
Catalog, organize, and manage metadata describing multiple types of data resources – from large datasets to detailed data models and discrete data elements
Enable metadata exploration (queries) to confirm data availability, provenance, and level of detail, including access restrictions (data use agreements)
Enable automated metadata updates directly from data sources – capture current state and track changes (evolution)
Provide extensible metadata models to capture fine-grained details – drill down to discrete data elements with bindings to code systems and mappings to reference terminologies
Provide semantic tagging at multiple levels – essential to help find and categorize available data resources
Semedy’s Metadata management solution, implemented using our Clinical Knowledge Management System (CKMS), includes extensible metadata models, ETL pipelines for import and/or export, and configurable views, queries, and reports. During an initial implementation stage, simple descriptive metadata details are automatically imported from each available data resource. These initial metadata records can be progressively refined and linked to information models, which in turn are linked to data dictionaries, terminologies, and value sets. Proprietary models can be cross-referenced to open standards (e.g. FHIR, OMOP CDM, etc.). All types of metadata artifacts are curated and managed within CKMS, enabling the development of reusable phenotype and cohort definitions that can be tested and validated using synthetic test data.
Using our CKMS platform, the demonstration will showcase how metadata is represented, searched, visualized, and cross-referenced using examples created by Semedy and from commonly used sources. We will demonstrate how to search and review data sets and their corresponding models, how to identify the data use agreement associated with a given data set, and how to determine which data sets contain data elements of interest. We will also illustrate how other Semedy solutions (e.g. Information Models, Terminologies, Patient Cohorts, Synthetic health data, etc.) can be used to enable a comprehensive metadata solution for data lakes.