AI Training Data Transparency

In compliance with California Assembly Bill 2013 (AB 2013), Scienza Health provides the following disclosures regarding the data used to train our artificial intelligence systems.

Training Data Disclosures

The following nine categories of information describe the data used to develop and train Scienza Health's AI-powered healthcare solutions, including digitalhumanOS™ and Gia™ workflows.

1. Data Sources & Ownership

De-identified clinical event data from partner long-term care facilities; voice biomarker datasets through strategic partner; publicly available cognitive assessment benchmarks. All data sources are properly licensed or owned by Scienza Health or its partners.

2. Dataset Purpose & Relevance

Training machine learning models for cognitive screening (0.89 AUC Lancet-validated methodology); voice biomarker pattern recognition for early cognitive decline detection; healthcare workflow automation for intake, follow-up, and clinical documentation.

3. Number of Data Points

Our AI models are trained on 27 billion clinical records from 12.3 million patients across 14,613 facilities, representing diverse populations across memory care, skilled nursing, assisted living, and primary care settings.

4. Copyright, Trademark & Patent Status

All training data is properly licensed from healthcare data partners or owned by Scienza Health. Proprietary algorithms and methodologies are patented where applicable. digitalhumanOS™ and Gia™ are registered trademarks.

5. Data Acquisition Method

Licensed from healthcare data partners under data use agreements; collected through IRB-approved research studies with informed consent; aggregated from consented clinical deployments with appropriate data governance protocols.

6. Personal Information Status

All data is de-identified in compliance with HIPAA Safe Harbor standards. No directly identifiable patient information is included in training datasets. Personal health information (PHI) is never used in model training without proper de-identification.

7. Synthetic Data Usage

Synthetic data augmentation is used for edge cases and underrepresented scenarios following healthcare machine learning best practices. Synthetic data generation methodology follows established protocols to ensure clinical validity.

8. Collection Timeframe

Data collection period: 2010-2025. First use in AI model training: 2024. Training datasets are updated quarterly to incorporate new validated clinical data while maintaining data quality standards.

9. Data Cleaning Methodology

HIPAA de-identification protocols applied to all data; outlier detection and removal using statistical methods; quality validation against established clinical standards; demographic representation balancing to reduce bias.

Additional Information

Governance Framework

All AI systems operate within our 5-layer governance framework, ensuring human oversight, full audit trails, and no autonomous clinical decisions. Learn more on our Governance page.

Data Subject Rights

For information about your privacy rights and how we handle personal data, please review our Privacy Policy.

Contact Us

For questions about our AI training data practices, please contact us at support@scienzahealth.com or call 1-888-816-1534.

Last Updated: January 2026
Effective Date: January 1, 2026
This page is published in compliance with California Assembly Bill 2013 (AB 2013).