AI Training Data Transparency
In compliance with California Assembly Bill 2013 (AB 2013), Scienza Health provides the following disclosures regarding the data used to train our artificial intelligence systems.
Training Data Disclosures
The following nine categories of information describe the data used to develop and train Scienza Health's AI-powered healthcare solutions, including digitalhumanOS™ and Gia™ workflows.
1. Data Sources & Ownership
De-identified clinical event data from partner long-term care facilities; voice biomarker datasets through strategic partner; publicly available cognitive assessment benchmarks. All data sources are properly licensed or owned by Scienza Health or its partners.
2. Dataset Purpose & Relevance
Training machine learning models for cognitive screening (0.89 AUC Lancet-validated methodology); voice biomarker pattern recognition for early cognitive decline detection; healthcare workflow automation for intake, follow-up, and clinical documentation.
3. Number of Data Points
Our AI models are trained on 11M+ clinical events and 2M patient records, representing diverse populations across memory care, skilled nursing, assisted living, and primary care settings.
4. Copyright, Trademark & Patent Status
All training data is properly licensed from healthcare data partners or owned by Scienza Health. Proprietary algorithms and methodologies are patented where applicable. digitalhumanOS™ and Gia™ are registered trademarks.
5. Data Acquisition Method
Licensed from healthcare data partners under data use agreements; collected through IRB-approved research studies with informed consent; aggregated from consented clinical deployments with appropriate data governance protocols.
6. Personal Information Status
All data is de-identified in compliance with HIPAA Safe Harbor standards. No directly identifiable patient information is included in training datasets. Personal health information (PHI) is never used in model training without proper de-identification.
7. Synthetic Data Usage
Synthetic data augmentation is used for edge cases and underrepresented scenarios following healthcare machine learning best practices. Synthetic data generation methodology follows established protocols to ensure clinical validity.
8. Collection Timeframe
Data collection period: 2010-2025. First use in AI model training: 2024. Training datasets are updated quarterly to incorporate new validated clinical data while maintaining data quality standards.
9. Data Cleaning Methodology
HIPAA de-identification protocols applied to all data; outlier detection and removal using statistical methods; quality validation against established clinical standards; demographic representation balancing to reduce bias.
Additional Information
Governance Framework
All AI systems operate within our 5-layer governance framework, ensuring human oversight, full audit trails, and no autonomous clinical decisions. Learn more on our Governance page.
Data Subject Rights
For information about your privacy rights and how we handle personal data, please review our Privacy Policy.
Contact Us
For questions about our AI training data practices, please contact us at support@scienzahealth.com or call 1-888-816-1534.
Last Updated: January 2026
Effective Date: January 1, 2026
This page is published in compliance with California Assembly Bill 2013 (AB 2013).
