AI Training Data Transparency
Scienza Health, Inc. is committed to transparency in how our clinical screening models are developed, trained, and validated. This disclosure is provided in accordance with California Assembly Bill 2013 (AB 2013), which requires providers of generative AI systems to disclose information about training data.
What We Build
Scienza Health develops digitalhumanOS™, a clinical screening platform, and GIA®, a Digital Human® that screens patients for 46 cognitive, behavioral, and neurological conditions through natural conversation. Our platform uses Voice AI, Computer Vision, and Speech Biomarker analysis to identify clinical risk factors.
Training Data Sources
Our clinical screening models are trained and validated using the following data categories:
- De-identified clinical records — sourced from licensed data partnerships with electronic health record platforms covering post-acute and long-term care settings. All data is de-identified in accordance with HIPAA Safe Harbor or Expert Determination methods before use in model training.
- Peer-reviewed clinical research — published studies on speech biomarkers, cognitive assessment, and neurological screening from institutions including Beth Israel Deaconess Medical Center, NIH, and MIT. See our clinical research page for the full citation list.
- Validated clinical assessment instruments — established screening tools including MoCA, MMSE, BIMS, PHQ-9, GAD-7, and AIMS, used as reference standards for model calibration and accuracy benchmarking.
- Proprietary clinical interaction data — data generated through clinical deployments with informed patient consent, used to refine screening accuracy and expand condition coverage.
Data Scale
Data Governance
All training data is subject to the following governance controls:
- HIPAA compliance — all patient data is de-identified before use in model training. No protected health information (PHI) is used in training datasets.
- Bias monitoring — models are evaluated across demographic groups including age, sex, race, ethnicity, and primary language to identify and mitigate performance disparities. Quarterly audits are conducted.
- Human-in-the-loop — all screening results require clinician review before any clinical action is taken. Our models assist clinicians; they do not replace clinical judgment.
- FDA device establishment registration — our platform is registered with the FDA, subject to regulatory requirements for clinical software.
- AES-256 encryption — all data at rest and in transit is encrypted. Access is controlled through role-based permissions with audit logging.
Model Accuracy and Validation
Our screening models have been validated against peer-reviewed clinical benchmarks. Published accuracy figures include:
| Condition | Accuracy |
|---|---|
| Depression | 81.6% |
| PTSD | 80.0% |
| Anxiety | 77.5% |
| Parkinson’s Disease | AUC 0.97 |
| Cognitive Decline | 70.8% |
Source: Peer-reviewed clinical validation studies. See full research.
What Our Models Do Not Do
- They do not diagnose. They screen and flag risk for clinician review.
- They do not make treatment decisions.
- They do not operate autonomously without clinician oversight.
- They do not use personally identifiable patient information in training.
Contact
For questions about our training data practices, model governance, or this disclosure, contact us at support@scienzahealth.com with the subject line "AI Transparency Inquiry."
Frequently Asked Questions
Does GIA comply with California AB 2013 AI transparency requirements?
California Assembly Bill 2013 requires providers of generative AI systems to disclose information about training data sources, data governance, and model capabilities. GIA® by Scienza Health publishes a complete AB 2013 disclosure covering all training data categories, clinical validation methodology, accuracy metrics by condition, and explicit statements of what the screening models do not do. The disclosure is updated with each model version and available at scienzahealth.com/ai-transparency.
Does GIA comply with California AB 3030 healthcare AI disclosure requirements?
California Assembly Bill 3030 requires healthcare facilities using AI-generated communications to include clear disclosure that the content was generated or assisted by artificial intelligence. GIA® by Scienza Health includes AB 3030-compliant disclosures on all screening outputs — structured medical notes, biomarker results, and session summaries all carry explicit AI-generation labeling before they reach the clinician for review and submission to the EHR.
What AI training data does Scienza Health use for clinical screening models?
The voice biomarker and speech analysis models powering GIA® by Scienza Health are trained on 12.3 million patient records and 27 billion clinical data points. Training data is sourced from de-identified clinical datasets — no personally identifiable patient information is used in model training. All training data sources, governance practices, and validation methodology are disclosed per California AB 2013 at scienzahealth.com/ai-transparency.
How does Scienza Health disclose AI-generated healthcare communications?
Every screening output generated by GIA® by Scienza Health — structured medical notes, biomarker results, CPT code assignments, and session summaries — carries explicit disclosure that the content was generated by artificial intelligence, in compliance with California AB 3030. The clinician reviews all AI-generated outputs before submission to the EHR. No AI-generated clinical content enters the permanent medical record without human authorization and review.
This disclosure was last updated in April 2026. Scienza Health, Inc. · Newport Beach, California. See AI clinical screening in California.