KEY LEARNINGS
- AI behavior is determined by the statistical patterns found in training data, not by explicit programming rules.
- The principle of 'Garbage In, Garbage Out' is amplified in AI, where data errors or biases create systemic failures at scale.
- Data quality extends beyond accuracy to include completeness, consistency, timeliness, and representativeness.
- Bias often enters systems through historical data that reflects past societal inequalities rather than malicious engineering.
- Effective data governance requires documenting data provenance and composition using tools like Datasheets for Datasets.
- 📄Datasheets for Datasets (Gebru et al.)Foundational paper on dataset documentation.
- 📄NIST SP 1270: Managing Bias in AINIST standards for identifying and managing AI bias.
- 🌐Google PAIR: Data Collection GuidePractical guidance on data preparation for AI.
- Gebru, T., et al. (2021). Datasheets for Datasets. Communications of the ACM.
- Buolamwini, J., & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.
- NIST. (2023). AI Risk Management Framework (AI RMF 1.0).





