KEY LEARNINGS
  • AI behavior is determined by the statistical patterns found in training data, not by explicit programming rules.
  • The principle of 'Garbage In, Garbage Out' is amplified in AI, where data errors or biases create systemic failures at scale.
  • Data quality extends beyond accuracy to include completeness, consistency, timeliness, and representativeness.
  • Bias often enters systems through historical data that reflects past societal inequalities rather than malicious engineering.
  • Effective data governance requires documenting data provenance and composition using tools like Datasheets for Datasets.
  • Gebru, T., et al. (2021). Datasheets for Datasets. Communications of the ACM.
  • Buolamwini, J., & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.
  • NIST. (2023). AI Risk Management Framework (AI RMF 1.0).