Data Governance for AI Systems in KSA: A Legal Perspective

The patient data sat in the model's training set for eighteen months before anyone noticed.

A Riyadh hospital had deployed an AI system to predict patient readmissions—a smart use of machine learning to improve care and reduce costs. But somewhere between the data scientists and the compliance team, a critical question got lost: Did patients consent to having their medical histories fed into an algorithm that would learn their patterns forever?

By the time the oversight surfaced, the model had already processed 47,000 patient records. The hospital faced a choice most Saudi organizations will confront in the coming years: retroactive consent, model deletion, or something in between—and the regulatory framework to decide wasn't built for this moment.

This is the governance gap AI creates. Saudi Arabia's Personal Data Protection Law (PDPL) and National Data Management Office (NDMO) regulations were written for databases, not neural networks. But the kingdom's AI ambitions—Vision 2030's smart cities, NEOM's cognitive infrastructure, the healthcare sector's diagnostic algorithms—demand we think differently about data governance today.

The Regulatory Moment: Why Now

Saudi Arabia's data protection landscape transformed in September 2023 when the PDPL implementing regulations took effect. Organizations had one year to comply. That deadline passed in September 2024, yet many companies still treat AI systems as exempt from the rules governing traditional data processing.

They aren't.

The PDPL's definition of "processing" encompasses any operation performed on personal data—collection, storage, use, disclosure, and derivation. That last word matters. When an AI model learns from data and generates new inferences about individuals, it's deriving information. Under Saudi law, derived data is still personal data if it relates to an identified or identifiable individual.

The NDMO's Data Classification Policy adds another layer. It requires organizations to classify data by sensitivity level and apply appropriate controls. AI systems don't change this obligation—they complicate it. A model trained on internal data might produce outputs that deserve a higher classification than any single input.

Recent enforcement signals suggest regulators are watching. In late 2025, the NDMO issued guidance specifically addressing AI data practices, emphasizing that automated decision-making systems require the same consent foundations as human-driven processes. The message was clear: AI doesn't create regulatory shortcuts. It creates regulatory complexity.

The AI Data Paradox: Minimization vs. Intelligence

Traditional data protection rests on the principle of data minimization—collect only what you need, retain it only as long as necessary, delete it when you're done. It's a principle that makes intuitive sense for transactional systems. You don't need a customer's entire purchase history to process a single return.

But AI systems are hungry. They improve with more data, more variety, more historical depth. A fraud detection model needs years of transaction patterns to recognize subtle anomalies. A natural language processing system needs millions of sentences to understand Arabic dialects. Data minimization, applied rigidly, can make AI systems worse at their jobs—or render them impossible to build.

This tension creates a governance challenge unique to AI:

Training Data Accumulation. Unlike traditional databases where you query and move on, AI training sets persist. The data used to train a model in 2024 might still influence predictions in 2029. Retention policies designed for operational databases don't account for this permanence.

Inference vs. Collection. An AI system might collect seemingly innocuous data points—mouse movements, typing patterns, response times—and infer protected characteristics: age, disability, emotional state. The PDPL governs personal data, not the method by which it was obtained. Inferred data is still personal data.

Model Memory. Large language models and similar systems can memorize training examples. Researchers have demonstrated extracting sensitive information from models by crafting specific prompts. A governance framework that treats models as separate from training data misses this leakage risk.

Purpose Drift. An organization might collect customer service transcripts for quality assurance, then realize the same data could train a chatbot. New purpose, same data—but does the original consent cover the new use? The PDPL requires purpose limitation, and AI systems make purpose boundaries porous.

Governance Steps That Actually Work

The organizations that navigate this landscape successfully share a pattern: they build governance into AI development, not around it. Here's what that looks like in practice:

1. Map Data Flows Before Model Selection

Before choosing an AI architecture, map every data source the system will touch. Document:

What data exists and where it originates
What consent or legal basis supports each use
How long each data type should be retained
Who can access it at each stage (collection, preprocessing, training, inference)

This mapping often reveals surprises. A seemingly straightforward use case—"analyze customer feedback for sentiment"—might involve audio recordings, transcripts, metadata about when and where feedback was given, and linked account information. Each layer has different consent implications.

2. Implement Purpose Boundaries in Code

Policy documents say "data will only be used for X." Technical controls ensure that's actually true. This means:

Access controls that prevent training pipelines from pulling data beyond their scope
Audit logs that track which datasets feed which models
Automated alerts when a model attempts to access data outside its purpose

The NDMO's Data Governance Framework requires organizations to demonstrate compliance, not just claim it. Technical controls provide that demonstration.

3. Design for Deletion

The PDPL grants individuals the right to request deletion of their personal data. AI systems complicate this right. If a customer asks you to delete their data, you can remove it from your operational database. But what about the model trained on that data?

There's no perfect solution yet, but responsible approaches include:

Maintaining training data inventories so you know which models contain which individuals' data
Implementing machine unlearning techniques where feasible (an emerging field, but progressing)
Building model retraining schedules that allow periodic "forgetting" of old data
Documenting the deletion process honestly—telling individuals what you can and cannot remove

4. Conduct Algorithmic Impact Assessments

Before deploying an AI system that processes personal data, assess:

What decisions will the system influence or automate?
Could those decisions harm individuals if the system is biased or wrong?
What data does the system need, and is that data representative?
How will individuals know AI was involved in decisions about them?

Saudi regulators haven't formally mandated AI impact assessments, but the PDPL's accountability principle suggests they're coming. Organizations that adopt the practice now will be ahead.

5. Build Explainability Into Requirements

The PDPL doesn't explicitly require algorithmic explainability, but it does require that data subjects understand how their data is used. An AI system that makes unexplainable decisions about people creates compliance risk.

This doesn't mean every AI must be fully interpretable—some complex models resist explanation. But it does mean:

Documenting model logic and decision factors
Providing meaningful information to affected individuals
Building human review processes for high-stakes decisions
Avoiding "black box" systems where no one can articulate why a decision was made

The Governance Dividend

Organizations that treat AI data governance as a constraint—a box to check, a risk to mitigate—miss something important. Good governance is also good AI.

Models trained on well-governed data are more reliable. They generalize better because they're not memorizing data they shouldn't have. They're more maintainable because you know what went into them. They're more trustworthy because you can explain their behavior to stakeholders, regulators, and the people they affect.

The Saudi organizations that will thrive in the AI era aren't those that move fastest and ask for forgiveness. They're the ones that build governance into their systems from the first design document—treating compliance not as overhead but as infrastructure.

The patient data at that Riyadh hospital? The hospital chose a middle path: they retrained the model without the problematic data, implemented consent for future AI use, and documented everything. The model's predictions got slightly worse. The organization's risk posture got dramatically better.

That trade-off—slightly worse predictions, dramatically better governance—is one Saudi organizations will face repeatedly. The ones that make it consciously, with clear eyes about both sides, will be the ones that build AI systems worth trusting.

PeopleSafetyLab researches AI governance, safety, and compliance for organizations building in the Kingdom of Saudi Arabia. We help companies navigate the gap between AI ambition and regulatory reality.