Back to Blog

Data Security and Stewardship at Baza In The Age of AI

To understand how we think about data security at Baza, you need to understand the rapidly evolving world we are building in.

Three years ago, most of us were using LLMs to generate poems and rap lyrics for amusement. Today, a man in Australia used ChatGPT to design a custom vaccine for his dying dog, and AI agents can work autonomously for days to build full websites and apps. The pace of change has been staggering.

On the other hand, those of us who have been in the technology industry for a while know that any technology advancement that becomes mainstream needs to be taken with a great dose of skepticism (remember NFTs?). While LLMs bring a lot of opportunity for us to accomplish things that we could only dream of, they also come with a lot of risk both on a high level for the safety of our human civilization and on an individual level - the safe handling of our data and digital identities.

At Baza, we understand that there needs to be a delicate balance between taking advantage of LLMs where it makes sense in any new software we are building while staying firm on using the principles of good data engineering. Keeping up with the changes happening in the tech industry because of AI feels like drinking from a firehose, but we are committed to doing it so that Baza as a platform stands the test of time.

Why Data Security Is Non-Negotiable for Analytics Platforms Like Baza

Baza is an African-built, AI-native data intelligence platform for African financial institutions. Our target customers collectively manage financial data for over a billion people across the continent, and that number is growing. When we integrate with these institutions, we become part of that trust network. For that reason, a strong data security practice is table stakes and non-negotiable.

At Baza, we don’t think of security as a checklist or a certification to achieve, but a constantly evolving practice that is foundational to software and data engineering. It has to be since AI has exponentially increased the attack surface that bad actors can exploit, from software supply chain attacks to prompt injections. What follows is a look at the specific security practices we have in place at Baza and the thinking behind them.

Why Data Stewardship In Addition To Security?

Security protects what is inside our infrastructure but stewardship goes further; it means we care about what happens to your data even when it’s outside our direct control. Baza exists as a node in a connected graph of customers, software vendors, and cloud infrastructure providers. For instance, a supply chain attack on any part of that graph can ripple through all of us quickly. That is why we don’t just lock down our systems. We actively think about how data flows through the entire network and where the weak points are.

Security Practice 1: Multi-Tenancy Approach and Defense-in-Depth

This security practice is industry-standard with well-architected software services. It is not specific to Baza.

Defense-In-Depth is a philosophy that no single layer of security should be trusted alone.

So just because you entered a bank, for instance, doesn’t mean you now have access to the vault. Defense-in-depth for software is sophisticated and challenging to implement well and that is exactly why we do it at Baza.

Three-Layer Security Model

Software As A Service is inherently multi-tenant - you can think of it as a commercial building where you are renting an entire level or a specific commercial space next to other businesses. Getting access to the building does not grant you access to a specific level.

Layer 1: Database Isolation With Row-Level Security (RLS):

Row-Level Security (RLS) is a database-level mechanism that scopes every single query to the authenticated tenant - meaning even if you manage to log in successfully you need to pass our RLS check before you can access any sensitive data.

Layer 2: Application Isolation with workspaces and roles:

Every tenant also has application level isolations called workspaces. This is where for example you can separate finance team from engineering if they have different reasons to use Baza. We also provide three roles by default for each workspace: Admin, Editor, and Viewer. We stress-test this regularly to make sure that data never leaks across the boundaries that are set.

Layer 3: Fail-Secure Design:

When in doubt, we deny access. Any request that is missing a role, workspace, tenant identifier, or required metadata is rejected outright. We never fall back to a default role or grant partial access when context is incomplete.

Three-Layer Security Model

Security Practice 2: Handling Data Exchange with AI Models

This is where things get very specific with the Baza Platform. Building an agentic SaaS platform like Baza means there is a lot of data exchange that happens between us and LLM Providers like Anthropic, Google, and OpenAI. Here’s how we do it:

A Quick Note: AI Agents vs. AI Models

Before diving into the details, it helps to clarify two terms we use throughout this article:

  • AI models (LLMs) are the large language models provided by companies like Anthropic, Google, and OpenAI. They process text, reason about data, and generate responses. On their own, they have no access to your data.
  • Baza AI agents are software that we build at Baza that sit between you and these models. Our agents decide which analytics tools to use, which data to retrieve, and how to frame requests to the models. They apply your organization’s security policies and access controls before any data is sent externally.

What Happens Before We Make Requests to LLMs

PII and Sensitive Data Obfuscation Before Sending Requests to LLMs

We do not send raw customer data to LLMs. Our Agents have access to purpose-built analytics tools (functions designed for specific tasks like aggregation, trend analysis, and anomaly detection) to call when doing computations. We cannot avoid sending data to LLMs because that would defeat the purpose of using them, but what we send is summarized, aggregated and masked when necessary.

  • Sensitive fields (such as emails, phone numbers, national IDs, payment card numbers for example) are detected and masked before data leaves Baza’s infrastructure. Where masking isn’t enough, we redact, but we are intentional about where we apply redaction because over-redacting can make AI agents less useful. For example, if an agent needs to search your inbox by sender name and that name gets redacted, the search returns nothing. We protect what needs to be protected without breaking the task the agent is trying to perform.
  • We developed internal algorithms to provide region-aware sensitive data masking for African markets. We are actively extending coverage across the continent.
  • We go beyond simple pattern matching, drawing inspiration from established validation techniques.

What Happens When We Send Requests to LLMs

At Baza, we route all communication with AI providers through an AI Gateway. This single layer lets us switch between models, enforce policies, and manage authentication without rewriting application code.

An AI gateway is a centralized layer that sits between your application and multiple AI model providers. It handles routing, authentication, rate limiting, and policy enforcement so that your application code doesn’t have to manage each provider individually.

How Data Flows Between Baza and AI Providers

In addition, an AI Gateway also provides specific configurations that we use to protect the privacy of our customer data such as:

Zero Data Retention (ZDR)

When you use an LLM through its consumer interface (chat.openai.com, claude.ai, etc), your data may be used to train future models. Business API access is different. We only use AI models that provide Zero Data Retention via API: meaning they are contractually obligated to process-then-discard your data. It can never be used for model training.

US- and Europe-hosted Open-Source LLM Models

Proprietary models from Google, Anthropic, and OpenAI used to dominate the frontier. However, newer open-source models are catching up quickly and at a fraction of the price. We want to pass those cost savings to our customers while maintaining strong data protection. Because these models are open source, many are hosted by inference providers in the US and EU, jurisdictions with established data protection frameworks and regulatory oversight. Open-source models also open the door for Baza to run fine-tuned models on our own infrastructure, further reducing how much data ever leaves our systems. We are actively working on this and will share more information soon.

Security Practice 3: Handling Arbitrary Code and Generated SQL When Our Agent Custom Tools Are Not Enough

The reality is that most AI agents look great in demos but fall apart in the real world. Even the most frontier LLM as of March 2026 can only generate correct analytical SQL queries about 4 out of 10 times according to some industry benchmarks. In addition to that, there are real security risks with AI-generated SQL: injection attacks, prompt manipulation, and the possibility of destructive queries running against your database. Text-to-SQL (generating queries from natural language) is a powerful equalizer for data analytics, but we don’t pretend the risks aren’t there.

Our purpose-built tools cover 80% of use cases but if we can’t, we allow AI agents to generate custom code/SQL. So what exactly do we do to protect our customers’ data?

Code Sandbox

This is a new and emerging practice that is unlocking AI Agent abilities. The idea is that if you give an AI agent a safe and self-contained sandbox (outside of your application) it can generate code to perform a task without you having to write custom tools (the purpose-built analytics functions mentioned earlier). We still like our custom tools because Baza works on clearly defined scopes (unlike general programming for example), but since we can’t possibly cover every data use case, a sandbox is still necessary.

Multi-Layer Read-Only Enforcement For SQL

We only allow connections to databases that are read-only. This is intentionally limiting. We want to give our Agents the least permissions needed and will re-evaluate as this particular segment of AI generated code matures.

Query Engine Level

We enforce read-only at the query engine level as an additional gate so even if the Agent generated non-read-only code, it will fail here.

Validation Layer

We parse, inspect, and discard dangerous functions or write/modify operations before they ever reach the database. We also auto-inject on every query specific guardrails such as timeouts, LIMIT clauses, and tenant-, workspace-, and user- scoping rules. We also automatically prevent Bulk Data Exfiltration.

Audit Trail And Agent Memory

Every query execution is logged and reviewed post-use. We implement our agents in a way that they continuously get better at generating safe accurate queries and rejecting automatically any request that is suspicious and insecure.

Security Practice 4: Handling External Source Integrations

In addition to databases, we also support external integrations across cloud storage, email, and productivity platforms using a trusted and secure industry vendor. Baza never handles your external integration credentials and you have the ability via our external integration to give us as much or as little access as you desire.

For database credentials:

  • We encrypt credentials in transit and at rest using industry-standard cryptography with versioned key envelopes.
  • We do not store your database credentials in our application databases. We use dedicated secret management infrastructure.

For files in your data vault:

  • We use short-lived scoped access with time-limited signed URLs scoped to individual file objects and don’t allow persistent broad permissions.

Security Practice 5: Role-Based and Resource-Based Access Control

We practice a combination of role-based and resource-based access control.

  • Hierarchical Role-Based Access Control (RBAC): each tenant has a SuperAdmin role and each workspace has Viewer, Editor, and Admin roles that are precisely scoped to perform specific actions.
  • Resource-Based Access Control: Access for each resource (files, reports, dashboards, integrations, settings etc) is validated at the database level before being returned.

Security Practice 6: Observability and Monitoring

All the security practices above are only as good as our ability to know when something goes wrong. You can build the strongest walls, but if you can’t tell when someone is trying to climb them, they don’t matter much.

  • Security event tracking: we track tenant access blocks, workspace access denials, and anomalous patterns using structured event logging. We are constantly monitoring for unusual activity across the platform and adding new security improvements where it makes sense.
  • Automated Alerts: We set up automated alerts on security-relevant events so that our team can respond before our users notice anything is off.
  • Transparent Disclosure: If there is ever a breach, we are committed to transparent and timely disclosure. We will work with affected customers to understand what happened, what data was involved, and what we are doing to fix it.

When we find a security exposure risk, we prioritize fixing it over feature development.

Conclusion And Our Commitment Going Forward

With this article, we hope you gained some visibility into how the team at Baza thinks about security and how we implement some of the practices to stay at the cutting edge. We want to emphasize once again that security is a continuous practice and never one-and-done. We will continue to review our data handling as regulations evolve across African markets to match reality.

We strongly invite feedback and comments from both our customers and the data/security industry to see where we can improve and serve the entire ecosystem better going forward.

If you’re a financial institution looking for an analytics platform you can trust with your data, apply for early access or reach out directly at info@usebaza.com. We welcome the hard questions.