AI Blog & Insights | Miguel Cabrita

Padlock and folders on a neutral surface, data governance and access control

The Guardian story about “rogue AI agents” is useful for one simple reason: it compresses into a single article a mistake that’s already entering ordinary businesses [1].

I’m not talking about strange labs or science fiction scenarios. I’m talking about teams building internal AI apps and connecting them to CRM, email, proposals, contracts, tickets, shared drives, and operational tools. The pattern’s always similar. The idea starts small, the prototype works, someone asks “can you connect it to everything else while you’re at it,” and suddenly the company has software touching real data without anyone having dealt with the boring part. There’s a name for this now: vibe coding. You describe what you want, the model writes it, it runs on the first try. The problem is that “works” means “does the thing”: it says nothing about what data it touches, what access it requests, or what happens when it connects to something real.

That boring part is what matters. The news calls it “rogue agents.” The actual pain is wide permissions, shared accounts, poorly thought-out connectors, badly managed logs, and personal data dropped into the pipeline because it was convenient.

There’s a lot of AI conversation that treats the problem as somewhere far off, in the models, the labs, the big names. For an SME, the problem is usually much closer. It’s in the internal app someone knocked together in a hurry because they wanted to save 20 minutes a day, and ended up creating a system with access to client, staff, or candidate data.

Where this starts going wrong

A company doesn’t need to build an “autonomous agent” to get into serious trouble. A few lazy choices in a row will do it.

The first one’s obvious. Someone pastes personal data into a tool because it’s faster than structuring the process. CVs, interview notes, client emails, tickets with names and contacts, sales notes, meeting summaries. Nothing breaks immediately. The risk gets in anyway.

The second is giving the app more access than it needs. A folder becomes the whole drive. A pipeline becomes the entire CRM. An assistant that was supposed to suggest text gets the ability to export, write, or send. In Irregular’s tests that fed the Guardian piece, the agents found paths because the paths were there, open [2].

The third is mixing instructions and content without much discipline. When an app reads documents, wiki pages, emails, tickets, or any content retrieved via search, the boundary between data and instructions gets a lot more fragile. OWASP already treats prompt injection as one of the central practical risks in LLM-based systems [5]. That stopped being academic curiosity a while back.

The fourth is spreading connectors everywhere and pretending this is still an innocent prototype. SharePoint, Google Drive, Slack, Microsoft 365, CRM, ERP, ticketing, internal APIs, MCP servers, third-party automations. Each connection opens another door. It also opens another question someone should have asked first: what data can leave through here, who approved this, and who revokes the access when the experiment ends.

The fifth is using a shared technical account and moving on as if auditability were a detail. When there’s an incident, the conversation degrades fast. Who ran the app? What data did it see? What output did it produce? Was there human review? If nobody can answer without guessing, you’re already working backwards.

This is where I see companies trip up. The tool still feels like a prototype, but the regulatory frame already shifted.

As soon as an internal AI app touches emails, CRM, documents, tickets, CVs, HR notes, identifiable outputs, or logs containing personal data, it’s already inside GDPR scope [3][7]. The technology might feel new. The questions are old.

What data is going in?

For what purpose?

Who has access?

How long does everything stay stored?

Which vendor is in the loop?

Is there a contract, administrative visibility, and enough control over retention and downstream use?

The temptation is to treat all of this as technical detail. That tends to be expensive. The EDPB reiterated in its AI opinion that these assessments still happen case by case, and that calling a poorly anonymised system “anonymous” doesn’t fix anything [3]. Worth keeping in mind too: usefulness isn’t a legal basis.

The CNIL hits the same point more operationally. Collecting lots of data because the tool can handle it is a choice with consequences. Keeping prompts and outputs forever because storage is cheap is the same kind of choice [7]. When a company builds an internal app with memory, logs, integrations, and real data, every convenience decision will be visible later.

Employee data deserves extra care. Whenever the app touches assessments, internal notes, metrics, or HR processes, the conversation gets more sensitive. I’ve seen too many people try to solve this with “consent.” It’s a lazy and weak answer.

And NIS2?

NIS2 enters the conversation, but not in the superficial way you already see around.

Not every company building an internal AI app is automatically in scope for NIS2. At the same time, many companies will feel real pressure through contracts, client requirements, tighter supply chains, and minimum controls that stop being optional.

In practice, NIS2 pulls the conversation toward topics these apps also hit quickly: incident management, continuity, supply chain, MFA, access control, evidence, response capability [8]. For some organisations that’ll be a direct obligation. For many others, it’ll be market-imposed discipline.

That was the central idea in my recent piece on NIS2 in Portugal. The topic rarely stays in the legal lane. It ends up in operations. That’s where it hurts.

What I’d do before connecting one of these apps to production

I like fast prototypes. I use them. They’re good for figuring out quickly whether there’s real value. The mistake happens when the prototype picks up connectors, memory, and permissions, and nobody stops to call it a system.

Before connecting something like this to production, I’d want 5 things sorted.

First, a use case with boundaries. Is the app going to summarise, search, suggest, approve, send, or what exactly?

Second, defined data classes. What can go in and what stays out?

Third, its own identity. Dedicated service account, minimum permissions, read/write separation, clean revocation.

Fourth, human review at the points where the cost of getting it wrong spikes: external sends, exports, public links, permission changes, financial approvals, sensitive operational actions.

Fifth, useful logs and limited retention. Enough to investigate. Not an eternal dump of prompts, outputs, and personal data.

None of this is glamorous. It shouldn’t be. Most serious internal AI work today has this texture: operational discipline, not demos.

The slow problem shows up after the demo

That’s what the Guardian piece helps remind you of. The damage rarely starts with one big failure moment. It starts with a sequence of small concessions that each seemed reasonable at the time.

“Give it access to this folder too.”

“Connect it to email so it’s more useful.”

“Use this shared technical account, we’ll sort it later.”

“Save everything for now, we’ll review.”

“It’s just internal.”

I’ve seen this story too many times, in different shapes, to treat it as a detail. When the app touches live systems and real data, it stopped being a weekend experiment.

If your company is already testing internal assistants, copilots, or AI automations connected to CRM, email, documents, or other operational tools, a short governance and implementation review is worth doing first. What data goes in, what access exists, which tools make sense, what stays sandboxed, and what controls need to be in place before rollout.

That’s exactly the kind of work I do in my services.

Fast AI prototypes, slow compliance problems: when internal apps become a GDPR risk

Where this starts going wrong

And NIS2?

What I’d do before connecting one of these apps to production

The slow problem shows up after the demo

Sources

Fast AI prototypes, slow compliance problems: when internal apps become GDPR risk

Where this starts going wrong

GDPR enters earlier than most people expect

And NIS2?

What I’d do before connecting one of these apps to production

The slow problem shows up after the demo

Sources