Trusted AI: the important role of guardian agents
AI agents have the power to transform enterprise workflows—but without proper safeguards, they can easily go off course. In this blog, we explore the growing need for Guardian Agents like Vectara’s Hallucination Correction Agent
5-minute read time
One should absolutely be excited about a future where Agents will help elevate workflows, speed and convenience of business, as well as partnerships to new levels of productivity and collaboration. We can see the beginnings of transformative impact in various industries, but the journey is not as straight a path as some might think. There is gold to be won here, but the track will be paved with ditches and pitfalls, if not done right.
Is it perhaps more fair to say: one should be cautiously optimistic about the path to agentic transformation? As it will be a journey and not an overnight accomplishment. We will have to embrace it, absolutely no doubt, to be relevant in the future, but a key strategy would be to embrace it in the right way and with the right partners, and do it step by step.
So what are some of the pitfalls? Let’s first tackle the matter of the limitations depending on context and lack of common sense. One fact that remains is that AI will still not understand our intentions. AI will process what you tell it in natural language and will mimic human-like communication in doing so. AI will also do exactly what you tell it to do. But here is the thing, AI will operate within some non-human-like constraints. It will only act accordingly to what data you have provided it in training. It will operate on what is most likely in the data set of the internet or what you have provided it yourself. AI does not have any other context or understanding about the world, unless you provide it. AI will indeed do exactly what you tell it to do under these conditions - but it may come out very wrong if you are not careful and specific enough. The output will be a mathematical optimum with no human EQ or common sense involved.
Let’s use an example: you have termites infesting your home. You want to get rid of them. You want to hire a service that removes the termites and find the lowest cost provider. You assign an AI Agent system to accomplish this task, on your behalf. The Agent system finds a service that can eliminate termites and that is very cheap. The Agent system books the service for you. Two days later you come home to a fully burnt down house and a $50 bill. The service included burning up your house. It did however complete the task you gave it: it took care of the termites, and it was cheap. The goals were clear and the mission achieved, under the context and constraints given to the Agent system. Yet it went horribly wrong. The Agent did not understand the intent and did not have enough context.
Sometimes agents do the right thing, even in subliminal circumstances, but there is no guarantee that it won't do something completely different if you don’t enhance or constrain the context. And sometimes it even accomplishes the right goal, but for the wrong reasons. There is a famous list called Victoria Krakovna’s specification gaming list that has mostly been referred to in AI training context. Awareness of this type of shortcoming (or shortcutting?) will become 10x more important in the context of agentic.
The discussion of Agentic Guardrails is increasing in volume and importance as the Agentic wave takes form. Given the dynamic nature of these future agentic workflows, rules-based guardrails is an approach from the past. Rules are fragile and easily circumvented in this new world. To make Agentic real in the enterprise and allow Agents to perform transactions on behalf of humans or departments in the future, it is very clear to me there will need to be a whole span of “helper modules”. What are helper modules? I’m not sure they have been given a category name yet. Some refer to them as critics, but that is only one sub-category. Others are producing reasoning models - another way to “help” the accuracy for sure. But why not also add EQ models? Validation models? Sanity-checking models? Common sense models? Call it what you want, but these Guardian Agents must be part of your productionization plan if you indeed are missioned on a journey to make Agentic really happen to its full extent (and not just human assisted) and thereby help transform your business.
Vectara is on a mission to enable Trusted AI in the enterprise. We offer an Enterprise RAG platform with the highest accuracy, fine-grained access control, data leakage protection, transparency, and auditability. Our RAG platform is optimized in every step of the RAG pipeline to prevent and minimize hallucinations. But RAG is just one part of the Trusted AI journey. We're advancing to the next phase: making agentic transformation safe for enterprises. To start unlocking Agentic potential in the enterprise, we're excited to announce the first of our Guardian Agents: Vectara’s Hallucination Correction Agent (VHC), which automatically and without human assistance corrects hallucinations.