Vectara
Back to blog
Agentic

Introducing Sub-agents

Smarter multi-agent, modular workflow

10-minute read timeIntroducing Sub-agents

Built-in Delegation for AI Agents

When agents face complex, multi-step tasks, they often run into context window limits or need specialized capabilities. Sub-agents provide built-in delegation that lets agents spawn other agents to handle specific subtasks. This primarily solves context management where large tasks can be broken down and processed in separate contexts while also enabling specialization and modular agent architectures.

What Makes Sub-agents Different?

Think about code review. You could build a single monolithic agent that tries to handle everything from security analysis to documentation checks to performance optimization. But the central agent that tries to handle all of this could run into context issues. It would be confused between all of the different instructions it needs to follow, and each sub task could use too much context. For example, the security policies could consume thousands of tokens that the main agent would need to ignore for most of its tasks.

Sub-agents let the model spend more tokens on sub tasks while improving both quality and latency. Your main agent delegates to specialized sub-agents, each configured for its specific domain:

  • The security analysis agent is configured with code analysis tools and security-focused instructions.
  • The documentation agent has tools for referencing API docs and technical writing guidelines.

Each operates independently but returns results to the parent agent, producing better results than the main agent could while preserving the main agent's context. Multiple sub-agents can run in parallel, reducing overall execution time.

Key Benefits:

This separation offers several distinct advantages:

Context isolation: Sub-agents maintain their own conversation history separate from the parent agent. Context management is critical for great agent performance, and allows an agent to focus on only the most relevant tokens. This prevents context pollution where unrelated information from multiple tasks bleeds together and confuses the model.

Specialized configuration: Each sub-agent can have distinct instructions, tool configurations, and behavior patterns tuned for its task. Your research agent might have broad web search access, while your data processing agent focuses on corpus queries and structured indexing.

Reusability: Once you've built a solid code review sub-agent, any parent agent can invoke it. You don't copy-paste instructions and tool configurations across multiple agent definitions.

Composability and testing: You can test and benchmark each sub-agent independently for its specific task. This makes it easier to iterate on individual components, measure performance improvements, and roll out changes without affecting the entire system.

Parallel execution: The parent agent can invoke multiple sub-agents in parallel, reducing end-to-end execution time. Instead of sequentially running security analysis, then documentation checks, then performance review, all three can run simultaneously.

Resource management: Sessions can be ephemeral (created fresh each time), persistent (reused across calls), or LLM-controlled (the model decides when to create new sessions versus resuming existing ones).

Sub-agents Architecture

The architecture has three core pieces: the parent agent, the sub-agent tool, and the sub-agent itself. The parent agent has tools it can use—search tools, calculation tools, and sub-agent tools. A sub-agent tool is simply another tool in this toolkit. Instead of calling a function or API, the sub-agent tool invokes another agent. That invoked agent is the sub-agent, and it can be any agent that exists in the Vectara platform. The diagram below illustrates this architecture:

Configuring the Parent Agent with a Sub-Agent

You configure a sub-agent tool just like any other tool in the agent's tool_configurations (see the full agent configuration structure for all available options):

The sub_agent_configuration is required and specifies which agent to invoke and how to manage sessions. This configuration is user-defined only and the LLM does not see or modify these settings. Any agent that you've already created in the Vectara platform can act as a sub-agent. There's no special configuration needed to make an agent "sub-agent compatible." The parent agent never directly manipulates the sub-agent's configuration or state. It just says "review this code" and gets back results.

Session Modes

To give users explicit control over session behavior, sub-agents support three session modes that control how conversation state persists.

persistent mode always reuses the same session for a given sub-agent tool configuration in which the sub-agent was invoked. The first invocation creates a session, and all subsequent calls resume it. This works well when you want the sub-agent to accumulate knowledge across multiple invocations—for example, a research assistant that builds understanding across multiple queries from the parent agent.

ephemeral mode creates a fresh session for every invocation. Each call starts with a clean slate, guaranteeing no state leakage between requests. Use this when you want completely independent sub-agent executions.

llm_controlled mode (default) lets the language model decide whether the sub-agent should be invoked in persistent or ephemeral mode. The LLM can choose to resume an existing session or create a new one as needed. This gives the model flexibility to maintain context when beneficial while starting fresh when appropriate.

The examples below show exactly how each mode works across multiple sub-agent tool calls for the same sub-agent in a parent agent session:

Invoking a Sub-Agent

Making the Call

The parent agent invokes a sub-agent by making a tool call just like any other tool. When the parent agent invokes the sub-agent tool, it provides:

The agent_key identifies which agent to invoke as the sub-agent (this comes from the tool configuration rather than the LLM).

The message parameter contains the detailed task instructions. This is where the parent agent tells the sub-agent exactly what work needs to be done.

The optional session_tti_minutes (time-to-idle) controls session expiration. This is the duration of inactivity before the sub-agent’s session is automatically deleted, measured from the last event in the session. Set it based on how long you expect the parent agent conversation to continue. A quick task might use five minutes; a multi-hour research session might use three hours.

Sharing Artifacts

Sub-agents can access artifacts from the parent agent's session workspace (see our artifacts blog post for more details on how artifacts work). The agent can specify which artifacts to share when invoking a sub-agent:

The sub-agent gets access to these artifacts in its own session workspace. It can read them, convert them (like PDF to markdown), or create derivative artifacts. All artifact operations happen in the sub-agent's workspace, keeping state properly isolated.

The purpose field matters. It tells the sub-agent why you're sharing this particular artifact, helping it understand context without requiring the parent agent to repeat information.

Security note: Sub-agents can only access artifacts that exist in the parent agent's workspace. Attempting to share artifacts the parent doesn't own throws an error.

What The Sub-Agent Tool Call Returns

The sub-agent processes the message with its own instructions and tools, and the following is returned by the tool call:

The parent agent receives both the response text and a session key. In LLM-controlled mode, the model can use this session key to resume the sub-agent session in subsequent calls.

The sub_agent_response contains only the sub-agent's final output message. Even if the sub-agent made multiple tool calls, went through extended thinking, or generated intermediate responses during its execution, the parent agent only sees the end result. This filtering is intentional as it prevents overwhelming the parent agent with implementation details.

When configuring sub-agents, ensure their instructions emphasize producing complete, self-contained final responses since that's all the parent agent will receive. For example, a security review sub-agent's instructions might include: "After completing your analysis, provide a summary that includes: (1) the number of issues found by severity, (2) a brief description of each critical issue, and (3) recommended next steps. The parent agent will only see your final response, so make it actionable and complete."

Artifact Sharing

An artifact is a piece of content (e.g., an uploaded file) that is stored in an agent session's workspace (see our artifacts blog post for details). The parent agent can specify which artifacts to share when invoking a sub-agent and the sub-agents can then access these artifacts from the parent agent’s session workspace.

The sub-agent can read the artifacts, convert them (like PDF to markdown), or create derivative artifacts. All artifact operations happen in the sub-agent's workspace instead of the parent agent, keeping the state isolated.

The purpose field matters. It tells the sub-agent why you're sharing this particular artifact, helping it understand context without requiring the parent agent to repeat information.

Security note: The parent agent can only share artifacts that are present in its own session’s workspace. Attempting to share artifacts the parent doesn't own throws an error.

Session Visibility and Security

Sub-agent sessions created by a parent agent belong to that parent. Other agents cannot access them, and neither can external callers. This isolation prevents information leakage between agent interactions.

When you look at a sub-agent's session list, you'll see sessions created by various parent agents, but each parent can only access its own sub-agent sessions. The system enforces this boundary automatically.

If a parent agent tries to specify a session key that it didn't create (in LLM-controlled mode), the system rejects the call. This prevents session hijacking attacks where one agent tries to access another's conversation state.

Advanced: Composing Multi-Level Agent Workflows

Sub-agents can invoke other sub-agents, enabling complex multi-level workflows. This composition pattern lets you build sophisticated systems from simple, focused agents.

Example: Code Review Orchestrator

Consider a comprehensive code review agent that delegates to multiple specialized sub-agents, where some sub-agents themselves use other sub-agents:

The security_analysis_agent itself uses sub-agents for specialized checks:

How It Works

  1. User submits a pull request to the Code Review Orchestrator
  2. Orchestrator uses web_search to look up best practices for the programming language
  3. Orchestrator invokes security_reviewer sub-agent with the code changes
  4. Security Analysis Agent invokes its own sub-agents:
    • dependency_scanner checks for vulnerable dependencies
    • Runs static_analysis tool for code patterns
    • compliance_checker verifies regulatory requirements
  5. Security Analysis Agent synthesizes results and returns to Orchestrator
  6. Orchestrator invokes documentation_checker sub-agent
  7. Orchestrator combines all results into a comprehensive review

This multi-level composition happens transparently. The Orchestrator doesn't need to know that the Security Analysis Agent uses its own sub-agents—it just receives the final security report. Each level operates independently with its own context, tools, and session management.

Conclusion

Sub-agents turn agent delegation from a custom implementation challenge into a built-in capability. Instead of building coordination logic, managing separate API calls, and handling state synchronization yourself, you configure sub-agent tools and let the system handle the mechanics.

The parent agent focuses on orchestration while sub-agents focus on specialized/focused tasks The Vectara platform manages sessions, artifacts, and security boundaries. This separation lets you build agent systems that are both modular and maintainable—complex behavior emerges from composing simple, well-defined agents rather than building monolithic prompts that try to do everything.

Whether you're splitting a code review agent into security and performance specialists, building multi-stage research workflows, or creating agents that delegate to domain experts, sub-agents provide the foundation for scalable agent architecture.

Before you go...

Connect with
our Community!