Returning images and files from the tools enables real agentic feedback loops on completely new modalities.For example, instead of dumping all the data into an agent, and hoping for the best, you can generate a visualization or analyze PDF reports, and allow the agent to provide insights based on that output. Just like a real data analyst.This saves your context window and unlocks autonomous agentic workflows for a lot of new use cases:
Agents can check websites autonomously and iterate until all elements are properly positioned, enabling them to tackle complex projects without manual screenshot feedback.
Brand Asset Generation
Provide brand guidelines, logos, and messaging, then let agents iterate on image and video generation (including Sora 2) until outputs fully match your expectations.
Screen-Aware Assistance
Build agents that help visually impaired individuals navigate websites or create customer support agents that see the user’s current webpage for better assistance.
Data Analytics
Generate visual graphs and analyze PDF reports, then let agents provide insights based on these outputs without overloading the context window.
from agency_swarm import BaseTool, ToolOutputFileContentfrom agency_swarm.tools.utils import tool_output_file_from_path, tool_output_file_from_urlfrom pydantic import Fieldclass FetchReferenceReport(BaseTool): """Return a reference PDF hosted remotely.""" source_url: str = Field( default="https://raw.githubusercontent.com/VRSEN/agency-swarm/main/examples/data/sample_report.pdf", description="Remote file to share", ) def run(self) -> ToolOutputFileContent: return ToolOutputFileContent(file_url=self.source_url)class FetchLocalReport(BaseTool): """Return a report stored on disk.""" path: str = Field(default="examples/data/sample_report.pdf", description="Local file path") def run(self) -> ToolOutputFileContent: return tool_output_file_from_path(self.path)class FetchRemoteReport(BaseTool): """Return a remote file using the helper.""" archive_url: str = Field(default="https://example.com/document.pdf", description="File to expose") def run(self) -> ToolOutputFileContent: return tool_output_file_from_url(self.archive_url)
When you choose file_data, include filename to hint a download name; URL-based outputs rely on the remote server metadata instead.
tool_output_file_from_path only supports PDF files.
Need to load local files without custom logic? Use the built-in LoadFileAttachment tool instead of creating a custom tool. It handles both images and PDFs and uses these same utility functions under the hood.
from agency_swarm import Agent, BaseTool, ToolOutputImagefrom pydantic import Fieldimport base64import matplotlib.pyplot as pltimport ioclass GenerateChartTool(BaseTool): """Generate a bar chart from data.""" data: list[float] = Field(..., description="Data points for the chart") labels: list[str] = Field(..., description="Labels for each data point") def run(self) -> ToolOutputImage: """Generate and return the chart as a base64-encoded image.""" # Create the chart fig, ax = plt.subplots() ax.bar(self.labels, self.data) # Convert to base64 buf = io.BytesIO() plt.savefig(buf, format='png') buf.seek(0) image_base64 = base64.b64encode(buf.read()).decode('utf-8') plt.close() # Return in multimodal format return ToolOutputImage(image_url=f"data:image/png;base64,{image_base64}")# Create an agent with the toolagent = Agent( name="DataViz", instructions="You generate charts and visualizations for data analysis.", tools=[GenerateChartTool])
function_tool decorators and BaseTool classes both support multimodal outputs in the exact same way.