
Replicate
Need image generation, text completion, audio transcription, or video processing? Your AI agent taps into Replicate's massive model catalog to create predictions, search for specialized models, manage deployments, and retrieve results, bringing the power of thousands of AI models into every conversation.




Your AI agent orchestrates Replicate's model infrastructure, running predictions, browsing model catalogs, managing deployments, and retrieving outputs without touching a terminal.
Replicate
See how product teams, developers, and creatives use AI agents to run machine learning models, generate content, and manage AI infrastructure through natural conversation.
A content writer needs a hero image for a blog post about sustainable energy. They describe the scene to the AI Agent, which creates a prediction using a Stable Diffusion model on Replicate with the text prompt. The generated image returns in 15 seconds. The writer gets a custom illustration without submitting a design request, and the blog publishes on schedule.
A product manager exploring speech-to-text capabilities asks the agent to find the best transcription models. The AI Agent searches Replicate's catalog for 'whisper' and 'speech recognition,' returns the top models with usage stats and descriptions, and even runs a test prediction on the most promising option. The PM evaluates models in minutes instead of days of research.
A machine learning engineer has training data ready and wants to fine-tune SDXL with brand-specific imagery. They tell the AI Agent the base model, training image URL, and destination. The agent creates the training job on Replicate and reports back with the job ID. The engineer monitors progress through follow-up messages without SSH-ing into any infrastructure.

Replicate
FAQs
The agent uses Replicate's Models Predictions Create endpoint, specifying the model owner, model name, and input parameters as a JSON object. It can wait synchronously for up to 60 seconds for results, or fire off the prediction and check status later. The endpoint supports any model in Replicate's catalog including FLUX, Llama, Whisper, and thousands more.
Yes. The agent uses Replicate's Search endpoint to query the entire model catalog by keyword. It returns matching models with descriptions, owner information, and version details. You can also browse curated collections or list public models sorted by creation date or latest version to discover new capabilities.
Tars requires your Replicate API token, which you generate from your Replicate account settings page. The token authenticates all API requests as a bearer token. It grants access to predictions, models, deployments, files, and training jobs associated with your account or organization.
No. Prediction results, model metadata, deployment configurations, and file data are all fetched from Replicate's API in real time. Tars does not cache generated images, text outputs, or any model artifacts. Each prediction request and result retrieval hits the live Replicate API.
Yes. The agent creates deployments with specified hardware (GPU type), scaling parameters (min/max instances), and model versions. It can list all deployments, get details for a specific one, or delete deployments that are offline. This gives your team infrastructure management capabilities directly through conversation.
Replicate's web interface requires navigating to specific model pages and configuring inputs manually. Their CLI requires terminal access and command knowledge. Tars AI Agents let anyone on your team describe what they need in plain language, the agent handles model selection, input formatting, and result delivery conversationally.
Yes. Using Replicate's Trainings Create endpoint, the agent starts training jobs with a base model version, your training data (as a URL to a zip file or hosted dataset), and a destination model for the fine-tuned output. It supports webhook notifications for training completion, so your team gets alerted when the job finishes.
If the prediction does not complete within the wait_for period (max 60 seconds), Replicate returns a prediction object with 'processing' status. The agent can then poll the prediction by ID to check when it completes. For long-running tasks like training or high-resolution generation, the agent manages the async flow and reports results when ready.
Don't limit your AI Agent to basic conversations. Watch how to configure and add powerful tools making your agent smarter and more functional.

Privacy & Security
At Tars, we take privacy and security very seriously. We are compliant with GDPR, ISO, SOC 2, and HIPAA.