
HoneyHive
Your ML team needs to log events, run evaluations, and check experiment metrics. Your AI agent interfaces with HoneyHive to start sessions, manage datasets, retrieve experiment results, and update metrics, making observability part of the conversation instead of a separate dashboard.




Your agent logs model events, manages evaluation datasets, starts experiment runs, and retrieves metrics from HoneyHive, turning LLM observability into a conversational workflow.
HoneyHive
See how AI teams use agents to interact with HoneyHive for logging, evaluation, and experimentation, keeping model observability embedded in daily workflows.
A prompt engineer finishes an A/B test between two system prompts. They ask the agent for results. Your AI Agent fetches the experiment run from HoneyHive, retrieves aggregated metrics including accuracy, latency, and cost per call, and presents a comparison. The engineer decides which prompt to promote to production in minutes. No dashboard navigation needed.
Customer support flags a conversation where the AI gave an incorrect answer. The QA lead tells the agent to add it as a test case. The agent appends the input, expected output, and metadata to the HoneyHive evaluation dataset. The regression suite grows organically from real failures. Future prompt changes get tested against actual edge cases.
An ML engineer completes an overnight batch inference job and needs to log all results. They trigger the agent, which sends model events to HoneyHive in bulk with inputs, outputs, durations, and token counts. The observability dashboard immediately reflects the new data. The engineer reviews performance trends without writing a single logging script.

HoneyHive
FAQs
The agent calls HoneyHive's batch model events endpoint with an array of event objects containing inputs, outputs, durations, token counts, and metadata. Events are associated with a session and project. HoneyHive processes them for dashboards, evaluators, and alerting. Single and batch logging are both supported.
Yes. The agent can start evaluation runs by specifying event IDs, dataset IDs, and project context through HoneyHive's API. It can also end runs and retrieve results with aggregated metrics like average, median, p95, or custom functions. This covers both automated and human evaluation workflows.
Tars uses your HoneyHive API key, which you generate from your account settings. This key authenticates all API calls including session management, event logging, dataset operations, and metric retrieval. You can rotate the key anytime from the HoneyHive dashboard.
No. Tars sends data directly to HoneyHive's API and does not retain copies. Model inputs, outputs, evaluation datasets, and experiment metrics are stored exclusively in your HoneyHive account. Tars handles only the API request and response during the conversation.
Yes. The agent can list all projects, filter by name, and create or update projects through HoneyHive's API. When logging events or managing datasets, the agent specifies the target project name, so your team can work across multiple AI applications from a single conversation.
The dashboard requires manual navigation to view experiments, manage datasets, and inspect events. With Tars, your team asks questions like 'How did experiment run_123 perform?' or 'Add this failing case to the QA dataset' and gets results instantly. Observability becomes part of the engineering conversation.
Yes. The agent calls HoneyHive's update metric endpoint to modify names, descriptions, evaluator prompts, code snippets, thresholds, and production enablement flags. This lets your team adjust quality criteria conversationally as requirements evolve.
When retrieving experiment results, the agent can specify aggregation functions including average, min, max, median, p90, p95, p99, sum, and count. This gives your team flexibility to analyze results from different statistical perspectives without running manual calculations.
Don't limit your AI Agent to basic conversations. Watch how to configure and add powerful tools making your agent smarter and more functional.

Privacy & Security
At Tars, we take privacy and security very seriously. We are compliant with GDPR, ISO, SOC 2, and HIPAA.