Apify

Web data extraction that talks back

Your AI agent integrates with Apify during customer conversations. Apify is the leading web scraping and automation platform with 10,000+ ready-made Actors for scraping websites like Google, Amazon, Instagram, and LinkedIn. Your agent can run scrapers on demand, retrieve extracted data, schedule recurring crawls, and manage large-scale data extraction projects through natural conversation.

Chosen by 800+ global brands across industries

Scraping power through natural dialogue

Run 10,000+ pre-built scrapers, manage datasets, schedule crawls, and monitor runs. Extract data from Google, Amazon, Instagram, and LinkedIn through conversation.

Run Web Scrapers and Automation Actors

Agent executes pre-built scrapers from the Apify Store or custom Actors with specific inputs. Runs Actors synchronously for immediate results or asynchronously for longer jobs. Configures memory allocation, timeout limits, and webhook notifications for each run.

Retrieve and Export Scraped Data

Agent fetches results from datasets in JSON, CSV, XML, Excel, or RSS formats. Applies pagination, field filtering, and sorting to large result sets. Flattens nested data, skips empty items, and transforms output for immediate use in conversations.

Store Files and Configuration in Key-Value Stores

Agent creates and manages key-value stores for screenshots, PDFs, configuration files, and scraped output. Retrieves specific records by key, lists available keys with prefix filtering, and checks record existence without downloading content.

Manage URL Queues for Distributed Crawling

Agent creates request queues, batch-adds up to 25 URLs at a time, and retrieves queue head for processing. Supports locking requests for exclusive access, setting retry counts, and marking URLs as handled after successful crawling.

Schedule Recurring Scrapes with Cron Expressions

Agent creates schedules using cron syntax or shortcuts like @daily and @weekly. Configures timezone, exclusive execution to prevent overlapping runs, and actions that trigger specific Actors or tasks. Retrieves schedule logs to track execution history.

Monitor Runs, Builds, and Retrieve Logs

Agent checks run status, retrieves execution logs for debugging, and monitors resource usage. Aborts running jobs gracefully with state persistence, resurrects failed runs to continue from last checkpoint, and updates status messages visible in the Apify Console.

Apify

Use Cases

Web data extraction workflows

Real scenarios where AI extracts leads, monitors prices, and collects social media data during customer conversations.

Lead Generation from Company Websites

Customer provides a list of company URLs and asks for contact information. Your AI Agent runs a contact scraper Actor with those URLs as input, waits for completion, retrieves the dataset containing emails, phone numbers, and social profiles, then presents qualified leads organized by company. The agent handles pagination for large result sets and formats output for CRM import.

Competitive Price Monitoring

Customer wants daily competitor pricing updates. Your AI Agent creates a scheduled task using an e-commerce scraper Actor, configures it to run every morning at 8 AM in their timezone, and stores results in a named dataset. During conversations, the agent retrieves the latest prices, compares with historical data from previous runs, and alerts the customer to significant price changes.

Social Media Data Collection for Research

Customer needs Instagram hashtag data for market research. Your AI Agent runs the Instagram Hashtag Scraper Actor with specified hashtags, retrieves posts, engagement metrics, and profile information from the resulting dataset. The agent filters by date range, sorts by engagement, and summarizes trends - all without the customer navigating any scraping interface.

Try
Apify

Apify

FAQs

Frequently Asked Questions

What are Apify Actors and how does the AI agent use them?

Actors are serverless cloud programs that run on Apify's infrastructure. They perform web scraping, data extraction, and automation tasks. The agent can run any Actor by providing its ID (like 'apify/web-scraper') and input parameters. The agent retrieves results from the Actor's default dataset or key-value store output.

Can the agent run pre-built scrapers from the Apify Store?

Yes. The Apify Store has 10,000+ ready-made Actors for popular sites like Google Maps, Amazon, LinkedIn, Instagram, and Twitter. The agent can run any public Actor by specifying its username and name (e.g., 'apify/google-search-scraper') along with the required input fields. No coding needed - just provide search terms, URLs, or other inputs.

How does the agent handle scraping jobs that take longer than 5 minutes?

For long-running jobs, the agent runs Actors asynchronously and polls for completion. It can set waitForFinish up to 60 seconds per request, then retrieve the run status and continue waiting. For very long jobs, the agent stores the run ID and checks back later. It can also set webhooks to receive notifications when runs complete.

Can the agent schedule recurring scrapes with cron expressions?

Yes. The agent creates schedules using standard cron syntax (like '0 8 * * *' for 8 AM daily) or shortcuts like @daily, @weekly, and @monthly. It configures the target Actor or task, input parameters, memory allocation, and timezone. Schedules can be set to exclusive mode to prevent overlapping runs if a previous execution is still running.

How does the agent retrieve and format scraped data from datasets?

The agent fetches dataset items with support for JSON, CSV, XML, Excel, and RSS formats. It can filter specific fields, skip empty items, apply sorting, paginate through large result sets (up to 250,000 items), and flatten nested JSON. For example, after running a product scraper, the agent can retrieve only price and title fields sorted by price.

Can the agent manage request queues for distributed crawling?

Yes. The agent creates named request queues, adds URLs individually or in batches of 25, and retrieves pending requests for processing. It supports request locking for distributed crawlers where multiple workers process the same queue, handles retry logic, and marks requests as handled after successful processing.

What happens if a scraping run fails or gets blocked?

The agent can resurrect failed or timed-out runs to continue from the last checkpoint, preserving already scraped data. It retrieves execution logs to diagnose blocking issues. For runs that need to be stopped, the agent can abort gracefully, giving the Actor 30 seconds to save state before force-stopping. This enables resuming later.

How does the integration authenticate with Apify?

The integration uses Apify API tokens passed in the Authorization header as Bearer tokens. You generate a token in the Apify Console under Integrations and add it to Tars. The token grants access to your Actors, datasets, key-value stores, and schedules. API rate limits are 250,000 requests per minute globally and 60 requests per second per resource.

How to add Tools to your AI Agent

Supercharge your AI Agent with Tool Integrations

Don't limit your AI Agent to basic conversations. Watch how to configure and add powerful tools making your agent smarter and more functional.

Privacy & Security

We’ll never let you lose sleep over privacy and security concerns

At Tars, we take privacy and security very seriously. We are compliant with GDPR, ISO, SOC 2, and HIPAA.

GDPR
ISO
SOC 2
HIPAA

Still scrolling? We both know you're interested.

Let's chat about AI Agents the old-fashioned way. Get a demo tailored to your requirements.

Schedule a Demo