Apify Integration for AI Agents

Apify

Use Cases

Web data extraction workflows

Real scenarios where AI extracts leads, monitors prices, and collects social media data during customer conversations.

Lead Generation from Company Websites

Customer provides a list of company URLs and asks for contact information. Your AI Agent runs a contact scraper Actor with those URLs as input, waits for completion, retrieves the dataset containing emails, phone numbers, and social profiles, then presents qualified leads organized by company. The agent handles pagination for large result sets and formats output for CRM import.

Competitive Price Monitoring

Customer wants daily competitor pricing updates. Your AI Agent creates a scheduled task using an e-commerce scraper Actor, configures it to run every morning at 8 AM in their timezone, and stores results in a named dataset. During conversations, the agent retrieves the latest prices, compares with historical data from previous runs, and alerts the customer to significant price changes.

Social Media Data Collection for Research

Customer needs Instagram hashtag data for market research. Your AI Agent runs the Instagram Hashtag Scraper Actor with specified hashtags, retrieves posts, engagement metrics, and profile information from the resulting dataset. The agent filters by date range, sorts by engagement, and summarizes trends - all without the customer navigating any scraping interface.

Try

Apify

FAQs

Frequently Asked Questions

What are Apify Actors and how does the AI agent use them?

Actors are serverless cloud programs that run on Apify's infrastructure. They perform web scraping, data extraction, and automation tasks. The agent can run any Actor by providing its ID (like 'apify/web-scraper') and input parameters. The agent retrieves results from the Actor's default dataset or key-value store output.

Can the agent run pre-built scrapers from the Apify Store?

Yes. The Apify Store has 10,000+ ready-made Actors for popular sites like Google Maps, Amazon, LinkedIn, Instagram, and Twitter. The agent can run any public Actor by specifying its username and name (e.g., 'apify/google-search-scraper') along with the required input fields. No coding needed - just provide search terms, URLs, or other inputs.

How does the agent handle scraping jobs that take longer than 5 minutes?

For long-running jobs, the agent runs Actors asynchronously and polls for completion. It can set waitForFinish up to 60 seconds per request, then retrieve the run status and continue waiting. For very long jobs, the agent stores the run ID and checks back later. It can also set webhooks to receive notifications when runs complete.

Can the agent schedule recurring scrapes with cron expressions?

Yes. The agent creates schedules using standard cron syntax (like '0 8 * * *' for 8 AM daily) or shortcuts like @daily, @weekly, and @monthly. It configures the target Actor or task, input parameters, memory allocation, and timezone. Schedules can be set to exclusive mode to prevent overlapping runs if a previous execution is still running.

How does the agent retrieve and format scraped data from datasets?

The agent fetches dataset items with support for JSON, CSV, XML, Excel, and RSS formats. It can filter specific fields, skip empty items, apply sorting, paginate through large result sets (up to 250,000 items), and flatten nested JSON. For example, after running a product scraper, the agent can retrieve only price and title fields sorted by price.

Can the agent manage request queues for distributed crawling?

Yes. The agent creates named request queues, adds URLs individually or in batches of 25, and retrieves pending requests for processing. It supports request locking for distributed crawlers where multiple workers process the same queue, handles retry logic, and marks requests as handled after successful processing.

What happens if a scraping run fails or gets blocked?

The agent can resurrect failed or timed-out runs to continue from the last checkpoint, preserving already scraped data. It retrieves execution logs to diagnose blocking issues. For runs that need to be stopped, the agent can abort gracefully, giving the Actor 30 seconds to save state before force-stopping. This enables resuming later.

How does the integration authenticate with Apify?

The integration uses Apify API tokens passed in the Authorization header as Bearer tokens. You generate a token in the Apify Console under Integrations and add it to Tars. The token grants access to your Actors, datasets, key-value stores, and schedules. API rate limits are 250,000 requests per minute globally and 60 requests per second per resource.

Web data extraction that talks back

Scraping power through natural dialogue

Run Web Scrapers and Automation Actors

Retrieve and Export Scraped Data

Store Files and Configuration in Key-Value Stores

Manage URL Queues for Distributed Crawling

Schedule Recurring Scrapes with Cron Expressions

Monitor Runs, Builds, and Retrieve Logs