AI & DATA

Every Reddit post,
as clean JSON

Subreddits, search, user streams, or specific threads — one input schema, four sources, no OAuth. RAG-ready markdown output for LLM pipelines.

4 sources

Subs + Search + Users + IDs

$0.003

Per Post

No OAuth

Public JSON Endpoints

RAG-ready

Markdown Output

USE CASES

What you can do with this data

🤖

RAG for AI Agents

Feed Reddit discussions into vector DBs. Agents answer questions citing real user experiences, not hallucinated ones.

📊

Brand & Trend Monitoring

Track mentions of your product, competitors, or category across subreddits. Detect sentiment shifts before they hit Twitter.

🔍

Market Research

Mine niche subreddits for pain points, feature requests, pricing complaints. Turn threads into a product roadmap.

📈

Content Inspiration

What topics win upvotes in your niche? Extract top posts per week, feed to GPT for blog/newsletter outlines.

📧

Lead Discovery

Search for "looking for X" or "recommendations for Y" queries. Filter by subreddit and build warm-lead lists.

📝

Dataset Building

Train classifiers on labeled subreddits. Export posts + comments for fine-tuning LLMs or building domain-specific agents.

OUTPUT FIELDS

Fields extracted per post

Post ID + permalink

Subreddit name

Title + selftext

Author (or null if deleted)

Score + upvote ratio

Number of comments

Created timestamp (UTC)

External URL (link posts)

Link flair text

NSFW + spoiler flags

Media URL (video / image)

Gallery URLs (multi-image)

Full comment tree

Comment depth + parent_id

Markdown block (RAG mode)

Thumbnail URL

HOW IT WORKS

Three steps to structured data

01

Pick a source

Subreddit names, search query, user handles, or specific post IDs — any combination works in one input.

02

Scrape

Actor fetches Reddit’s public JSON via residential proxy. Warmup cookies, retry on 429/503, fallback to old.reddit on 403.

03

Consume

Full / minimal / RAG-markdown output. Stream to vector DB, export CSV for analysis, or webhook to your agent.

COMPARISON

Why this actor vs alternatives

Feature This Actor Reddit PRAW (DIY)Other Reddit actors
Auth required None OAuth app + token Varies
Price per post $0.003 Free but self-host $0.003–$0.010
Sources in one input subs + search + users + IDs Custom code per source Usually one per actor
Comment trees Optional, depth-limited Manual pagination Varies
RAG markdown output Built-in Write your own Rare
Proxy rotation Residential by default Manual Usually datacenter

FAQ

Frequently asked questions

Does it need a Reddit API key or OAuth?

No. The actor hits Reddit’s public .json endpoints, which Reddit itself serves for every page. No OAuth app, no token rotation, no quota tier.

How does it avoid rate limits?

Runs through Apify residential proxy by default with automatic retry/backoff, plus an old.reddit.com fallback when the main domain 403s. 40 rapid requests returned clean JSON during validation.

Can I get the comment tree?

Yes — toggle Include comments and set max depth. Each post’s comments are returned as a flat list with parent_id, depth, score, and body for reconstruction or vector indexing.

What’s the output format for LLMs?

Three formats: full (every Reddit field), minimal (key fields only), and rag (single markdown block per post + top comments, ready to drop into a vector DB).

Can I combine sources in one run?

Yes. One input schema supports subreddit listings, search queries, user streams, and specific post IDs simultaneously — one actor instead of four.

START NOW

Scrape Reddit at scale

One run = thousands of posts with comments. Feed your RAG, train your classifier, or monitor your brand.

Start Scraping Reddit