AI & DATA
Every Reddit post,
as clean JSON
Subreddits, search, user streams, or specific threads — one input schema, four sources, no OAuth. RAG-ready markdown output for LLM pipelines.
4 sources
Subs + Search + Users + IDs
$0.003
Per Post
No OAuth
Public JSON Endpoints
RAG-ready
Markdown Output
USE CASES
What you can do with this data
RAG for AI Agents
Feed Reddit discussions into vector DBs. Agents answer questions citing real user experiences, not hallucinated ones.
Brand & Trend Monitoring
Track mentions of your product, competitors, or category across subreddits. Detect sentiment shifts before they hit Twitter.
Market Research
Mine niche subreddits for pain points, feature requests, pricing complaints. Turn threads into a product roadmap.
Content Inspiration
What topics win upvotes in your niche? Extract top posts per week, feed to GPT for blog/newsletter outlines.
Lead Discovery
Search for "looking for X" or "recommendations for Y" queries. Filter by subreddit and build warm-lead lists.
Dataset Building
Train classifiers on labeled subreddits. Export posts + comments for fine-tuning LLMs or building domain-specific agents.
OUTPUT FIELDS
Fields extracted per post
Post ID + permalink
Subreddit name
Title + selftext
Author (or null if deleted)
Score + upvote ratio
Number of comments
Created timestamp (UTC)
External URL (link posts)
Link flair text
NSFW + spoiler flags
Media URL (video / image)
Gallery URLs (multi-image)
Full comment tree
Comment depth + parent_id
Markdown block (RAG mode)
Thumbnail URL
HOW IT WORKS
Three steps to structured data
Pick a source
Subreddit names, search query, user handles, or specific post IDs — any combination works in one input.
Scrape
Actor fetches Reddit’s public JSON via residential proxy. Warmup cookies, retry on 429/503, fallback to old.reddit on 403.
Consume
Full / minimal / RAG-markdown output. Stream to vector DB, export CSV for analysis, or webhook to your agent.
COMPARISON
Why this actor vs alternatives
| Feature | This Actor | Reddit PRAW (DIY) | Other Reddit actors |
|---|---|---|---|
| Auth required | None | OAuth app + token | Varies |
| Price per post | $0.003 | Free but self-host | $0.003–$0.010 |
| Sources in one input | subs + search + users + IDs | Custom code per source | Usually one per actor |
| Comment trees | Optional, depth-limited | Manual pagination | Varies |
| RAG markdown output | Built-in | Write your own | Rare |
| Proxy rotation | Residential by default | Manual | Usually datacenter |
FAQ
Frequently asked questions
Does it need a Reddit API key or OAuth?
No. The actor hits Reddit’s public .json endpoints, which Reddit itself serves for every page. No OAuth app, no token rotation, no quota tier.
How does it avoid rate limits?
Runs through Apify residential proxy by default with automatic retry/backoff, plus an old.reddit.com fallback when the main domain 403s. 40 rapid requests returned clean JSON during validation.
Can I get the comment tree?
Yes — toggle Include comments and set max depth. Each post’s comments are returned as a flat list with parent_id, depth, score, and body for reconstruction or vector indexing.
What’s the output format for LLMs?
Three formats: full (every Reddit field), minimal (key fields only), and rag (single markdown block per post + top comments, ready to drop into a vector DB).
Can I combine sources in one run?
Yes. One input schema supports subreddit listings, search queries, user streams, and specific post IDs simultaneously — one actor instead of four.
START NOW
Scrape Reddit at scale
One run = thousands of posts with comments. Feed your RAG, train your classifier, or monitor your brand.
Start Scraping Reddit →