Reddit Scraper: Extract Posts and Comments Without API Approval

Q: How far back can I scrape historical posts?

The scraper can retrieve posts using sort filters (`new`, `hot`, `top`, `rising`, `controversial`) and timeframe filters (`hour`, `day`, `week`, `month`, `year`, `all`) where Reddit exposes them. Full historical posts are available if they have not been deleted or removed. For trend analysis over a year, run monthly extractions and accumulate the results. This creates a longitudinal dataset you can analyze for patterns and sentiment shifts.

Direct Answer: What Does the Reddit Scraper Extract?

The Reddit Scraper on Apify extracts public Reddit posts, comment trees, subreddit listings, search results, and user submissions without requiring Reddit API approval, OAuth credentials, or PRAW scripts. You choose subreddits, a Reddit search query, user handles, or specific post IDs, set maxItems, and get structured JSON or RAG-ready Markdown in minutes. The live actor is Reddit Posts Scraper, priced as pay-per-event on Apify.

Reddit made API access effectively unusable for most businesses in 2023 when it raised prices 500x overnight and killed third-party apps. The official Reddit API documentation now shows that API access costs $0.24 per 1,000 requests minimum and requires a lengthy approval process with no guarantee of acceptance. Scraping solves the access problem cleanly.

If you need Reddit monitoring as a repeatable business workflow, use the web scraping automation setup to connect scheduled Apify runs to Sheets, alerts, or a research database.

Current production signal as of May 15, 2026: the actor had 52 recent runs, a 100% success rate in the Apify revenue pulse, 447 charged results, 3 paying users, Store rank 887, and a displayed Store price from $0.003 per delivered post/result.

What Data Fields You Get

Every result from the Reddit Posts Scraper includes a standardized set of fields that cover everything you need for analysis, monitoring, or content research:

Field	Description
`title`	Full post title
`selftext` / `selftext_markdown`	Post text content where Reddit exposes it
`score`	Score at time of scrape
`upvote_ratio`	Ratio of upvotes to total votes
`num_comments`	Total number of comments
`author`	Reddit username of the poster
`subreddit`	Subreddit the post belongs to
`created_utc`	UTC datetime of original post
`permalink`	Direct Reddit permalink
`url`	External URL for link posts
`link_flair_text`	Post flair label assigned by author or moderator
`engagementScore`	0-100 score based on score, comments, and upvote ratio
`commentDensity`	Comment-to-score ratio for discussion-heavy filtering
`contentType`	text, link, media, gallery, or video
`sourceMode` / `sourceValue`	Which input source produced the row
`comments`	Optional: comment tree with author, body, score, depth, and parent ID
`markdown`	Optional: RAG-ready Markdown block when `outputFormat` is `rag`

The comment data is particularly useful for sentiment analysis. You are not just getting the top-level post opinion. You are getting the community’s full reaction, including dissenting views, edge case reports, and specific pain points.

Use Cases by Role

Marketers: Brand and Competitor Monitoring

Reddit is where real opinions live. Unlike social media platforms optimized for positivity and engagement, Reddit rewards honest feedback. Users call out bad products by name, share screenshots of poor customer support, and warn communities about misleading pricing.

With the Reddit Scraper, marketers can:

Monitor brand mentions across relevant subreddits without setting up manual alerts
Track how competitors are being discussed in communities where their buyers spend time
Identify recurring complaints about competitor products and turn them into positioning angles
Watch for emerging narratives before they become PR problems

A practical workflow: scrape your brand name and three competitor names weekly from subreddits where your target buyers are active. Export to a spreadsheet, sort by upvote count, and review the top 20 posts. You will find positioning opportunities that no keyword tool surfaces.

Founders: Product Research and Validation

Reddit is the fastest way to validate a product assumption without a survey. Real users describe real problems in real language. The vocabulary they use is the vocabulary your landing page should use.

Run a scrape on r/startups, r/entrepreneur, r/SaaS, or any vertical-specific subreddit. Filter by posts asking for tool recommendations, describing workflow frustrations, or mentioning the problem you are solving. You get qualitative research at scale, collected in minutes, without recruiting participants or writing a research instrument.

For example: if you are building a project management tool, scrape r/productivity and r/projectmanagement for the phrase “I switched from” or “I hate that [tool name]”. The complaints you find are your feature roadmap.

SEOs: Content Gap Discovery

Reddit surfaces demand for content that Google Search Console does not show you. Questions with thousands of upvotes and no good answers represent gaps in the information ecosystem. Those gaps are ranking opportunities.

Search Reddit for your core topic, sort results by upvote count, and look for questions with high engagement but no authoritative answer in the top comments. Write the definitive answer as a blog post. Reddit’s organic links and the genuine search demand behind those questions will help it rank.

This approach pairs naturally with the Apify web scraping platform because you can automate the entire research pipeline: scheduled scrapes feed into a spreadsheet, which your content team reviews weekly. Combine with competitive analysis tools and market research strategies. See content marketing for how to use Reddit insights in your editorial calendar. Also relevant: demand generation and go-to-market strategy.

Researchers: Trend Analysis and Sentiment Tracking

Academic and market researchers use Reddit data to understand how public sentiment shifts over time on specific topics. Unlike surveys, Reddit data is longitudinal, unprimed, and produced without the researcher’s presence affecting responses.

Scrape the same set of subreddits monthly and track changes in:

Volume of posts on a topic
Average upvote scores (indicating community agreement)
Sentiment in comment threads
Emergence of new terminology or framing

This kind of trend analysis is useful for investor research, product category forecasting, and competitive intelligence.

How to Configure the Reddit Scraper

Setup takes under five minutes with no code required:

Step 1: Open the actor

Go to https://apify.com/tugelbay/reddit-posts-scraper and click “Try for free”. You will need a free Apify account.

Step 2: Set your input

The actor accepts four source modes:

postIds: fetch specific Reddit threads by ID, with optional comments
users: fetch submitted posts from public user profiles
search: search Reddit-wide or inside a subreddit with native Reddit operators such as subreddit:SaaS "looking for"
subreddits: fetch listings such as hot, new, top, rising, or controversial

Additional configuration options:

maxItems: set how many results you want (affects cost directly)
includeComments: toggle whether to fetch full comment threads
sort: choose between hot, new, top, rising, and controversial
timeframe: filter top or controversial posts by hour, day, week, month, year, or all
outputFormat: choose full, minimal, or rag Markdown output
skipNsfw: filter posts marked over_18

Step 3: Run and export

Click “Start”. The actor runs on Apify’s cloud infrastructure using residential proxies. When complete, download results as JSON, CSV, or Excel. You can also connect the output directly to Google Sheets via Apify’s native integration or Zapier.

Step 4: Schedule for ongoing monitoring

In the actor settings, set a schedule (daily, weekly, monthly). Apify will run the scrape automatically and store results in the dataset. This is the foundation of any automated monitoring workflow.

Practical Example: Scraping r/startups for Product Feedback

Suppose you are building a tool for early-stage founders and want to understand what they complain about most in their current stack.

Input configuration:

{
 "search": "subreddit:startups \"product feedback\" OR \"customer feedback\"",
 "sort": "top",
 "timeframe": "month",
 "maxItems": 200,
 "includeComments": true,
 "maxComments": 20,
 "maxCommentDepth": 2,
 "outputFormat": "rag",
 "skipNsfw": true
}

What you get:

200 posts with full comment threads from the past month, filtered to posts about product feedback and user research. In the results, you will find:

Founders describing which tools they abandoned and why
Specific feature requests that recur across multiple posts
Price sensitivity signals (“too expensive for early stage”, “worth it after Series A”)
Comparisons between competitors written by actual users, not review sites

The total Store-side result cost for this run is based on delivered dataset items at the live Apify pay-per-event price. Comment extraction can increase runtime and proxy usage, so check the live pricing box before running large scheduled jobs.

Pricing vs the Reddit API

The contrast between scraping and the official Reddit API is significant:

	Reddit Official API	Apify Reddit Scraper
Approval required	Yes, with no guarantee	No
Setup time	Days to weeks	Under 5 minutes
Cost structure	Per-request, tiered pricing	Pay per delivered result on Apify
Rate limits	Strict, varies by tier	Managed by actor
Comment access	Full via API	Full via scraper
Historical data	Limited	Sortable by time period
Code required	Yes (OAuth, pagination)	No

For most business use cases, pay-per-result pricing is easier to control than maintaining your own API client, proxy pool, retries, and dataset pipeline. A typical brand monitoring job pulling a few hundred posts per week costs only when rows are delivered, and gives you raw data you can process however you need.

The Reddit Posts Scraper is priced on Apify’s Pay Per Event model. Always verify the current live price on the Apify Store before large runs because pricing configuration can change independently of this article.

Limitations to Know Before You Start

Reddit’s layout changes periodically. The actor is maintained to handle these changes, but immediately after a Reddit redesign there may be a brief window where some fields return null values. Check the actor’s changelog before running critical jobs.

Comment depth is configurable but has limits. Very deep threads (500+ comments) may take longer to process and cost more. For most use cases, limiting comment depth to two or three levels is sufficient.

Deleted posts and shadowbanned users are not retrievable. If a post was removed by moderators or the user account was banned, the content is gone from Reddit’s public interface and the scraper cannot access it.

Subreddits with 18+ restrictions require appropriate account configuration. NSFW subreddits are accessible but may require additional setup depending on actor version.

This is not a real-time stream. The scraper pulls snapshots. If you need real-time monitoring, schedule runs at shorter intervals (hourly) and accept slightly higher costs.

How It Avoids Blocks

Reddit actively rate-limits bots and scrapers that use data center IP addresses. The Reddit Posts Scraper uses Apify residential proxy by default, warms up Reddit cookies, retries temporary 429 / 503 responses, falls back to old.reddit.com, and can use a Firecrawl rescue path when both Reddit endpoints fail.

This is the same approach used by professional data providers charging thousands per month for Reddit data. On Apify, the proxy infrastructure is included in the per-result pricing, so you are not paying separately for proxies.

The actor also handles automatic retries on failed requests and writes dataset rows incrementally. You do not need to think about OAuth approval, token rotation, pagination, or rebuilding comment trees. Set your inputs, run, and collect results.

Getting Started

The Reddit Posts Scraper is available at https://apify.com/tugelbay/reddit-posts-scraper.

Free Apify accounts include $5 in monthly credits, which covers several hundred results for initial testing. Paid plans start at $49/month and include significantly more compute and storage.

If you are new to Apify and want to understand the broader platform before running your first scrape, the overview at Apify: The Web Scraping Platform Marketers Actually Need covers how actors work, what other data sources are available, and how to integrate Apify output with your existing marketing stack.

Reddit data is among the most valuable and most underused research assets available to marketers and product teams. The API waitlist is not the obstacle it used to be.

FAQ

Is scraping Reddit legal or against terms of service?

Reddit’s terms prohibit automated scraping. This tool is provided for educational and research purposes. Users are responsible for compliance with applicable laws and Reddit’s policies. For production applications, the official Reddit API provides compliant access. That said, many researchers and marketers operate scrapers at small scale for competitive intelligence and market research. Be respectful of rate limits and do not overload Reddit’s infrastructure.

How accurate is the sentiment in comment threads?

Raw comment text is accurate, and the actor can add a lightweight positive / negative / neutral label with its built-in lexicon scorer when includeSentiment is enabled. For high-stakes analysis, treat that label as a first-pass filter and run your own NLP or LLM classifier downstream. Upvote counts still serve as a proxy for community agreement: high upvotes indicate community consensus, negative or low scores indicate dissent or poor reception.

Can I identify deleted comments?

No. Deleted or removed comments do not appear in Reddit’s public interface and the scraper cannot access them. You see only what’s currently visible to anyone on Reddit. This is actually useful for filtering: the comments you capture are the ones moderation approved, which tend to be higher quality than removed spam or rule-breaking posts.

How do I handle rate limiting or blocks?

The scraper uses residential proxies, warmup cookies, retry/backoff, and an old.reddit.com fallback to reduce block risk. Occasional blocks can still happen on high-volume runs. Re-run failed keywords; proxy rotation usually resolves the issue on retry. If you consistently hit blocks on specific subreddits, reduce maxItems, lower comment depth, or increase the time between scheduled runs.

Can I schedule automated weekly or monthly scrapes?

Yes. Use Apify’s built-in scheduler to run the actor on any cron schedule. Results accumulate in your dataset, creating a historical trend log without manual intervention. This is ideal for competitive monitoring: set up a weekly scrape of subreddits where your competitors are mentioned, and your dataset automatically tracks how sentiment shifts over time. Monthly sentiment tracking on r/startups or r/SaaS gives you a real-time pulse on market trends.

How far back can I scrape historical posts?

The scraper can retrieve posts using sort filters (new, hot, top, rising, controversial) and timeframe filters (hour, day, week, month, year, all) where Reddit exposes them. Full historical posts are available if they have not been deleted or removed. For trend analysis over a year, run monthly extractions and accumulate the results. This creates a longitudinal dataset you can analyze for patterns and sentiment shifts.

Last verified: May 15, 2026

Reddit Scraper: Extract Posts and Comments Without API Approval

SEO ROI Calculator

Direct Answer: What Does the Reddit Scraper Extract?

What Data Fields You Get

Use Cases by Role

Marketers: Brand and Competitor Monitoring