The “Webpage To Markdown Scraper” LeadTables Data Module

Zach's FavsGood For BeginnersCheapLeadTables Data Module

Data Acquisition Style
- Purchase (instant access - e.g. from broker)
- Scraped / Human Labor 2 - Semi-automatic (e.g. with AI or partial tool assistance)

Full Content:

The “Webpage To Markdown Scraper” Data Module lets you store a clean markdown snapshot of a webpage directly in your LeadTable.

REALLY useful for later handing off to an AI Data Module to analyze or extract from.

Choose which page you want to capture (for example: a company’s homepage, pricing page, or careers page).
Run the module on the leads you care about.
You’ll get a text (markdown) snapshot of the page, plus a few helpful metadata fields, saved right into your LeadTable.

Turn websites into usable data: Instead of clicking around tabs and pages, you can search, filter, and analyze website content like any other column.
Power better AI workflows: Feed clean page text into downstream AI enrichments (summaries, positioning analysis, ICP fit, personalization, and more).
Qualify leads faster: Quickly spot red flags (thin sites, outdated content, unclear offering) or positive signals (strong positioning, clear pricing, hiring, new product pages).
Create repeatable review: Save the content in your table so your team can review the same snapshot later.
Extract useful data: You can have an AI Data Module or Formula Data Module pull out useful snippets from the page content for using as personalization markers, lead filtering, and more!

Scraped Markdown (text) — A clean text version of the page content.
Scrape Status (text) — Whether the scrape succeeded or failed.
Scrape Word Count (number) — A quick “how much content did we actually get?” signal.
Page Title (text) — The page title (when available).
Page Description (text) — The page description (when available).
Page Language (text) — The page’s language (when available).

Page URL: The page you want to capture for each lead. Usually you’d just set this to their homepage, but if you’ve already identified subpages within their site, you could pass those in too.
Main Content Only?: When enabled, we try to exclude navigation, headers, and footers for a cleaner, lower-word-count (and thus, cheaper for an LLM to analyze) result.

Some websites block automated access, require logins, or show heavy cookie/consent overlays. Those pages may fail or return limited content.
“Main content only” is a best-effort cleanup. It usually improves readability, but occasionally hides useful parts of a page.
Website content changes over time, so rerunning later may produce different results.

This module is powered by Firecrawl.dev.

Start small (test on a few rows) to make sure you’re pulling the right page, then deploy to larger segments.
Use Scrape Word Count to quickly spot pages that likely didn’t scrape cleanly (very low word count) and/or websites that are inactive “parked pages.”