How to Find Shopify Stores by Category (Without Buying a $12K List)
Find Shopify stores by category, app stack, country, and revenue band. Honest walkthrough of the free APIs, the paid lists, and a $0.008/store path.
Find Shopify stores by category, app stack, country, and revenue band. Honest walkthrough of the free APIs, the paid lists, and a $0.008/store path.
If you sell into the Shopify ecosystem, you have probably already tried to find Shopify stores by category and bounced off a wall. Either you bought a $4-12K static CSV from StoreLeads or Charm, or you wrote a /products.json scraper that worked for a weekend and then got blocked by Cloudflare on Monday morning. Neither of those is a sustainable lead source for a sales team that needs fresh prospects every week.
I run a portfolio of B2B Apify actors at seibs.co and I have spent the last few months rebuilding this exact problem from scratch for SDRs at Klaviyo competitors, Recharge competitors, Gorgias competitors, and three different headless-commerce agencies. This post is what I wish somebody had written for me two years ago: the actual mechanics, the limits of every approach, and a path that costs less than a cup of coffee per 100 verified stores.
The Shopify-prospecting market is split into three buckets and every one of them has a real flaw.
Paid static lists (StoreLeads, Charm, MyIP.ms, BuiltWith). These are the default. They are also stale. StoreLeads refreshes most of its app-install signals on a 30-90 day cadence depending on tier, BuiltWith is the same story, and Charm is monthly at best. For a Klaviyo competitor trying to catch a brand within the 14-day "they just installed Klaviyo, they hate it" window, monthly is a miss. Pricing also starts around $99-499/mo and climbs fast when you want filters that actually work (revenue band, multi-app exclusion, country).
Manual /products.json scraping. Shopify exposes a public /products.json endpoint on every store. Hit https://allbirds.com/products.json?limit=250&page=1 and you get the catalog. The problem: (a) Cloudflare challenges from raw requests calls, (b) you have to source domains from somewhere else first, (c) you still need to parse 30+ different script footprints to detect installed apps, and (d) the moment you hit ~50 stores in parallel from one IP you are throttled.
Apollo / ZoomInfo / Crunchbase filters. None of these have a real "is on Shopify" field. You can sometimes filter by "uses Shopify" but accuracy is bad and there is no app-stack detection. They are firmographic, not technographic.
Google dorking (site:myshopify.com "<keyword>"). Works for tiny lists. Falls apart past 100 results because Google's site: operator is heavily sampled and you can't filter by revenue, country, or installed apps.
The gap: a fresh, technographic, filter-on-the-server lookup that bills only for stores matching your criteria.
Before you pick a tool, get clear on which axis of "category" you mean - the cost difference is 10x.
| Filter axis | Example | Where the data lives |
|---|---|---|
| Product category | "candles", "supplements", "footwear" | Top collections + product titles in /products.json |
| App stack | "uses Recharge", "no Klaviyo", "Yotpo + Gorgias" | Script footprints in homepage HTML |
| Revenue band | "$50K-$500K MRR" | Heuristic from product count x avg price |
| Country / currency | "US-only", "EUR" | /products.json currency + homepage Hreflang |
| Plan tier | "Shopify Plus only" | X-ShopId header + shopify-plus HTML refs |
| Brand signals | "founder-led", "B-corp", "made in USA" | /about and /pages/our-story |
Most sales teams I talk to actually want a combination - "non-Klaviyo Shopify Plus stores in the US doing supplements with 50+ products" - and that combination is exactly what the paid lists can't filter on without an enterprise tier.
The minimum viable pipeline:
+----------------+ +----------------+ +-------------------+
| Domain source | --> | Shopify verify | --> | App-stack detect |
| (Common Crawl, | | /products.json | | (regex script |
| TLD zone) | | + Liquid fb | | footprints) |
+----------------+ +----------------+ +-------------------+
|
v
+-------------------+
| Catalog + revenue |
| estimate |
+-------------------+
You will need: a residential proxy pool (Bright Data, Oxylabs, or Apify Proxy - budget ~$500/mo for serious volume), curl_cffi with Chrome 131 TLS fingerprinting because vanilla requests gets Cloudflare-blocked, and a regex library of ~40 known script footprints (static.klaviyo.com, cdn.smile.io, rechargepayments.com/shopify, etc).
Realistic build cost: 60-100 engineering hours plus $500-$1500/mo recurring proxy + infra. Worth it if Shopify prospecting is your core product. Wasteful if it's a side input to a sales motion.
You upload a domain list (sourced from any of: Crunchbase D2C tags, Common Crawl, BuiltWith free CSV exports, your own past lead list, or a competitor backlink dump from Ahrefs) and get back verified Shopify stores with app stack, revenue estimate, and contact info attached. The actor handles proxies, Cloudflare, and the footprint regex library so you don't.
I wrote one of these: seibs.co/shopify-store-discovery. It costs $0.008 per verified store record, $0.005 if 3+ apps are detected, and $0.005 when contact data is actually extracted - non-Shopify and filtered-out domains are never charged. There are a couple of other Shopify actors in the Apify Store you should compare against - StoreLeads has an API too if you have the budget. The point isn't "use mine," the point is that prebuilt scrapers exist and they cost a few dollars per thousand stores.
from apify_client import ApifyClient
# Token from https://console.apify.com/account/integrations
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("seibs.co/shopify-store-discovery").call(run_input={
"mode": "deep_analyze",
"domains": [
"allbirds.com", "rothys.com", "warbyparker.com",
"brooklinen.com", "casper.com", "harrys.com",
],
"excluded_apps": ["klaviyo"], # find non-Klaviyo stores
"required_apps": ["recharge"], # that DO use Recharge
"min_product_count": 50,
"min_estimated_monthly_revenue_usd": 50000,
"extract_contact": True,
"concurrency": 4,
"use_apify_proxy": True,
"apify_proxy_groups": ["RESIDENTIAL"],
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if not item.get("is_shopify"):
continue
apps = ", ".join(a["app_id"] for a in item.get("apps_installed", []))
rev = item.get("estimated_monthly_revenue_usd", {}).get("midpoint")
print(f"{item['domain']:30s} | ${rev or 0:>8,} MRR | apps: {apps}")
print(f" contact: {item.get('best_contact_email')}")
Run output looks roughly like:
luminouswick.co | $250,000 MRR | apps: recharge, smile_io, judgeme
contact: hello@luminouswick.co
If you need a one-off TAM count (e.g. "how many US Shopify Plus stores in beauty over $1M MRR for a board deck"), the static lists are still fine. Don't use them for ongoing outbound - the install dates are too stale to catch the moment-of-need window.
The Apify actor (and any scraper) needs a seed list of domains to verify. Cheap sources:
.myshopify.com redirects from the WAT files.r/skincareaddiction, r/coffee, r/menswear link to D2C brands constantly. Use seibs.co/reddit-topic-watcher to mine these.#shopifystore, #smallbusiness, niche tags like #supplementsforwomen. Most carry a bio link to the store.For Shopify specifically you don't need a clean list - the verifier will reject non-Shopify domains for free.
| Use case | Sample filter |
|---|---|
| Klaviyo conquest list | excluded_apps=["klaviyo"], apps_count >= 4, min_revenue=50000 |
| Recharge churn-risk outbound | required_apps=["recharge"], days_since_last_publish > 90 |
| Headless-agency target list | shopify_plan_hint.plan="shopify-plus", theme_name in ["Dawn","Impulse","Empire"] |
| Gorgias TAM by country | country="US", excluded_apps=["gorgias","zendesk","reamaze"], apps_count >= 5 |
| Reviews-app competitive switch | required_apps=["yotpo"], lookback for renewal-window outreach |
| Subscription-launch alert | has_subscriptions=true, run weekly, dedupe by domain |
| Founder-led brand fund pipeline | founder_names != null, min_revenue=10000, niche keyword filter |
| Influencer affiliate fit list | category keyword + social_links.tiktok != null |
The output ships with best_contact_email (ranked founder > sales > info), social links, theme version, and a revenue_estimate_confidence so you can weight your list.
A few things any Shopify-discovery pipeline cannot do well, including mine:
App detection is footprint-based, not authoritative. The actor looks for static.klaviyo.com in homepage HTML. If a brand uses Klaviyo for transactional email only and serves no widget on the homepage, the footprint is invisible and you'll get a false negative. Conversely if a developer left a Klaviyo <script> tag in but disabled the integration in the Klaviyo dashboard, you'll get a false positive. Treat the apps list as ~85-90% precision, not 100%.
Revenue estimates are heuristic, expect +/-50% error. It's product_count x avg_price x estimated_orders/day band with Shopify Plus, subscription, and B2B refinements. Use it for relative ranking ("over $100K MRR vs under $10K") not absolute MRR. There is a revenue_estimate_confidence: low|medium|high field on each record - filter to high if you're doing financial modeling.
Cloudflare blocks happen. Even with residential proxies and Chrome-131 TLS impersonation, 2-5% of stores will return Cloudflare challenges on a given run. Re-run failed domains 6-12 hours later and most resolve.
/products.json is not always on. Shopify Plus merchants can disable the endpoint. About 3-7% of Plus stores have it locked down. The actor falls back to Liquid HTML parsing but the catalog data is sparser.
There is no real-time install feed. No scraper can tell you "this brand installed Klaviyo at 3:14 PM today." You can get close with daily scheduled runs and a dedupe-on-apps diff, but the latency floor is 24 hours.
Free tier limits. Apify's free plan gives you ~$5/mo of platform credit. That's ~600 verified stores. Hobbyist work fits; production prospecting needs the $39/mo Starter or higher.
Q: How do I find Shopify stores in a specific country?
A: Pass a domain list and filter on the output primary_country field, or seed your input domain list from a country-specific source (.co.uk zone file for UK, BuiltWith country exports). Shopify exposes currency reliably; country is inferred from currency + Hreflang + shipping policy parsing.
Q: Can I find Shopify stores by SIC code or NAICS code?
A: Not directly - Shopify storefronts don't expose industry codes. Use category keyword matching on top collections and product titles as a proxy (top_categories field in the output).
Q: How accurate is the Shopify-vs-not-Shopify detection?
A: Very high - the /products.json endpoint plus Liquid template HTML fingerprint plus X-ShopId header check gives ~99% precision. False positives are limited to a small number of headless Hydrogen stores that happen to mirror the Liquid template.
Q: What's the cheapest way to find Shopify stores by category for a one-time market sizing report?
A: Use BuiltWith's free CSV exports for the relevant category (it's freemium, you get a sample), or scrape site:myshopify.com "<category keyword>" from Google for a rough TAM count. For anything beyond a one-time sample, a programmatic scraper pays back in hours.
Q: Will scraping Shopify get me sued?
A: Public-web scraping of /products.json, homepage HTML, and /contact pages is generally legal under hiQ Labs v. LinkedIn and similar US case law. Read Shopify's terms for your specific use case. Don't scrape behind logins, don't violate robots.txt for indexed content, don't store PII you don't have a basis to process.
Q: Can I get historical Shopify install data? A: Not from a live scraper - it can only tell you the current state. StoreLeads and BuiltWith have historical install timelines but charge for it. If you need install-date intelligence, run your own scraper on a weekly cron and build your own history.
Q: How do I find Shopify Plus stores specifically?
A: Filter the output on shopify_plan_hint.plan = "shopify-plus". The signal comes from X-ShopId header patterns, shopify-plus script references, and known Plus-only feature footprints (Launchpad, Script Editor, B2B). Coverage is ~60-70% of actual Plus merchants - the rest are silent.
Q: How fresh is the data compared to StoreLeads? A: Live at crawl time. StoreLeads refreshes app-install data on a 30-90 day cycle depending on tier; a fresh scraper run is current within the last few minutes.
Q: Does this work for BigCommerce or WooCommerce too?
A: This specific actor is Shopify-only. The same approach (platform-detect via known footprints, then enrich) works for BigCommerce (bigcommerce.com/stencil scripts) and WooCommerce (/wp-json/wc/v3 endpoints) but those are separate builds.
Run shopify-store-discovery on Apify - the free plan covers ~600 stores per month, no credit card required. Drop in 5-10 seed domains and you'll see the output schema in about a minute.
While you're there, two related actors that pair with this one:
b2b-sales-triggers - score intent (funding rounds, exec hires, press mentions) on top of the store list. Run Shopify discovery first, pipe the domain list in, and your SDR gets a ranked outbound queue.reddit-topic-watcher - mine niche subreddits for D2C brand mentions to seed your Shopify domain input list.I'm a solo MSP operator who builds B2B web-scraping actors at apify.com/seibs.co when I'm not running incident calls. The portfolio has 30+ live actors covering lead generation, intent data, SEC/USPTO/court records, and AI agent wrappers - all pay-per-event so you only pay for what's emitted. Find me at seibs.co.
Answer 3 questions and we surface the 2-3 best matches in the portfolio. No email gate, no signup.