Most TAM lists are bad because companies don't categorise themselves the way you expect them to. Filter by "SaaS" and you miss half your buyers. Filter by "100-500 employees" and you include the wrong tier. Lookalike search fixes this by sourcing companies that look like your best customers - regardless of how they self-tag.
Why firmographic filters fail
Industry tags are noisy. Employee bands are coarse. SIC and NAICS codes are decades behind reality. The result: most lists built from firmographic filters miss 30-50% of real ICP and include 30-50% noise.
Your best customers all share patterns - in their job postings, their tech stack, their content, their language. Lookalike search uses those patterns to find similar companies the database wouldn't surface any other way.
How we run it
1. Define the seed list
Start with your top 20-50 customers. Not just your biggest - your best: highest LTV, fastest cycles, lowest churn, highest NPS. The seed defines the lookalike model.
2. Run lookalike sourcing
- Ocean.io - lookalike search across millions of company websites
- DiscoLike - natural-language ICP search across 60M+ websites
- AI Ark - AI-native B2B data with deep keyword and stack search
- Sumble / TheirStack - technographic and hiring-signal lookalikes
3. Layer in technographic data
If your best customers all run a specific tool (Salesforce, HubSpot, a particular CDP), filter your lookalike list to companies running the same stack. Massively increases match quality.
4. AI-driven qualification
Run the lookalike list through an AI fit-check in Clay. The agent reads each company's website, marketing copy, and recent news, and scores them against your ICP definition. Disqualified companies drop out before they hit any sequencer.
5. Combine signals
Lookalikes get even sharper when combined with timing signals: lookalikes that recently raised, lookalikes that just hired in the relevant department, lookalikes whose CEO is engaging with your content.
Tools we use
Ocean.io, DiscoLike, AI Ark for lookalike sourcing. Sumble and TheirStack for technographic overlays. Clay for AI qualification and orchestration. Findymail / BetterContact for verified emails.
What to watch for
- Quality of the seed list determines quality of the lookalikes. Clean inputs = clean outputs.
- Lookalikes still need ICP filtering on the back end - the model approximates, it doesn't verify.
- Combine with another signal source for the highest conversion - lookalike alone is good, lookalike + funding or hiring is exceptional.