The “1 → 3 → 10 → 30 → 100” Framework & “Rule of 30” When Testing
I mentioned before that I recommend testing lead gen and enrichment on a small number of leads before rolling it out for everyone.
Below are two mental models I’d recommend for this, both of which have super-duper-creative names…
The “1 → 3 → 10 → 30 → 100” Framework
This mirrors the general process I go through with my enrichments:
- Test it on 1 to make sure it ran correctly
- Test it on 3 to make sure it’s dynamically loading data in correctly
- Test on 10 and 30 to try to catch edge cases
- Run on 100 and review them to look for bugs
- If all good, roll it out for everyone
The “Rule of 30”
When you’re experimenting with something like ICP filtering, personalization lines, etc., I think 30 is usually a pretty good number of leads to manually review to get a general sense of the accuracy of the process.
For example, let’s say you set up the following Data Module enrichment sequence…
- Build a list of raw agency leads with some other tool & import them into LeadTables
- Use the “Webpage Markdown Scraper” Data Module to scrape homepages into markdown
- Use the “AI Prompt” Data Module to extract a testimonial from the scraped homepage markdown
- Use another “AI Prompt” Data Module to write a personalization line using the extracted testimonial
In that sequence above, your biggest “AI prompt vulnerabilities / risks” would probably be…
- The websites not having testimonials very often
- The AI hallucinating about testimonials, resulting in you quoting a testimonial that doesn’t actually exist
This is where the real work of prompt crafting comes into play.
Typically, if you can get your prompt producing output reliably and accurately for all 30 leads, that’s a pretty good sign it should be at least somewhat safe to scale.
The “rule of 30” dictates that you’d want to literally pull up the websites of these 30 leads and manually verify that the extracted testimonial can be found on their homepage, and matches word-for-word what the AI prompt extracted, etc.
(If you want to make your life easier, you could also copy and paste it into a notes column in your LeadTable or something for quick comparison)
The more nuanced and risky a prompt is, the more leads you should test it on. (And you’ll also do well to use a smarter + more expensive AI model)
Pro Tip: Use Tags To Create Special Sets Of “Rule of 30 Tester Leads”
As you do your manual reviews, you’ll start to come across edge cases that push your prompt to the limits and/or mess up your output entirely.
Maybe there’s one website without testimonials, one with only case studies, one that fails to load, etc.
I like to tag these leads with something like “XYZ Prompt Tester” and then filter my table down to just those tester leads when developing and testing my AI prompt.
That way, we have as diverse of an array of test cases as possible — the rationale being “if it can work flawlessly on all these, it’ll [hopefully] work flawlessly on everyone else too.”
☝️ NB: Remember that AI is NOT deterministic!
If you re-run the same formula execution multiple times for the same leads, you’ll likely notice that the results will vary slightly.
This is due to an AI term called “temperature,” which is essentially how “creative” the LLM is being.
Though it may not initially feel like it, “temperature” is actually a good thing for this “rule of 30” approach to testing.
Because even though you could air-quotes “fix the randomness problem” for these 30 specific leads by changing the AI’s temperature to 0, that would mask a true root cause problem, which is a brittle prompt that won’t actually reliably scale.
Because of this, if you want to be extra-sure your prompt is bulletproof, it can be smart to re-run your formula on your tester leads a few times in full to ensure it consistently outputs a net result you’re happy with.
If it fails to produce an accurate result half the time, for example, that’s a red flag because it implies there will often be inaccuracies as you scale your prompt past your tester leads, and is a sign your prompt needs further work before it’s bulletproof.