Over the past year, we've spoken with 100+ data leaders at top investment firms (hedge funds, asset managers, private equity, and investment banks) about their web scraping operations and how they’re navigating LLM adoption.
Here is what we’ve learned and our thoughts on the build vs. buy decision.
AI’s long promise of solving major web scraping issues is now coming to fruition with the current evolution of LLMs.
Problems solved Automated web scraping code generation and maintenance | Business Outcome • Cut scraper build time from days to minutes
• Limit data loss with self-healing scrapers
• Reduce number of engineers working full-time on scraper maintenance |
Problems solved Agentic web navigation | Business Outcome • Source granular data from thousands of company websites
• Scale extraction from data hidden behind complex browser interactions |
Problems solved Unstructured data extraction (text blocks, PDFs, images, etc.) | Business Outcome • Unlock analysis of 10M+ unstructured documents
• 95%+ accuracy in PDF data extraction |
Problems solved Advanced data cleaning, mapping, and transformation | Business Outcome • 80%+ reduction in manual data cleaning time
• Standardized outputs across hundreds of sources |
We know this first-hand based on what we’ve shipped to enterprise customers.
So, how are investment firms exploring this new unlock?
Every top investment firm we spoke with has in-house web scraping teams and purchases web-scraped data. Many are experimenting with LLMs either for web scraping or elsewhere. Finance is the hungriest and ready to invest in new technology to get an edge; LLMs are no exception.
Examples of how firms are trialing LLMs (excluding Kadoa):
Company Bank | Implementation High-volume, zero-context, high-accuracy PDF extraction | Business Outcome 95%+ accuracy or they lose money |
Company Asset Manager | Implementation In-house GPTs (on-prem, trained on internal and external data) | Business Outcome Real-time access to company and market intelligence |
Company Prop Firm | Implementation Extract data from unstructured reports and filings | Business Outcome Unlock deeper insights from public documents |
Company Hedge Fund | Implementation In-house LLM-powered web scraping tool | Business Outcome Reduce # of engineers exclusively working on web scraping |
Examples of how firms are using Kadoa:
Company Hedge Fund | Implementation Automate building and maintaining traditional web scraping | Business Outcome Focus web scraping engineers on complex/critical scraping projects |
Company Asset Manager | Implementation Empower analysts to build web data feeds independently | Business Outcome Enable analysts to bypass data teams to source custom web data, cutting data acquisition from days to minutes |
Company Market Maker | Implementation Empower analysts to monitor strategic web pages in real-time | Business Outcome Enable analysts to act immediately to market moving updates |
Company Trading Firm | Implementation Automate browser interactions and extract from unstructured reports (i.e., gov, commodity) | Business Outcome Deeper, broader insight into public documents |
Company Hedge Fund | Implementation Aggregate hundreds of web sources into unified data structures | Business Outcome Save on expensive data provider costs and customize results |
Investment firms are obsessed with building things in-house to hide their secrets, comply with their privacy policies, and avoid any sort of insight commoditization. But because LLM innovation is moving so quickly, investment firms need to think strategically about what to build vs. buy.
Building in-house makes sense for firms with:
Buying from vendors is appealing when firms want to:
Large investment firms have the resources to build anything they want. At the same time, the pace of LLMs is so fast that building everything in-house might leave them in the dust. A hybrid approach feels the most advantageous at this point, which looks like:
Whatever you choose to do first, it’s best to start now and stay on top of this technological wave.
Looking to dive deeper? Let's discuss your firm's web scraping strategy and LLM opportunities. Contact us here.