Build vs Buy: LLM Adoption for Web Scraping in Finance
Over the past year, we've spoken with 100+ data leaders at top investment firms (hedge funds, asset managers, private equity, and investment banks) about their web scraping operations and how they’re navigating LLM adoption.
Here is what we’ve learned and our thoughts on the build vs. buy decision.
Why Use LLMs for Web Scraping?
AI’s long promise of solving major web scraping issues is now coming to fruition with the current evolution of LLMs.
Problems solved | Business Outcome |
---|---|
Automated web scraping code generation and maintenance | • Cut scraper build time from days to minutes • Limit data loss with self-healing scrapers • Reduce number of engineers working full-time on scraper maintenance |
Agentic web navigation | • Source granular data from thousands of company websites • Scale extraction from data hidden behind complex browser interactions |
Unstructured data extraction (text blocks, PDFs, images, etc.) | • Unlock analysis of 10M+ unstructured documents • 95%+ accuracy in PDF data extraction |
Advanced data cleaning, mapping, and transformation | • 80%+ reduction in manual data cleaning time • Standardized outputs across hundreds of sources |
We know this first-hand based on what we’ve shipped to enterprise customers.
So, how are investment firms exploring this new unlock?
Current LLM Implementations
Every top investment firm we spoke with has in-house web scraping teams and purchases web-scraped data. Many are experimenting with LLMs either for web scraping or elsewhere. Finance is the hungriest and ready to invest in new technology to get an edge; LLMs are no exception.
Examples of how firms are trialing LLMs (excluding Kadoa):
Company | Implementation | Business Outcome |
---|---|---|
Bank | High-volume, zero-context, high-accuracy PDF extraction | 95%+ accuracy or they lose money |
Asset Manager | In-house GPTs (on-prem, trained on internal and external data) | Real-time access to company and market intelligence |
Prop Firm | Extract data from unstructured reports and filings | Unlock deeper insights from public documents |
Hedge Fund | In-house LLM-powered web scraping tool | Reduce # of engineers exclusively working on web scraping |
Examples of how firms are using Kadoa:
Company | Implementation | Business Outcome |
---|---|---|
Hedge Fund | Automate building and maintaining traditional web scraping | Focus web scraping engineers on complex/critical scraping projects |
Asset Manager | Empower analysts to build web data feeds independently | Enable analysts to bypass data teams to source custom web data, cutting data acquisition from days to minutes |
Market Maker | Empower analysts to monitor strategic web pages in real-time | Enable analysts to act immediately to market moving updates |
Trading Firm | Automate browser interactions and extract from unstructured reports (i.e., gov, commodity) | Deeper, broader insight into public documents |
Hedge Fund | Aggregate hundreds of web sources into unified data structures | Save on expensive data provider costs and customize results |
Build vs. Buy
Investment firms are obsessed with building things in-house to hide their secrets, comply with their privacy policies, and avoid any sort of insight commoditization. But because LLM innovation is moving so quickly, investment firms need to think strategically about what to build vs. buy.
Building in-house makes sense for firms with:
- Ready access to AI talent
- Highly custom requirements that a vendor cannot meet
- A long-term vision that doing this will give you an edge
Buying from vendors is appealing when firms want to:
- Rapidly adopt the latest technology
- Address more generalized needs
- Avoid reinventing the wheel without clear long-term benefits
Our Recommendation
Large investment firms have the resources to build anything they want. At the same time, the pace of LLMs is so fast that building everything in-house might leave them in the dust. A hybrid approach feels the most advantageous at this point, which looks like:
- Find vendors that save time by unlocking bottlenecks in your web scraping operations, for example:
- Tools for analysts
- Automating manual operations
- Better data quality
- Work closely with emerging vendors, guiding their roadmap to fit your specific needs
- Leverage LLMs for highly custom projects
- Gradually build in-house expertise
Whatever you choose to do first, it’s best to start now and stay on top of this technological wave.
Looking to dive deeper? Let's discuss your firm's web scraping strategy and LLM opportunities. Contact us here.