Blog post illustration

Build vs Buy: LLM Adoption for Web Scraping in Finance

Tavis Lochhead,Co-Founder of Kadoa
28 October 2024
Back to blog

Over the past year, we've spoken with 100+ data leaders at top investment firms (hedge funds, asset managers, private equity, and investment banks) about their web scraping operations and how they’re navigating LLM adoption.

Here is what we’ve learned and our thoughts on the build vs. buy decision.

Why Use LLMs for Web Scraping?

AI’s long promise of solving major web scraping issues is now coming to fruition with the current evolution of LLMs.

Problems solvedBusiness Outcome
Automated web scraping code generation and maintenance• Cut scraper build time from days to minutes
• Limit data loss with self-healing scrapers
• Reduce number of engineers working full-time on scraper maintenance
Agentic web navigation• Source granular data from thousands of company websites
• Scale extraction from data hidden behind complex browser interactions
Unstructured data extraction (text blocks, PDFs, images, etc.)• Unlock analysis of 10M+ unstructured documents
• 95%+ accuracy in PDF data extraction
Advanced data cleaning, mapping, and transformation• 80%+ reduction in manual data cleaning time
• Standardized outputs across hundreds of sources

We know this first-hand based on what we’ve shipped to enterprise customers.

So, how are investment firms exploring this new unlock?

Current LLM Implementations

Every top investment firm we spoke with has in-house web scraping teams and purchases web-scraped data. Many are experimenting with LLMs either for web scraping or elsewhere. Finance is the hungriest and ready to invest in new technology to get an edge; LLMs are no exception.

Examples of how firms are trialing LLMs (excluding Kadoa):

CompanyImplementationBusiness Outcome
BankHigh-volume, zero-context, high-accuracy PDF extraction95%+ accuracy or they lose money
Asset ManagerIn-house GPTs (on-prem, trained on internal and external data)Real-time access to company and market intelligence
Prop FirmExtract data from unstructured reports and filingsUnlock deeper insights from public documents
Hedge FundIn-house LLM-powered web scraping toolReduce # of engineers exclusively working on web scraping

Examples of how firms are using Kadoa:

CompanyImplementationBusiness Outcome
Hedge FundAutomate building and maintaining traditional web scrapingFocus web scraping engineers on complex/critical scraping projects
Asset ManagerEmpower analysts to build web data feeds independentlyEnable analysts to bypass data teams to source custom web data, cutting data acquisition from days to minutes
Market MakerEmpower analysts to monitor strategic web pages in real-timeEnable analysts to act immediately to market moving updates
Trading FirmAutomate browser interactions and extract from unstructured reports (i.e., gov, commodity)Deeper, broader insight into public documents
Hedge FundAggregate hundreds of web sources into unified data structuresSave on expensive data provider costs and customize results

Build vs. Buy

Investment firms are obsessed with building things in-house to hide their secrets, comply with their privacy policies, and avoid any sort of insight commoditization. But because LLM innovation is moving so quickly, investment firms need to think strategically about what to build vs. buy.

Building in-house makes sense for firms with:

  • Ready access to AI talent
  • Highly custom requirements that a vendor cannot meet
  • A long-term vision that doing this will give you an edge

Buying from vendors is appealing when firms want to:

  • Rapidly adopt the latest technology
  • Address more generalized needs
  • Avoid reinventing the wheel without clear long-term benefits

Our Recommendation

Large investment firms have the resources to build anything they want. At the same time, the pace of LLMs is so fast that building everything in-house might leave them in the dust. A hybrid approach feels the most advantageous at this point, which looks like:

  • Find vendors that save time by unlocking bottlenecks in your web scraping operations, for example:
    • Tools for analysts
    • Automating manual operations
    • Better data quality
  • Work closely with emerging vendors, guiding their roadmap to fit your specific needs
  • Leverage LLMs for highly custom projects
  • Gradually build in-house expertise

Whatever you choose to do first, it’s best to start now and stay on top of this technological wave.

Looking to dive deeper? Let's discuss your firm's web scraping strategy and LLM opportunities. Contact us here.