AI in production

Building Agents, Tackling RAG, and Finding Early Adopters

Jake Jones and Richard Mabey unpack the challenges of building autonomous legal agents, the limitations of current technologies, and the mindset shifts driving innovation in this masterclass on product thinking.

Lilian Breidenbach

11 Dec 2024 • 7 min read

Photo by Thomas Le / Unsplashli

In this second part of the series, Jake Jones (Co-Founder at Flank) and Juro's Richard Mabey continue their deep dive into the technologies and challenges of automating legal work: They discuss legal-specific vs. open-source LLMs, whether RAG works, the critical importance of early adopters, product orthodoxies vs. realities, and the distinguishing features of AI agents.

Part Two: Legal-Specific LLMs – RAG – Early Adopters – Product Thinking – AI Agents

Legal-Specific vs. Open-Source LLMs

Richard Mabey: Let’s get a bit deeper into the tech. There’s a lot of writing about legal-specific LLMs and how they’re better, faster, and more accurate. We’ve never gone done that route; we’re using APIs of open-source LLMs built into our stack in different ways. I’m curious how you’ve approached that subject. Do you think there’s a competitive advantage there?

Jake Jones: For today, there’s no advantage in using a domain-specific LLM over something like GPT-4. Let’s take a task that a lawyer wants to outsource to an agent; how much of that is actually legal-specific work? How much knowledge of the law and legal logic does the lawyer leverage when they’re completing the task? That’s an open question. I would argue, sometimes very little to none, and most of these tasks have multiple steps. So even if part of it requires legal logic, it’s only a part of it, in which case you’d want a general LLM, which is better at understanding what it’s looking at, rather than the specific legal context.

The other thing is that the foundation models are evolving very, very quickly. From what I’ve seen in the market, as soon as there’s a domain-specific LLM, it gets outstripped in performance weeks later by a foundation model. But it’s very expensive to train a specific LLM. You need lots of data, you need data scientists, and you need to curate that data.

Maybe that step will become more commoditized, so training becomes easier. Or maybe someone like Thomson Reuters, who I know are working on their own domain-specific LLMs, will be able to do something with the massive amount of data they have, something that will make it useful for companies like Flank and Juro, who are building applications, to leverage those legal-specific LLMs. But for today, there’s no advantage in using a domain-specific LLM.

Richard Mabey: What are the other failure points you see in the market? Where do you see the common pitfalls from a product perspective?

RAG Does Not Work

Jake Jones: One of the biggest pitfalls is this arguably exciting idea that, by leveraging LLMs, we can build robust software and tools within days; that is not accurate. You can get 80% of the way there in a couple of days, but then it requires the same product work we saw in the previous paradigm, where you need to spend a lot of time with your customer, tweaking it, tinkering with it. This long tail work is the difference between an immature product without product/market fit, and a mature one you can deploy at scale.

The second major pitfall is thinking that Retrieval-Augmented Generation works; it doesn’t. It’s a bit of a party trick, and the problem is not the limitation of the LLMs. The LLMs get smarter and smarter, and all of the downstream steps in a RAG orchestration get better and better, but vector databases and vector retrieval are really poor. Even if you augment this with knowledge graphs, even if you do a lot of other clever stuff, you’re going to get a system that is highly probabilistic and inaccurate, so you have to find another way.

The Critical Importance Of Early Adopters

Richard Mabey: It comes back to focus, right? In many respects, the product paradigm has not really changed; if you build a highly focused solution, you have a much higher chance of building a comprehensive one.

You’ve obviously worked with a lot of lawyers, and I’m always curious about other people’s perceptions of their psyche–painfully detail-oriented, world-class at picking holes in things, but often their mindset is quite comprehensive. Juro, which is an intelligent contract automation platform, gets thousands of feature requests for every possible thing you can think of. And we’re saying: thank you, but we’ve got to focus on what we’re good at. But lawyers have been great at picking out this stuff.

I’m curious about how you found those first customers for the second coming of your company. How did you find these innovators, and what enabled them to take on the risk of an early-stage company doing something amazing?

Jake Jones: That’s a really great question. I think every company is so reliant upon early customers to take a gamble on you and bet on new technology. We were very lucky that TravelPerk approached us in late 2022; they’d seen some of our LinkedIn posts about the problem space we’re solving and wanted to dig deeper. They’re a very innovative legal team, always looking to leverage the latest technology, the latest methodologies, and the latest way of thinking about legal ops to empower commercial teams, so they were pushing us to solve this problem for them. We went really deep with them once ChatGPT was released and we pivoted into AI.

What was key for us was that our first four or five customers were sold on the problem we wanted to solve, the bottleneck between commercial and legal teams, being their biggest problem. The product work becomes a lot more straightforward after that, because you’re aligned on it: This is the problem we’re solving. It’s very constructive from there, because you have a roadmap: driven by trial and error, solving this problem collaboratively.

Product Orthodoxies vs. Realities

Richard Mabey: A lot of the product orthodoxy would say you shouldn’t build for a handful of customers; you should triangulate the feedback. But in those early days you learn so much more from a handful of individuals and going super deeply into what they’re doing. It was the same for us. In the early days, customers like Deliveroo were super forward-looking and had very specific problems they wanted to solve. They take a leap and, if it works, get rewarded for it.

What I observed in those early adopters is that they don’t spend a lot of time thinking about the efficiency of the legal function, not trying to do everything 10 or 20% faster; they’re thinking about the end user of their service, with what NPS score the sales team will reflect their service delivery. I’ve found that it’s fairly rare, but when you find those gems, the innovation is just so exciting.

Jake Jones: I’ve absolutely had the same experience. The legal teams most willing to adopt software early are always looking to improve the experience of their internal customer, which is terminology I’d never heard before speaking to Andy Cooke from TravelPerk. The first few times it threw me off, because I thought they’re not customers, but they are; they’re the consumers of a service you’re providing as an in-house legal team. That frame changes the in-house legal team from being the provider of a service to being the provider of a product, and that’s where both you and I come in as product partners. The most enjoyable part of developing our product was finding lawyers with a product mindset. You asked earlier about what it’s like working with lawyers, and maybe I don’t really know, because maybe the people adopting AI agents aren’t typical lawyers.

Richard Mabey: It’s funny, we obviously found a lot of these early innovators in fast-scaling tech companies; but as we’ve scaled ourselves, we’ve found more and more of these gems in super traditional businesses, making things like tires, fire retardant, or clothing. They’re also really frustrated and curious about how to solve their problems. I think legal people can learn a lot from product people. It’s the same in other functions, but it’s just starting in legal, which is really exciting.

Please Tell Me What An Agent Is

0:00

/1:07

Richard Mabey: Let’s talk about agents. I recently had lunch with 30 customers and asked for a show of hands of who knows what an AI agent is; it was three out of 30, and our customers are very forward-looking. What is an AI agent and why should people care?

Jake Jones: The distinction between the AI that people use every day–ChatGPT or the spellbooks of the world, which are still AI-first solutions but are used by experts to be more efficient–the distinction between that and an agent is autonomy. There’s many ways of defining an agent, but for now let’s define it through that one dimension of autonomy. A fully autonomous AI tool, in this case a legal agent, can receive a request from a non-expert and complete it from end to end, without any interference from the legal team. Legal agents can do what previously a paralegal had to do.

In the future, we’ll begin to see AI agents with greater autonomy and a wider range of capabilities. That’s the second dimension that’s probably important to include; it can’t just do one very simple, pragmatic thing, but it can do a range of different tasks. It becomes increasingly human-like in its ability and autonomy.

In Part Three, Jake and Richard confront what the automation of legal work means for us as humans: they discuss the changing role of senior lawyers, protecting the human quality of legal services, the challenges of scaling into large enterprises, and how Jake would lead an in-house legal team in 2025.

Find out more about Flank's AI Agents here.

🎧 Listen to the full conversation on Episode 8 of Juro's Brief Encounters here.