A Practitioners Guide to Retrieval Augmented…

Feb 5, 2024

How basic techniques can be used to build powerful applications with LLMs...

4 Comments

Feb 5, 2024

Such a great write-up. The more accessible RAG is, the more widespread the adoption of LLMs becomes. For example, chunking and preprocessing, which remain manual steps that many aren't experienced with, can be semi-automated based on the task at hand. A superior chunking method can lead to a double-digit increase in performance.

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Feb 5, 2024

Totally agree! We are already seeing some really great analysis on how to do things like chunking and data cleaning way better (see the link to the databricks RAG article in further resources for example). Hopefully this trend continues

Expand full comment

Phil

May 31, 2024

Thank you so much for this deep dive, very informative! I'm a UX researcher that's new to conversational AI design (no ML practitioner, so please bare with me!) and for our LLM-powered voice assistant and chatbot, RAG has obviously become immensely important. I was wondering, when it comes to preprocessing data and curating a knowledge base for RAG, do the data for the external knowledge base need to be formatted/"cleaned" in a specific way? For example, is it sufficient to simply use links to a website and embedded PDFs/documents are then automatically included in the knowledge base, or would one need to extract and separately add the contents of the PDFs/documents to the database?

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

May 31, 2024

You need to extract the raw textual data and add it into the database!

Expand full comment

Deep (Learning) Focus

A Practitioners Guide to Retrieval Augmented…