9 Comments
Feb 1Liked by Cameron R. Wolfe, Ph.D.

Hey Cameron, love the depth of the article! I have a question for you regarding the retrieval and fine-tuning article: what are your thoughts on OpenAI releasing the ability for users to build custom GPTs? Does that do away with fine-tuning? Have you seen or tested their effectiveness? Thank you!

Expand full comment
Feb 1Liked by Cameron R. Wolfe, Ph.D.

How do you know there is not BEIR contamination and generally data contamination in the synthetic data generated by GPT-4?

The fine-tuned LLM (in your last/second to last paper) that used a mixture of synthetic data and other data that ended up beating some BEIR benchmark was surprising, and I’m wondering if that is a fair benchmark.

Expand full comment
Jan 22·edited Jan 22Liked by Cameron R. Wolfe, Ph.D.

> The first step in applying DocLLM is to pass a document through an optical character recognition (ORC) system.

Do you have any suggestion for reliable open source OCR system?

btw there is a typo in your article

Expand full comment
Jan 22Liked by Cameron R. Wolfe, Ph.D.

Another notable article to keep me abreast of what's going on in the field. Thanks a lot Cameron!

Expand full comment