Hey Cameron, love the depth of the article! I have a question for you regarding the retrieval and fine-tuning article: what are your thoughts on OpenAI releasing the ability for users to build custom GPTs? Does that do away with fine-tuning? Have you seen or tested their effectiveness? Thank you!
How do you know there is not BEIR contamination and generally data contamination in the synthetic data generated by GPT-4?
The fine-tuned LLM (in your last/second to last paper) that used a mixture of synthetic data and other data that ended up beating some BEIR benchmark was surprising, and I’m wondering if that is a fair benchmark.
Hey Cameron, love the depth of the article! I have a question for you regarding the retrieval and fine-tuning article: what are your thoughts on OpenAI releasing the ability for users to build custom GPTs? Does that do away with fine-tuning? Have you seen or tested their effectiveness? Thank you!
How do you know there is not BEIR contamination and generally data contamination in the synthetic data generated by GPT-4?
The fine-tuned LLM (in your last/second to last paper) that used a mixture of synthetic data and other data that ended up beating some BEIR benchmark was surprising, and I’m wondering if that is a fair benchmark.
> The first step in applying DocLLM is to pass a document through an optical character recognition (ORC) system.
Do you have any suggestion for reliable open source OCR system?
btw there is a typo in your article
Another notable article to keep me abreast of what's going on in the field. Thanks a lot Cameron!