I really enjoyed this edition and look forward to the next part of the series, Cameron. I've been meaning to learn more about this, so your newsletters timing could not be better :)
Basically, hallucination is usually not something that you solve via prompting. Instead, we solve it during the instruction tuning (or RLHF) phase of training the LLM. This is the refinement period that comes after pre-training. We can tell humans to provide negative feedback when the model hallucinates so that the model learns to not do this. Additionally, we can use techniques like information retrieval that will pull data/resources from something like a database or the internet so that the model can use this as context and base the facts it provides on these resources.
In general, hallucination is a large problems for LLMs that is still actively being worked on. We should be aware of this and not assume correctness when we are using something like ChatGPT!
Entirely off topic, Cameron, please excuse me. If you have covered this elsewhere, you'll send me the link: thank you. I was surprised when ChatGPT fabricated both references that were written out (authors, title, journal, pages, year), as well as links to journal articles (such as to a Nature Communications article that did not link--because it did not exist). My nephew explained that this is referred to perhaps by folks who make or study LLMs as "hallucinating." Can you comment about this? Are future versions going to eliminate this? It was frankly amazing to me how quickly it could fabricate references (in my particular case, about studies on the triple point of water). It repeated the same pattern after I would respond by saying that those two provided references were not real references--by apologizing, and then generating another 2 false references, politely explaining that these 2 were correct. After about 10 cycles of this, and thus 20 false references, I realized that it could do this literally ad infinitum. Welcoming feedback on this, thank you.
I really enjoyed this edition and look forward to the next part of the series, Cameron. I've been meaning to learn more about this, so your newsletters timing could not be better :)
Awesome! I'm glad that it's helpful.
And...there was no way that I could see, on how to prompt (now I do come close to your topic here) so that it would not fabricate references...
I linked to some info about this from a prior post, but it's probably not clear. Here's the link: https://cameronrwolfe.substack.com/i/93578656/training-language-models-to-follow-instructions-with-human-feedback
Basically, hallucination is usually not something that you solve via prompting. Instead, we solve it during the instruction tuning (or RLHF) phase of training the LLM. This is the refinement period that comes after pre-training. We can tell humans to provide negative feedback when the model hallucinates so that the model learns to not do this. Additionally, we can use techniques like information retrieval that will pull data/resources from something like a database or the internet so that the model can use this as context and base the facts it provides on these resources.
In general, hallucination is a large problems for LLMs that is still actively being worked on. We should be aware of this and not assume correctness when we are using something like ChatGPT!
Entirely off topic, Cameron, please excuse me. If you have covered this elsewhere, you'll send me the link: thank you. I was surprised when ChatGPT fabricated both references that were written out (authors, title, journal, pages, year), as well as links to journal articles (such as to a Nature Communications article that did not link--because it did not exist). My nephew explained that this is referred to perhaps by folks who make or study LLMs as "hallucinating." Can you comment about this? Are future versions going to eliminate this? It was frankly amazing to me how quickly it could fabricate references (in my particular case, about studies on the triple point of water). It repeated the same pattern after I would respond by saying that those two provided references were not real references--by apologizing, and then generating another 2 false references, politely explaining that these 2 were correct. After about 10 cycles of this, and thus 20 false references, I realized that it could do this literally ad infinitum. Welcoming feedback on this, thank you.