AI alignment
人工智能对齐
- 拓展学习:小词详解 | align
How to train your large language model
如何训练大语言模型
A new technique is speeding up the process
一种新方法正在加快这一过程
IT IS NO secret that building a large language model (LLM) requires vast amounts of data. In conventional training, an LLM is fed mountains of text, and encouraged to guess each word before it appears. With each prediction, the LLM makes small adjustments to improve its chances of guessing right. The end result is something that has a certain statistical “understanding” of what is proper language and what isn’t.
构建大语言模型 (LLM) 需要大量数据,这已不是什么秘密。在传统的训练中,要给大语言模型灌输大量文本,并鼓励它猜测下一个该出现的单词。针对每次猜测,模型都会进行一些小调整以提高猜对的几率。最终结果是,对于什么是恰当的语言,什么不是,模型有了某种统计意义上的“理解”。
But an LLM that has only undergone this so-called “pretraining” is not yet particularly useful. When asked for a joke to cheer your correspondent up, for instance, the pretrained model GPT-2 just repeated the question back three times. When asked who the American president was, it responded: “The answer is no. The president is not the president.” Clearly, teaching an LLM to do what humans want requires something more.
但是,只经过这种“预训练”的大语言模型还不是非常好用。例如,当被要求开个玩笑来给笔者提提神时,预训练模型GPT-2只是把这条提问重复说了三遍。当被问及美国总统是谁时,它回答说:“答案是‘不’。总统不是总统。”显然,要教会大语言模型按人类的需要做事还需要做更多工作。