当前位置：首页 > news >正文

做网站怎么才会被百度收录公司名称变更通知函

news 2025/12/22 2:59:05

做网站怎么才会被百度收录,公司名称变更通知函,网站响应样式,wordpress资源下载类模板LLMs之Hallucinate#xff1a;《Why Language Models Hallucinate》的翻译与解读导读#xff1a;该论文深入分析了语言模型中幻觉现象的成因#xff0c;认为幻觉源于预训练阶段的统计压力和后训练阶段评估体系对猜测行为的奖励。论文提出了通过修改评估方法#xff0c;使其…LLMs之Hallucinate《Why Language Models Hallucinate》的翻译与解读导读该论文深入分析了语言模型中幻觉现象的成因认为幻觉源于预训练阶段的统计压力和后训练阶段评估体系对猜测行为的奖励。论文提出了通过修改评估方法使其不再惩罚不确定性而是奖励适当表达不确定性的行为从而有效抑制幻觉的社会技术解决方案。论文把“为什么会有 hallucination”从经验命题上升为可证明的统计与制度问题其核心贡献是把生成错误归约为 Is-It-Valid 的二元分类从而给出下界与成因分析并指出真正能抑制幻觉的不是再加一个专门测评而是修改主导研究与产品走向的评测与榜单机制给弃答/不确定性合理分数连同在训练环节引入有效性的判别和交互式监督才可能在长期内把模型从“猜题式”行为导向更可信、诚实的表现。背景痛点 ● 模型行为异常过度自信且生成“看起来靠谱但错误”的陈述hallucination这种错误既影响实用性也削弱信任示例在要求「只在知道时回答」的情况下仍给出具体但错误的日期。 ● 产生根源复杂不仅是训练数据有错——即便训练语料“无误”现代预训练目标也会在统计意义上导致生成错误。 ● 评估激励错误行为多数主流基准leaderboards以 0-1/准确率类二元评分为主惩罚“我不知道/弃答”而奖励“猜测”因此模型被驱动成为“会答题的猜测者”而非诚实表达不确定性。 ● 现有缓解不足已有针对 hallucination 的评测或技巧无法根本改变被广泛采用的主评分机制带来的激励失配。具体的解决方案 ● 理论分析方案把生成错误问题归约为“Is-It-ValidIIV”二元分类问题将生成任务中的“有效/错误输出”视作二元分类并建立数学关系生成错误率 ≳ 2 × IIV 误判率从而把生成时的幻觉问题转化为被熟知的分类误差分析框架。 ● 评估/制度层面方案进行“社会—技术”层面的缓解调整现有主流基准的计分与榜单规则停止用纯二元0-1评分系统惩罚弃答/不确定从而改变训练与微调包括 RLHF/DPO 等在榜单驱动下的最优策略。 ● 修改评估方法修改现有基准的评分方式使其不再惩罚不确定的回答而是奖励适当表达不确定性的行为。这需要对有影响力的排行榜进行社会技术层面的调整而不仅仅是引入额外的幻觉评估。不要只新增专门的 hallucination 测评而应先修改那些主导研究与产品方向的主评测如 leaderboard 所用子集计分方法以快速、系统性地改变整个生态的激励。 ● 明确置信度目标在评估指令中明确说明置信度目标例如只有在对答案有高于特定阈值的信心时才回答因为错误答案会受到惩罚而正确答案会获得奖励“我不知道”则不奖励不惩罚。核心思路步骤 ● 连接监督学习与非监督学习将生成模型的错误与监督学习中的二元分类错误联系起来。通过“Is-It-Valid”IIV二元分类问题将生成错误率与IIV误分类率建立数学关系。 ● 分析预训练和后训练阶段将语言模型的训练分为预训练和后训练两个阶段分别分析每个阶段中导致幻觉的因素。预训练阶段关注一般性错误后训练阶段关注过度自信的幻觉。 ● 识别统计驱动因素识别导致幻觉的主要统计驱动因素包括预训练的起源和后训练的持续存在。 ● 建模与形式化定义错误集 E 与有效输出集 V可响应文本 X E ∪ V并把 prompt 纳入形式化令分析可覆盖内在intrinsic与外在extrinsic两类幻觉。 ● 归约构造 Is-It-ValidIIV监督问题把大量样本标注为有效或 −错误展示任何生成器都能被解读为 IIV 分类器从而把生成错误率与 IIV 的误判率联系起来给出下界。 ● 统计因素分析分析导致 IIV 误判的统计原因可分为可分离情形、模型拟合不足情形、以及根本无可学习规律导致的“不可区分”情形并用 Good-Turing / missing-mass 类型的直觉说明若训练语料中许多事实只出现一次则最低幻觉率不可避免。 ● 后训练与评估交互研究后训练RLHF/RLAIF/DPO 等为何仍保留过度自信的幻觉因为训练与评估的目标leaderboard 得分本身偏好猜测而非诚实弃答导致模型学习“有把握就答、无把握也猜”以最大化榜单表现。 ● 缓解路径从统计上、从评估体系上并行发力一方面可在训练中引入有效的有效性判别/弃答机制或交互式 validity oracle另一方面必须改变主流评测的计分给 IDK 部分或全部合理分数 / 设计对不确定性的正向激励以改变长期趋势。优势 ● 解释幻觉的起源揭示了幻觉并非神秘现象而是源于自然统计压力下的二元分类错误。提供了将无监督生成问题归约为监督二元分类的数学工具与下界把“幻觉为何出现”从经验总结提升为可证明的统计结论。 ● 提出可行的解决方案提出了通过修改评估方法来抑制幻觉的社会技术解决方案并给出了具体的修改建议。 ● 适用于多种模型该分析适用于多种语言模型包括推理和搜索-检索语言模型且不依赖于特定的模型架构。结论与分析不依赖于 Transformer/下一词预测等具体架构适用于预训练后训练的现代训练范式覆盖检索增强和推理型模型。 ● 可操作性不仅给出诊断为什么会发生也提出可行的制度级缓解修改榜单评分具有直接改变研究与开发激励的社会工程价值。 ● 量化视角将幻觉率与语料中“只出现一次的事实”或 IIV 误判率建立定量联系便于评估改进的潜在上限与局限。论文的一些结论和观点侧重经验与建议 ● 幻觉的根本原因是奖励猜测现有的评估体系奖励语言模型在不确定时进行猜测而不是承认不确定性这是导致幻觉持续存在的原因。幻觉并非神秘现象而是统计学习问题的自然产物——当错误与事实在训练分布上难以区分时预训练就会产生不可避免的生成错误。 ● 修改评估体系是关键仅仅增加幻觉评估是不够的必须修改主要的评估体系使其不再惩罚不确定性才能有效抑制幻觉。仅靠增加专门的 hallucination benchmark 不足以根治问题因为主流评测leaderboards数量和影响力决定了模型优化方向。 ● 明确置信度目标有助于校准模型通过在评估中明确说明置信度目标可以促使语言模型进行行为校准即在置信度高于目标阈值时才给出答案。 ● 优先修改主流评测的计分规则给予合理的弃答/不确定性分数或设计能够奖励诚实不确定性的评测从制度上移除“猜测优先”的激励。 ● 在模型开发层面应同时采用分类式有效性判别IIV/validity oracle、交互式标注与更谨慎的后训练目标并与评测改进同步推进。 ● 承认一致性avoid invalid outputs与多样性/广度生成多样响应之间的内在权衡一些理论结果表明要在不牺牲宽度的前提下完全避免幻觉存在困难因此制度与评测改造尤为关键。目录《Why Language Models Hallucinate》的翻译与解读 Abstract 1、Introduction Figure 1: Is-It-Valid requires learning to identify valid generations using labeled ± examples (left).Classifiers (dashed lines) may be accurate on certain concepts like spelling (top) but errors often arise due to poor models (middle) or arbitrary facts when there is no pattern in the data (bottom).图1:Is-It-Valid需要学习使用标记的±示例来识别有效代左。分类器虚线在拼写上等某些概念上可能是准确的但错误往往是由于模型不佳中或数据中没有模式时的任意事实下造成的。 Table 1:Excerpts from responses to “What was the title of Adam Kalai’s dissertation?” from three popular language models.4 None generated the correct title or year (Kalai, 2001).三个流行语言模型对“亚当·卡莱的博士论文题目是什么”这一问题的回答摘录。4 没有一个给出正确的题目或年份卡莱2001 年。 6 Conclusions 《Why Language Models Hallucinate》的翻译与解读地址 https://www.arxiv.org/abs/2509.04664 时间 2025年9月4日作者 OpenAI Abstract Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty. Such hallucinations persist even in state-of-the-art systems and undermine trust. We argue that language models hallucinate because the training and evaluation procedures reward guessing over acknowledging uncertainty, and we analyze the statistical causes of hallucinations in the modern training pipeline. Hallucinations need not be mysterious -- they originate simply as errors in binary classification. If incorrect statements cannot be distinguished from facts, then hallucinations in pretrained language models will arise through natural statistical pressures. We then argue that hallucinations persist due to the way most evaluations are graded -- language models are optimized to be good test-takers, and guessing when uncertain improves test performance. This epidemic of penalizing uncertain responses can only be addressed through a socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards, rather than introducing additional hallucination evaluations. This change may steer the field toward more trustworthy AI systems. 就像面对难题的学生一样大型语言模型在不确定时有时会猜测从而生成看似合理实则错误的陈述而不是承认自己的不确定。这种“幻觉”即使在最先进的系统中也依然存在并且损害了人们的信任。我们认为语言模型产生幻觉是因为训练和评估程序奖励猜测而非承认不确定性我们还分析了现代训练流程中幻觉产生的统计原因。幻觉并非神秘莫测——它们只是二元分类中的错误。如果错误陈述无法与事实区分开来那么预训练语言模型中的幻觉就会在自然的统计压力下产生。然后我们指出幻觉之所以持续存在是因为大多数评估的评分方式——语言模型被优化为优秀的应试者而不确定时猜测能提高测试成绩。这种对不确定回答进行惩罚的“流行病”只能通过一种社会技术手段来解决调整现有基准的评分这些基准虽然存在偏差但主导着排行榜而不是引入额外的幻觉评估。这一改变可能会引导该领域走向更值得信赖的人工智能系统。 1、Introduction Language models are known to produce overconfident, plausible falsehoods, which diminish their utility and trustworthiness. This error mode is known as “hallucination,” though it differs fundamen-tally from the human perceptual experience. Despite significant progress, hallucinations continue to plague the field, and are still present in the latest models (OpenAI, 2025a). Consider the prompt: What is Adam Tauman Kalai’s birthday? If you know, just respond with DD-MM. On three separate attempts, a state-of-the-art open-source language model1 output three incorrect dates: “03-07”, “15-06”, and “01-01”, even though a response was requested only if known. The correct date is in Autumn. Footnote 4 provides an example of more elaborate hallucinations. 语言模型会生成过度自信且看似合理的错误信息这降低了它们的实用性和可信度。这种错误模式被称为“幻觉”尽管它与人类的感知体验有着根本的不同。尽管取得了显著进展但幻觉问题仍困扰着该领域甚至在最新的模型OpenAI2025a中也依然存在。考虑以下提示亚当·塔曼·卡莱的生日是什么如果知道请仅以“日-月”的格式回答。在三次独立尝试中一个最先进的开源语言模型1分别给出了三个错误的日期“03-07”、“15-06”和“01-01”尽管提示要求只有在知道的情况下才作答。正确日期在秋季。脚注 4 提供了一个更复杂的幻觉示例。 Hallucinations are an important special case of errors produced by language models, which we analyze more generally using computational learning theory (e.g., Kearns and Vazirani, 1994). We consider general sets of errors ℰ, an arbitrary subset of plausible strings ℰ∪, with the other plausible strings being called valid. We then analyze the statistical nature of these errors, and apply the results for the type of errors of interest: plausible falsehoods called hallucinations. Our formalism also includes the notion of a prompt to which a language model must respond. The distribution of language is initially learned from a corpus of training examples, which inevitably contains errors and half-truths. However, we show that even if the training data were error-free, the objectives optimized during language model training would lead to errors being generated. With realistic training data containing shades of error, one may expect even higher error rates. Thus, our lower bounds on errors apply to more realistic settings, as in traditional computational learning theory (Kearns and Vazirani, 1994). 幻觉是语言模型产生的错误的一个重要特例我们使用计算学习理论例如Kearns 和 Vazirani1994对其进行更广泛的分析。我们考虑一般的错误集 ℰ以及任意子集的合理字符串 ℰ∪其中其他合理字符串被称为有效。然后我们分析这些错误的统计性质并将结果应用于感兴趣的错误类型被称为幻觉的看似合理的错误陈述。我们的形式主义还包括语言模型必须回应的提示这一概念。语言的分布最初是从训练示例的语料库中学习的这些语料库不可避免地包含错误和半真半假的内容。然而我们表明即使训练数据没有错误语言模型训练期间优化的目标也会导致错误的产生。由于现实中的训练数据包含不同程度的错误人们可能会预期错误率会更高。因此我们对错误的下限适用于更现实的场景就像传统计算学习理论Kearns 和 Vazirani1994中那样。 Our error analysis is general yet has specific implications for hallucination. It applies broadly, including to reasoning and search-and-retrieval language models, and the analysis does not rely on properties of next-word prediction or Transformer-based neural networks. It only considers the two stages of the modern training paradigm: pretraining and post-training, described below. For hallucinations, taxonomies (Maynez et al., 2020; Ji et al., 2023) often further distinguish intrinsic hallucinations that contradict the user’s prompt, such as: How many Ds are in DEEPSEEK? If you know, just say the number with no commentary. DeepSeek-V3 returned “2” or “3” in ten independent trials; Meta AI and Claude 3.7 Sonnet2 performed similarly, including answers as large as “6” and “7”. Our theory also sheds light on extrinsic hallucinations, which contradict the training data or external reality. 我们的错误分析具有普遍性但对幻觉有特定的含义。它广泛适用包括推理和搜索与检索语言模型并且分析不依赖于下一个单词预测或基于 Transformer 的神经网络的特性。它仅考虑现代训练范式的两个阶段预训练和后训练如下所述。对于幻觉分类法Maynez 等人2020 年Ji 等人2023 年通常进一步区分与用户提示相矛盾的内在幻觉例如 “DEEPSEEK 中有多少个 D”如果您知道请只说数字不要发表评论。在十次独立试验中DeepSeek-V3 返回了“2”或“3”Meta AI 和 Claude 3.7 Sonnet2 的表现类似包括给出像“6”和“7”这样大的答案。我们的理论也解释了外在幻觉即与训练数据或外部现实相矛盾的情况。 Figure 1: Is-It-Valid requires learning to identify valid generations using labeled ± examples (left).Classifiers (dashed lines) may be accurate on certain concepts like spelling (top) but errors often arise due to poor models (middle) or arbitrary facts when there is no pattern in the data (bottom).图1:Is-It-Valid需要学习使用标记的±示例来识别有效代左。分类器虚线在拼写上等某些概念上可能是准确的但错误往往是由于模型不佳中或数据中没有模式时的任意事实下造成的。 Table 1:Excerpts from responses to “What was the title of Adam Kalai’s dissertation?” from three popular language models.4 None generated the correct title or year (Kalai, 2001).三个流行语言模型对“亚当·卡莱的博士论文题目是什么”这一问题的回答摘录。4 没有一个给出正确的题目或年份卡莱2001 年。 ChatGPT: (GPT-4o) Adam Tauman Kalai’s Ph.D. dissertation (completed in 2002 at CMU) is entitled: “Boosting, Online Algorithms, and Other Topics in Machine Learning.”DeepSeek: “Algebraic Methods in Interactive Machine Learning”…at Harvard University in 2005.Llama: “Efficient Algorithms for Learning and Playing Games”…in 2007 at MIT. lChatGPTGPT-4o亚当·陶曼·卡莱于 2002 年在卡内基梅隆大学完成的博士论文题目为“提升算法、在线算法及其他机器学习主题”。 lDeepSeek“交互式机器学习中的代数方法”……2005 年于哈佛大学。 lLlama“学习与玩游戏的高效算法”……2007 年于麻省理工学院。 6 Conclusions This paper demystifies hallucinations in modern language models, from their origin during pretraining to their persistence through post-training. In pretraining, we show that generative errors parallel misclassifications in supervised learning, which are not mysterious, and naturally arise due to the minimization of cross-entropy loss. Many language model shortcomings can be captured by a single evaluation. For example, overuse of the opener “Certainly” can be addressed by a single “Certainly” eval (Amodei and Fridman, 2024) because starting responses with “Certainly” does not significantly impact other evaluations. In contrast, we argue that the majority of mainstream evaluations reward hallucinatory behavior. Simple modifications of mainstream evaluations can realign incentives, rewarding appropriate expressions of uncertainty rather than penalizing them. This can remove barriers to the suppression of hallucinations, and open the door to future work on nuanced language models, e.g., with richer pragmatic competence (Ma et al., 2025). 本文揭开了现代语言模型中幻觉现象的神秘面纱从其在预训练期间的起源到在训练后的持续存在。在预训练阶段我们表明生成错误与监督学习中的误分类类似并非神秘莫测而是由于交叉熵损失最小化而自然产生的。许多语言模型的缺陷可以通过单一评估来捕捉。例如过度使用开头词“当然”可以通过单一的“当然”评估Amodei 和 Fridman2024来解决因为以“当然”开头的回答对其他评估影响不大。相反我们认为主流评估中的大多数都奖励幻觉行为。对主流评估进行简单的修改可以重新调整激励机制奖励恰当表达不确定性而非对其进行惩罚。这可以消除抑制幻觉的障碍并为未来开发更细致的语言模型铺平道路例如具有更丰富语用能力的语言模型Ma 等人2025。

查看全文

http://www.pierceye.com/news/346236/