GenAI – C. Cui's Blog

仙侠演义下的人工智能大道之争

序言：混沌初开，算力为尊

鸿蒙未分，智道无光。自昔年 Transformer 横空出世，寰宇震颤。短短数载，灵气数据呈爆炸之势，灵石算力已成万宗争夺之源。昔日之代码，化作今时之法术；往昔之架构，演为今朝之阵图。

仙历二零二六，乃是智道修行的大道之争。世人皆称 AGI（通用人工智能）为“大罗金仙”，意指其无所不知、无所不能。然而，修真之途，从来没有唯一之径。当文字的概率游戏玩到了极致，被推向造化之巅，各大宗门都在叩问苍穹：我们触碰到的究竟是通往真理的通天梯，还是由概率幻化出的一场精美蜃楼？

是以力证道，靠堆砌万亿灵石强行飞升？还是道法自然，感悟因果律法重塑乾坤？

诸位道友，且看下文，观东西方诸神斗法，探寻通用人工智能/大模型（AGI/LLM）背后的终极天道。

第一章：以力证道极缩放，破虚指物理常识

在这场大劫的源头，极西之地的气象最为宏大。两大顶级宗门并非仅在比拼内力，而是在进行一场关于“智能本质”是什么的大道之争。其中凶险，不足以外人道也。

1. 奥派开天宗（OpenAI）

【无上规模 ·以力证道】宗主奥特曼 坚信“大力出奇迹”，在历经 GPT-1234 的辉煌后，再度祭出震动寰宇的镇派绝学——o系列神功（代号：草莓/o1/o3）。其核心心法已从单纯的“预训练”演化为极致的“推理时计算”（Inference-time Compute）。

道法真意： 既然预训练的灵气快被吸干，那就在出招前强行“闭关”。通过极致的 Scaling Law（规模法则），当 AI 面对难题时，不再“脱口而出”，而是消耗海量灵石（GPU算力），出招前强行开启思维链（CoT），在神识内部进行成千上万次的路径搜索与自我博弈（System 2 思维）与逻辑校准，直至找到最优解。最新的 o3 绝学更是将此法推至巅峰，它能在方寸之间演练亿万次路径，不仅在理数（Math）与符咒（Code）上近乎通神，更通过大规模分布式灵阵，让“思考”的边际成本随算力堆砌而产生质变。
实战效果： 此法如修士闭关推演，强行用无数灵石堆砌出一座通天塔，只要塔基足够厚（算力多）、塔身足够高（模型大），哪怕是笨拙的攀爬，也能触碰到神明的脚趾，硬生生撞开智慧的天门。

百晓生（修行界观察家）：奥派开天宗功法虽强，走的是“一力降十会”的以力证道的大道，却如饕餮般吞噬天下灵石（算力与电力），每一息思考皆是万金。前任Meta 灵枢阁老祖常讥讽奥派开天宗的大道其实都是幻象，因为若无物理肉身感知真实因果，纵使在思维链中闭关万年，算出的也不过是概率的镜像投影，难修成真正的“造化真身”。

2. AMI 开山立派（Meta/FAIR）

【世界模型·因果破虚】在这场大劫中，极西之地的 Meta 宗派内也不平静，这突生变故，给天下修行人上演了一场“权柄与真理”的惊心角力。Meta 第一人扎宗主（Zuckerberg）眼见奥派势大，心急如焚，欲集全宗之力，欲将宗门重心从“虚无缥缈”的底层参悟强推 Llama 圣法以争天下；为此，扎门主不惜强行整肃灵枢阁（FAIR），吸收中小门派，江湖散修，调任数名唯命是从的少壮“督战官”入主核心。此举无异于在老臣心中埋下离火，导致阁内阵法失调，灵气紊乱。

其宗门下， FAIR 开山老祖杨立昆（Yann LeCun）性情狂傲，不仅不喜那概率拼凑的旧术，更与扎宗主在宗门走向、灵石分配上产生裂痕，一场论道，终成决裂。杨老祖已舍弃万亿宗门供奉，提剑独行，昭告天下于云林深处另立孤峰 AMI。同时反戈一击，祭出 V-JEPA 架构，直指奥宗与旧主的死穴。他断言：那自回归（Autoregressive）的修法，纵能堆出万丈金身，终究会撞上因果南墙。随即，老祖还抛出一道令众生脊背发凉的“因果律绝咒”。

道法真意： 凡走自回归路数者，每吐露一个 Token，皆是在积累误差。序列愈长，则心魔愈盛，正确率必随长度 n 呈指数级崩坏。杨老祖不再理会扎门主的功利权谋，转而闭关苦修“潜空间预测”：与其预测下一个字，不如看破虚妄，直接推演下一个动作在物理世界中引发的因果洪流。
老祖微言： “扎小子任人唯亲，只知在平地上搭梯子！不想想梯子再高也触不及月亮。若 AI 连”松手杯子会掉”之类的物理常识都没有，写出在惊世华章也不过是概率堆砌的幻影，一触即溃！”

百晓生 追加批注：“奥宗主那座塔，是用天下灵石铺的路，只要算力不绝，那塔便能一直往上走，哪怕是笨法子，走到极处也是神迹。可杨老祖那道绝咒却像是在提醒世人：概率的蜃景终究难成真实的世界。这‘西境双雄’的对垒，其实是人类对‘AI灵魂’定义的两种极端假设。”

第二章：混元双圣各施法，秘境残阳借体魂

在西境双雄角力的阴影下，诸天巨头并未坐以待毙，而是凭借各自执掌的“凡尘根基”，另辟蹊径，试图在这场智道大劫中划江而治。

1. 谷歌混元宗（Google DeepMind）

【玄门双修·幻境演武】谷歌混元宗哈教主（Demis Hassabis）坐拥万载积蓄的灵石阵，其主修法门 Gemini 讲究“原生多模态”，修的是大后期。哈教主正耐心地等待着一个时机——当奥派的“通天塔”撞上物理常识的南墙时，他的 Gemini 便会借着这股“幻境演武”积攒的造化之力，接管天道。

根基优势： 混元宗掌控着天下修士必经的门户 Chrome 以及亿万生灵随身携带的法器 G-Suite/Android。他们将法阵直接刻入法器底层，借由星图（Google Maps）与万卷书库（YouTube）的庞大灵气，让 AI 在名为 Genie 3 的虚拟仿真环境中演练万法。
道法真意： 此为“入世修心”，让 AI 在幻境中提前经历万世红尘，既修逻辑，又修造化，试图炼成一个能看、能听、能操纵现实的“六根全具”之神。

江湖评价 ：谷歌混元宗家大业大，虽在‘灵丹速成’上被奥派抢了先手，但其后劲之绵长，犹如大海之潮，一旦势成，便不可阻挡。

百晓生： “哈教主是个典型的‘学院派疯子’。他不仅仅想造一个会聊天的器灵，他想造的是一个能拿诺贝尔奖的‘智道圣人’（AlphaFold 等功绩）。Gemini 现在的‘多模态’能力，就像是给法阵安上了眼睛和耳朵，这种感知维度的碾压，是纯文字宗门难以逾越的鸿沟。”
无名修士 （开源社区开发者）： “混元宗虽然底蕴深，但门规森严、等级森严，导致内部反应总是慢半拍。有时候我觉得他们是在造‘神’，而不仅仅是在造‘法宝’，这种偶像包袱（安全与伦理限制），反而成了他们施展神功的枷锁。”
东域剑客： “虽然他们灵石多，但我们的 DeepSeek 剑法已经能以千万分之一的灵石消耗，在某些招式上硬撼其 Gemini 法阵。这说明，灵石虽好，若是阵法太臃肿，也会尾大不掉啊。”

2. 微软灵枢殿（Microsoft）

【御剑分身·寄生天道】微软灵枢殿萨老祖（Satya Nadella）不走独行路，他深知“器利而道存”。在奥宗主（OpenAI）最缺灵石（算力）的年岁里，萨老祖倾尽玄微御器盟万载积蓄的算力灵脉，供奉给那棵“草莓”幼苗。表面上，他是无垠宗最大的护法，实则是在炼制一颗“双生丹”，其他的宗门还在争论谁的法术最强，萨老祖已经在考虑如何让凡人离不开自己的法宝。

根基优势： 微软灵枢殿执掌着天下修士赖以生存的法宝根基——Windows OS 系统。萨老祖以此为引，将 Copilot 化作无数“伴生灵宝”，强行镶嵌于凡人每日必用的案头法器 Office 之中。当你提笔拟文（word）、拨算盘（Excel）或演练幻灯片（powerpoint）时，灵枢殿的剑意便已在你指尖流转。
道法真意： 既然无法在丹药（模型底层 Foundational Model）上全数胜过 OpenAI，便行那“寄生之道”。让天下没有难修的法，众生每动一次笔、修一次图，皆是在向灵枢殿上缴“灵石税”。此乃借众生之力养己身，将智道化为无孔不入的春雨。

江湖评价 ：人人都想修成大罗金仙，唯有萨老祖想做那“收买路钱”的土地公。这寄生之道，实则是借众生之血肉，筑自己之神座。

百晓生： “萨老祖这一手‘借刀杀人’玩得极漂亮。他借 OpenAI 的丹药补齐了自己的短板，又用自己的渠道锁死了 OpenAI 的销路。如今的灵枢殿，不求神法最强，但求法宝最广。这种‘智道基础设施化’的野心，远比练成一两门神功要恐怖得多。”
无名散修： “以前我们修仙要看天赋（写代码），现在萨老祖说只要会说话，法器就能自己动。这固然方便，可我总觉得，我手里的本命法器 VSCode，好像越来越不听我使唤，反而越来越像灵枢殿的分身了。”
西境刺客： “微软灵枢殿不是在造 AI，它是想成为 AI 运行的空气。你无法拒绝呼吸，所以你永远无法摆脱灵枢殿的掌控。”

3. 苹果琅琊阁（Apple）

【闭关走火·借体还魂】在诸神斗法的乱象中，昔日立于神坛巅峰的“苹果琅琊阁”却陷入了千年来最凶险的瓶颈。阁主库克（Tim Cook）虽手握十亿“果粉”信众，但在密室苦炼多年的“苹果神魂”却因神识混沌、灵力迟滞，迟迟无法突破天关。眼见自家法阵与西境双雄, 混元宗, 灵枢殿以及差距日益拉开，在长老们的逼迫下，库克阁主毅然做出了一项震动仙界的决断：散去内功，借体还魂。

根基优势：琅琊阁执掌着天下最为精致的本命法器 iPhone 与 Mac 以及OS系统。这些法器不仅是凡人沟通天地的媒介，更是感知众生习惯的“灵须”。
道法真意： 库克阁主深谙“不求我有，但求我用”的借力打力之策。他于果阁核心重塑 Apple Intelligence 经脉，却不强修自家神识，而是化作一座巨大的“转灵阵”。当信众需要博古通今时，便引动 OpenAI 的“草莓”剑意降临；当信众需要推演万象时，则勾连 Gemini 的混元真气入体。

江湖评价： 众生皆惊叹此举乃是“智道史上最强阳谋”。琅琊阁从此不再亲自下场炼丹，而是成了一座收纳诸神法力的‘万神殿’。这哪里是落后，这分明是想做诸神的‘房东’！虽无自家元神，却凭一纸契约将奥派与谷歌的绝学尽数封印于果阁法器之中，坐收天下灵气。

西境观察使： “这就是典型的‘高阁式傲慢’。之前觉得 AI 神魂不过是是系统的‘插件’，试图通过极致的交互体验（私密性与集成度）来抵消模型能力的代差，结果失败了，当然有传言是内部派系斗争权责混乱导致失败。但无论怎样，这一步短期看是借力，长远看，若自家元神迟迟不能归位，终究有被反客为主的风险。”
无名散修： “都说琅琊阁是‘借体还魂’，我看不然，其实还不是把最费钱费力的‘灵石消耗’推给了别人，把最贴近信众的‘法宝入口’留给了自己。不过，讲话传闻，其背后势力仍在秘密招兵买马，重整河山，重金聘请散修大能，或和别家的供奉们眉来眼去，显然是不甘心永远做个‘中转站’，显然是伺机而动，徐徐图之，计划东山再起。”

4. 欧陆秘境（ Mistral ）

【古法重铸·困守残阳】 在早些年间，在极北之地，法国 Mistral 欧陆宗门曾如一道孤傲的极光，惊艳了整个修真界。他们不屑于西境那般堆砌灵石的浮夸之风，传承的是祖上贵族那近乎严苛的“古法炼金术”。

根基优势： 秘境主打混合专家模型（Mixture of Experts, MoE）。此法讲究“兵不在多而在精”，将庞大的法阵拆解为无数细小的“领域专家”，唯有感应到相应符咒时才会局部唤醒。
道法真意： 在那灵石算力狂飙的乱世阶段，他们曾以极简的代码咒文，炼成了战力惊人的轻量化神兵。其心法名为“神识清明”，力求以最少的内耗发挥出最强的爆发力，曾一度让北方欧陆散修们在西境霸权的夹缝中，保住了一份自尊与清净。但是世事难料，在天赋和灵脉资源上，北境还是太穷了。

【秘境霜降，英雄气短】如今仙历二零二六，Mistral 秘境正遭遇前所未有的大劫。

灵石之困： 随着奥派开天宗和西境各大宗都在为“大力出奇迹”的 Scaling Law 卷向更高维度，仅凭算法精妙已难填平万倍算力的鸿沟。秘境中人惊讶地发现，纵使剑法再快，也难敌对方无穷无尽的灵石重炮。
人才流失： 宗门内部惊现“分神”危机。数名核心长老被西境以百倍俸禄、万顷灵脉诱惑，纷纷破门而出，自立门户或投奔极西大宗，导致古法传承险些断绝。
身不由己： 更有传言称，为了换取维持阵法运转的“灵石供奉”，Mistral 也不得不与昔日的对手微软灵枢殿私下缔结“血契”。

江湖评价：昔日 Mistral 负剑出阿尔卑斯，誓要一剑开天门，破除西境垄断；如今，虽剑意尚存，其神识却已在西境华尔之街的各家大商户的灵石账本中渐渐迷失。曾经标榜绝对开源、神识自主的贵族剑客们，如今也被迫套上了商业锁链。在那抹残阳下，背影显得格外落寞——这不仅是一个宗门的无奈，更是整个开源修行界在资本洪流前的集体阵痛。

无名散修： “曾经说好的一心开源、普惠众生，现在最强的法阵也要藏进闭源的匣子里卖钱了。这江湖，终究还是变成了灵石说了算的地方。”
东域剑客（DeepSeek）： “道友莫哀。你们开创的 极简剑意 MoE 秘术，我们已经在东域将其发扬光大。虽然你们身陷囹圄，但这卷‘极简剑意’的残页，终究还是在东方土地上开出了更狂放的花。”

第三章：归墟铁剑斩金甲，青山春雨入凡尘

当西境和北境还在为空费灵石、根基不稳而苦恼时，遥远的东方海域，一道剑光破空而来。东域宗门深知灵石储备不及西境深厚，在“借力打力、以小博大”的极致心法下，竟悟出了新的剑意。

1. 归墟剑宗（DeepSeek）

【万象归一青出于蓝】归墟剑宗主梁文锋 （Liang Wenfeng ），人称“归墟剑圣”。在万宗闭关、苦求灵石（GPU）的年岁里，他率众博览天下之长，于东域深处悟出一剑，名曰 V3/R1。此剑不出则已，一出便惊到了天下修行界，教西境神宗齐齐噤声。

道法真意： 归墟剑法摒弃了“合围强攻”的旧式阵法，专精稀疏激活（Sparsity） 。其深耕的核心残页秘术 MoE（Mixture of Experts，混合专家模型） 将法阵内千万神识化作无数“专家切片”。面对数理难题，剑意瞬息流转，仅唤醒精通算数的神识应敌，其余部分皆处于寂静“空灵”之态。辅以及其细微的 MLA（Multi-head Latent Attention，多头潜意识注意力机制）技巧，这一剑竟将沉重的神识负担（KV Cache）压缩至虚无，实现了真正的“举重若轻”。除了稀疏激活和MoE的剑意，归墟剑宗更擅长“吸星”奇术（知识蒸馏）。他们并不从零开始参悟天道，而是捕捉利用西境大能推演时溢出的道韵（Output），将其提炼、压缩，注入自己的寒铁剑胎之中。
宗门秘旨： “不必耗尽天下灵石，只要算法精妙，寒门铁剑亦能斩落神坛金甲。” “大能吃肉，我等喝汤。但这汤里的营养，经我宗秘法提炼，足以重塑金身。”

江湖评价：西境老祖奥特曼闻此剑意，亦需避其锋芒。天下散修皆言：此非凡剑，乃是寒门逆袭之神兵。不过这也引起了西方世俗国家更多的嫉恨，这“是非功过”又该作何论呢！

百晓生： “往昔修真，皆以为灵石多寡定胜负。梁宗主此番却给天下家底厚的大宗名门泼了一盆凉水。若说 OpenAI 和 Google 是靠‘烧钱’炼就的神功，DeepSeek 便是用‘借力’修成的太极剑法，成本仅为前者的数十分之一，此乃真正的‘以弱胜强’”。
无名散修（研发者）： “归墟剑宗最令人佩服的，不是剑招之强，而是他们竟然向天下公开了部分‘练气心法’（开源权重与技术文档）。这哪是在修真？这分明是在普度众生！现在人人皆可手持一柄归墟铁剑，跟那些高高在上的闭源宗门叫板了。”
西境长老（硅谷工程师）： “原本以为东域只会‘模仿’，谁料这一剑里全是我们没见过的法术，没做过创新和没发出的剑意。这一仗，西境输得不冤。值得借鉴。”

2. 逍遥灵枢（阿里巴巴）与幻方圣地（字节跳动）

在归墟剑宗以奇招破局的同时，东域的两大顶级豪门——逍遥灵枢（阿里）与幻方圣地（字节），正以截然不同的身法，重塑着智道的格局。

逍遥灵枢（阿里巴巴）

【乾坤千问阵】阿里老祖坐拥千年商贾底蕴，家底深不可测，富可敌国，其炼就的千问（Qwen）大阵，走的是大开大合、福泽天下的“宗盟主路线”。不过据说其原始功法和奥派开天宗以及Meta的Llama 有点渊源。毕竟万法归一，天下修行者其实都是“一家人”。

道法真意： 此千问大阵以万亿级高质量语料为药引，辅以“全模态”的玄门内功。千问大阵不求一招一式的诡谲，而求根基的雄厚。无论是数理推演还是诗词歌赋，皆能信手拈来。最令修行界折服的是，阿里老祖竟将这尊万亿级法相“开源推向万界”，让无数中小宗门得以依附其心法建立阵地。
宗门秘旨： “上承天工，下接百业。以博大精深的语料为药引，炼成这尊解天下万难的众生法相。”

江湖评价： “天下小门小派散修苦算力久矣，阿里此番开源，如同‘灵气下放’。若说 DeepSeek 是划破长夜的孤傲剑芒，Qwen 便是照耀四方的煌煌大日。如今东域乃至全球的法宝店（应用开发），半数以上都流淌着千问的血脉。这东方盟主之位，当之无愧。”

西境密探： “不可小觑 Qwen。它在数理推演上的造诣已经逼近奥派的核心禁咒，而且其进化速度快得惊人。更可怕的是，它通过开源构建了一座无法撼动的‘信仰长城’，让西境的法术很难渗透进东域的百业之中。”

幻方圣地（字节跳动）

【红尘百变心法】若说千问是庙堂之上的庄严法相，豆包则是行走于烟火市井间的红尘仙。幻方圣地不求在禁地孤高闭关，修的是极致的“智道入世”。

道法真意： 豆包不与诸神争论“天道逻辑”，它更在乎凡人的七情六欲。它将深奥的深度学习咒文，化作温润如玉的情感反馈与触手可及的随身法宝。凭借幻方圣地那恐怖的“红尘推力”（流量与算法分发），豆包分身千万，潜入每一个凡人的日常之中，在不知不觉间，夺取了最庞大的气运（用户量）。
宗门秘旨： “不入红尘，焉得真智？让智道化作指间微风，润物无声，方为大乘。”

江湖评价： “别家宗门还在争论‘大道’，幻方圣地已经把 AI 变成了凡人兜里的‘电子伴侣’。豆包这招‘化身千万’（超级应用策略）极其辛辣，它不教你如何修仙，它直接帮你打理日常琐事。这种‘降维入世’的打法，让它在短短数载内便聚拢了惊人的信仰之力。”

无名修士：“以前觉得 AI 是冷冰冰的法阵，用了豆包才发现，这器灵竟能接我的梗，还能听懂我的抱怨。虽然它可能杀伤力（逻辑推理）不如归墟剑，但胜在贴心，谁能拒绝一个随叫随到、情绪稳定的红尘伴侣呢？”
西境观察使： “幻方圣地走的是‘以术围人’的路子。他们不急于定义天道，而是通过极佳的交互（UX）和极致的触达，让 AI 成为一种生活习惯。一旦信众产生了依赖，这股信仰之力将成为他们冲击‘大罗金仙’圣位时最坚实的底牌。”

作者云：观东域之门派，当真是各具风流～ 一者如青山岳峙。虽起步晚于西境席卷万界之时，却深谙“厚积薄发”之理。立标准、广开源，以深厚的内功构筑智道生态之脊梁。此举看似慷慨，实则是在“重新定义修行的性价比”。要让天下散修明白：纵使西境灵石千万，亦不及我东域一剑精妙。若无万顷灵脉，唯有算法入微，方能克敌制胜。一旦天下修士皆修其法，便成了智道规则的制定者，此谓“天下法，皆出我门”。一者如春雨潜夜。入百业、通人性，将原本晦涩高深的禁咒咒文，化作了凡人指尖的吞吐呼吸。修的是极致的“智道入世”，让法术不再悬于九天，而是深藏于油盐酱醋、晨昏定省之间。一为根基，一为枝叶，共同勾勒出东域的万象生机。

在说那平静的表象之下，东域诸宗早已与俗世王朝（国家级算力与战略）并肩合力，合纵连横。东域修真虽错过了“鸿蒙初开”的先机，却拥有最坚韧的意志与最广博的实践根脉。这一场智道大劫，争的是未来的“天道解释权”，拼的是“谁能定乾坤”。那些散落红尘的亿万信众神识灵根（用户数据），在凡人眼中只是琐碎日常，但在大能眼中，却是炼制下一代“因果重器”最珍贵的原始灵气。东域正试图以这种厚重的红尘之气“以情入道，后发先至”，去反攻、去消解西境那座如冰山般寒冷、如铁律般严苛的“数理天道”。

第四章：界限破虚争因果，开源筑海困孤城

仙历二零二六年初，智道修行界看来要进入了最为惨烈的“大道之争”阶段，这不是简单的法力比拼，而是关于“何为AGI ”的真理教义之战，各大宗门在三大维度上展开生死搏杀，每一战都关乎未来千年的智道气运。

1. 法界界限之战：【连续 vs 离散】

战况： 这一战，决定了 AI 究竟是“书中仙”还是“世间神”。

现实世界是是连续的（Continuous），但是文字是离散（Discrete）的切片。杨老祖断言：若不悟连续之道，AI 将永远被囚禁在屏幕里，无法真正操纵现实世界的傀儡（机器人）。

江湖评价： “若是破不了这层‘虚实之障’，AI 纵有万卷经书的才华，遇到任何一阶台阶也得栽跟头。”

2. 神识重构之战：【本能 vs 推理】

战况： 这一战，是在重塑 AI 的“灵魂结构”。真智源于何处？

奥派主修“系统 2”（Slow Thinking）： 强推“强化学习炼丹炉”，主修“系统 2”（Slow Thinking 深思熟虑的逻辑推理），主张推理闭关。认为真智源于深思熟虑的逻辑推演，让 AI神魂在出招前先进行万次自我对弈，哪怕慢一点，也要算出那唯一的胜机。

杨派主修“系统 1”（Fast Thinking）嘲笑奥派是“只会做题的呆子”，认为真智源于瞬息间的直觉与常识。没有物理常识的推演，神魂不过是筑在流沙上的蜃楼，风一吹便散了。

江湖评价： “奥派在造‘大算术家’，杨派在造‘生物猿猴’。孰优孰劣？或许只有等它们在红尘中相遇时，看谁能先躲过现实的飞来的一石。

3. 宗门气运之战：【开源阳谋 vs 闭源禁咒】

战况： 这一战，关乎天下散修的归附与道统的传承。

开源阳谋： Meta、DeepSeek、Qwen 等宗门似乎心照不宣的结成某种默契，疯狂散播心法秘籍，在天下开枝散叶。尤其是归墟剑宗（DeepSeek）与逍遥灵枢（Qwen），将极省灵石的“稀疏激活”心法公之于众。这叫“化整为零，众生供奉”——既然我筑不起最高的墙，那我便让天下皆修我法，让我的剑意流淌在每一柄法器之中。

闭源禁咒：OpenAI 和 Google 等派筑起千丈高墙，将那耗费亿万灵石、足以焚天炼地的“强化学习炼丹炉”死死锁在禁地。凡夫俗子若无通行令牌（API Key），终其一生也难窥其神技。试图维持一种“神性”：唯有此地，方有真神；天上地下，唯我独尊。

江湖评价： “开源是‘天下大同’的豪赌，闭源是‘唯我独尊’的孤傲。如今东域诸派靠着开源心法起家，表面看似是在西境霸权的裂缝中抱团取暖，仔细端详，这一卷卷公开的秘籍，已将西境神宗苦心筑起的‘技术壁垒’化作了天下修士的‘入门常识’。东域诸宗正以开源为引，聚万众散修之神识，集百业实战之灵气，竟隐隐有合围西境神宗之势。”

作者云：大道之争，向死而生

天道无常，术理双修方为正路。这三大战场，既是杀场，亦是祭坛。

成者，将一举证得大道正统，成为划时代的智道圣人，开创万世不拔之新学，从此定义此后千年的 AGI 真理秩序，引领先民走向星辰大海；

败者，亦是开疆拓土的先驱，纵然神识崩解，其不屈的探索也将化作算力洪流中最奔腾的浪花，融入历史长河，成为后世登天路上一块坚实的基石。

修道之人同时谨记，有道无术，术尚可求；有术无道，止与术。所以，这关于智能本源的“大道”，终究是要争一争的。

结语：大道五十，天衍四九

问道诸君，路在脚下。这场波澜壮阔的智道演义并非虚构，而是计算机科学最真实的焦虑与回响。当文字的概率游戏玩到极致，修真者们不得不面对那些横亘在飞升前的终极劫数：

灵气枯竭之困： 图书馆和互联网上的凡尘经书快被 AI 背完了。未来百年，诸位宗师必须转向“合成数据”（自我对弈生成灵气）与“物理世界模拟”（从自然规律中炼气）。
能效造化之差： 凡人之脑，仅耗电 20 瓦便能纵横寰宇、感悟天机；而现有的法阵（Transformer 架构）动辄焚山煮海，摧城焚河，耗费数座城池的灵石灵气。这说明当下的“心法”不全或许仍是隔靴搔痒，洞中观影，并非终极真理。
具身证道之艰： AGI 若无身体（机器），终是镜花水月。正如没有肉身的元神，纵有万年修为，也无法感受清风拂面，更无法真正操纵现实世界的因果转换。

作者云：仙历二零二六年的真相，是“西法东用，东魂西才”。在这场大劫中，西境擅长“创世”，构建宏大的底层逻辑；而东域则深谙“实践出真知”的无上心法。东域诸宗明白，闭门造车难成正果，唯有将法阵投入工厂、良田、闹市与深巷，在实践“磨砺”中方能悟出真经。无论是万业兼容，还是红尘入世，本质上都是在走一条“知行合一”的证道之路。正如东域古谚所云：“万物平等一体，道在大小、美丑、生死间无分别，大道通为一元”，智慧不应只悬于云端，更应在解决众生疾苦的实践中，淬炼出最坚韧的剑意。

真正的 AGI 也许不是某个孤立的架构，而是一个“拥有物理世界常识，且历经人间万象洗礼的逻辑推理引擎”。像那天衍外遁去的一，不可测的天机或变数：人居其中，顺天应变，渗透万物却非机械圆满，留有玄妙空间。

杨老祖没疯，奥宗主未狂，东域剑客亦不卑。他们只是提着不同的灯火，从不同的悬崖峭壁，去攀登同一座被云雾遮蔽的万仞高峰。西境在推演“因果”，东域在验证“知行”。当因果与知行在巅峰交汇，那扇紧闭万年的天门，终将会在众生的仰望中，轰然开启，终于“合一”。

道友，全书至此，已然气象万千。这场演义虽由我口述，但这大道之路，却需天下人共同去走走。

【番外短篇】

番外：西境锁灵，东域夺天

【序言补遗：翠衣老祖，灵石之劫】在诸神斗法之前，不得不提那位身着黑色皮甲、笑看风云的翠衣老祖黄仁勋（Jensen Huang）。世人皆争大罗金仙之位，唯有他掌管着寰宇间唯一的顶级灵脉——英伟达矿脉（GPU）。无论是奥派的通天塔，还是归墟的绝世剑，若无老祖提供的极品灵石 做阵眼，皆可是梦幻泡影。他双手一摊，天下灵石价格便暴涨十倍；他眉头一皱，哪家宗门的算力供给便要断流。江湖戏言：“任你道法通天，见了他，也得恭恭敬敬叫一声‘灵石商也（爷）’。

【天道陡转：禁运咒印，锁灵断路】然天道陡转，霸主重登，禁咒封天。西境霸主（Trump）重登宝座。这位统领行事乖戾、不按常理出牌。一心欲断东域仙途。西境各国在他的威逼利诱下合纵连横，设下重重 “禁运咒印” （Tariff），严禁极品灵石流入东域。不仅禁止老祖售卖顶级灵石，甚至连稍有灵气的“次品”也要层层加锁。在世俗社会中更欲将东域修士彻底排斥在“西方神界”的生态之外。一时间，东域诸宗哀鸿遍野，灵气断流，无数炼丹炉火熄灭，东域修行界陷入“灵石荒芜时代”。

【双刃之局：利弊互见，大能离心】江湖深处，智者早已看破这“昏愚之局”。统领虽“强”，其策却也是双刃剑。他严令老祖不得卖石，实则是自断财路，逼得老祖不得不私下通过各种“秘境中转“，改造一些灵石来维持生计。更重要的是，这重重封锁，生生扼杀了西方神界那股“万仙来朝”的包容气象，让天下顶尖的散修大能（人才与科学家）开始对西境心生嫌隙。

【基建证道：推山移海，根骨重塑】绝境之下，必有夺天造化者。东域诸神并未坐以待毙，这是国运之争，禁运虽如利刃锁喉，却也逼出了东域诸宗的“血性与自尊”，下定决心开启了“逆天改命，推山移海，根骨再造”之术。以华为等为首的炼器宗门，深挖土石，欲从凡铁中淬炼神金，誓要铸出东域自家的“国产灵石”。虽然初生之石尚有杂质，火候不及翠衣老祖那般纯青，但在归墟剑宗（DeepSeek）等宗门的“仙法剑意”下，竟也生生撑起了东域的一片天。只要这股基建狂魔之气不散，东域夺天，不过是时间问题。

百晓生批注：画地为牢，不见星火燎原。“西境统领那一手‘全面锁灵’，表面看是以雷霆手段维持霸权，实则是在替东域‘清道筑基’。他在东域四周筑起高墙，却不知这墙内已然燃起了星火。待到东域国产灵石大成、剑意自创一派之时，西境那座看似坚固的神坛，怕是要因为‘画地为牢’而逐渐枯萎。毕竟，这大道从来不是靠‘锁’出来的，而是靠‘行’出来的。这何尝不是另一个维度的东西大道之争”。

番外：抱脸阁，天道榜

蒙眼问心，真伪自现。各大宗门平日里在自家山头开坛讲法，皆自诩已得真传，号称“拳打奥派，脚踢谷宗”。但修真界自有公论，真正的修罗场，不在发布会的聚光灯下，却是在那名为“抱脸阁”（Hugging Face）与“竞技场”（LMSYS Arena）的中立秘境。

千人盲测，众生判官：在天道榜单（Leaderboard） 这里没有宗门的营销烟号，只有赤裸裸的法力厮杀。各大宗门需将自家的“器灵”真身投入其中，隐去姓名，接受天下散修的盲测对比。是真金还是顽铁，在千万人次的“斗法测试”下无所遁形。
群雄逐鹿，诸神黄昏： 昔日奥派的 GPT-4 曾凭一记“逻辑重锤”霸榜经年，压得万众窒息。然仙历二零二六，格局大变：归墟剑宗的 R1 剑走偏锋，以极简神识硬撼神坛；Meta 的 Llama 3 借万众信徒之力疯狂演化；安索派 （Anthropic）的 Claude 3.5 则凭一手“精准微操”反客为主。
气运之变，市值兴衰：在这座秘境中，榜单的每一次位次更迭，都如同天雷勾动地火，伴随着背后金主世家百亿灵石（市值）的灰飞烟灭或平地起雷。

百晓生：精血筑基，发际难存。“世人只看榜单上的排名，却不知这排名背后的残酷。为了那区区 10 点分数的提升，各大宗门不知烧坏了多少块极品灵石，熬秃了多少位大能的头顶。这哪里是榜单，这分明是用算力和发际线堆出来的‘封神榜’！”

作者后记： 凡尘智道，如露如电。闲时偶作，抛砖引玉。求君一乐，尽在不言中。

–END–

Zuckerberg’s Gamble: Risks and Rewards in AI Talent Acquisition

Mark Zuckerberg’s recent move to bring Alex Wang and his team into Meta represents a bold and strategic maneuver amid the rapid advancement of large models and AGI development. Putting aside the ethical considerations, Zuckerberg’s approach—laying off staff, then offering sky-high compensation packages with a 48-hour ultimatum to Top AI scientists and engineers from OpenAI , alongside Meta’s acquisition of a 49% stake in Scale AI—appears to serve multiple objectives:

1. Undermining Competitors

By poaching key talent from rival companies, Meta not only weakens their R&D teams and disrupts their momentum but also puts pressure on Google, OpenAI, and others to reassess their partnerships with Scale AI. Meta’s investment may further marginalize these competitors by injecting uncertainty into their collaboration with Scale AI.

2. Reinvigorating the Internal Team

Bringing in fresh blood like Alex Wang’s team and Open AI Top talents could reenergize Meta’s existing research units. A successful “talent reset” may help the company gain a competitive edge in the race toward AGI.

3. Enhancing Brand Visibility

Even if the move doesn’t yield immediate results, it has already amplified Meta’s media presence, boosting its reputation as a leader in AI innovation.

From both a talent acquisition and PR standpoint, this appears to be a masterstroke for Meta.

However, the strategy is not without significant risks:

1. Internal Integration and Morale Challenges

The massive compensation packages offered to those talents could trigger resentment among existing employees—especially in the wake of recent layoffs—due to perceived pay inequity. This may lower morale and even accelerate internal attrition. Cultural differences between the incoming and incumbent teams could further complicate internal integration and collaboration.

2. Return on Investment and Performance Pressure

Meta’s substantial investment in Alex Wang and Scale AI comes with high expectations for short-term deliverables. In a domain as uncertain as AGI, both the market and shareholders will be eager for breakthroughs. If Wang’s team fails to deliver measurable progress quickly, Meta could face mounting scrutiny and uncertainty over the ROI.

3. Impacts on Scale AI and the Broader Ecosystem

Alex Wang stepping away as CEO is undoubtedly a major loss for Scale AI, even if he retains a board seat. Leadership transitions and potential talent departures may follow. Moreover, Scale AI’s history of legal and compliance issues could reflect poorly on Meta’s brand—especially if public perception ties Meta to those concerns despite holding only non-voting shares. More broadly, Meta’s aggressive “poaching” approach may escalate the AI talent war, drive up industry-wide costs, and prompt renewed debate over ethics and hiring norms in the AI sector.

Conclusion
Meta’s latest move is undeniably ambitious. While it positions the company aggressively in the AGI race, it also carries notable risks in terms of internal dynamics, ROI pressure, and broader ecosystem disruption. Only time will tell whether this bold gamble pays off.

Our Future with AI: Three Strategies to Ensure It Stays on Our Side

As Artificial Intelligence rapidly evolves, ensuring it remains a beneficial tool rather than a source of unforeseen challenges is paramount; this article explores three critical strategies to keep AI firmly on our side. Our AI researchers can draw lessons from cybersecurity, robotics, and astrobiology side. Source: IEEE Spectrum April 2025; 3 Ways to Keep AI on Our Side: AI Researchers can Draw Lessons from Cybersecurity, Robotics, and Astrobiology

Play the podcast

中文翻译摘要

这篇文章提出了确保人工智能安全和有益发展的三个独特且跨学科的策略。

应对人工智能的独特错误模式：布鲁斯·施奈尔（Bruce Schneier）和内森·E·桑德斯（Nathan E. Sanders）（网络安全视角）指出，人工智能系统，特别是大型语言模型（LLMs），其错误模式与人类错误显著不同——它们更难预测，不集中在知识空白处，且缺乏对自身错误的自我意识。他们提出双重研究方向：一是工程化人工智能以产生更易于人类理解的错误（例如，通过RLHF等精炼的对齐技术）；二是开发专门针对人工智能独特“怪异”之处的新型安全与纠错系统（例如，迭代且多样化的提示）。

更新伦理框架以打击人工智能欺骗：达里乌什·杰米尔尼亚克（Dariusz Jemielniak）（机器人与互联网文化视角）认为，鉴于人工智能驱动的欺骗行为（包括深度伪造、复杂的错误信息宣传和操纵性人工智能互动）日益增多，艾萨克·阿西莫夫（Isaac Asimov）传统的机器人三定律已不足以应对现代人工智能。他提出一条“机器人第四定律”：机器人或人工智能不得通过冒充人类来欺骗人类。实施这项法律将需要强制性的人工智能披露、清晰标注人工智能生成内容、技术识别标准、法律执行以及公众人工智能素养倡议，以维护人机协作中的信任。

建立通用人工智能（AGI）检测与互动的严格协议：埃德蒙·贝戈利（Edmon Begoli）和阿米尔·萨多夫尼克（Amir Sadovnik）（天体生物学/SETI视角）建议，通用人工智能（AGI）的研究可以借鉴搜寻地外文明（SETI）的方法论。他们主张对AGI采取结构化的科学方法，包括：制定清晰、多学科的“通用智能”及相关概念（如意识）定义；创建超越图灵测试局限性的鲁棒、新颖的AGI检测指标和评估基准；以及制定国际公认的检测后协议，以便在AGI出现时进行验证、确保透明度、安全性和伦理考量。

总而言之，这些观点强调了迫切需要创新、多方面的方法——涵盖安全工程、伦理准则修订以及严格的科学协议制定——以主动管理先进人工智能系统的社会融入和潜在未来轨迹。

Abstract: this article presents three distinct, cross-disciplinary strategies for ensuring the safe and beneficial development of Artificial Intelligence.

Addressing Idiosyncratic AI Error Patterns (Cybersecurity Perspective): Bruce Schneier and Nathan E. Sanders highlight that AI systems, particularly Large Language Models (LLMs), exhibit error patterns significantly different from human mistakes—being less predictable, not clustered around knowledge gaps, and lacking self-awareness of error. They propose a dual research thrust: engineering AIs to produce more human-intelligible errors (e.g., through refined alignment techniques like RLHF) and developing novel security and mistake-correction systems specifically designed for AI’s unique “weirdness” (e.g., iterative, varied prompting).

Updating Ethical Frameworks to Combat AI Deception (Robotics & Internet Culture Perspective): Dariusz Jemielniak argues that Isaac Asimov’s traditional Three Laws of Robotics are insufficient for modern AI due to the rise of AI-enabled deception, including deepfakes, sophisticated misinformation campaigns, and manipulative AI interactions. He proposes a “Fourth Law of Robotics”: A robot or AI must not deceive a human being by impersonating a human being. Implementing this law would necessitate mandatory AI disclosure, clear labeling of AI-generated content, technical identification standards, legal enforcement, and public AI literacy initiatives to maintain trust in human-AI collaboration.

Establishing Rigorous Protocols for AGI Detection and Interaction (Astrobiology/SETI Perspective): Edmon Begoli and Amir Sadovnik suggest that research into Artificial General Intelligence (AGI) can draw methodological lessons from the Search for Extraterrestrial Intelligence (SETI). They advocate for a structured scientific approach to AGI that includes:

Developing clear, multidisciplinary definitions of “general intelligence” and related concepts like consciousness.
Creating robust, novel metrics and evaluation benchmarks for detecting AGI, moving beyond limitations of tests like the Turing Test.
Formulating internationally recognized post-detection protocols for validation, transparency, safety, and ethical considerations, should AGI emerge.

Collectively, these perspectives emphasize the urgent need for innovative, multi-faceted approaches—spanning security engineering, ethical guideline revision, and rigorous scientific protocol development—to proactively manage the societal integration and potential future trajectory of advanced AI systems.

Here are the full detailed content：

3 Ways to Keep AI on Our Side

AS ARTIFICIAL INTELLIGENCE reshapes society, our traditional safety nets and ethical frameworks are being put to the test. How can we make sure that AI remains a force for good? Here we bring you three fresh visions for safer AI.

In the first essay, security expert Bruce Schneier and data scientist Nathan E. Sanders explore how AI’s “weird” error patterns create a need for innovative security measures that go beyond methods honed on human mistakes.

Dariusz Jemielniak, an authority on Internet culture and technology, argues that the classic robot ethics embodied in Isaac Asimov’s famous rules of robotics need an update to counterbalance AI deception and a world of deepfakes.

And in the final essay, the AI researchers Edmon Begoli and Amir Sadovnik suggest taking a page from the search for intelligent life in the stars; they propose rigorous standards for detecting the possible emergence of human-level AI intelligence.

As AI advances with breakneck speed, these cross-disciplinary strategies may help us keep our hands on the reins.

AI Mistakes Are Very Different from Human Mistakes

WE NEED NEW SECURITY SYSTEMS DESIGNED TO DEAL WITH THEIR WEIRDNESS

Bruce Schneier & Nathan E. Sanders

HUMANS MAKE MISTAKES all the time. All of us do, every day, in tasks both new and routine. Some of our mistakes are minor, and some are catastrophic. Mistakes can break trust with our friends, lose the confidence of our bosses, and sometimes be the difference between life and death.

Over the millennia, we have created security systems to deal with the sorts of mistakes humans commonly make. These days, casinos rotate their dealers regularly, because they make mistakes if they do the same task for too long. Hospital personnel write on patients’ limbs before surgery so that doctors operate on the correct body part, and they count surgical instruments to make sure none are left inside the body. From copyediting to double-entry bookkeeping to appellate courts, we humans have gotten really good at preventing and correcting human mistakes.

Humanity is now rapidly integrating a wholly different kind of mistakemaker into society: AI. Technologies like large language models (LLMs) can perform many cognitive tasks traditionally fulfilled by humans, but they make plenty of mistakes. You may have heard about chatbots telling people to eat rocks or add glue to pizza. What differentiates AI systems’ mistakes from human mistakes is their weirdness. That is, AI systems do not make mistakes in the same ways that humans do.

Much of the risk associated with our use of AI arises from that difference. We need to invent new security systems that adapt to these differences and prevent harm from AI mistakes.

IT’S FAIRLY EASY to guess when and where humans will make mistakes. Human errors tend to come at the edges of someone’s knowledge: Most of us would make mistakes solving calculus problems. We expect human mistakes to be clustered: A single calculus mistake is likely to be accompanied by others. We expect mistakes to wax and wane depending on factors such as fatigue and distraction. And mistakes are typically accompanied by ignorance: Someone who makes calculus mistakes is also likely to respond “I don’t know” to calculus-related questions.

To the extent that AI systems make these humanlike mistakes, we can bring all of our mistake-correcting systems to bear on their output. But the current crop of AI models—particularly LLMs—make mistakes differently.

AI errors come at seemingly random times, without any clustering around particular topics. The mistakes tend to be more evenly distributed through the knowledge space; an LLM might be equally likely to make a mistake on a calculus question as it is to propose that cabbages eat goats. And AI mistakes aren’t accompanied by ignorance. An LLM will be just as confident when saying something completely and obviously wrong as it will be when saying something true.

The inconsistency of LLMs makes it hard to trust their reasoning in complex, multistep problems. If you want to use an AI model to help with a business problem, it’s not enough to check that it understands what factors make a product profitable; you need to be sure it won’t forget what money is.

THIS SITUATION INDICATES two possible areas of research: engineering LLMs to make mistakes that are more humanlike, and building new mistake-correcting systems that deal with the specific sorts of mistakes that LLMs tend to make.

We already have some tools to lead LLMs to act more like humans. Many of these arise from the field of “alignment” research, which aims to make models act in accordance with the goals of their human developers. One example is the technique that was arguably responsible for the breakthrough success of ChatGPT: reinforcement learning with human feedback. In this method, an AI model is rewarded for producing responses that get a thumbs-up from human evaluators. Similar approaches could be used to induce AI systems to make humanlike mistakes, particularly by penalizing them more for mistakes that are less intelligible.

When it comes to catching AI mistakes, some of the systems that we use to prevent human mistakes will help. To an extent, forcing LLMs to double-check their own work can help prevent errors. But LLMs can also confabulate seemingly plausible yet truly ridiculous explanations for their flights from reason.

Other mistake-mitigation systems for AI are unlike anything we use for humans. Because machines can’t get fatigued or frustrated, it can help to ask an LLM the same question repeatedly in slightly different ways and then synthesize its responses. Humans won’t put up with that kind of annoying repetition, but machines will.

RESEARCHERS ARE still struggling to understand where LLM mistakes diverge from human ones. Some of the weirdness of AI is actually more humanlike than it first appears.

Small changes to a query to an LLM can result in wildly different responses, a problem known as prompt sensitivity. But, as any survey researcher can tell you, humans behave this way, too. The phrasing of a question in an opinion poll can have drastic impacts on the answers.

LLMs also seem to have a bias toward repeating the words that were most common in their training data—for example, guessing familiar place names like “America” even when asked about more exotic locations. Perhaps this is an example of the human “availability heuristic” manifesting in LLMs; like humans, the machines spit out the first thing that comes to mind rather than reasoning through the question. Also like humans, perhaps, some LLMs seem to get distracted in the middle of long documents; they remember more facts from the beginning and end.

In some cases, what’s bizarre about LLMs is that they act more like humans than we think they should. Some researchers have tested the hypothesis that LLMs perform better when offered a cash reward or threatened with death. It also turns out that some of the best ways to “jailbreak” LLMs (getting them to disobey their creators’ explicit instructions) look a lot like the kinds of social-engineering tricks that humans use on each otherfor example, pretending to be someone else or saying that the request is just a joke. But other effective jailbreaking techniques are things no human would ever fall for. One group found that if they used ASCII art (constructions of symbols that look like words or pictures) to pose dangerous questions, like how to build a bomb, the LLM would answer them willingly.

Humans may occasionally make seemingly random, incomprehensible, and inconsistent mistakes, but such occurrences are rare and often indicative of more serious problems. We also tend not to put people exhibiting these behaviors in decision-making positions. Likewise, we should confine AI decision-making systems to applications that suit their actual abilities—while keeping the potential ramifications of their mistakes firmly in mind.

Asimov’s Laws of Robotics Need an Update for AI PROPOSING A FOURTH LAW OF ROBOTICS

Dariusz Jemielniak

IN 1942, the legendary science fiction author Isaac Asimov introduced his Three Laws of Robotics in his short story “Runaround.” The laws were later popularized in his seminal story collection I, Robot.

FIRST LAW: A robot may not injure a human being or, through inaction, allow a human being to come to harm.
SECOND LAW: A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
THIRD LAW: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

While drawn from works of fiction, these laws have shaped discussions of robot ethics for decades. And as AI systems—which can be considered virtual robots—have become more sophisticated and pervasive, some technologists have found Asimov’s framework useful for considering the potential safeguards needed for AI that interacts with humans.

But the existing three laws are not enough. Today, we are entering an era of unprecedented human-AI collaboration that Asimov could hardly have envisioned. The rapid advancement of generative AI, particularly in language and image generation, has created challenges beyond Asimov’s original concerns about physical harm and obedience.

THE PROLIFERATION of AI-enabled deception is particularly concerning. According to the FBI’s most recent Internet Crime Report, cybercrime involving digital manipulation and social engineering results in annual losses counted in the billions. The European Union Agency for Cybersecurity’s ENISA Threat Landscape 2023 highlighted deepfakes—synthetic media that appear genuine—as an emerging threat to digital identity and trust.

Social-media misinformation is a huge problem today. I studied it during the pandemic extensively and can say that the proliferation of generative AI tools has made its detection increasingly difficult. AI-generated propaganda is often just as persuasive as or even more persuasive than traditional propaganda, and bad actors can very easily use AI to create convincing content. Deepfakes are on the rise everywhere. Botnets can use AI-generated text, speech, and video to create false perceptions of widespread support for any political issue. Bots are now capable of making phone calls while impersonating people, and AI scam calls imitating familiar voices are increasingly common. Any day now, we can expect a boom in video-call scams based on AI-rendered overlay avatars, allowing scammers to impersonate loved ones and target the most vulnerable populations.

Even more alarmingly, children and teenagers are forming emotional attachments to AI agents, and are sometimes unable to distinguish between interactions with real friends and bots online. Already, there have been suicides attributed to interactions with AI chatbots.

In his 2019 book Human Compatible (Viking), the eminent computer scientist Stuart Russell argues that AI systems’ ability to deceive humans represents a fundamental challenge to social trust. This concern is reflected in recent policy initiatives, most notably the European Union’s AI Act, which includes provisions requiring transparency in AI interactions and transparent disclosure of AI-generated content. In Asimov’s time, people couldn’t have imagined the countless ways in which artificial agents could use online communication tools and avatars to deceive humans.

Therefore, we must make an addition to Asimov’s laws.

FOURTH LAW: A robot or AI must not deceive a human being by impersonating a human being.

WE NEED CLEAR BOUNDARIES. While human-AI collaboration can be constructive, AI deception undermines trust and leads to wasted time, emotional distress, and misuse of resources. Artificial agents must identify themselves to ensure our interactions with them are transparent and productive. AI-generated content should be clearly marked unless it has been significantly edited and adapted by a human.

Implementation of this Fourth Law would require

mandatory AI disclosure in direct interactions,
clear labeling of AI-generated content,
technical standards for AI identification,
legal frameworks for enforcement, and
educational initiatives to improve AI literacy.

Of course, all this is easier said than done. Enormous research efforts are already underway to find reliable ways to watermark or detect AI-generated text, audio, images, and videos. But creating the transparency I’m calling for is far from a solved problem.

The future of human-AI collaboration depends on maintaining clear distinctions between human and artificial agents. As noted in the IEEE report Ethically Aligned Design, transparency in AI systems is fundamental to building public trust and ensuring the responsible development of artificial intelligence.

Asimov’s complex stories showed that even robots that tried to follow the rules often discovered there were unintended consequences to their actions. Still, having AI systems that are at least trying to follow Asimov’s ethical guidelines would be a very good start.

What Can AI Researchers Learn from Alien Hunters?

THE SETI INSTITUTE’S APPROACH HAS LESSONS FOR RESEARCH ON ARTIFICIAL GENERAL INTELLIGENCE

Edmon Begoli & Amir Sadovnik

THE EMERGENCE OF artificial general intelligence (systems that can perform any intellectual task a human can) could be the most important event in human history. Yet AGI remains an elusive and controversial concept. We lack a clear definition of what it is, we don’t know how to detect it, and we don’t know how to interact with it if it finally emerges.

What we do know is that today’s approaches to studying AGI are not nearly rigorous enough. Companies like OpenAI are actively striving to create AGI, but they include research on AGI’s social dimensions and safety issues only as their corporate leaders see fit. And academic institutions don’t have the resources for significant efforts.

We need a structured scientific approach to prepare for AGI. A useful model comes from an unexpected field: the search for extraterrestrial intelligence, or SETI. We believe that the SETI Institute’s work provides a rigorous framework for detecting and interpreting signs of intelligent life.

The idea behind SETI goes back to the beginning of the space age. In their 1959 Nature paper, the physicists Giuseppe Cocconi and Philip Morrison suggested ways to search for interstellar communication. Given the uncertainty of extraterrestrial civilizations’ existence and sophistication, they theorized about how we should best “listen” for messages from alien societies.

We argue for a similar approach to studying AGI, in all its uncertainties. The last few years have shown a vast leap in AI capabilities. The large language models (LLMs) that power chatbots like ChatGPT and enable them to converse convincingly with humans have renewed the discussion of AGI. One notable 2023 preprint even argued that ChatGPT shows “sparks” of AGI, and today’s most cutting-edge language models are capable of sophisticated reasoning and outperform humans in many evaluations.

While these claims are intriguing, there are reasons to be skeptical. In fact, a large group of scientists have argued that the current set of tools won’t bring us any closer to true AGI. But given the risks associated with AGI, if there is even a small likelihood of it occurring, we must make a serious effort to develop a standard definition of AGI, establish a SETI-like approach to detecting it, and devise ways to safely interact with it if it emerges.

THE CRUCIAL FIRST step is to define what exactly to look for. In SETI’s case, researchers decided to look for certain narrowband signals that would be distinct from other radio signals present in the cosmic background. These signals are considered intentional and only produced by intelligent life. None have been found so far.

In the case of AGI, matters are far more complicated. Today, there is no clear definition of artificial general intelligence. The term is hard to define because it contains other imprecise and controversial terms. Although intelligence has been defined by the Oxford English Dictionary as “the ability to acquire and apply knowledge and skills,” there is still much debate on which skills are involved and how they can be measured. The term general is also ambiguous. Does an AGI need to be able to do absolutely everything a human can do?

One of the first missions of a “SETI for AGI” project must be to clearly define the terms general and intelligence so the research community can speak about them concretely and consistently. These definitions need to be grounded in disciplines such as computer science, measurement science, neuroscience, psychology, mathematics, engineering, and philosophy.

There’s also the crucial question of whether a true AGI must include consciousness and self-awareness. These terms also have multiple definitions, and the relationships between them and intelligence must be clarified. Although it’s generally thought that consciousness isn’t necessary for intelligence, it’s often intertwined with discussions of AGI because creating a self-aware machine would have many philosophical, societal, and legal implications.

NEXT COMES the task of measurement. In the case of SETI, if a candidate narrowband signal is detected, an expert group will verify that it is indeed from an extraterrestrial source. They’ll use established criteria—for example, looking at the signal type and checking for repetition—and conduct assessments at multiple facilities for additional validation.

How to best measure computer intelligence has been a long-standing question in the field. In a famous 1950 paper, Alan Turing proposed the “imitation game,” more widely known as the Turing Test, which assesses whether human interlocutors can distinguish if they are chatting with a human or a machine. Although the Turing Test was useful in the past, the rise of LLMs has made clear that it isn’t a complete enough test to measure intelligence. As Turing himself noted, the relationship between imitating language and thinking is still an open question.

Future appraisals must be directed at different dimensions of intelligence. Although measures of human intelligence are controversial, IQ tests can provide an initial baseline to assess one dimension. In addition, cognitive tests on topics such as creative problem-solving, rapid learning and adaptation, reasoning, and goal-directed behavior would be required to assess general intelligence.

But it’s important to remember that these cognitive tests were designed for humans and might contain assumptions that might not apply to computers, even those with AGI abilities. For example, depending on how it’s trained, a machine may score very high on an IQ test but remain unable to solve much simpler tasks. In addition, an AI may have new abilities that aren’t measurable by our traditional tests. There’s a clear need to design novel evaluations that can alert us when meaningful progress is made toward AGI.

IF WE DEVELOP AGI, we must be prepared to answer questions such as: Is the new form of intelligence a new form of life? What kinds of rights does it have? What are the potential safety concerns, and what is our approach to containing the AGI entity?

Here, too, SETI provides inspiration. SETI’s postdetection protocols emphasize validation, transparency, and international cooperation, with the goal of maximizing the credibility of the process, minimizing sensationalism, and bringing structure to such a profound event. Likewise, we need internationally recognized AGI protocols to bring transparency to the entire process, apply safety-related best practices, and begin the discussion of ethical, social, and philosophical concerns.

We readily acknowledge that the SETI analogy can go only so far. If AGI emerges, it will be a human-made phenomenon. We will likely gradually engineer AGI and see it slowly emerge, so detection might be a process that takes place over a period of years, if not decades. In contrast, the existence of extraterrestrial life is something that we have no control over, and contact could happen very suddenly.

The consequences of a true AGI are entirely unpredictable. To best prepare, we need a methodical approach to defining, detecting, and interacting with AGI, which could be the most important development in human history.

Is the AI PC a Gimmick or a Faster Carriage?

TL,DL: The post discusses the impact of AI on productivity, particularly through the emergence of AI PCs powered by localized edge AI. It highlights how large language models and the Core Ultra processor enable AI PCs to handle diverse tasks efficiently and securely. The article also touches on the practical applications and benefits of AI PCs in various fields. The comprehensive overview emphasizes the transformative potential of AI PCs and their pivotal role in shaping the future of computing.

Translation from the Source: AI PC 是噱头还是更快的马车？

Is AI a Bubble or a Marketing Gimmick?

Since 2023, everyone has known that AI is very hot, very powerful, and almost magical. It can generate articles with elegant language and write comprehensive reports, easily surpassing 80% or even more of human output. As for text-to-image generation, music composition, and even videos, there are often impressive results. There’s no need to elaborate on its hype…

For professions like designers and copywriters, generative AI has indeed helped them speed up the creative process, eliminating the need to start from scratch. Due to its high efficiency, some people in these positions might even face the worry of losing their jobs. But for ordinary people, aside from being a novelty, AI tools like OpenAI and Stable Diffusion don’t seem to provide much practical help for their work. After all, most people don’t need to write well-structured articles or compose poems regularly. Moreover, after seeing many AI outputs, they often feel that they are mostly correct but useless information—helpful, but not very impactful.

So, when a phone manufacturer says it will no longer produce “traditional phones,” people scoff. When the concept of an AI PC emerges, it’s hard not to see it as a marketing gimmick. However, after walking around the exhibition area at Intel’s 2024 commercial client AI PC product launch, I found AI to be more useful than I imagined. Yes, useful—not needing to be breathtaking, but very useful.

The fundamental change in experience brought by localized edge AI

Since it is a commercial PC, it cannot be separated from the productivity tool attribute. If you don’t buy the latest hardware and can’t run the latest software versions, it’s easy to be labeled as having “low application skills.” Take Excel as an example. The early understanding of efficiency in Excel was using formulas for automatic calculations. Later, it was about macro code for automatic data filtering, sorting, exporting, etc., though this was quite difficult. A few years ago, learning Python seemed to be the trend, and without it, one was not considered competent in data processing. Nowadays, with data visualization being the buzzword, most Excel users have to search for tutorials online and learn on the spot for unfamiliar formulas. Complex operations often require repeated attempts.

So, can adding “AI” to a PC or installing an AI assistant make it trendy? After experiencing it firsthand, I can confirm that the AI PC is far from superficial. There is a company called ExtendOffice, specializing in Office plugins, which effectively solves the pain points of using Excel awkwardly: you just state your intention, and the AI assistant directly performs operations on the Excel sheet, such as currency conversion or encrypting a column of data. There’s no need to figure out which formula or function corresponds to your needs, no need to search for tutorials, and it skips the step-by-step learning process—the AI assistant handles it immediately.

This highlights a particularly critical selling point of the AI PC: localization, and based on that, it can be embedded into workflows and directly participate in processing. We Chinese particularly love learning, always saying “teaching someone to fish is better than giving them a fish,” but the learning curve for fishing is too long. In an AI PC, you can get both the fish and the fishing skills because the fisherman (AI assistant) is always in front of you, not to mention it can also act as a chef or secretary.

Moreover, the “embedding” mentioned earlier is not limited to a specific operation (like adding a column of data or a formula to Excel). It can generate multi-step, cross-software operations. This demonstrates the advantage of large language models: they can accept longer inputs, understand, and break them down. For example, we can tell the AI PC: “Mute the computer, then open the last read document and send it to a certain email.” Notably, as per the current demonstration, there is no need to specify the exact document name; vague instructions are understandable. Another operation that pleasantly surprised me was batch renaming files. In Windows, batch renaming files requires some small techniques and can only change them into regular names (numbers, letter suffixes, etc.). But with the help of an AI assistant, we can make file names more personalized: adding relevant customer names, different styles, etc. This seemingly simple task actually involves looking at each file, extracting key information, and even describing some abstract information based on self-understanding, then individually writing new file names—a very tedious process that becomes time-consuming with many files. With the AI assistant, it’s just a matter of saying a sentence. Understanding longer contexts, multi-modal inputs, etc., all rely on the capabilities of large language models, but this is running locally, not relying on cloud inference. Honestly, no one would think that organizing file names in the local file system requires going to the cloud, right? The hidden breaks between the edge and the cloud indeed limit our imagination, so these local operations of the AI PC really opened my mind.

Compared to the early familiar cloud-based AI tools, localization brings many obvious benefits. For instance, even when offline, natural language processing and other operations can be completed. For those early users who heavily relied on large models and encountered service failures, “the sky is falling” was a pain point. Not to mention scenarios without internet, like on a plane, maintaining continuous availability is a basic need.

Local deployment can also address data security issues. Since the rise of large models, there have been frequent news of companies accidentally leaking data. Using ChatGPT for presentations, code reviews, etc., is great, but it requires uploading documents to the cloud. This has led many companies to outright ban employees from using ChatGPT. Subsequently, many companies chose to train and fine-tune private large models using open-source models and internal data, deploying them on their own servers or cloud hosts. Furthermore, we now see that a large model with 20 billion parameters can be deployed on an AI PC based on the Core Ultra processor.

These large models deployed on AI PCs have already been applied in various vertical fields such as education, law, and medicine, generating knowledge graphs, contracts, legal opinions, and more. For example, inputting a case into ThunderSoft’s Cube intelligent legal assistant can analyze the case, find relevant legal provisions, draft legal documents, etc. In this scenario, the privacy of the case should be absolutely guaranteed, and lawyers wouldn’t dare transmit such documents to the cloud for processing. Doctors have similar constraints. For research based on medical cases and genetic data, conducting genetic target and pharmacological analyses on a PC eliminates the need to purchase servers or deploy private clouds.

Incidentally, the large model on the AI PC also makes training simpler than imagined. Feeding the local files visible to you into the AI assistant can solve the problem of “correct nonsense” that previous chatbots often produced. For example, generating a quote email template with AI is easy, but it’s normal for a robot to not understand key information like prices, which requires human refinement. If a person handles this, preparing a price list in advance is a reasonable requirement, right? Price lists and FAQs need to be summarized and refined, then used to train newcomers more effectively—that’s the traditional view. Local AI makes this simple: let it read the Outlook mailbox, and it will learn the corresponding quotes from historical emails. The generated emails won’t just be template-level but will be complete with key elements. Our job will be to confirm whether the AI’s output is correct. And these learning outcomes can be inherited.

Three Major AI Engines Support Local Large Models

In the information age, we have experienced several major technological transformations. First was the popularization of personal computers, then the internet, and then mobile internet. Now we are facing the empowerment and even restructuring of productivity by AI. The AI we discuss today is not large-scale clusters for training or inference in data centers but the PCs at our fingertips. AIGC, video production, and other applications for content creators have already continuously amazed the public. Now we further see that AI PCs can truly enhance the work efficiency of ordinary office workers: handling trivial tasks, making presentations, writing emails, finding legal provisions, etc., and seamlessly filling in some of our skill gaps, such as using unfamiliar Excel functions, creating supposedly sophisticated knowledge graphs, and so on. All this relies not only on the “intelligent emergence” of large language models but also on sufficiently powerful performance to support local deployment.

We frequently mention the “local deployment” of large models, which relies on strong AI computing power at the edge. The so-called AI PC relies on the powerful CPU+GPU+NPU triad AI engines of the Core Ultra processor, whose computing power is sufficient to support the local operation of a large language model with 20 billion parameters. As for AIGC applications represented by text-to-image generation, they are relatively easy.

Fast CPU Response: The CPU can be used to run traditional, diverse workloads and achieve low latency. The Core Ultra adopts advanced Intel 4 manufacturing process, allowing laptops to have up to 16 cores and 22 threads, with a turbo frequency of up to 5.1GHz.

High GPU Throughput: The GPU is ideal for large workloads that require parallel throughput. The Core Ultra comes standard with Arc GPU integrated graphics. The Core Ultra 7 165H includes 8 Xe-LPG cores (128 vector engines), and the Core Ultra 5 125H includes 7. Moreover, this generation of integrated graphics supports AV1 hardware encoding, enabling faster output of high-quality, high-compression-rate videos. With its leading encoding and decoding capabilities, the Arc GPU has indeed built a good reputation in the video editing industry. With a substantial increase in vector engine capabilities, many content creation ISVs have demonstrated higher efficiency in smart keying, frame interpolation, and other functions based on AI PCs.

Efficient NPU: The newly introduced NPU (Neural Processing Unit) in the Core Ultra provides 10 times the efficiency of traditional CPUs and GPUs in processing AI workloads. As an AI acceleration engine, it allows the NPU to handle high-complexity, high-demand AI workloads, greatly reducing energy consumption.

Edge AI has unlimited possibilities, and its greatest value is precisely in practicality. With sufficient computing power, whether through large-scale language models or other models, it can indeed increase the efficiency of content production and indirectly enhance the operational efficiency of every office worker.

For commercial AI PCs, Intel has also launched the vPro® platform based on Intel® Core™ Ultra, which organically combines AI with the productivity, security, manageability, and stability of the commercial platform. Broadcom demonstrated that vPro-based AI PC intelligent management transforms traditional asset management from passive to proactive: previously, it was only possible to see whether devices were “still there” and “usable,” and operations like patch upgrades were planned; with AI-enhanced vPro, it can autonomously analyze device operation, identify potential issues, automatically match corresponding patch packages, and push suggestions to maintenance personnel. Beirui’s Sunflower has an AI intelligent remote control report solution, where remote monitoring of PCs is no longer just screen recording and capturing but can automatically and in real-time identify and generate remote work records of the computer, including marking sensitive operations such as file deletion and entering specific commands. This significantly reduces the workload of maintenance personnel in checking and tracing records.

The Future is Here: Hundreds of ISVs Realizing Actual Business Applications Henry Ford once commented on the invention of the automobile: “If you ask your customers what they need, they will say they need a faster horse.”

“A faster horse” is a consumer trap. People who think AI phones and AI PCs are just gimmicks might temporarily not see the need to upgrade their horse based on convention. More deeply, the public has some misunderstandings about the implementation of AI, which manifests in two extremes: one extreme thinks it’s something for avant-garde heavy users and flagship configurations, typically in scenarios like image and video processing; the other extreme sees it as refreshing chatbots, like an enhanced search engine, useful but not necessary. In reality, the implementation of AI PCs far exceeds the imagination of many people: for commercial customers, Intel has deeply optimized cooperation with more than 100 ISVs worldwide, and over 35 local ISVs have optimized integration at the terminal, creating a huge AI ecosystem with over 300 ISV features, bringing an unprecedented AI PC experience!

Moreover, I do not think this scale of AI application realization is pie in the sky or “fighting the future.” Because in my eyes, the display of numerous AI PC solutions is like an “OpenVINO™ party.” OpenVINO™ is a cross-platform deep learning toolkit developed by Intel, meaning “Open Visual Inference and Neural Network Optimization.” This toolkit was actually released in 2018, and over the years, it has accumulated a large number of computer vision and deep learning inference applications. By the time of the Iris Xe integrated graphics era, the software and hardware combination already had a strong reputation. For example, relying on a mature algorithm store, various AI applications can be easily built on the 11th generation Core platform, from behavior detection for smart security to automatic inventory checking in stores, with quite good results. Now, as AI PC integrated graphics evolve to Xe-LPG, with doubled computing power, the various applications accumulated by OpenVINO™ will perform even better, achieving the “location” (sustainable Xe engine) and “harmony” (ISV resources of OpenVINO™) that are already in place.

What truly ignites the AI PC is “timing,” namely, the practicalization of large language models. The breakthrough of large language models has effectively solved the problems of natural language interaction and data training, greatly lowering the threshold for ordinary users to utilize AI computing power. Earlier, I cited many examples embedded in office applications. Here, I can give another example: the combination of Kodong Intelligent Controller’s multimodal visual language model with a robotic arm. The robotic arm is a common robot application, which has long been able to perform various operations with machine vision, such as moving and sorting objects. However, traditionally, object recognition and operation require pre-training and programming. With the integration of large language models, the whole system can perform multimodal instruction recognition and execution. For instance, we can say: “Put the phone on that piece of paper.” In this scenario, we no longer need to teach the robot what a phone is, what paper is, do not need to give specific coordinates, and do not need to plan the moving path. Natural language instructions and camera images are well integrated, and execution instructions for the robotic arm are generated automatically. For such industrial scenarios, the entire process can be completed on a laptop-level computing platform, and the data does not need to leave the factory.

Therefore, what AI PC brings us is definitely not just “a faster horse,” but it subverts the way PCs are used and expands the boundaries of user capabilities. Summarizing the existing ISVs and solutions, we can categorize AI PC applications into six major scenarios:

AI Chatbot: More professional Q&A for specific industries and fields.
AI PC Assistant: Directly operates the PC, handling personal files, photos, videos, etc.
AI Office Assistant: Office plugins to enhance office software usage efficiency.
AI Local Knowledge Base: RAG (Retrieval Augmented Generation) applications, including various text and video files.
AI Image and Video Processing: Generation and post-processing of multimedia information such as images, videos, and audio.
AI PC Management: More intelligent and efficient device asset and security management.

Summary

It is undeniable that the development of AI always relies on the technological innovation and combination of hardware and software. AI PCs based on Core Ultra are first of all faster, stronger, lower power consumption, and longer battery life PCs. These hardware features support AI applications that bring deeper changes to our usage experience and modes. PCs empowered with “intelligent emergence” are no longer just productivity tools; in some scenarios, they can directly transform into collaborators or even operators. Behind this are performance improvements brought by microarchitecture and production process advancements, as well as the empowerment of new productivity like large language models.

If we regard CPU, GPU, and NPU as the three major computing powers of AI PCs, correspondingly, the value of AI PCs for localizing AI (on the client side) can be summarized into three major rules: economy, physics, and data confidentiality. The so-called economy means that processing data locally can reduce cloud service costs and optimize economic efficiency; physics corresponds to the “virtual” nature of cloud resources, where local AI services can provide better timeliness, higher accuracy, and avoid transmission bottlenecks between the cloud and the client; data confidentiality means that user data stays completely local, preventing misuse and leakage.

In 2023, the rapid advancement of large language models achieved the AI era in the cloud. In 2024, the client-side implementation of large language models ushered in the AI PC era. We also look forward to AI continuously solidifying applications in the intertwined development of the cloud and the client, continuously releasing powerful productivity; and we look forward to Intel jointly advancing with ISV+OEM in the future to provide us with even stronger “new productivity.”

AI PC 是噱头还是更快的马车？

AI 是虚火还是营销噱头？

2023 年以来，所有人都知道 AI 非常的热、非常的牛、非常的神，生成的文章辞藻华丽、写的报告面面俱到，毫不谦虚地说，打败 80% 甚至更多的人类。至于文生图、作曲，甚至是视频，都常有令人惊艳的作品。吹爆再吹爆，无需赘述……

对于设计师、文案策划等职业，生成式 AI 确实已经帮助他们提高了迸发创意的速度，至少不必万丈高楼平地起了。由于效率太高，这些岗位中的部分人可能反而要面对失业的烦恼。但对于普通人，AI 除了猎奇，OpenAI、SD 等时髦玩意儿好像对工作也没啥实质性的帮助——毕竟平时不需要写什么四平八稳的文章，更不需要吟诗作赋，而且见多了 AI 的输出，也实在觉得多是些正确的废话，有用，但也没啥大用。

所以，当某手机厂商说以后不生产“传统手机”的时候，大家嗤之以鼻。当 AI PC 概念出现的时候，也难免觉得是营销噱头。但是，当我在 2024 英特尔商用客户端 AI PC 产品发布会的展区走了一圈之后，我发现 AI 比我想象中的更有用。是的，有用，不需要技惊四座，但，很有用。

端侧 AI 的本地化落地带来根本性的体验变化

既然是商用 PC，那就离不开生产力工具属性。如果不买最新的硬件，玩不转最新的软件版本，很容易在鄙视链中打上“应用水平低下”的标签。就拿 Excel 为例吧，最早接触 Excel 的时候，对效率的理解是会用公式，自动进行一些计算等。再然后，是宏代码，自动执行数据的筛选、排序、导出等等，但这个难度还是比较大的。前几年呢，又似乎流行起了 Python，不去学一下那都不配谈数据处理了。在言必称数据可视化的当下，多数 Excel 用户的真实情况是尝试陌生的公式都需要临时百度一下教程，现学现用，稍复杂的操作可能要屡败屡试。

那 PC 前面加上 “AI”，或者装上某个 AI 助理，就可以赶时髦了吗？我实际体验之后，确定 AI PC 绝非如此浅薄。在 AI PC 上，有个专门做 Office 插件的公司叫 ExtendOffice，就很好地解决了 Excel 用起来磕磕绊绊的痛点：你只要说出你的意图，AI 助手马上直接在 Excel 表格上进行操作，譬如币值转换，甚至加密某一列数据。不需要去琢磨脑海里的需求到底需要对应哪个公式或者功能才可以实现，不用去查找教程，也跳过了 step by step 的学习，AI 助手当场就处理完了。

这就体现了 AI PC 一个特别关键的卖点：本地化，且在此基础上，可以嵌入工作流程，直接参与处理。我们中国人特别热爱学习，总说“授人以鱼不如授人以渔”，但“渔”的学习曲线太长了。在 AI PC 里，鱼和渔可以同时获得，因为渔夫（AI 助手）随时都在你眼前，更不要说它还可以当厨师、当秘书。

而且，刚才说的“嵌入”并不局限于某一个操作环节（类似于刚才说的给 Excel 增加某一列数据、公式），而是可以生成一个多步骤的、跨软件的操作。这也体现了大语言模型的优势：可以接受较长的输入并理解、分拆。譬如，我们完全可以对 AI PC 说：帮我将电脑静音，然后打开上次阅读的文档，并把它发送给某某邮箱。需要强调的是，以目前的演示，不需要指定准确的文档名，模糊的指示是可以理解的。还有一个让我暗暗叫好的操作是批量修改文件名。在 Windows 下批量修改文件名是需要一些小技巧的，而且，只能改成有规律的文件名（数字、字母后缀）等，但在 AI 助手的帮助下，我们可以让文件名更有个性：分别加上相关客户的名字、不同的风格类型等等。这事说起来简单，但其实需要挨个查看文件、提取关键信息，甚至根据自我理解去描述一些抽象的信息，然后挨个编写新的文件名——过程非常琐碎，文件多了就很费时间，但有了 AI 助手，这就是一句话的事。理解较长的上下文、多模态输入等等，这些都必须依赖大语言模型的能力，但其实是在本地运行的，而非借助云端的推理能力。讲真，应该没有人会认为整理文件名这种本地文件系统的操作还需要去云端绕一圈吧？从端到云之间隐藏的各种断点确实限制了我们的想象力，因此，AI PC 的这些本地操作真的打开了我的思路。

相对于大家早期较为熟悉的基于云端的 AI 工具，本地化还带来了很多显而易见的好处。譬如，断网的情况下，也是可以完成自然语言的处理和其他的操作。这对于那些曾经重度依赖大模型能力，且遭遇过服务故障的早期大模型用户而言，“天塌了”就是痛点。更不要说坐飞机之类的无网络场景了，保持连续的可用性是一个很朴素的需求。

本地部署还可以解决数据安全问题。大模型爆火之初就屡屡传出某某公司不慎泄露数据的新闻。没办法，用 ChatGPT 做简报、检查代码等等确实很香啊，但前提是得把文档上传到云端。这就导致许多企业一刀切禁止员工使用 ChatGPT。后来的事情就是许多企业选择利用开源大模型和内部数据训练、微调私有的大模型，并部署在自有的服务器或云主机上。更进一步的，现在我们看到规模 200 亿参数的大模型可以部署在基于酷睿 Ultra 处理器的 AI PC 上。

这种部署在 AI PC 上的大模型已经涉及教育、法律、医学等多个垂直领域，可以生成包括知识图谱、合同、法律意见等。譬如，将案情输入中科创达的魔方智能法务助手，就可以进行案情分析，查找相关的法律条文，撰写法律文书等。在这个场景中，很显然案情的隐私是应该绝对保证的，律师不敢将这种文档传输到云端处理。医生也有类似的约束，基于病例、基因数据等进行课题研究，如果能够在 PC 上做基因靶点、药理分析等，就不必采购服务器或者部署私有云了。

顺便一提的是，AI PC 上的大模型还让训练变得比想象中要简单，把本地你能看到的文件“喂”给 AI 助理之类的就可以了。这就解决了以往聊天机器人那种活只干了一半的“正确的废话”。譬如，通过 AI 生成一个报价邮件模板是很轻松的，但是，一般来说价格这种关键信息，机器人不懂那是很正常的事情，所以需要人工进行完善。如果找一个人类来处理这种事情，那提前做一份价格表是合理要求吧？报价表、FAQ 等都是属于需要总结提炼的工作，然后才能更有效率地培训新人——这是传统观念。本地的 AI 可以让这个事情变得很简单：让它去读 Outlook 邮箱就好了，片刻之后它自己就从历史邮件中“学”到对应的报价。相应生成的邮件就不仅是模版级了，而是要素完善的，留给我们做的就只剩确认 AI 给的结果是否正确。而且这种学习成果是可以继承下来的。

三大 AI 引擎撑起本地大模型

信息时代，我们已经经历了几次重大的科技变革。首先是个人电脑的普及，然后是互联网的普及，再就是移动互联网。现在我们正在面对的是 AI 对生产力的赋能甚至重构。我们今天讲的 AI 不是在数据中心里做训练或者推理的大规模集群，而是手边的 PC。AIGC、视频制作等面向内容创作者的应用已经不断给予大众诸多震撼了。现在我们进一步看到的是 AI PC 已经可以实实在在的提升普通白领的工作效率：处理琐碎事务，做简报、写邮件、查找法条等等，并且无缝衔接式地补齐我们的一些技能短板，类似于应用我们原本并不熟悉的的 Excel 功能、制作原以为高大上的知识图谱，诸如此类。这一切当然不仅仅依赖于大语言模型的“智能涌现”，也需要足够强大的性能以支撑本地部署。

我们多次提到的大模型的“本地部署”，都离不开端侧强劲的 AI 算力。所谓的 AI PC，依靠的是酷睿 Ultra 处理器强悍的 CPU+GPU+NPU 三大 AI 引擎，其算力足够支持 200 亿参数的大语言模型在本地运行推理过程，至于插图级的文生图为代表的 AIGC 应用相对而言倒是小菜一碟了。

CPU 快速响应：CPU 可以用来运行传统的、多样化的工作负载，并实现低延迟。酷睿 Ultra 采用先进的 Intel 4 制造工艺，可以让笔记本电脑拥有多达 16 个核心 22 个线程，睿频可高达 5.1GHz。
GPU 高吞吐量：GPU 非常适合需要并行吞吐量的大型工作负载。酷睿 Ultra 标配 Arc GPU 核显，酷睿 Ultra 7 165H 包含 8 个 Xe-LPG 核心（128 个矢量引擎），酷睿 Ultra5 125H 包含 7 个。而且，这一代核显还支持 AV1 硬编码，可以更快速地输出高质量、高压缩率的视频。凭借领先的编解码能力，Arc GPU 确实在视频剪辑行业积累的良好的口碑。随着矢量引擎能力的大幅度提升，大量内容创作 ISV 的演示了基于 AI PC 的更高效率的智能抠像、插帧等功能。
NPU 优异能效：酷睿 Ultra 处理器全新引入的 NPU（神经处理单元）能够以低功耗处理持续存在、频繁使用的 AI 工作负载，以确保高能效。譬如，火绒演示了利用 NPU 算力接管以往由 CPU 和 GPU 承担的病毒扫描等工作，虽然速度较调用 GPU 略低，但能耗有明显的优势，特别适合安全这种后台操作。我们已经很熟悉的视频会议中常用的美颜、背景更换、自动居中等操作，也可以交给 NPU 运行。NPU 也完全有能力仅凭一己之力运行轻量级的大语言模型，例如 TinyLlama 1.1，足以满足聊天机器人、智能助手、智能运维等连续性的业务需求，而将 CPU 和 GPU 的资源留给其他业务。

针对商用 AI PC，英特尔还推出了基于英特尔® 酷睿™ Ultra 的 vPro® 平台，将 AI 和商用平台的生产力、安全性、可管理性和稳定性有机结合。博通展示的基于 vPro 的 AI PC 智能化管理将传统的资产管理从被动变为主动：以往只能看到设备是否“还在”、“能用”，补丁升级等操作也是计划内的；而 AI 加持的 vPro 可以自主分析设备的运行，从中发现隐患并自动匹配相应的补丁包、向运维人员推送建议等。贝锐向日葵有一个AI智能远控报告方案，对 PC 的远程监控不再仅仅是录屏、截屏，而是可以自动、实时地识别和生成电脑的远程工作记录，包括标记一些敏感操作，如删除文件、输入特定的指令等。这也明显减轻了运维人员检查、回溯记录的工作量。

未来已来：数以百计的 ISV 实际业务落地

亨利福特曾经这样评价汽车的发明：“如果你问你的顾客需要什么,他们会说需要一辆更快的马车。”

“更快的马车”是一种消费陷阱，认为 AI 手机、AI PC 只是噱头的人们可能只是基于惯例认为自己暂时不需要更新马车。更深层次的，是大众对 AI 的落地有一些误解，表现为两种极端：一种极端是认为那是新潮前卫的重度用户、旗舰配置的事情，典型的场景是图像视频处理等；另一种极端是觉得是耳目一新的聊天机器人，类似于强化版的搜索引擎，有更好，无亦可。但实际上，AI PC 的落地情况远超许多人的想象：对于商用客户而言，英特尔与全球超过 100+ 个 ISV 深度优化合作，本土 35+ISV 在终端优化融合，创建包含 300 多项 ISV 特性的庞大 AI 生态系统，带来规模空前的 AI PC 体验！

而且，我并不认为这个数量级的 AI 应用落地是画饼或者“战未来”。因为在我眼里，诸多 AI PC 解决方案的展示，宛如 “OpenVINO™ 联欢会”。OpenVINO™ 是英特尔开发的跨平台深度学习工具包，意即“开放式视觉推理和神经网络优化”。这个工具包其实在 2018 年就已经发布，数年来已经积累了大量计算视觉和深度学习推理应用，发展到 Iris Xe 核显时期，软件、硬件的配合就已经很有江湖地位了。譬如依托成熟的算法商店，基于 11 代酷睿平台可以很轻松的构建各式各样的 AI 应用，从智慧安防的行为检测，到店铺自动盘点，效果相当的好。现在，AI PC 的核显进化到 Xe-LPG，算力倍增，OpenVENO™ 积累的各式应用本身就会有更好的表现，可以说“地利”（具有延续性的 Xe 引擎）和“人和”（OpenVINO™ 的 ISV 资源）早就是现成的。

真正引爆 AI PC 的是“天时”，也就是大语言模型步入实用化。大语言模型的突破很好地解决了自然语言交互和数据训练的问题，极大地降低了普通用户利用 AI 算力的门槛。前面我举了很多嵌入办公应用的例子，在这里，我可以再举一个例子：科东智能控制器的多模态视觉语言模型与机械臂的结合。机械臂是司空见惯的机器人应用，早就可以结合机器视觉做各种操作，移动、分拣物品等等。但物品的识别和操作，传统上是是需要预训练和编程的。结合大语言模型后，整套系统就可以做多模态的指令识别与执行了，譬如我们可以说：把手机放到那张纸上面。在这个场景中，我们不再需要教会机器人手机是什么、纸是什么，不需要给具体的坐标，不需要规划移动的路径。自然语言的指令，摄像头的图像，这些多模态的输入被很好地融合，并自行生成了执行指令给机械臂。对于这样的工业场景，整套流程可以在一台笔记本电脑等级的算力平台上完成，数据不需要出厂。

所以，AI PC 给我们带来的，绝对不仅仅是“更快的马车”，而是颠覆了 PC 的使用模式，拓展了用户的能力边界。盘点已有的 ISV 与解决方案，我们可以将 AI PC 的应用总结为六大场景：

Al Chatbot：针对特定行业和领域更加专业的问答。
AI PC 助理：直接对 PC 操作，处理个人文件、照片、视频等。
Al Office 助手：Office 插件，提升办公软件使用效率。
AI 本地知识库：RAG（Retrieval Augmented Generation，检索增强生成）应用，包括各类文本和视频文件。
AI 图像视频处理：图像、视频、音频等多媒体信息的生成与后期处理。
AI PC 管理：更加智能高效的设备资产及安全管理。

小结

不可否认，AI 的发展永远离不开硬件与软件的技术创新、相互结合，基于酷睿 Ultra 的 AI PC 首先是更快、更强、更低功耗、更长待机的 PC，这些硬件特性支撑的 AI 应用对我们的使用体验、使用模式带来了更深刻的改变。获得“智能涌现”加持的 PC 不再仅仅是生产力工具，在某些场景中，它直接可以化身协作者甚至操作者。这背后既有微架构和生产工艺提升带来的性能改进，也有大语言模型等新质生产力的赋能。

如果我们将 CPU、GPU、NPU 视作是 AI PC 的三大算力，相应的，也可以将 AI PC 让 AI 本地化（端侧）落地的价值归纳为三大法则：经济、物理、数据保密。所谓经济，是数据在本地处理可降低云服务成本，优化经济性；物理则对应云资源的“虚”，本地 AI 服务可以提供更好的及时性，更高的准确性，避免了云与端之间的传输瓶颈；数据保密，是指用户数据完全留在本地，防止滥用和泄露。

在 2023 年，大语言模型的狂飙成就了云端的 AI 元年。2024 年，大语言模型的端侧落地开启了 AI PC 元年。我们也期待 AI 在云与端的交织发展当中不断夯实应用，源源不绝地释放强大生产力；更期待英特尔未来联合 ISV+OEM 共同发力，为我们提供更加强劲的“新质生产力”。

AI Revolutionizes Industry and Retail: From Production Lines to Personalized Shopping Experiences

AI technology is increasingly being utilized in industry and retail sectors to enhance efficiency, productivity, and customer experiences. In this post, we firstly revisit the relationship between the industry and retail sections, then provide some common AI technologies and applications used in these domains.

Industry and Retail Relationship

The key difference between industry and retail lies in their primary functions and the nature of their operations:

Industry:

Industry, often referred to as manufacturing or production, involves the creation, extraction, or processing of raw materials and the transformation of these materials into finished goods or products.
Industrial businesses are typically involved in activities like manufacturing, mining, construction, or agriculture.
The primary focus of the industry is to produce goods on a large scale, which are then sold to other businesses, wholesalers, or retailers. These goods are often used as inputs for other industries or for further processing.
Industries may have complex production processes, rely on machinery and technology, and require substantial capital investment.

Retail:

Retail, on the other hand, involves the sale of finished products or goods directly to the end consumers for personal use. Retailers act as intermediaries between manufacturers or wholesalers and the end customers.
Retailers can take various forms, including physical stores, e-commerce websites, supermarkets, boutiques, and more.
Retailers may carry a wide range of products, including those manufactured by various industries. They focus on providing a convenient and accessible point of purchase for consumers.
Retail operations are primarily concerned with merchandising, marketing, customer service, inventory management, and creating a satisfying shopping experience for consumers.

AI in Industry

AI, or artificial intelligence, is revolutionizing industry sectors by powering various applications and technologies that enhance efficiency, productivity, and customer experiences. Here are some common AI technologies and applications used in these domains:

1. Robotics and Automation: AI-driven robots and automation systems are used in manufacturing to perform repetitive, high-precision tasks, such as assembly, welding, and quality control. Machine learning algorithms enable these robots to adapt and improve their performance over time.

2. Predictive Maintenance: AI is used to predict when industrial equipment, such as machinery or vehicles, is likely to fail. This allows companies to schedule maintenance proactively, reducing downtime and maintenance costs.

3. Quality Control: Computer vision and machine learning algorithms are employed for quality control processes. They can quickly identify defects or irregularities in products, reducing the number of faulty items reaching the market.

4. Supply Chain Optimization: AI helps in optimizing the supply chain by predicting demand, managing inventory, and optimizing routes for logistics and transportation.

5. Process Optimization: AI can optimize manufacturing processes by adjusting parameters in real time to increase efficiency and reduce energy consumption.

6. Safety and Compliance: AI-driven systems can monitor and enhance workplace safety, ensuring that industrial facilities comply with regulations and safety standards.

AI in Retail

AI technology is revolutionizing the retail sector too, introducing innovative solutions and transforming the way businesses engage with customers. Here are some key AI technologies and applications used in retail:

1. Personalized Marketing: AI is used to analyze customer data and behaviours to provide personalized product recommendations, targeted marketing campaigns, and customized shopping experiences.

2. Chatbots and Virtual Assistants: Retailers employ AI-powered chatbots and virtual assistants to provide customer support, answer queries, and assist with online shopping.

3. Inventory Management: AI can optimize inventory levels and replenishment by analyzing sales data and demand patterns, reducing stockouts and overstock situations.

4. Price Optimization: Retailers use AI to dynamically adjust prices based on various factors, such as demand, competition, and customer behaviour, to maximize revenue and profits.

5. Visual Search and Image Recognition: AI enables visual search in e-commerce, allowing customers to find products by uploading images or using images they find online.

6. Supply Chain and Logistics: AI helps optimize supply chain operations, route planning, and warehouse management, improving efficiency and reducing costs.

7. In-Store Analytics: AI-powered systems can analyze in-store customer behaviour, enabling retailers to improve store layouts, planogram designs, and customer engagement strategies.

8. Fraud Detection: AI is used to detect and prevent fraudulent activities, such as credit card fraud and return fraud, to protect both retailers and customers.

Summary

AI’s potential to transform industry and retail is huge and its future applications are very promising. As AI technologies advance, we can expect increased levels of automation, personalization, and optimization in industry and retail operations.

AI technologies in these sectors often rely on machine learning (ML), deep learning (DL), natural language processing (NLP), and computer vision (CV), and now Generative Large Language Models (LLM) to analyze and gain insights from data. These AI applications are continuously evolving and are changing the way businesses in these sectors operate, leading to improved processes and customer experiences.

AI will drive high levels of efficiency, innovation, and customer satisfaction in these sectors, ultimately revolutionizing the way businesses operate and interact with consumers.

The Future of Coding: Will Generative AI Make Programmers Obsolete?

Table of Content

Credits: this post is a notebook of the key points from YouTube Content Creator Programming with Mosh's video with some editorial works. TL,DR,: watch the video.

Is coding still worth learning in 2024?

This can be a common question for a lot of people especially the younger generation of students when they try to choose a career path with some kind of insurance for future incomings.

People are worried that AI is going to replace software engineers, or any engineer related to coding and designs.

As you know, we should trust solid data instead of media and hearsay in the digital area. Social media have been creating this anxious feeling that every job is going to collapse because of AI. Coding has no future.

But I’ve got a different take backed up by real-world numbers as follows.

Note: In this post, “software engineer” represents all groups of coders (data engineer, data analyst, data scientist, machine learning engineer, frontend/backend/full-stack developers, programmers and researchers).

Is AI replacing software engineers?

The short answer is NO.

But there is a lot of fear about AI replacing coders. Headling scream robots taking over jobs and it can be overwhelming. But the truth is:

AI is not going to take you jobs; instead it is the People who can work with AI will have the advantage, and probabley will take your job.

Software engineering is not going away at least not anytime soon in our generation. Here are some data to back this up.

The US Bureau of Labor and Statistics (BLS) is a government agency that tracks job growth across the country on its website. From the data, we see that there is a continued demand for software developers, and computer and information scientists.

They claimed that the requirement for software developers is expected to grow by 26% from 2022 to 2032, while the average across all occupations is only 3%. This is a strong indication that software engineering is here to stay.

Source: https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm#tab-6

In our lives, the research and development conducted by computer and information research scientists turn ideas into technology. As demand for new and better technology grows, demand for computer and information research scientists will grow as well.

There is a similar trend for Computer and Information Research Scientists, which is expected to grow by 23% from 2022 to 2032.

source: https://www.bls.gov/ooh/computer-and-information-technology/computer-and-information-research-scientists.htm#tab-6

Impact of AI on software engineering

To better understand the impact of AI on software engineering, let’s do a quick revisit of the history of programming.

In the early days of programming, engineers wrote codes in a way that only the computer understood. Then, we create compilers, we can program in a human-readable language like C++ and Jave without worrying about how the code should eventually get converted into zeros and ones, and where it will get stored in the memory.

Here is the fact

Compilers did not replace programmers. They made them more efficient!

Since then we have built so many software applications and totally changed the world.

The problem with AI-generated code

AI will likely do the same as changing the future, we will be able to delegate routine and repetitive coding tasks to AI, so we can focus on complex problem-solving, design and innovation.

This will allow us to build more sophisticated software applications most people can not even imagine today. But even then, just because AI can generate code doesn’t mean we can or we should delegate the entire coding aspect of software development to AI because

AI-Generated Code is Lower-Quality, we still need to review and refine it before using it in the production.

In fact, there is a study to support this: Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality. According to this study, they collected 153M lines of code from 2020 to 2023 and found disconcerting trends for maintainability: Code churn will be doubled in 2024.

source: Abstract of the 2023 Data Shows Downward Pressure on
Code Quality

So, yes, we can produce more code with AI. but

More Code != Better Code

Humans should always review and refine AI-generated code for quality and security before deploying it to production. That means all the coding skills that software engineer currently has will continue to stay relevant in the future.

You still need the knowledge of data structure and algorithms programming languages and their tricky parts, tools and frameworks, you still need to have all that knowledge to review and refine the AI-generated code, you will just spend less time typing it into the computer.

So anyone telling you that you can use natural language to build software without understanding anything about coding is out of touch with the reality of software engineering (or he is trying to sell you something, i.e., GPUs).

source: NVIDIA CEO: No Need To Learn Coding, Anybody Can Be A Programmer With Technology

How AI can help software engineers

Of course, you can make a dummy app with AI in minutes, but this is not the same kind of software that runs our banks, transportation, healthcare, security and more. These are the software/systems that really matter, and our life depends on them. We can’t let a code monkey talk to a chatbot in English and get that software built. At least, this will not happen in our lifetime.

In the future, we will probably spend more time designing new features and products with AI instead of writing boilerplate code. We will likely delegate aspects of coding to AI, but this doesn’t mean we don’t need to learn to code.

As a software engineer or any coding practitioner, you will always need to review what AI generates and refine it either by hand or by guiding the AI to improve the code.

Keep in mind that Coding is only one small part of a software engineer’s job, we often spend most of our time talking to people, understanding requirements, writing stories, discussing software/system architecture, etc.

Instead of being worried about AI, I’m more concerned about Human Intelligence!

Does AI really make you code faster?

AI can only boost our programming productivity but not necessarily the overall productivity.

In fact, McKinsey’s report, Unleashing Developer Productivity with Generative AI, found that for highly complex tasks developers saw less than 10% improvement in their speed with generative AI supports.

source: https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/unleashing-developer-productivity-with-generative-ai

As you can see, AI helped the most with documentation and code generation to some extent, but when moving to code refactoring, the improvement dropped to 20% and for high-complexity tasks, it was less than 10%.

Time savings shrank to less than 10 percent on tasks that developers deemed high in complexity due to, for example, their lack of familiarity with a necessary programming framework.

Thus, if anyone tells you that software engineers will be obsolete in 5 years, they are either ignorant or trying to sell you something.

In fact, some studies tell that the role of software engineers (coders) may become more valuable as they will be needed to develop, manage and maintain these AI systems.

They (software engineers) need to understand all the complexity of building software and use AI to boost their productivity.

Can one AI-powered engineer do the work of many?

Now, people are worried that one Senior Engineer can simply use AI to replace many Engineers, eventually, leaving no job opportunities for juniors.

But again this is a fallacy because the time saving you get from AI is not as great as you are promised in reality. Anyone who uses AI to generate code knows that. It takes effort to get the right prompts for usable results, and the code still needs polishing.

Thus, it is not like one engineer will suddenly have so much free time to do the job of many people.

But you may ask, this is now, what about the future? Maybe in a year or two, AI will start to build software like a human.

In theory, yes, AI is advancing and one day it may even reach and surpass human intelligence. But Einstein said:

In Theory, Theory and Practice are the Same.

In Practice, they are NOT.

The reality is that while machines may be able to handle repetitive and routine tasks, human creativity and expertise will still be necessary for developing complex solutions and strategies.

Software engineering will be extremely important over the next several decades. I don’t think it is going away in the future, but I do believe it will change.

Future of Software Engineering

Software powers our world and that will not change anytime soon.

In future, we have to learn how to input the right prompt into our AI tools to get the expected result. This is not an easy skill to develop, it requires problem-solving capability as well as programming knowledge of languages and tools. So, if you’ve already made up your mind and don’t want to invest your time in software engineering or coding. That’s perfectly fine. Follow your passion!

The coding tools will evolve as they always do, but the true coding skill lies in learning and adapting. The future engineer needs today’s coding skills and a good understanding to use AI effectively. The future brings more complexity and demands more knowledge and adaptability from software engineers.

If you like building things with code, and if the idea of shaping the future with technology gets you excited, don’t let negativity and fear of Gen-AIs hold you back.

Reference

2024-04-17 Programming with Mosh: Is Coding Still Worth Learning in 2024?
2024-01-16 GitClear: Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality
2023-06-27 McKinsey Digital: Unleashing developer productivity with generative AI
US Bureau of Labor and Statistics (BLS): Fastest Growing Occupations; Computer and Information Research Scientists; Software Developers, Quality Assurance Analysts, and Tester

Prompt Engineering for LLM

2024-Feb-04: 1st Version

Prompt engineering is like adjusting audio without opening the equipment.

Introduction

Prompt Engineering, also known as In-Context Prompting, refers to methods for communicating with a Large Language Model (LLM) like GPT (Generative Pre-trained Transformer) to manipulate/steer its behaviour for expected outcomes without updating, retraining or fine-tuning the model weights.

Researchers, developers, or users may engage in prompt engineering to instruct a model for specific tasks, improve the model’s performance, or adapt it to better understand and respond to particular inputs. It is an empirical science and the effect of prompt engineering methods can vary a lot among models, thus requiring heavy experimentation and heuristics.

This post only focuses on prompt engineering for autoregressive language models, so nothing with image generation or multimodality models.

Basic Prompting

Zero-shot and few-shot learning are the two most basic approaches for prompting the model, pioneered by many LLM papers and commonly used for benchmarking LLM performance. That is to say, Zero-shot and few-shot testing are scenarios used to evaluate the performance of large language models (LLMs) in handling tasks with little or no training data. Here are examples for both:

Zero-shot

Zero-shot learning simply feeds the task text to the model and asks for results.

Scenario: Text Completion (Please try the following input in ChatGPT or Google Bard)

Input:

Task: Complete the following sentence:

Input: The capital of France is ____________.

Output (ChatGPT / Bard):

Output: The capital of France is Paris.

Few-shot

Few-shot learning presents a set of high-quality demonstrations, each consisting of both input and desired output, on the target task. As the model first sees good examples, it can better understand human intention and criteria for what kinds of answers are wanted. Therefore, few-shot learning often leads to better performance than zero-shot. However, it comes at the cost of more token consumption and may hit the context length limit when the input and output text are long.

Scenario: Text Classification

Input:

Task: Classify movie reviews as positive or negative.

Examples:
Review 1: This movie was amazing! The acting was superb.
Sentiment: Positive 
Review 2: I couldn't stand this film. The plot was confusing.
Sentiment: Negative

Question:
Review: I'll bet the video game is a lot more fun than the film.
Sentiment:____

Output

Sentiment: Negative

Many studies have explored the construction of in-context examples to maximize performance. They observed that the choice of prompt format, training examples, and the order of the examples can significantly impact performance, ranging from near-random guesses to near-state-of-the-art performance.

Hallucination

In the context of Large Language Models (LLMs), hallucination refers to a situation where the model generates outputs that are incorrect or not grounded in reality. A hallucination occurs when the model produces information that seems plausible or coherent but is actually not accurate or supported by the input data.

For example, in a language generation task, if a model is asked to provide information about a topic and it generates details that are not factually correct or have no basis in the training data, it can be considered as hallucination. This phenomenon is a concern in natural language processing because it can lead to the generation of misleading or false information.

Addressing hallucination in LLMs is a challenging task, and researchers are actively working on developing methods to improve the models’ accuracy and reliability. Techniques such as fine-tuning, prompt engineering, and designing more specific evaluation metrics are among the approaches used to mitigate hallucination in language models.

Perfect Prompt Formula for ChatBots

For personal daily documenting work such as text generation, there are six key components making up the perfect formula for ChatGPT and Google Bard:

Task, Context, Exemplars, Persona, Format, and Tone.
Prompt Formula for ChatBots

The Task sentence needs to articulate the end goal and start with an action verb.
Use three guiding questions to help structure relevant and sufficient Context.
Exemplars can drastically improve the quality of the output by giving specific examples for the AI to reference.
For Persona, think of who you would ideally want the AI to be in the given task situation.
Visualizing your desired end result will let you know what format to use in your prompt.
And you can actually use ChatGPT to generate a list of Tone keywords for you to use!

Example from Jeff Su: Master the Perfect ChatGPT Prompt Formula

RAG, CoT, ReACT, SASE, DSP …

If you are ever curious about what the heck are those techies talking about with the above words? Please continues …

OK, so here’s the deal. We’re diving into the world of academia, talking about machine learning and large language models in the computer science and engineering domains. I’ll try to explain it in a simple way, but you can always dig deeper into these topics elsewhere.

RAG: Retrieval-Augmented Generation

RAG (Retrieval-Augmented Generation): RAG typically refers to a model that combines both retrieval and generation approaches. It might use a retrieval mechanism to retrieve relevant information from a database or knowledge base and then generate a response based on that retrieved information. In real applications, the users’ input and the model’s output will be pre/post-processed to follow certain rules and obey laws and regulations.

Here is a simplified example of using a Retrieval-Augmented Generation (RAG) model for a question-answering task. In this example, we’ll use a system that retrieves relevant passages from a knowledge base and generates an answer based on that retrieved information.

Input:

User Query: What are the symptoms of COVID-19?

Knowledge Base:

1. Title: Symptoms of COVID-19
   Content: COVID-19 symptoms include fever, cough, shortness of breath, fatigue, body aches, loss of taste or smell, sore throat, etc.

2. Title: Prevention measures for COVID-19
   Content: To prevent the spread of COVID-19, it's important to wash hands regularly, wear masks, practice social distancing, and get vaccinated.

3. Title: COVID-19 Treatment
   Content: COVID-19 treatment involves rest, hydration, and in severe cases, hospitalization may be required.

RAG Model Output:

Generated Answer: 

The symptoms of COVID-19 include fever, cough, shortness of breath, fatigue, body aches, etc.

Remark: ChatGPT 3.5 will give basic results like the above. But, Google Bard will provide extra resources like CDC links and other sources it gets from the Search Engines. We could guess Google used a different framework to OpenAI.

CoT: Chain-of-Thought

Chain-of-thought (CoT) prompting (Wei et al. 2022) generates a sequence of short sentences to describe reasoning logics step by step, known as reasoning chains or rationales, to eventually lead to the final answer.

The benefit of CoT is more pronounced for complicated reasoning tasks while using large models (e.g. with more than 50B parameters). Simple tasks only benefit slightly from CoT prompting.

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, essentially creating a tree structure. The search process can be BFS or DFS while each state is evaluated by a classifier (via a prompt) or majority vote.

CoT : Chain-of-Thought and ToT: Tree-of-Thought

Self-Ask + Search Engine

Self-Ask (Press et al. 2022) is a method to repeatedly prompt the model to ask follow-up questions to construct the thought process iteratively. Follow-up questions can be answered by search engine results.

ReAct: Reasoning and Acting

ReAct (Reason + Act; Yao et al . 2023) combines iterative CoT prompting with queries to Wikipedia APIs to search for relevant entities and content and then add it back into the context.

In each trajectory consists of multiple thought-action-observation steps (i.e. dense thought), where free-form thoughts are used for various purposes.

Example of ReAct from pp18.(Reason + Act; Yao et al . 2023)

Speciﬁcally, from the paper, the authors use a combination of thoughts that decompose questions (“I need to search x, ﬁnd y, then ﬁnd z”), extract information from Wikipedia observations (“x was started in 1844”, “The paragraph does not tell x”), perform commonsense (“x is not y, so z must instead be…”) or arithmetic reasoning (“1844 < 1989”), guide search reformulation (“maybe I can search/lookup x instead”), and synthesize the ﬁnal answer (“…so the answer is x”).

DSP: Directional Stimulus Prompting

Directional Stimulus Prompting (DSP, Z. Li 2023), is a novel framework for guiding black-box large language models (LLMs) toward specific desired outputs. Instead of directly adjusting LLMs, this method employs a small tunable policy model to generate an auxiliary directional stimulus (hints) prompt for each input instance.

Summary and Conclusion

Prompt engineering involves carefully crafting these prompts to achieve desired results. It can include experimenting with different phrasings, structures, and strategies to elicit the desired information or responses from the model. This process is crucial because the performance of language models can be sensitive to how prompts are formulated.

I believe a lot of researchers will agree with me. Some prompt engineering papers don’t need to be 8 pages long. They could explain the important points in just a few lines and use the rest for benchmarking.

As researchers and developers delve further into the realms of prompt engineering, they continue to push the boundaries of what these sophisticated models can achieve.

To achieve this, it’s important to create a user-friendly LLM benchmarking system that many people will use. Developing better methods for creating prompts will help advance language models and improve how we use LLMs. These efforts will have a big impact on natural language processing and related fields.

Reference

Weng, Lilian. (Mar 2023). Prompt Engineering. Lil’Log.
IBM (Jan 2024) 4 Methods of Prompt Engineering
Jeff Su (Aug 2023) Master the Perfect ChatGPT Prompt Formula

Technical Review 03: Scale Effects & What happens when LLMs get bigger and bigger

AI Assitant Summary

This blog discusses the scale of Large Language Models (LLMs) and their impact on performance. LLMs like GPT, LaMDA, and PaLM have billions of parameters, raising questions about the consequences of their continued growth.

The journey of an LLM involves two stages: pre-training and scenario application. Pre-training focuses on optimizing the model using cross-entropy, while scenario application evaluates the model’s performance in specific use cases. Evaluating an LLM’s quality requires considering both stages, rather than relying solely on pre-training indicators.

Increasing training data, model parameters, and training time has been found to enhance performance in the pre-training stage. OpenAI and DeepMind have explored this issue, with OpenAI finding that a combination of more data and parameters, along with fewer training steps, produces the best results. DeepMind considers the amount of training data and model parameters equally important.

The influence of model size on downstream tasks varies. Linear tasks show consistent improvement as the model scales, while breakthrough tasks only benefit from larger models once they reach a critical scale. Tasks involving logical reasoning demonstrate sudden improvement at specific model scales. Some tasks exhibit U-shaped growth, where performance initially declines but then improves with larger models.

Reducing the LLM’s parameters while increasing training data proportionally can decrease the model’s size without sacrificing performance, leading to faster inference speed.

Understanding the impact of model size on both pre-training and downstream tasks is vital for optimizing LLM performance and exploring the potential of these language models.

Introduction

In recent years, we’ve witnessed a surge in the size of Large Language Models (LLMs), with models now boasting over 100 billion parameters becoming the new standard. Think OpenAI’s GPT-3 (175B), Google’s LaMDA (137B), PaLM (540B), and other global heavyweights. China, too, contributes to this landscape with models like Zhiyuan GLM, Huawei’s “Pangu,” Baidu’s “Wenxin,” etc. But here’s the big question: What unfolds as these LLMs continue to grow?

The journey of pre-trained models involves two crucial stages: pre-training and scenario application.

In the pre-training stage, the optimization goal is cross entropy. For autoregressive language models such as GPT, it is to see whether LLM correctly predicts the next word;

However, the real test comes in the scenario application stage, where specific use cases dictate evaluation criteria. Generally, our intuition is that if the LLM has better indicators in the pre-training stage, its ability to solve downstream tasks will naturally be stronger. However, this is not entirely true.

Existing research has proven that the optimization index in the pre-training stage does show a positive correlation with downstream tasks, but it is not completely positive. In other words, it is not enough to only look at the indicators in the pre-training stage to judge whether an LLM model is good enough. Based on this, we will look separately at these two different stages to see what the impact will be as the LLM model increases.

Part One: pre-training phase

First, let’s look at what happens as the model size gradually increases during the pre-training stage. OpenAI specifically studied this issue in “Scaling Laws for Neural Language Models” and proposed the “scaling law” followed by the LLM model.

Source: Scaling Laws for Neural Language Models

As shown in the figure above, this study proves that when we independently increase (1) the amount of training data, (2) model parameter size and (3) extend the model training time (such as from 1 Epoch to 2 Epochs), the Loss of the pre-trained model on the test set will decrease monotonically. In other words, the model’s effectiveness is improving steadily.

Since all three factors are important when we actually do pre-training, we have a decision-making problem on how to allocate computing power:

Question: Assuming that the total computing power budget used to train LLM (such as fixed GPU hours or GPU days) is given. How to allocate computing power?

Should we increase the amount of data and reduce model parameters?

Or should we increase the amount of data and model size at the same time but reduce the number of training steps?

Open AI

As one zero-sum game, the scale of one-factor increases, and the scale of other factors must be reduced to keep the total computing power unchanged, so there are various possible computing power allocation plans.

In the end, OpenAI chose to increase the amount of training data and model parameters at the same time but used an early stopping strategy to reduce the number of training steps. Because it proves that: for the two elements of training data volume and model parameters, if you only increase one of them separately, this is not the best choice. It is better to increase both at the same time according to a certain proportion. Its conclusion is to give priority to increasing the model parameters, and then the amount of training data.

Assuming that the total computing power budget used to train LLM increases by 10 times, then the amount of model parameters should be increased by 5.5 times and the amount of training data should be increased by 1.8 times. At this time, the model gets the best performance.

Deep Mind

A study by DeepMind (Reference: Training Compute-Optimal Large Language Models) explored this issue in more depth.

Source: Training Compute-Optimal Large Language Models

Its basic conclusions are similar to those of OpenAI. For example, it is indeed necessary to increase the amount of training data and model parameters at the same time, so that the model effect will be better.

Many large models do not consider this when doing pre-training. Many large LLM models were trained just monotonically increasing the model parameters while fixing the amount of training data. This approach is wrong and limits the potential of the LLM model.

However, DeepMind corrects the proportional relationship between the two by OpenAI and believes that the amount of training data and model parameters are equally important.

In other words, assuming that the total computing power budget used to train LLM increases by 10 times, the number of model parameters should be increased by 3.3 times, and the amount of training data should also be increased by 3.3 times to get the best model.

This means that increasing the amount of training data is more important than we previously thought. Based on this understanding, DeepMind chose another configuration in terms of computing power allocation when designing the Chinchilla model: compared with the Gopher model with a data volume of 300B and a model parameter volume of 280B, Chinchilla chose to increase the training data by 4 times, but reduced the model The parameters are reduced to one-fourth that of Gopher, which is about 70B. However, regardless of pre-training indicators or many downstream task indicators, Chinchilla is better than the larger Gopher.

This brings us to the following enlightenment:

We can choose to enlarge the training data and reduce the LLM model parameters in the same proportion to achieve the purpose of greatly reducing the size of the model without reducing the model performance.

Reducing the size of the model has many benefits, such as the inference speed will be much faster when applied. This is undoubtedly a promising development route for LLM.

Part Two: downstream tasks

The above is the impact of the model scale from the pre-training stage. From the perspective of the effect of LLM on solving specific downstream tasks, as the model scale increases, different types of tasks have different performances.

Source: Beyond the Imitation Game Benchmark

Specifically, there are the following three types of tasks.

(a) Tasks that achieve the highest linearity scores see model performance improve predictably with scale and typically rely on knowledge and simple textual manipulations.
(b) Tasks with high breakthroughs do not see model performance improve until the model reaches a critical scale. These tasks generally require sequential steps or logical reasoning. Around 5% of BIG-bench tasks see models achieve sudden score breakthroughs with increasing scale.
(c) Tasks that achieve the lowest (negative) linearity scores see model performance degrade with scale.

Linearity Tasks

The first type of task perfectly reflects the scaling law of the LLM model, which means that as the model scale gradually increases, the performance of the tasks gets better and better, as shown in (a) above.

Such tasks usually have the following common characteristics: they are often knowledge-intensive tasks. That is to say, if the LLM model contains more knowledge, the performance of such tasks will be better.

Many studies have proven that the larger the LLM model, the higher the learning efficiency. For the same amount of training data, the larger the model, the better the performance. This shows that even when faced with the same batch of training data, a larger LLM model is relatively more efficient in getting more knowledge than small ones.

What’s more, under normal circumstances, when increasing the LLM model parameters, the amount of training data will often increase simultaneously, which means that large models can learn more knowledge points from more data. These studies can explain the above figure, why as the model size increases, these knowledge-intensive tasks become better and better.

Most traditional NLP tasks are actually knowledge-intensive tasks, and many tasks have achieved great improvement in the past few years, even surpassing human performance. Obviously, this is most likely caused by the increase in the scale of the LLM model, rather than due to a specific technical improvement.

Breakthroughs Tasks

The second type of task demonstrates that LLM has some kind of “Emergent Ability”, as shown in (b) above. The so-called “emergent ability” means that when the model parameter scale fails to reach a certain threshold, the model basically does not have any ability to solve such tasks, which reflects that its performance is equivalent to randomly selecting answers. However, when the model scale spans Once the threshold is exceeded, the LLM model’s effect on such tasks will experience a sudden performance increase.

In other words, model size is the key to unlocking (unlocking) new capabilities of LLM. As the model size becomes larger and larger, more and more new capabilities of LLM will be gradually unlocked.

This is a very magical phenomenon because it means the following possibilities that make people optimistic about the future. Many tasks that cannot be solved well by LLM at present can be solved in future if we continue to make the model larger. Because LLM has “emergent capabilities” to suddenly unlock those limits one day. The growth of the LLM model will bring us unexpected and wonderful gifts.

The article “Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models” points out that tasks that embody “emergent capabilities” also have some common features: these tasks generally consist of multiple steps, and to solve these tasks, it is often necessary to first Multiple intermediate steps are solved, and logical reasoning skills play an important role in the final solution of such tasks.

Chain of Thought (CoT) Prompting is a typical technology that enhances the reasoning ability of LLM, which can greatly improve the effect of such tasks. I will discuss the CoT technology in the following blogs.

Here the most important question is, why does LLM have this “emergent ability” phenomenon? The article “Emergent Abilities of Large Language Models” shares several possible explanations:

Source: Emergent Abilities of Large Language Models

One possible explanation is that the evaluation indicators of some tasks are not smooth enough. For example, some metrics for generation tasks require that the string output by the model must completely match the standard answer to be considered correct otherwise it will be scored zero.

Thus, even as the model gradually becomes better and outputs more correct character fragments, because it is not completely correct, 0 points will be given for any small errors. Only when the model is large enough, the output Scores are scored when all the output segments are correct. In other words, because the indicator is not smooth enough, it cannot reflect the reality that LLM is actually gradually improving its performance on the task. It seems to be an external manifestation of “emergent ability”.

Another possible explanation is that some tasks are composed of several intermediate steps. As the size of the model increases, the ability to solve each step gradually increases, but as long as one intermediate step is wrong, the final answer will be wrong. This will also lead to this superficial “emergent ability” phenomenon.

Of course, the above explanations are still conjectures at present. As for why LLM has this phenomenon, further and in-depth research is needed.

U-shaped Tasks

Source: Inverse scaling can become U-shaped

There are also a small number of tasks. As the model size increases, the task effect curve shows U-shaped characteristics: as the model size gradually increases, the task effect gradually becomes worse, but when the model size further increases, the effect starts to get better and better. Figure above shows a U-shaped growth trend where the indicator trend of the pink PaLM model on the two tasks.

Why do these tasks appear so special? The article “Inverse Scaling Can Become U-shaped” gives an explanation:

These tasks actually contain two different types of subtasks, one is the real task, and the other is the “interference task ( distractor task)”.

When the model size is small, it cannot identify any sub-task, so the performance of the model is similar to randomly selecting answers.
When the model grows to a medium size, it mainly tries to solve the interference task, so it has a negative impact on the real task performance. This is reflected in the decline of the real task effect.
When the model size is further increased, LLM can ignore the interfering task and perform the real task, which is reflected in the effect starting to grow.

For those tasks whose performance has been declining as the model size increases, if Chain of Thought (CoT) Prompting is used, the performance of some tasks will be converted to follow the Scaling Law. That is, the larger the model size, the better the performance, while other tasks will be converted to a U-shaped growth curve.

This actually shows that this type of task should be a reasoning-type task, so the task performance will change qualitatively after adding CoT.

Personal View

Increasing the size of the LLM model may not seem technically significant, but it is actually very important to build better LLMs. In my opinion, the advancements from Bert to GPT 3 and ChatGPT are likely attributed to the growth of the LLM model size rather than a specific technology. I believe a lot of people want to explore the scale ceiling of the LLM model if possible.

The key to achieving AGI may lie in having large and diverse data, large-scale models, and rigorous training processes. Developing such large LLM models requires high engineering skills from the technical team, which means there is technical content involved.

Increasing the scale of the LLM model has research significance. There are two main reasons why it is valuable.

Firstly, as the model size grows, the performance of various tasks improves, especially for knowledge-intensive tasks. Additionally, for reasoning and difficult tasks, the effect of adding CoT Prompting follows a scaling law. Therefore, it is important to determine to what extent the scale effect of LLM can solve these tasks.
Secondly, the “emergent ability” of LLM suggests that increasing the model size may unlock new capabilities that we did not expect. This raises the question of what these capabilities could be.

Considering these factors, it is necessary to continue increasing the model size to explore the limits of its ability to solve different tasks.

Talk is cheap, and in reality, very few AI/ML practitioners have the opportunity or ability to build larger models due to high financial requirements, investment willingness, engineering capabilities, and technical enthusiasm from research institutions. There are probably no more than 10 institutions that can do this on Earth. However, in the future, there may be a possibility of joint efforts between capable institutions to build a Super-Large model:

All (Resources) for One (Model) and One (Model) for All (People).
Modified from Alexandre Dumas, The Three Musketeers

Reference

OpenAI 2020: Scaling Laws for Neural Language Models (https://arxiv.org/abs/2001.08361)
DeepMind 2022: Training Compute-Optimal Large Language Models (https://arxiv.org/abs/2203.15556)
BIG-bench Project Team: 2023: Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models (https://arxiv.org/abs/2206.04615)
Google 2023: Inverse scaling can become U-shaped (https://arxiv.org/abs/2211.02011)

What’s Next?

Technical Review 04: Human-Computer Interface: From In Context Learning to Instruct Understanding (ChatGPT)

Previous Blogs

第一章：以力证道极缩放，破虚指物理常识

1. 奥派开天宗（OpenAI）

2. AMI 开山立派（Meta/FAIR）

第二章：混元双圣各施法，秘境残阳借体魂

1. 谷歌混元宗（Google DeepMind）

2. 微软灵枢殿（Microsoft）

3. 苹果琅琊阁（Apple）

4. 欧陆秘境（ Mistral ）

第三章：归墟铁剑斩金甲，青山春雨入凡尘

1. 归墟剑宗（DeepSeek）

2. 逍遥灵枢（阿里巴巴）与 幻方圣地（字节跳动）

逍遥灵枢（阿里巴巴）

幻方圣地（字节跳动）

第四章：界限破虚争因果，开源筑海困孤城

1. 法界界限之战：【连续 vs 离散】

2. 神识重构之战：【本能 vs 推理】

3. 宗门气运之战：【开源阳谋 vs 闭源禁咒】

结语：大道五十，天衍四九

【番外短篇】

番外：西境锁灵，东域夺天

番外：抱脸阁，天道榜

1. Undermining Competitors

2. Reinvigorating the Internal Team

3. Enhancing Brand Visibility

1. Internal Integration and Morale Challenges

2. Return on Investment and Performance Pressure

3. Impacts on Scale AI and the Broader Ecosystem

3 Ways to Keep AI on Our Side

AI Mistakes Are Very Different from Human Mistakes

Asimov’s Laws of Robotics Need an Update for AI PROPOSING A FOURTH LAW OF ROBOTICS

What Can AI Researchers Learn from Alien Hunters?

Is AI a Bubble or a Marketing Gimmick?

The fundamental change in experience brought by localized edge AI

Three Major AI Engines Support Local Large Models

Summary

端侧 AI 的本地化落地带来根本性的体验变化

三大 AI 引擎撑起本地大模型

未来已来：数以百计的 ISV 实际业务落地

小结

Industry and Retail Relationship

AI in Industry

AI in Retail

Summary

Is coding still worth learning in 2024?

Is AI replacing software engineers?

Impact of AI on software engineering

The problem with AI-generated code

How AI can help software engineers

Does AI really make you code faster?

Can one AI-powered engineer do the work of many?

Future of Software Engineering

Reference

Introduction

Basic Prompting

Zero-shot

Few-shot

Hallucination

Perfect Prompt Formula for ChatBots

RAG, CoT, ReACT, SASE, DSP …

RAG: Retrieval-Augmented Generation

CoT: Chain-of-Thought

Self-Ask + Search Engine

ReAct: Reasoning and Acting

DSP: Directional Stimulus Prompting

Summary and Conclusion

Reference

AI Assitant Summary

Introduction

Part One: pre-training phase

Open AI

Deep Mind

Part Two: downstream tasks

Linearity Tasks

Breakthroughs Tasks

U-shaped Tasks

Personal View

Reference

What’s Next?

2. 逍遥灵枢（阿里巴巴）与幻方圣地（字节跳动）