自然语言提高了LLM在编码、规划和机器人方面的性能

大型语言模型（LLM）在编程和机器人任务中变得越来越有用，但对于更复杂的推理问题，这些系统与人类之间的差距越来越大。如果没有像人类那样学习新概念的能力，这些系统就无法形成好的抽象——本质上是跳过不太重要细节的复杂概念的高级表示——因此当被要求执行更复杂的任务时，它们会变得杂乱无章

幸运的是，麻省理工学院计算机科学与人工智能实验室（CSAIL）的研究人员在自然语言中发现了一个抽象的宝库。在本月将在国际学习表征会议上发表的三篇论文中，该小组展示了我们的日常单词如何成为语言模型的丰富上下文来源，帮助它们为代码合成、人工智能规划以及机器人导航和操作构建更好的总体表征。这三篇论文也都可以在arXiv预印本服务器上获得

这三个独立的框架为其给定的任务构建抽象库：LILO（来自语言观察的库归纳）可以合成、压缩和文档化代码；Ada（动作域获取）探索了人工智能主体的顺序决策；LGA（语言引导抽象）帮助机器人更好地理解其环境，以制定更可行的计划。每个系统都是一种神经符号方法，一种融合了类人神经网络和类程序逻辑组件的人工智能

LILO: A neurosymbolic framework that codes

大型语言模型可以用于快速编写小规模编码任务的解决方案，但还不能像人类软件工程师编写的那样构建整个软件库。为了进一步提高软件开发能力，人工智能模型需要将代码重构（精简和组合）成简洁、可读和可重复使用的程序库

重构工具，如之前开发的麻省理工学院领导的Stitch算法，可以自动识别抽象，因此，为了向迪士尼电影《Lilo&Stitch》致敬，CSAIL的研究人员将这些算法重构方法与LLM相结合。他们的神经符号方法LILO使用标准LLM来编写代码，然后将其与Stitch配对，以找到在库中全面记录的抽象

LILO对自然语言的独特强调使系统能够完成需要类似人类常识知识的任务，例如识别和删除代码串中的所有元音，以及绘制雪花。在这两种情况下，CSAIL系统的表现都优于独立的LLM，以及麻省理工学院以前的一种名为DreamCoder的库学习算法，这表明它有能力在提示中建立对单词的更深入理解

这些令人鼓舞的结果表明，LILO可以帮助编写程序来操作Excel电子表格等文档，帮助人工智能回答有关视觉效果的问题，以及绘制2D图形

“语言模型更喜欢使用以自然语言命名的函数，”麻省理工学院电气工程和计算机科学博士生、CSAIL附属机构、该研究的主要作者Gabe Grand说。“我们的工作为语言模型创建了更直接的抽象，并为每个模型分配了自然语言名称和文档，从而为程序员提供了更易于解释的代码，并提高了系统性能。”。接下来，Stitch有效地识别代码中的公共结构，并提取出有用的抽象。然后，LILO会自动命名并记录这些程序，从而简化程序，系统可以使用这些程序来解决更复杂的任务

麻省理工学院的框架用特定领域的编程语言编写程序，比如Logo，这是麻省理工大学在20世纪70年代开发的一种语言，用来教孩子们编程。扩展自动化重构算法以处理更通用的编程语言（如Python）将是未来研究的重点。尽管如此，他们的工作代表着语言模型如何促进日益复杂的编码活动向前迈出了一步

就像编程一样，在家庭和基于命令的视频游戏中自动执行多步任务的人工智能模型缺乏抽象。想象一下，你正在做早餐，让你的室友把一个热鸡蛋端到桌子上——他们会直观地把他们在厨房做饭的背景知识抽象成一系列动作。相比之下，受过类似信息培训的LLM仍然很难推理他们需要什么来建立一个灵活的计划

Ada: Natural language guides AI task planning

以著名数学家Ada Lovelace命名，许多人认为他是世界上第一位程序员，CSAIL领导的“Ada”框架通过开发虚拟厨房家务和游戏的有用计划库，在这个问题上取得了进展。该方法对潜在任务及其自然语言描述进行训练，然后语言模型从该数据集中提出动作抽象。人工操作员对最佳计划进行评分并将其过滤到库中，以便将尽可能好的操作实施到不同任务的分层计划中

“传统上，大型语言模型由于抽象推理等问题而难以处理更复杂的任务，”Ada首席研究员Lio Wong说，他是麻省理工学院大脑和认知科学研究生，CSAIL附属机构，也是LILO的合著者。“但我们可以将软件工程师和机器人专家使用的工具与LLM相结合，以解决棘手的问题，如虚拟环境中的决策。”

当研究人员将广泛使用的大型语言模型GPT-4纳入Ada时，该系统在厨房模拟器和迷你Minecraft中完成的任务比

When the researchers incorporated the widely-used large language model GPT-4 into Ada, the system completed more tasks in a kitchen simulator and Mini Minecraft than the AI decision-making baseline "Code as Policies." Ada used the background information hidden within natural language to understand how to place chilled wine in a cabinet and craft a bed. The results indicated a staggering 59% and 89% task accuracy improvement, respectively.

With this success, the researchers hope to generalize their work to real-world homes, with the hopes that Ada could assist with other household tasks and aid multiple robots in a kitchen. For now, its key limitation is that it uses a generic LLM, so the CSAIL team wants to apply a more powerful, fine-tuned language model that could assist with more extensive planning. Wong and her colleagues are also considering combining Ada with a robotic manipulation framework fresh out of CSAIL: LGA (language-guided abstraction).

Language-guided abstraction: Representations for robotic tasks

Andi Peng, an MIT graduate student in electrical engineering and computer science and CSAIL affiliate, and her co-authors designed a method to help machines interpret their surroundings more like humans, cutting out unnecessary details in a complex environment like a factory or kitchen. Just like LILO and Ada, LGA has a novel focus on how natural language leads us to those better abstractions.

In these more unstructured environments, a robot will need some common sense about what it's tasked with, even with basic training beforehand. Ask a robot to hand you a bowl, for instance, and the machine will need a general understanding of which features are important within its surroundings. From there, it can reason about how to give you the item you want.

In LGA's case, humans first provide a pre-trained language model with a general task description using natural language, like "Bring me my hat." Then, the model translates this information into abstractions about the essential elements needed to perform this task. Finally, an imitation policy trained on a few demonstrations can implement these abstractions to guide a robot to grab the desired item.

Previous work required a person to take extensive notes on different manipulation tasks to pre-train a robot, which can be expensive. Remarkably, LGA guides language models to produce abstractions similar to those of a human annotator, but in less time.

To illustrate this, LGA developed robotic policies to help Boston Dynamics' Spot quadruped pick up fruits and throw drinks in a recycling bin. These experiments show how the MIT-developed method can scan the world and develop effective plans in unstructured environments, potentially guiding autonomous vehicles on the road and robots working in factories and kitchens.

"In robotics, a truth we often disregard is how much we need to refine our data to make a robot useful in the real world," says Peng. "Beyond simply memorizing what's in an image for training robots to perform tasks, we wanted to leverage computer vision and captioning models in conjunction with language. By producing text captions from what a robot sees, we show that language models can essentially build important world knowledge for a robot."

The challenge for LGA is that some behaviors can't be explained in language, making certain tasks underspecified. To expand how they represent features in an environment, Peng and her colleagues are considering incorporating multimodal visualization interfaces into their work. In the meantime, LGA provides a way for robots to gain a better feel for their surroundings when giving humans a helping hand.

An 'exciting frontier' in AI

"Library learning represents one of the most exciting frontiers in artificial intelligence, offering a path towards discovering and reasoning over compositional abstractions," says assistant professor at the University of Wisconsin-Madison Robert Hawkins, who was not involved with the papers. Hawkins notes that previous techniques exploring this subject have been "too computationally expensive to use at scale" and have an issue with the lambdas, or keywords used to describe new functions in many languages, that they generate.

"They tend to produce opaque 'lambda salads,' big piles of hard-to-interpret functions. These recent papers demonstrate a compelling way forward by placing large language models in an interactive loop with symbolic search, compression, and planning algorithms. This work enables the rapid acquisition of more interpretable and adaptive libraries for the task at hand."

By building libraries of high-quality code abstractions using natural language, the three neurosymbolic methods make it easier for language models to tackle more elaborate problems and environments in the future. This deeper understanding of the precise keywords within a prompt presents a path forward in developing more human-like AI models.

想要了解更多关于脑机接口技术的内容，请关注脑机网，我们将定期发布最新的研究成果和应用案例，让您第一时间了解脑机接口技术的最新进展。