Open-ended requests can occasionally throw machines off and produce outcomes that users didn’t intend. Generally speaking, machines respond best to highly specified demands.
People are taught to communicate with robots in a rigorous fashion, for as by asking questions in a specific way to get the desired answer.
PaLM-SayCan, the newest technology from Google, claims to be smarter. The real object is made by Everyday Robots, a company that was split out from Google X. It features cameras in its head for eyes and a pincer-equipped arm hidden beneath its long, straight torso, which is mounted on wheels.
A question like “I just worked out, can you get me a healthy snack?” will prod the robot into fetching an apple. “PaLM-SayCan [is] an interpretable and general approach to leveraging knowledge from language models that enables a robot to follow high-level textual instructions to perform physically-grounded tasks,” Google Brain researchers added.
PaLM, Google’s largest language model, was unveiled in April of this year. PaLM was taught using data that was collected off the internet, but the system was modified to provide a list of instructions for the robot to follow rather than spitting out open-ended text responses.
When you say, “I spilled my Coke on the table, how would you throw it away and bring me something to help clean?” PaLM understands the question and generates a list of steps the robot can take to fulfill the task, such as going over to pick up the can, dumping it in a bin, and obtaining a sponge.
However, large language models (LLMs) like PaLM are unable to comprehend the meaning of any of their words. Because of this, the researchers developed a different model employing reinforcement learning to associate abstract language with actions and visual representations. In this approach, the robot is taught to connect the term “Coke” with the mental image of a soda can.
PaLM-SayCan also picks up so-called “affordance functions,” which rate a given action’s likelihood of success given the objects in its environment. If the robot notices a sponge but no vacuum nearby, it is more likely to pick up the sponge than the vacuum.
Our method, SayCan, extracts and leverages the knowledge within LLMs in physically-grounded tasks. The LLM (Say) provides a task-grounding to determine useful actions for a high-level goal and the learned affordance functions (Can) provide a world-grounding to determine what is possible to execute upon the plan. We use reinforcement learning (RL) as a way to learn language conditioned value functions that provide affordances of what is possible in the world.
The robot is programmed to only choose actions from a set of 101 possible instructions in order to keep it on track. PaLM-SayCan, which Google trained to adapt to a kitchen, can obtain snacks, beverages, and carry out basic cleaning duties. LLMs, according to the researchers, are the first step in enabling robots to safely carry out more complex tasks when given abstract instructions.
Our experiments on a number of real-world robotic tasks demonstrate the ability to plan and complete long-horizon, abstract, natural language instructions at a high success rate. We believe that PaLM-SayCan’s interpretability allows for safe real-world user interaction with robots