| Papers | Paper URL | Abstract | 
  
  
    
      | A Generalist Agent | URL | Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. | 
  
  
    
      | Why should we prefer offline reinforcement learning over behavioral cloning? ICLR 2022 | URL | natural to ask: when can an offline RL method outperform BC with an equal amount of expert data, even when BC is a natural choice? | 
  
  
    
      | Uni[MASK]: Unified Inference in Sequential Decision Problems | URL | show how sequential decision making tasks can be thought of in terms of corresponding input maskings, enabling the training of a single model to perform all tasks at once.  applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. |