We regularly publish our research results, tutorials, and opinion pieces on ,Ěý, , and .

A simple blockstacking task

Thinking, Fast and Slow, with LLMs and PDDL

June 10, 2024

ChatGPT is never shy at pretending to perform deep thought, but — like our brain — might need additional tools to reason accurately “ChatGPT can make mistakes. Check important info.” is now written right underneath the prompt, and we all got used to the fact that ChatGPT stoically makes up...

CLIP Overview

Building CLIP From Scratch

May 16, 2024

by Matt Nguyen Open World Object Recognition on the Clothing MNIST Dataset Computer vision systems were historically limited to a fixed set of classes, CLIP has been a revolution allowing open world object recognition by “predicting which image and text pairings go together". CLIP is able to predict this by...

Example of pictures fitting into the Confusion Matrix

Is Open World Vision in Robotic Manipulation Useful?

May 14, 2024

by Uri Soltz Google’s Open World Localization Visual Transformer (OWL-ViT) in combination with Meta’s “Segment Anything” has emerged as the goto pipeline for zero-shot object recognition — none of the objects have been used in training the classifier — in robotic manipulation. Yet, OWL-ViT has been trained on static images...

MAGPIE gripper and its dependencies

MAGPIE: An Open-Source Force Control Gripper With 3D Perception

May 14, 2024

by Streck Salmon There are a myriad of robotic arms, but very few choices when it comes to robotic grippers, particularly those with built-in force control and perception. This article explores the outer and inner workings of the MAGPIE gripper , an intelligent robotic object manipulator developed at the Correll...

Padding step in the ViT (Matt Nguyen)

Building a Vision Transformer Model From Scratch

April 4, 2024

Building a Vision Transformer Model From Scratch by Matt Nguyen The self-attention-based transformer model was first introduced by Vaswani et al. in their paper Attention Is All You Need in 2017 and has been widely used in natural language processing. A transformer model is what is used by OpenAI to...

A humanoid performing assembly. Image by the author via miramuseai.net.

The Future of Robotic Assembly

March 28, 2024

Since the introduction of mass production in 1913 assembly lines are still mostly human — humanoids might change this Henry Ford is known as the father of mass production, streamlining the production of his “Model T” enabling cars to be widespread affordable. One of the key innovations at the time...

Deligrasp overview

Grasping With Common Sense using VLMs and LLMs

March 10, 2024

How to leverage large language models for robotic grasping and code generation Grasping and manipulation remain a hard, unsolved problem in robotics. Grasping is not just about identifying points where to put your fingers on an object to create sufficient constraints. Grasping is also about applying just enough force to...

A humanoid cleaning up (its own?) mess while preparing a meal. The humanoid form factor holds tremendous promise for seamless integration into existing value creation processes. Image: author via miramuseai.net

Are the Humanoids Here to Stay?

March 1, 2024

Humanoids might finally solve the “brownfield” problem that plagues robotic adaptation, and recent breakthroughs in multi-modal transformers and diffusion models might actually make it happen. Not a week goes by without a flurry of humanoid companies releasing a new update. Optimus can walk? Digit has just moved an empty tote?...

Left: Performance of the “CLIP” model on accurately providing labels for images, dramatically outperforming previous work. Image from https://arxiv.org/pdf/2103.00020.pdf. Right: Summarizing a model’s performance by a single number is only one piece of information. Once this information is actually used to make a decision, we will also need to understand the different ways the model can fail. Image: own work.

Reasoning About Uncertainty using Markov Chains

Feb. 26, 2024

Formal methods to tackle “Trial-and-Error” problems The ability to deal with unseen objects in a zero-shot manner makes machine learning models very attractive for applications in robotics, allowing robots to enter previously unseen environments and manipulating unknown objects therein . While their accuracy in doing so is incredible compared with...

Principal Component Analysis of a random 2D point cloud using PyTorch’s built-in function. Image by the author.

Understanding Principal Component Analysis in PyTorch

Feb. 18, 2024

Built-in function vs. numerical methods PCA is an important tool for dimensionality reduction in data science and to compute grasp poses for robotic manipulation from point cloud data. PCA can also directly used within a larger machine learning framework as it is differentiable. Using the two principal components of a...