Blog /lab/correll/ en DALL-E 2 from Scratch /lab/correll/2024/09/16/dall-e-2-scratch <span>DALL-E 2 from Scratch</span> <span><span>Nicolaus J Correll</span></span> <span><time datetime="2024-09-16T17:13:48-06:00" title="Monday, September 16, 2024 - 17:13">Mon, 09/16/2024 - 17:13</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/focal_image_wide/public/2024-09/dalle.png?h=177e1033&amp;itok=Sbwa7JNm" width="1200" height="600" alt="Dall-E from scratch generating clothing MNIST like data"> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/correll/taxonomy/term/33"> Blog </a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default 3"> <div class="ucb-article-row-subrow row"> <div class="ucb-article-text col-lg d-flex align-items-center" itemprop="articleBody"> <div><h2>Text-Conditioned Image Generation on FashionMNIST using CLIP Latents</h2><p>by Matthew Nguyen</p><p><span>Denoising diffusion probabilistic models (DDPM) are a popular type of generative AI model that were introduced by </span><a href="https://arxiv.org/pdf/2006.11239" rel="nofollow">Ho et al. in 2020</a><span> and improved upon by </span><a href="https://arxiv.org/pdf/2102.09672" rel="nofollow">Nichol et al. in 2021</a><span>. The basic idea behind these models is that noise is added to images in the forward diffusion process in order to train the model to predict the noise that should be removed at a certain timestep in the reverse diffusion process. When sampling images, you would start with an image containing pure noise and iteratively remove model’s predicted noise at each timestep until you get the final image.</span></p><p><span>In order to have a DDPM generate multiple types of images while still letting the user choose which type of image they want, the model needs to be conditioned on some input. </span><a href="https://cdn.openai.com/papers/dall-e-2.pdf" rel="nofollow">Ramesh et al.</a><span> introduced one such conditioning method called unCLIP, which is used in OpenAI’s DALL-E 2 model. In the method described by Ramesh et al., the input caption is first passed to a prior network which will use a trained CLIP model to get the CLIP text embeddings. These text embeddings are then used by a decoder-only transformer in order to generate possible CLIP image embeddings. The CLIP image embeddings generated by the prior network will be used by a decoder network, which consists of a UNet model, in order to condition the images that are created. In this article, we are going to be building a simple diffusion model using this process.</span></p><p><a href="https://medium.com/correll-lab/dall-e-2-from-scratch-c055bf881b9a" rel="nofollow"><span>Keep reading on Correll Lab on Medium for free...</span></a></p></div> </div> <div class="ucb-article-content-media ucb-article-content-media-right col-lg"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> <div> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/large_image_style/public/2024-09/dalle.png?itok=QCuj6Kcn" width="1500" height="2196" alt="Dall-E from scratch generating clothing MNIST like data"> </div> </div> </div> </div> </div> </div> </div> </div> </div> <div>Text-Conditioned Image Generation on FashionMNIST using CLIP Latents by Matthew Nguyen</div> <h2> <div class="paragraph paragraph--type--ucb-related-articles-block paragraph--view-mode--default"> <div>Off</div> </div> </h2> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Mon, 16 Sep 2024 23:13:48 +0000 Nicolaus J Correll 161 at /lab/correll Testing the Field Capabilities of the Unitree Go-1 /lab/correll/2024/07/05/testing-field-capabilities-unitree-go-1 <span>Testing the Field Capabilities of the Unitree Go-1</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2024-07-05T00:00:00-06:00" title="Friday, July 5, 2024 - 00:00">Fri, 07/05/2024 - 00:00</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/focal_image_wide/public/article-thumbnail/dog.png?h=9e02dff8&amp;itok=II2q2lFM" width="1200" height="600" alt="Unitree Robotic dog at Rocky Mountain Biological Lab. "> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/correll/taxonomy/term/33"> Blog </a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> <div> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/large_image_style/public/article-image/dog.png?itok=y_nLV9Xa" width="1500" height="1021" alt="Unitree Robotic dog at Rocky Mountain Biological Lab. "> </div> </div> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><h2>Promotional videos are great, but what is the real deal taking a robotic dog to the field?</h2> <p>Our goal is to find out what a commodity robotic dog can add to their set of tools, what it can actually accomplish in the field, and what fundamental research in robotics is needed to enable them.</p> <p>Here are our key findings from a first deployment:</p> <ol> <li>The Unitree Go-1 is able to navigate surprisingly rugged terrain.</li> <li>The robot&nbsp;<em>does</em>&nbsp;fail. Its legs can entangle with the stems of forbs and shrubs and the robot can easily slip even on flat (!) terrain.</li> <li>If the robot fails, it often cannot recover by itself, but needs to be manually disentangled and rebooted.</li> <li>The robot itself is absolutely not rugged and susceptive to dust and morning dew, requiring additional engineering for field applications.</li> </ol> <p><a href="https://medium.com/towards-data-science/testing-the-field-capabilities-of-the-unitree-go-1-de665ae6ef05" rel="nofollow">Continue reading on Towards Data Science...</a></p></div> </div> </div> </div> </div> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Fri, 05 Jul 2024 06:00:00 +0000 Anonymous 153 at /lab/correll Thinking, Fast and Slow, with LLMs and PDDL /lab/correll/2024/06/10/thinking-fast-and-slow-llms-and-pddl <span>Thinking, Fast and Slow, with LLMs and PDDL</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2024-06-10T14:57:50-06:00" title="Monday, June 10, 2024 - 14:57">Mon, 06/10/2024 - 14:57</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/focal_image_wide/public/article-thumbnail/pddl.png?h=5874c92f&amp;itok=bq2fa8FH" width="1200" height="600" alt="A simple blockstacking task"> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/correll/taxonomy/term/33"> Blog </a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> <div> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/large_image_style/public/article-image/pddl.png?itok=p7SXDorE" width="1500" height="629" alt="A simple blockstacking task"> </div> </div> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><h2>ChatGPT is never shy at pretending to perform deep thought, but — like our brain — might need additional tools to reason accurately</h2> <p>“ChatGPT can make mistakes. Check important info.” is now written right underneath the prompt, and we all got used to the fact that ChatGPT stoically makes up anything from dates to entire references. But what about basic reasoning? Looking at a simple tower rearranging task from the early days of Artificial Intelligence (AI) research, we will show how large language models (LLM) reach their limitations and introduce the&nbsp;Planning Domain Definition Language (PDDL) and symbolic solvers&nbsp;to make up for it. Given that LLMs are fundamentally probabilistic, it is likely that such tools will be built-in to future versions of AI agents, combining common sense knowledge and razor-sharp reasoning. To get the most out of this article, set up your own PDDL environment using&nbsp;<a href="https://marketplace.visualstudio.com/items?itemName=jan-dolejsi.pddl" rel="nofollow" target="_blank">VS Code’s PDDL extension</a>&nbsp;and&nbsp;<a href="https://github.com/AI-Planning/planutils" rel="nofollow" target="_blank">planutils</a>&nbsp;planner interface and work along with the examples.</p> <p><a href="https://medium.com/towards-data-science/thinking-fast-and-slow-with-llms-and-pddl-111699f9907e?sk=8792c884cc6498579bdd1cca6c5e00cb" rel="nofollow">Continue reading on Towards Data Science...</a></p></div> </div> </div> </div> </div> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Mon, 10 Jun 2024 20:57:50 +0000 Anonymous 152 at /lab/correll Building CLIP From Scratch /lab/correll/2024/05/16/building-clip-scratch <span>Building CLIP From Scratch</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2024-05-16T03:21:38-06:00" title="Thursday, May 16, 2024 - 03:21">Thu, 05/16/2024 - 03:21</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/focal_image_wide/public/article-thumbnail/1_5V3OGdcahofT2J8JkoHiAg.png?h=54ae1cb1&amp;itok=ER3v0EnZ" width="1200" height="600" alt="CLIP Overview "> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/correll/taxonomy/term/33"> Blog </a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> <div> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/large_image_style/public/article-image/1_5V3OGdcahofT2J8JkoHiAg.png?itok=spHnzYZg" width="1500" height="548" alt="CLIP Overview (from the original CLIP paper)"> </div> </div> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><p><em>by Matt Nguyen</em></p> <h2><strong>Open World Object Recognition on the Clothing MNIST Dataset</strong></h2> <p>Computer vision systems were historically limited to a fixed set of classes, CLIP has been a revolution allowing open world object recognition by “predicting which image and text pairings go together". CLIP is able to predict this by learning the cosine similarity between image and text feature for batches of training data. This is shown in the contrastive pre-training portion of Figure 1 where the dot product between the image features {I_1 … I_N} and the text features {T_1 … T_N} is taken.</p> <p>In this tutorial, we are going to build CLIP from scratch and test it on the fashion MNIST dataset. Some of the sections in this article are taken from my&nbsp;<a href="https://medium.com/correll-lab/building-a-vision-transformer-model-from-scratch-a3054f707cc6" rel="nofollow">vision transformers article</a>. Notebook with the code from this tutorial can be found&nbsp;<a href="https://colab.research.google.com/drive/1E4sEg7RM8HBv4PkIhjWZuwCXXbB_MinS?usp=sharing" rel="nofollow" target="_blank">here</a>.</p> <p><a href="https://medium.com/correll-lab/building-clip-from-scratch-68f6e42d35f4" rel="nofollow">Continue reading on Correll lab...</a></p></div> </div> </div> </div> </div> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Thu, 16 May 2024 09:21:38 +0000 Anonymous 151 at /lab/correll Is Open World Vision in Robotic Manipulation Useful? /lab/correll/2024/05/14/open-world-vision-robotic-manipulation-useful <span>Is Open World Vision in Robotic Manipulation Useful?</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2024-05-14T11:44:52-06:00" title="Tuesday, May 14, 2024 - 11:44">Tue, 05/14/2024 - 11:44</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/focal_image_wide/public/article-thumbnail/1_2Xf6vJj2g_DNU6Pfsl4quQ.png?h=57f5390a&amp;itok=ZHvahn6J" width="1200" height="600" alt="Example of pictures fitting into the Confusion Matrix"> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/correll/taxonomy/term/33"> Blog </a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> <div> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/large_image_style/public/article-image/1_2Xf6vJj2g_DNU6Pfsl4quQ.png?itok=6rqIE1O1" width="1500" height="1500" alt="Example of pictures fitting into the Confusion Matrix"> </div> </div> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><p><em>by Uri Soltz</em></p> <p>Google’s Open World Localization Visual Transformer (OWL-ViT) in combination with Meta’s “Segment Anything” has emerged as the goto pipeline for zero-shot object recognition — none of the objects have been used in training the classifier — in robotic manipulation. Yet, OWL-ViT has been trained on static images from the internet and has limited fidelity in a manipulation context. OWL-ViT returns a non-negligible confusion matrix and we show that processing the same view from different distances significantly increases performance. Still, OWL-ViT works better for some objects than for others and is thus inconsistent. Our experimental setup is described in&nbsp;<a href="https://medium.com/@streck0101/exploring-magpie-the-next-generation-of-low-cost-robotic-grippers-dd21e4e4f3b2" rel="nofollow">Exploring MAGPIE: A Force Control Gripper w/ 3D Perception</a>, by Streck Salmon.</p> <p><a href="https://medium.com/correll-lab/is-open-world-vision-in-robotic-manipulation-useful-6b7389499dc9" rel="nofollow">Read the full article on Correll lab...</a></p></div> </div> </div> </div> </div> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Tue, 14 May 2024 17:44:52 +0000 Anonymous 150 at /lab/correll MAGPIE: An Open-Source Force Control Gripper With 3D Perception /lab/correll/2024/05/14/magpie-open-source-force-control-gripper-3d-perception <span>MAGPIE: An Open-Source Force Control Gripper With 3D Perception</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2024-05-14T11:40:36-06:00" title="Tuesday, May 14, 2024 - 11:40">Tue, 05/14/2024 - 11:40</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/focal_image_wide/public/article-thumbnail/1_RJOnXoLzBe4WUTOEioKsjQ_1.png?h=9b26270c&amp;itok=5O7nLL9m" width="1200" height="600" alt="MAGPIE gripper and its dependencies"> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/correll/taxonomy/term/33"> Blog </a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> <div> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/large_image_style/public/article-image/1_RJOnXoLzBe4WUTOEioKsjQ.png?itok=u_eFxMb2" width="1500" height="1116" alt="MAGPIE gripper and its dependencies"> </div> </div> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><p><em>by Streck Salmon</em></p> <p>There are a myriad of robotic arms, but very few choices when it comes to robotic grippers, particularly those with built-in force control and perception. This article explores the outer and inner workings of the&nbsp;<a href="/lab/correll/2023/07/10/versatile-robotic-hand-3d-perception-force-sensing-autonomous-manipulation" rel="nofollow" target="_blank">MAGPIE gripper</a>, an intelligent robotic object manipulator developed at the&nbsp;<a href="/lab/correll/" rel="nofollow" target="_blank">Correll Lab at the ֲý</a>, Boulder. The gripper’s hardware design was created by Stephen Otto during his Master’s thesis, and the software for planning, perception (utilizing the RealSense with Open3D), and interfacing with the UR5 was developed by Dylan Kriegman as part of his senior thesis. Alongside this, James Watson also made significant contributions to perception and planning software. The original paper, published by Correll, Otto, Kriegman, and Watson, can be found&nbsp;<a href="https://arxiv.org/abs/2402.06018" rel="nofollow" target="_blank">here</a>.</p> <p><a href="https://medium.com/correll-lab/magpie-an-open-source-force-control-gripper-with-3d-perception-dd21e4e4f3b2" rel="nofollow">Read the full article on Correll Lab...</a></p></div> </div> </div> </div> </div> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Tue, 14 May 2024 17:40:36 +0000 Anonymous 149 at /lab/correll Building a Vision Transformer Model From Scratch /lab/correll/2024/04/04/building-vision-transformer-model-scratch <span>Building a Vision Transformer Model From Scratch</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2024-04-04T00:00:00-06:00" title="Thursday, April 4, 2024 - 00:00">Thu, 04/04/2024 - 00:00</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/focal_image_wide/public/article-thumbnail/vitfromscratch.png?h=6ef47e7d&amp;itok=7LTOXTL8" width="1200" height="600" alt="Padding step in the ViT (Matt Nguyen)"> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/correll/taxonomy/term/33"> Blog </a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div>Building a Vision Transformer Model From Scratch <p><em>by Matt Nguyen</em></p> <p>The self-attention-based transformer model was first introduced by Vaswani et al. in their paper&nbsp;<a href="https://arxiv.org/pdf/1706.03762.pdf" rel="nofollow" target="_blank"><em>Attention Is All You Need</em></a><em>&nbsp;</em>in 2017 and has been widely used in natural language processing. A transformer model is what is used by OpenAI to create ChatGPT. Transformers not only work on text, but also on images, and essentially any sequential data. In 2021, Dosovitsky et al. introduced the idea of using the transformers for computer vision tasks such as image classification in their paper&nbsp;<a href="https://arxiv.org/pdf/2010.11929.pdf" rel="nofollow" target="_blank"><em>An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale</em></a>. In their paper, they were able to achieve excellent results with their vision transformer model compared to convolutional networks and required a lot less resources to train.</p> <p>In this tutorial, we are going to build a vision transformer model from scratch and test is on the MNIST dataset, a collection of handwritten digits that have become a standard benchmark in machine learning. Notebook with the code from tutorial can be found&nbsp;<a href="https://colab.research.google.com/drive/1rabTm93y39FNbu-21tDhlvYh2gp8edVR?usp=sharing" rel="nofollow" target="_blank">here</a>.</p> <p><a href="https://medium.com/correll-lab/building-a-vision-transformer-model-from-scratch-a3054f707cc6" rel="nofollow">Read the full article on Correll lab....</a></p></div> </div> </div> </div> </div> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Thu, 04 Apr 2024 06:00:00 +0000 Anonymous 139 at /lab/correll The Future of Robotic Assembly /lab/correll/2024/03/28/future-robotic-assembly <span>The Future of Robotic Assembly</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2024-03-28T00:00:00-06:00" title="Thursday, March 28, 2024 - 00:00">Thu, 03/28/2024 - 00:00</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/focal_image_wide/public/article-thumbnail/assembly.png?h=fd96c3c3&amp;itok=Pq8yeksK" width="1200" height="600" alt="A humanoid performing assembly. Image by the author via miramuseai.net."> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/correll/taxonomy/term/33"> Blog </a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><h2>Since the introduction of mass production in 1913 assembly lines are still mostly human — humanoids might change this</h2> <p>Henry Ford is known as the father of mass production, streamlining the production of his “Model T” enabling cars to be widespread affordable. One of the key innovations at the time was to use a conveyor belt in the assembly lane that paced the production process. Yet, actual labor was mostly manual and still is today, for example looking at engine assembly at BMW in 2024.</p> Mass production at Henry Ford’s Model T factory in 1913 (left, public domain) and engine assembly at BMW in 2024 (right, picture from&nbsp;<a href="https://www.bmwgroup-werke.com/steyr/en/highlight/engine-assembly.html" rel="nofollow" target="_blank">here</a>). <p>Pacing an assembly line by what is known by the german word “Takt” or cycle time, is indeed a key idea to make an assembly process predictable. The throughput of a factory is directly related to its Takt, which in turn is driven by the slowest contributor, and directly relates to the sojourn time of an order in an assembly line. In a human-driven environment, people might eventually adapt to the cycle time of the processes around them, which is beautifully captured in a the figure below that shows the acquisition of speed skill in a cigar manufacturing factory.</p> <p><a href="https://towardsdatascience.com/the-future-of-robotic-assembly-ce3446703de8?source=friends_link&amp;sk=e5d2b10f877383d1146a1d03689b3928" rel="nofollow">Continue reading on <em>Towards Data Science</em>..,</a></p></div> </div> </div> </div> </div> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Thu, 28 Mar 2024 06:00:00 +0000 Anonymous 144 at /lab/correll Grasping With Common Sense using VLMs and LLMs /lab/correll/2024/03/10/grasping-common-sense-using-vlms-and-llms <span>Grasping With Common Sense using VLMs and LLMs</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2024-03-10T00:00:00-07:00" title="Sunday, March 10, 2024 - 00:00">Sun, 03/10/2024 - 00:00</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/focal_image_wide/public/article-thumbnail/deligrasp_0.png?h=fc348019&amp;itok=F0Mop-s6" width="1200" height="600" alt="Deligrasp overview"> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/correll/taxonomy/term/33"> Blog </a> </div> <div role="contentinfo" class="container ucb-article-tags" itemprop="keywords"> <span class="visually-hidden">Tags:</span> <div class="ucb-article-tag-icon" aria-hidden="true"> <i class="fa-solid fa-tags"></i> </div> <a href="/lab/correll/taxonomy/term/32" hreflang="en">TDS</a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> <div> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/large_image_style/public/article-image/deligrasp_0.png?itok=eJCJ2F6n" width="1500" height="736" alt="Deligrasp overview"> </div> </div> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><h2>How to leverage large language models for robotic grasping and code generation</h2> <p>Grasping and manipulation remain a hard, unsolved problem in robotics. Grasping is not just about identifying points where to put your fingers on an object to create sufficient constraints. Grasping is also about applying just enough force to pick up the object with breaking it, while making sure it can be put to its intended use. At the same time, grasping provides critical sensor input to detect what an object is and what its properties are. With mobility essentially solved, grasping and manipulation remains the final frontier in unlocking truely autonomous labor replacements.</p> <p><a href="https://towardsdatascience.com/grasping-with-common-sense-bfe21743c02d?source=friends_link&amp;sk=460bc3168cc4f5f2395b19b9a1d1c6bc" rel="nofollow">Continue reading on Toward Data Science....</a></p></div> </div> </div> </div> </div> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Sun, 10 Mar 2024 07:00:00 +0000 Anonymous 136 at /lab/correll Are the Humanoids Here to Stay? /lab/correll/2024/03/01/are-humanoids-here-stay <span>Are the Humanoids Here to Stay?</span> <span><span>Anonymous (not verified)</span></span> <span><time datetime="2024-03-01T00:00:00-07:00" title="Friday, March 1, 2024 - 00:00">Fri, 03/01/2024 - 00:00</time> </span> <div> <div class="imageMediaStyle focal_image_wide"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/focal_image_wide/public/article-thumbnail/humanoid_1.png?h=241cb1d5&amp;itok=XI9HPucw" width="1200" height="600" alt="A humanoid cleaning up (its own?) mess while preparing a meal. The humanoid form factor holds tremendous promise for seamless integration into existing value creation processes. Image: author via miramuseai.net"> </div> </div> <div role="contentinfo" class="container ucb-article-categories" itemprop="about"> <span class="visually-hidden">Categories:</span> <div class="ucb-article-category-icon" aria-hidden="true"> <i class="fa-solid fa-folder-open"></i> </div> <a href="/lab/correll/taxonomy/term/33"> Blog </a> </div> <div class="ucb-article-content ucb-striped-content"> <div class="container"> <div class="paragraph paragraph--type--article-content paragraph--view-mode--default"> <div class="ucb-article-content-media ucb-article-content-media-above"> <div> <div class="paragraph paragraph--type--media paragraph--view-mode--default"> <div> <div class="imageMediaStyle large_image_style"> <img loading="lazy" src="/lab/correll/sites/default/files/styles/large_image_style/public/article-image/humanoid.png?itok=cmja-cIm" width="1500" height="1500" alt="A humanoid cleaning up (its own?) mess while preparing a meal. The humanoid form factor holds tremendous promise for seamless integration into existing value creation processes. Image: author via miramuseai.net"> </div> </div> </div> </div> </div> <div class="ucb-article-text d-flex align-items-center" itemprop="articleBody"> <div><h2>Humanoids might finally solve the “brownfield” problem that plagues robotic adaptation, and recent breakthroughs in multi-modal transformers and diffusion models might actually make it happen.</h2> <p>Not a week goes by without a flurry of humanoid companies releasing a new update. Optimus can walk? Digit has just moved an empty tote? So has Figure! It also seems that real companies are finally getting interested. Starting with Tesla, humanoids are now “working” at Amazon and BMW, from which it is only a short way to our households and gardens. But are they really working? The demos we get to see are neither as exciting as Boston Dynamics’ Atlas doing parkour, nor humanoids seem to be very productive. So is the market rightfully excited and are humanoids up to something? I’m excited about humanoids for two reasons:</p> <p><strong>1)</strong>&nbsp;<em>Humanoids might finally solve the “Brownfield” problem, the main reason so many robots solutions burn in pilot purgatory.</em></p> <p><strong><em>2)</em></strong><em>&nbsp;Machine learning has made a huge leap in 2023, with computers exhibiting reasoning skills that — for the first time — allow them to operate in open-world settings and perform contact richt manipulation.</em></p> <p><a href="https://towardsdatascience.com/are-the-humanoids-here-to-stay-050da171530b?source=friends_link&amp;sk=abfc9adb87dfd585431b280b8beabbd5" rel="nofollow">Continue to read on&nbsp;<em>Towards Data Science...</em></a></p></div> </div> </div> </div> </div> <div>Traditional</div> <div>0</div> <div>On</div> <div>White</div> Fri, 01 Mar 2024 07:00:00 +0000 Anonymous 140 at /lab/correll