Mixed-precision Block QR Decomposition with CUDA

University of Washington ECE Master's Project with Amazon Lab126

This was my final project for my Master’s in Electrical Engineering at the University of Washington, completed in 2024. I worked with a group of 6 graduate students to implement a parallel QR decomposition algorithm using CUDA. The project was a collaboration with Amazon Lab126 for development of Simultaneous Localization and Mapping (SLAM) for robotics. The QR decomposition can be used for least-squares optimization to solve for robot pose, a key part of SLAM algorithms.

We developed our algorithm targetting a PC with a GeForce RTX 2070 and Ubuntu installed. The algorithm was developed “from scratch” without the use of BLAS libraries (e.g. Cutlass). Here’s a clip from one of our presentations, where I explain the mathematics.

Share: X (Twitter)