Fiducial Markers for Physical Storage Tracking
Previously, I saw (and can no longer find the source) a vlog about a maker who stored all of his projects and stock of parts in bins over his long work bench. All of the bins were labeled with some sort of code, and a security camera in his office would periodically scan for codes in the image and keep track of the location of each bin. With this system, he could put any bin in any empty spot on the shelf and easily find it later, or find that it was not on the shelf, from the comfort of his desk computer. I sought to recreate this, for my own maker space.
I tried a few different approaches involving QR codes and fiducial marking, and at this point I’ve settled on AprilTags based on testing I’ve performed (See this blog post for the testing). This project page will catalogue all of the work I’ve done to make this a reality.
Initial Missteps with QR and 2D Barcodes⌗
My naive approach was to try and encode the name of the box in a QR code, then have the camera read the code and publish the location of the box to MQTT. From there, I could centrlally process all of the MQTT topics from all of the cameras (gotta think about scalability), and make a simple inventory of each shelf and reverse inventory of each box, listing the shelf and position. If I could encode the name of the box (or something short enough for me to encode and long enough to recognize), I wouldn’t need any sort of database to correlate box IDs to metadata.
I initially started by researching the greater space of 2D matrix barcodes. The most commonly known is the QR code (‘Quick Response’), which is widely used to encode all kinds of information to be read primarily by mobile phones. Other codes I investigated include the microQR, Aztec code, and Data Matrix codes. I used a generator to generate codes in a variety of types for a 10 character string, and printed them out in two sizes - 0.9" square and 2.5" square. 0.9" square happens to be the largest size I can print on my label printer, and 2.5" seemed about right to use premade address label stickers which are easy to print.
I taped this paper on the wall above my desk, and tried a few different cameras before analyzing the images using OpenCV. This particular image was taken by the same model of camera as my 3D Printer Wide Angle Camera, which was a candidate to become the makerspace inventory camera.
This approach did not go well. With all images the same dimensions, the number of pixels in the image has a strong impact on the readability from the same distance. The MicroQR is only 13x13, while the Aztec is 15x15 and the regular QR is 21x21. This seems small as far as QR codes (and a 10 character string is really very little data), so it just gets worse from here. I was never able to decode any of the 0.9" codes with any camera at any reasonable distance, so I gave up on printing a barcode as part of the 24mm label maker label.
With the larger images, the MicroQR seemed promising visually, but I couldn’t find any open source library to decode it, so we will never know how well it would have performed. I found websites able to decode the Aztec code, but again no open source libraries were able to decode it. The websites seemed to be able to adequately decode the ‘mid’ depth image (shown above), which is actually only about 3’ since the camera has such a wide angle, but without being able to integrate it into my own code the Aztecs approach was another dead end.
That left the QR code that we started with. There are plenty of QR decoders available with OpenCV bindings (including OpenCV’s own built-in decoder, Zbar, and Wechat’s CNN based decoder). Of these, Zbar was the most difficult to install (partially because the project has been abandoned for around a decade and never compiled on macOS) but very notably outperformed the other two, and was also the only one to successfully return an array of codes when multiple codes were in view (the rest just picked the ‘best’ one). Since I really need all of the codes in view, that was very important.
Despite all of this success with Zbar, I was very unimpressed with how close the camera had to be to get working detection. So, I need a new system with lower resolution tags that the camera can resolve from further away.
Wikipedia has a good introduction on fiducial markers for those unfamiliar. Essentially, these are specifically designed patterns which encode a very small amount of information in a way that is designed for machine vision to unambiguously identify the marker. Normally they are used to identify the position and orientation of the marker to locate the image or the marker in space (given a known position of either the marker or the camera), and any coding within the market is just to identify the marker out of a small set in case there are multiple fiducuials in view at once. The advantage of this is the data encoded is extremely low, and thus the images are extremely low resolution, and when properly scaled up they are easily identified by cameras at much further distances. Wikipedia has a comparison between a few fiducial marker packages, and based on the recent development and open-source nature I selected AprilTags as my next path of approach. I wrote a blog post on my initial experimentation with this.
I decided to create a proper software solution for this. I thought through the architecture and settled on some design goals:
- Software should run on a single application server (instead of on the camera / Raspberry Pi)
- Cameras should use standard video protocols such as RTSP or HTTP JPEG requests, and converting webcams, PiCams, or capture cards to RTSP/HTTP is not within the scope of this project
- Application server should retrieve and process images from all cameras
- Application server should maintain state of entire system
- Scalability is not believed to exceed the capacity of a single server (as framerates are low and video processing scales well across threads), so separating the processing from state management is not a requirement at this time
- Software should have a purely web-based UI (mobile friendly preferred) for users to view the status, box locations, manage boxes, and add/reassign boxes
- Software should have REST and MQTT API to retrieve/publish state for use by other applications. Beyond associating tag IDs with boxes, we will not deal with the contents or inventory within the box
I chose to name the project ‘StorageTags’, although I’m not sure if it’s the most unique name, and I reserve the right to change it in the future.
I explain a bit more about the creation of the project on this blog post.
I’ll continue to update this page as I continue to develop the software. I’ve started to move out of the experimentation phase, but the software is far from finished