The summer after my sophomore year in college, I did an internship at OSRAM OptoSemiconductors GmbH, a leading manufacturer of LEDs.
Problem The company had the following problem in the manufacturing process: thousands of wafers go through the process each day, and they are tracked by manually writing down wafer numbers. However, at various stages of the process high-quality images are taken of each wafer for quality control of the chips being made. The goal of my project was to apply image processing techniques to perform optical character recognition (OCR) on the numbers and letters that uniquely identify a wafer to be able to automatically track the wafers through the production process.
Challenges of the OCR approach The images were taken by a microscope, which produced very low-contrast images where the difference between the foreground and background was within 10 gray-scale values. This property of images made it extremely difficult for a program to differentiate between the unique identifier of a wafer and the background. Furthermore, the images were taken at different brightness levels and the location of the unique identifier on the wafers was not consistent from wafer to wafer, which made the problem even more challenging.
Attempted Solutions I implemented various image processing algorithms to perform OCR:
However, they produced inconsistent results due to the problems with the images mentioned above. I also used machine learning to train the system on 200 images for each letter and number. The images used for training were the images of wafers in production. The system performed at 75%-90% success rate on the test set, which was not satisfactory for the production requirements, as one of the main requirements is that the solution should be robust and reliable.
The Solution Once I discovered that none of the standard OCR techniques will produce satisfactory results, I realized that a different approach should be taken. Luckily for me, there was another way. When a wafer is labeled with a unique identifier, a data matrix is engraved onto the wafer which encodes the unique identifier (example). A Data Matrix is a two-dimensional square or rectangular matrix barcode that consists of black and white dots, which represent binary 0s and 1s. ECC200 data matrices, which is the standard I worked with, are Reed-Solomon codes, which contain redundant symbols in addition to the symbols in the label. They can detect and correct multiple random errors, up to ⌊r/2⌋ symbols, where r is the number of redundant symbols added to the code.
I noticed that although the location of the label varied from wafer to wafer, all wafers had one feature in common: the label was located on the flat edge of the wafer. So this was the solution to the problem:irreducible reducing polynomial of a particular Galois field is used to encode the symbols in data matrix depending on the number of redundant symbols in the matrix. And now that we know what field we are in, we know where to go.
Each symbol in the label maps to its binary representation of 8 bits (usually, its ASCII code plus a magic number). In addition, the data matrix contains redundant symbols that enable error correction to recover the symbols on the label in case a part of the wafer's label is chipped off during the production. In order to further improve reliability of erasure recovery in the case of mechanical damage to the edges of wafers, the binary representation is mapped onto the data matrix in L-shaped configuration and some of the actual and redundant symbols are distributed across the edges of the matrix to further protect from the data matrix from losing too much information. See the diagram below for more information.
The data matrix consists of the frame on the borders and the middle of the matrix. For larger data matrices like this one the area is split into 2 segments. There are 10 symbols in the label which are mapped onto the wafer. In addition, there are 11 redundant symbols distributed across the data matrix resulting in a total of 21 symbols on the data matrix.
Once the symbols are correctly located, Peterson-Gorenstein-Zierler decoder is used to decode the label. I used the theoretical decoder described in the Wikipedia link to implement the decoder in MATLAB. The steps of the algorithm are outlined below:
Sparse Conditional Constant Propagation
Memory management is a very interesting problem, which we have learned to solve well enough that most of the programmers these days don't have to worry about it thanks to advanced garbage collectors. When designing a heap allocator, the following concerns should be addressed:
Parallel allocators pose additional challenges, such as:
So, we want a memory allocator that keeps the memory compact, uses as little additional memory as possible for bookkeeping, allocates and deallocates fast, and serves multiple requests at the same time, taking advantage of the thread local cache whenever possible.
Memory Allocators Let us take a look at three common memory allocation designs. The simplest kind to implement is a buddy memory allocator, which has little external fragmentation and fast compaction due to the ease with which coalescing of the freed blocks is achieved. It can suffer from internal fragmentation depending on the size of the lowest order memory block and the granularity of the block size. Slab allocation is commonly used to reduce the effects of internal fragmentation by allowing reuse of fixed-size blocks. Since majority of applications allocate and deallocate many objects of similar sizes, this strategy is very effective. The improved buddy allocator provides better time guarantees while keeping the distribution of fragmentation the same as the standard approach: worst-case constant time for allocation and worst-case/amortized constant time for deallocation instead of O(lg n). Linux kernel uses the buddy system with several modifications to further minimize external and internal fragmentation.
Other allocation techniques include bidirectional coalescing with boundary tags, invented by Donald Knuth and published in The Art of Computer Programming: Fundamental Algorithms, Addison-Wesley, 1973. The tags are appended on both sides with the information about the size of the block and whether it's allocated (allocation bit is 0 or 1). It makes coalescing easy as we can check both adjacent blocks and coalesce if either neighbor is free, then discard the inner .
Hoard Memory Allocator is a fast, scalable memory allocator that solves well the 3 problems that multi-threaded allocators face. The Hoard uses 1 global heap and 1 local heap per thread. Each local heap owns a number of superblocks. The Hoard system bounds the blowup by putting an emptiness threshold on each superblock and moving the superblock from the local to global heap once the threshold is crossed for further reuse of the superblock.
Lock contention problem on a local heap is solved by ensuring that a superblock is owned by only one local heap at any time and only one thread is allowed to allocate from a given superblock. When multiple threads allocate memory, their requests will be directed to different superblocks thus avoiding most of the per process lock contention scenarios. In the worst case when in a pair of threads one allocates and the other one deallocates from the same superblock, there is a 2x slowdown due to serialization of malloc()s and free()s, which is scalable as it does not grow with the number of threads. The lock on the global heap should be obtained in a few cases, such as initial creation of superblocks, moving superblocks in and out of the global heap, and freeing of the blocks in the superblocks held by the global heap. The paper shows that in a steady state the allocator incurs no lock contention on the global heap, and low contention during gradual growth.
Avoiding active false sharing is addressed by directing requests for allocation from multiple threads to different superblocks. When deallocation request comes in, the freed block of memory is returned to its superblock. This strategy prevents multiple threads from sharing a cache line. There is, of course, the case when a superblock's emptiness factor goes below the threshold and the superblock is now transferred to the global heap and later assigned to another local heap, which would cause the two heaps to share cache lines, but in practice the authors of the Hoard allocator observed that such event occurs very infrequently and most of the time a superblock is completely empty upon transfer to the global heap.