Decision-making through artificial neural networks with minimal latency is critical for numerous applications such as navigation, tracking, and real-time machine action systems. This requires machine learning hardware to process multidimensional data at high throughput. Unfortunately, handling convolution operations, the primary computational tool for data classification tasks, obeys challenging runtime complexity scaling laws. However, homomorphically implementing the convolution theorem in a Fourier optics display light processor can achieve a non-iterative O(1) runtime complexity for data inputs beyond 1,000 × 1,000 large matrices. Following this approach, here we demonstrate data streaming multi-kernel image batching using a Fourier Convolutional Neural Network (FCNN) accelerator. We show image batch processing of large-scale matrices as 2 million dot product multiplications performed by a digital light processing module in the Fourier domain. Furthermore, we further parallelize this optical FCNN system by exploiting multiple spatially parallel diffraction orders, achieving a 98x throughput improvement over state-of-the-art FCNN accelerators. A comprehensive discussion of the practical challenges associated with working at the edge of system capabilities highlights the problem of crosstalk and resolution scaling laws in the Fourier domain. Accelerating convolution by exploiting massive parallelism in display technology brings non-Van Neumann-based machine learning acceleration.
Here we show two PIC-based prototypes of a photonic convolution layer. System 1) is a Fourier-optics based 4F system integrated into a PIC. Unliked our earlier demonstration of a massively-parallel optical DMD-based CNN layer (Miscuglio, Sorger et al. OPTICA 2020), which processes 1000x1000 pixel matrices in a single time-step at 20KHz update rates (8x faster than SOW GPUs), this first-ever PIC-based 4F processor processes only 10’s of pixels, but at GHz rates (10^6 times faster than DMD, and 10^8 times faster than SLM). System 2) is a PIC-based joint-transform correlator where both the data and the convolution kernel are fed front-end and auto-convolve in the Fourier domain (autocorrelation). Note, the rapid 10GHz update rate of the kernel using foundry PIC components allows to perform online training on the system as well. Rapid and low SWaP ASICs are powerful tools for network edge processing and enable ns-short latency for rapid target tracking, for example.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.