Rigorous Coupled Wave Analysis (RCWA) method is highly efficient for the simulation of diffraction efficiency and field distribution patterns in periodic structures and textured optoelectronic devices. GPU has been increasingly used in complex scientific problems such as climate simulation and the latest Covid-19 spread model. In this paper, we break down the RCWA simulation problem to key computational steps (eigensystem solution, matrix inversion/multiplication) and investigate speed performance provided by optimized linear algebra GPU libraries in comparison to multithreaded Intel MKL CPU library running on IRIDIS 5 supercomputer (1 NVIDIA v100 GPU and 40 Intel Xeon Gold 6138 cores CPU). Our work shows that GPU outperforms CPU significantly for all required steps. Eigensystem solution becomes 60% faster, Matrix inversion improves with size achieving 8x faster for large matrixes. Most significantly, matrix multiplication becomes 40x faster for small and 5x faster for large matrix sizes.
Rigorous Coupled Wave Analysis (RCWA) method is highly efficient for the simulation of diffraction efficiency and field distribution patterns in periodic structures and textured optoelectronic devices such as VCSELs, LEDs, and DOEs. RCWA provides exact solutions provided the Fourier expansion has infinite order. In practice, the Fourier expansion must be truncated due to computer memory limitations. Researchers are trying to utilize fast convergence algorithms such as the ‘normal vector method’ and ‘Li’s rule’ which could obtain accurate TM mode results with fewer Fourier orders. However, to thoroughly investigating the behavior of a structure usually requires thousand and even millions of RCWA simulations which may last hours and days. GPU is highly suitable for solutions of complex systems allowing large-scale multi-threaded parallel programming (< 1000 / low-end GPU, <5k / high-end GPU) to speed up matrix computations significantly. In this paper, we present a high-speed RCWA program utilizing optimized CUDA-GPU code and MAGMA libraries. It achieves 2-6 X speedup compared to conventional multithreaded CPU-based code utilizing the Intel MKL library running on IRIDIS 5 super-computer (1 NVIDIA v100 GPU, 40 Intel Xeon Gold 6138 2.0GHz cores CPU)
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.