Neuromorphic computing is becoming a popular approach for implementations of brain-inspired machine learning tasks. As a paradigm for both hardware and algorithm design, neuromorphic computing aims to emulate several aspects related to the structure and function of the biological nervous system to achieve artificial intelligence with efficiencies that are orders of magnitude better than those exhibited by general-purpose computing hardware. We provide a holistic treatment of spike-based neuromorphic computing (i.e., based on spiking neural networks), detailing biological motivation, key aspects of neuromorphic algorithms, and a survey of state-of-the-art neuromorphic hardware. In particular, we focus on these aspects within the context of brain-inspired vision applications. Our aim is to serve as a complement to several of the existing reviews on neuromorphic computing while also providing a unique perspective.
Smart assistant usage has increased significantly with the AI boom and growth of IoT. Speech as an input modality brings a level of personalization to the various smart voice assistant products and applications; however, many smart assistants underperform when tasked with interpreting atypical speech input. Dysarthria, heavy accents, and deaf and hard-of-hearing speech characteristics prove difficult for smart assistants to interpret despite the large amounts of diverse data used to train automatic speech recognition models. In this study, we explore the Transformer architecture for use as an automatic speech recognition model for speech with medium to low intelligibility scores. We utilize the Transformer model pre-trained on the Librispeech dataset and fine-tuned on the Torgo dataset of atypical speech, as well as a subset of the University of Memphis Speech Perception Assessment Laboratory’s (UMemphis SPAL) Deaf speech dataset. We also develop a methodology for performing automatic speech recognition using a Node.JS application running on a Raspberry Pi 4 to function as a pipeline between the user and a Google Home smart assistant device. The highest performing Transformer model shows a 20.2% character error rate with a corresponding 29.0% word error rate on a subset of medium intelligibility audio samples from the UMemphis SPAL dataset. This study highlights the importance for a large, transcribed dataset, fueling a large atypical-speech data gathering effort through a newly developed web application, My-Voice.
Recent studies in the field of adversarial machine learning have highlighted the poor robustness of convolutional neural networks (CNNs) to small, carefully crafted variations of the inputs. Previous work in this area has largely been focused on very small image perturbations and how these completely throw off the classifier output and cause CNNs to make high-confidence misclassifications while leaving the image visually unchanged for a human observer. These attacks modify individual pixels of each image and are unlikely to exist in a natural environment. More recent work has demonstrated that CNNs are also vulnerable to simple transformations of the input image, such as rotations and translations. These ‘natural’ transformations are much more likely to occur, either accidentally or intentionally, in a real-world scenario. In fact, humans experience and successfully recognize countless objects under these types of transformations every day. In this paper, we study the effect of these transformations on CNN accuracy when classifying 3D face-like objects (Greebles). Furthermore, we visualize the learned feature representations by CNNs and analyze how robust these learned representations are and how they compare to the human visual system. This work serves as a basis for future research into understanding the differences between CNN and human object recognition, particularly in the context of adversarial examples.
Producing a better segmentation mask is crucial in scene understanding. Semantic Segmentation is a vital task for applications such as autonomous driving, robotics, medical image understanding. Efficient high and low-level context manipulation is a key for competent pixel-level classification. The image’s high-level feature map helps in the better spatial configuration of the objects for Segmentation, while the low-level features help to discern the boundaries of the objects in the segmentation map. In our implementation, We use a two bridged network. The first bridge manipulates the subtle differences between images and produces a vector to understand the low-level features in the input images. The second bridge produces global contextual aggregation from the image while gathering a better understanding of the image’s high-level features. The backbone network is the dialated residual network which helps to avoid the attrition of the size of the image during feature extraction. We train our network on the Cityscapes dataset and ADE20k dataset and compare our results with the State-of-the-Art models. The initial experiments have yielded an initial mean IoU of 70.1% and pixel accuracy of 94.4% on the cityscapes dataset and 34.6% on the ADE20K dataset.
Multilayer Perceptron Networks with random hidden layers are very efficient at automatic feature extraction and offer significant performance improvements in the training process. They essentially employ large collection of fixed, random features, and are expedient for form-factor constrained embedded platforms. In this work, a reconfigurable and scalable architecture is proposed for the MLPs with random hidden layers with a customized building block based on CORDIC algorithm. The proposed architecture also exploits fixed point operations for area efficiency. The design is validated for classification on two different datasets. An accuracy of ~ 90% for MNIST dataset and 75% for gender classification on LFW dataset was observed. The hardware has 299 speed-up over the corresponding software realization.
The advent of nanoscale metal-insulator-metal (MIM) structures with memristive properties has given birth to a new generation
of hardware neural networks based on CMOS/memristor integration (CMHNNs). The advantage of the CMHNN
paradigm compared to a pure CMOS approach lies in the multi-faceted functionality of memristive devices: They can
efficiently store neural network configurations (weights and activation function parameters) via non-volatile, quasi-analog
resistance states. They also provide high-density interconnects between neurons when integrated into 2-D and 3-D crossbar
architectures. In this work, we explore the combination of CMHNN classifiers with manifold learning to reduce the
dimensionality of CMHNN inputs. This allows the size of the CMHNN to be reduced significantly (by ≈ 97%). We tested
the proposed system using the Caltech101 database and were able to achieve classification accuracies within ≈ 1:5% of
those produced by a traditional support vector machine.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.