Semantic segmentation has crucial importance in various domains due to its ability to recognize and categorize objects within an image at a pixel level. This task enables a wide range of applications, such as autonomous vehicles, environmental monitoring, and remote sensing (RS). In RS, semantic segmentation plays a crucial role, acting as the basis for applications including land cover classification. Following the success of deep learning (DL) methods in computer vision, our paper addresses the intersection between DL and RS imagery. We focus on improving the efficiency of some baseline and backbone models to ensure their adaptability to the challenges posed by RS imagery. Therefore, we evaluate state-of-the-art models on two datasets and investigate their ability to accurately segment objects in RS imagery. Our research aims to open the way for more accurate and reliable semantic segmentation methods in geospatial analysis.
KEYWORDS: Genetics, Diseases and disorders, Education and training, Diagnostics, Data modeling, Leukemia, Image segmentation, Image processing, Image analysis, Cancer detection
Chromosomal translocations involve the exchange of segments between non-homologous chromosomes. The Philadephia chromosome, known as t(9;22) abnormality, is an example of translocation linked with chronic myeloid leukemias. This study leverages the capabilities of a modified Siamese architecture for the automated detection of this translocation. Highlighting its superior image recognition capabilities, this modified Siamese architecture, an innovative alternative to conventional Convolutional Neural Networks (CNNs), processes images by effectively capturing both local and global image details without the inherent biases found in traditional image analysis methods. This work underscores the specific capabilities and advantages of the proposed Siamese architecture, emphasizing its crucial role in overcoming the limitations of traditional diagnostic methods in identifying the t(9;22) translocation, and its potential to significantly enhance genetic diagnostics.
While RGB imaging is reaching its limits, Hyperspectral Imaging (HSI) is being widely used especially for medical applications. This study points out the ability of HSI technique to help in planning the surgical procedure in orthopedic surgery by automatically identifying anatomical structures and surgical instruments thanks to their spectral signatures. Four segmentation methods have been explored: (i) average spectra method that uses the Euclidean distance between the spectrum of each pixel and the average spectrum of each specific structure, (ii) segmentation using kmeans, (iii) segmentation based on indices in which we identify reflectances ratios at specific wavelengths that allow materials to be correctly classified, (iv) and finally a pixel-based classification method based on neural networks. Experiments on anatomical objects whose physical characteristics are known to have been carried out. Selecting specific wavelengths to reduce the cost of the final device was also discussed.
Over the past decade, several approaches have been proposed to learn disentangled representations for video prediction. However, reported experiments are mostly based on standard benchmark datasets such as Moving MNIST and Bouncing Balls. In this work, we address the problem of learning disentangled representation for video prediction in an industrial environment. To this end, we use decompositional disentangled variational auto-encoder, a deep generative model that aims to decompose and recognize overlapped boxes on a pallet. Specifically, this approach disentangles each frame into a dynamic component (box appearance) and a temporally variant component (box location). We evaluate this approach on a new dataset, which contains 40000 video sequences. The experimental results demonstrate the ability to learn both the decomposition of the bounding boxes and their reconstruction without explicit supervision.
KEYWORDS: Education and training, Genetics, Diseases and disorders, Computer vision technology, Visual process modeling, Image analysis, Data modeling, Cancer detection, Cancer, Performance modeling
Cancer, hematological malignancies and inherited genetic diseases can be diagnosed by detecting chromosome abnormalities. This detection is crucial for the management and follow-up of these diseases. Biologically, there are two categories of chromosome abnormalities: either in their number or in their structure. The process of karyotyping involves creating an ordered representation of the 23 pairs of chromosomes. Each given pair presents a specific band pattern, where both chromosomes are identical, in normal cases. Karyotype images are manually analyzed by qualified cytogeneticists to detect any changes on chromosomes. Based on computer vision methods, it is possible to automate the detection of chromosome abnormalities, which can assist cytogeneticists in the diagnosis process. In the literature, little research has been done to automate the detection of structural abnormalities based on computer vision techniques. In this study, we are interested in the detection of a specific abnormality: the deletion of the long arm of chromosome 5, named del(5q) deletion. We focused our work on the use of the convolutional neural network (CNN) approach, which has shown its ability to provide reliable solutions in computer vision problems. On a collected database, we trained three CNN models to test their ability to differentiate between a healthy and an deleted chromosome 5. The highest performance was provided by VGG19, achieving an accuracy of 98.66%, a sensitivity of 89.33% and specificity of 100%.
Polarization-resolved extension of Second Harmonic Generation microscopy (PSHG) exhibits proven efficiency in cancer diagnosis. Contrary to the case of white light microscopy, PSHG can reveal small structural collagen changes, during tumorigenesis, for a broad range of organs such as breast, thyroid, lung, pancreas, and ovary. However, despite its effectiveness for cancer diagnosis, PSHG is not yet fully exploited. One way of improvement consists in taking better advantage of polarization-resolved measurements which are performed by acquiring multiple images (usually between three to 20) of the same sample under different input beam polarization conditions. Each image of the resulting stacked raw images set can contain relevant information not found in the other images of the set. In the literature, information extraction from stacked raw images is performed using methods such as averaging of all images, collagen structural parameters modeling or PSHG polarimetric parameters extraction. If the two latter methods provide a richer information than the first one, they may, however, suffer from a loss of information from the stacked raw images. To examine this potential loss of information, AI methods can be used for extracting information from the stacked raw images. Using recently available images of the public SHG-TIFF database, dealing with breast and thyroid PSHG measurements of both normal and tumor tissues, we test available AI methods for information extraction and benchmark these methods to the state-of-the-art, in terms of automatic cancer diagnosis efficiency.
Behavioral analysis in an urban environment is a complex task that requires material and human resources, due to the difficulty of interpreting the situations. This paper presents a method to improve the detection of dangerous behaviors by assisting surveillance stations. Our objective is to alert when one of these behaviors is captured by a surveillance camera. To do this, we analyze the positions and paths of the persons in a global way, through a group of parameters. These parameters are determined by an automatic image analysis algorithm such as DBSCAN computed on an NVIDIA Jetson TX2. This analysis allows to detect, through the evolution and clustering of points in each cloud, phenomena qualified as abnormal, such as dispersion and rapid clustering, as well as poaching. The data used to feed our algorithm come from simulations that allow testing new and different scenarios. The performance of our proposed method is evaluated on videos representing real case situations.
In several particularly secure applications such as the entrance to a school, it is important to know whether the person entering is an adult or a child. In this article, we propose a human body morphology detector that distinguishes whether the person is an adult or a child. This detector could be included in a smart portal to detect whether the entry person is an adult or a child to apply a different treatment depending on the morphology. A person detector module1 is deployed to detect the presence of a person within a predefined radius. When the location of the person is detected, our system can measure the height of the person and determine if the person is an adult or a child based on its height.
Cities development accelerates with galloping urbanization on the surface on the world [1,2]. They must face significant threats linked to risks of human origin, like terrorism. In this paper, we present our approach for intrusion detection composed of 3 phases. The first one consists in selecting, via a GUI interface, zones supposed to be prohibited zones in an image. The second one, based on a Neural Network method, is applied for the person detection. The third one verifies if the detected person is present in one of the prohibited zones or not. If so, an alarm goes off automatically. Real tests were performed to secure an elementary school in the city of Nice in France. The obtained results showed the efficiency of our method in terms of good detection. Other work is in progress with the aim of deeply analyzing the intrusion to detect the abnormal behavior.
This paper presents a new approach to improve the identification of underwater fiducial markers for camera pose estimation. The use of marker detection is new in the underwater field. Hence, it requires a new image preprocessing to reach the same performance as in onshore environment. This is a challenging task due to the poor quality of underwater images. Images captured in highly turbid environment are strongly degraded by light attenuation and scattering. In this context, dehazing methods are increasingly used. However, they are less effective because the scattering of light in the water is different from the atmosphere. Therefore, the estimation of dehazing parameters on the target image can lead to a bad image restoration. For this reason, an objectoriented dehazing method is proposed to optimize the contrast of markers. The proposed system exploits the texture features derived by multi-channel filtering for image segmentation. To achieve this, saliency detection is applied to estimate the visually salient objects in the image. The generated saliency map is passed through a Gabor filters bank and the significant texture features are clustered by K-means algorithm to produce the segmented image. Once different objects of the image are separated, an optimized Dark Channel Prior (DCP) dehazing method is applied to optimize the contrast of each individual object. The implemented system has been tested on a large image dataset taken during night offshore experiments in turbid waters at 15 meters depth. Results showed that the object-oriented dehazing improves the successful of markers identification in underwater environment.
Optical correlation is a pattern recognition method which is very famous to recognize an image from a database. It is simple to implement, to use and allows to obtain good performances. However, it suffers from a global decision based on the location, height and shape of the correlation peak within the correlation plane. It entails a considerable reduction of its robustness. Moreover, the correlation is sensitive to the rotation, to the scale, it pulls a deformation on the correlation plane which will decrease the performances of this method. In this paper, to overcome these problems, we propose and validate a new method of nonparametric modelling of the correlation plane. This method is based on a kernel estimation of the regression function used to classify the individuals according to the correlation plane. The idea is to enhance the decision by taking into consideration the shape and the distribution of energy in the correlation plane. This relies on calculations of the Hausdorff distance between the target correlation plane and the correlation planes coming from the database. The results showed the very good performance of our method compared to other in the literature especially in terms of a significant rate of good detection and a very low rate of false alarm.
Road maintenance management presents a complex task for road authorities. The first presumption for the evaluation analysis and correct road construction rehabilitation is to have precise and up-to-date information about road pavement condition and level degradation. Different road crack types were proposed in the state of art in order to provide useful information for making pavement maintenance strategies. For this reason, we present in this paper a novel research to automatically detect and classify road cracks on two-dimensional digital images. Indeed, our proposed package is composed of two methods: crack detection and crack classification. The first method consists in detecting the cracks on images acquired by the VIAPIX® system developed by our company ACTRIS. To do so, we are based on our unsupervised approach cited in for road crack detection on two-dimensional pavement images. Then, in order to categorize each of the detected cracks, the second method of our package is applied. Based on principal component analysis (PCA), our method permits the classification of all the detected cracks into three types: vertical, horizontal, and oblique. The obtained results demonstrate the efficiency of our robust approaches in terms of good detection and classification on a variety of pavement images.
In this paper we present a novel approach for road sign identification and geolocation based on Joint Transform Correlator “JTC” and VIAPIX module. The proposed method is divided into three parts: identification, gathering and geolocation. The first part permits to detect and identify road signs on images acquired by the VIAPIX module [1] developed by our company ACTRIS [2]. To do so, we are based on our own method cited in [3] for road sign identification. The second part of our proposed approach consists in gathering the identified road sign by using the JTC technique [4]. Since the VIAPIX® module provides images at an interval of one image per meter, we identify each road sign by finding the number of images where this road sign has been recognized while computing thereby on each of these images its corresponding pixel coordinates. Finally, each road sign is geolocated using its pixel coordinates on several images. At this stage, we are based on the axial stereovision method [5]. Indeed, relying on the pixel coordinates and the distance between different images, we compute the 3D coordinates of each road sign. Thus, GPS coordinates can be then found using the GPS position of the vehicle basing on Vincenty formulae [6].
Today, communications security, i.e. the discipline of preventing unauthorized interceptors from accessing telecommunications in an intelligible form, while still delivering content to the intended recipients, is a main issue in our modern society especially. In this paper, attention is drawn to the importance and relevance of optical correlation techniques for detection and tracking people. In order to be efficient, these techniques need pre- or post-processing steps to take into account the environmental conditions. The aim of this work is to improve the performance of the optical correlation method, based on a new decision process in order to reduce the false detection rate. To realize this, we propose a method using a VanderLugt correlator with a phase-only filter for face recognition using two criteria for decision making based on the values of the peak-to-correlation energy and the energy distribution in different parts of the correlation plane. In the three-step algorithm, the first stage consists by dividing the correlation plane into nine equal sub-planes. In the second stage the energy of each sub-plane is computed, while in the last stage the classification criterion is realized and the recognition rate is calculated. Numerous tests were performed using the Pointing Head Pose Image Database. They show the effectiveness of the method in terms of face recognition detection rate without pre-processing phase and with 0% false detection.
In this study, we suggest and validate an all-numerical implementation of a VanderLugt correlator which is optimized for
face recognition applications. The main goal of this implementation is to take advantage of the benefits (detection,
localization, and identification of a target object within a scene) of correlation methods and exploit the reconfigurability
of numerical approaches. This technique requires a numerical implementation of the optical Fourier transform. We pay
special attention to adapt the correlation filter to this numerical implementation. One main goal of this work is to reduce
the size of the filter in order to decrease the memory space required for real time applications. To fulfil this requirement,
we code the reference images with 8 bits and study the effect of this coding on the performances of several composite
filters (phase-only filter, binary phase-only filter). The saturation effect has for effect to decrease the performances of the
correlator for making a decision when filters contain up to nine references. Further, an optimization is proposed based for
an optimized segmented composite filter. Based on this approach, we present tests with different faces demonstrating
that the above mentioned saturation effect is significantly reduced while minimizing the size of the learning data base.
For face recognition applications, it is necessary to have a robust discrimination system. In this paper, a new method for
denoising the correlation plane and removing the zero-order term associated with alternate joint transform correlator
(JTC) architectures such as the nonlinear JTC (NJTC) and nonlinear non-zero-order JTC (NNJTC) is proposed. The
proposed technique is called nonlinear denoised JTC (NDJTC) which incorporates the attractive features of NJTC and
NNJTC to ensure good compromise between discrimination and robustness while denoising the correlation plane. To
investigate the performance of the proposed technique, various tests are performed using the PHPID face database and
the results demonstrate excellent behavior of the proposed NDJTC. In this technique, the zero-order term and the impact
of noise in the correlation plane were removed leading to increased robustness and discrimination ability of the NJTC.
To confirm these results, different comparisons were performed using a NJTC and a NNJTC. Finally, a study using
different levels of nonlinearity was conducted to find the best compromise between robustness and discrimination ability
of the proposed method.
Correlation is based pattern recognition is primarily based on the matching of contours between an unknown
target image and a known reference image. However, it does not usually include the color image information in
the decision making process. In order to render the correlation method sensitive to color change, we propose a
generalized method based on the decomposition of the target image in its three color components using, either
the normalized RGB (red, green, blue) color space, or the normalized HSV (hue, saturation, value) space. Then,
the correlation operation is carried out for each color component and the results are merged in order to make a
final decision. The aforementioned steps can alleviate majority of the problems associated with illumination
changes in the target image by utilizing color information of the target image. To overcome these problems, we
propose to convert the color based contour information into a signature corresponding to the color information of
the target image. Test results are presented to validate the effectiveness of the proposed technique.
Home automation is being implemented into more and more domiciles of the elderly and disabled in order to maintain their independence and safety. For that purpose, we propose and validate a surveillance video system, which detects various posture-based events. One of the novel points of this system is to use adapted Vander-Lugt correlator (VLC) and joint-transfer correlator (JTC) techniques to make decisions on the identity of a patient and his three-dimensional (3-D) positions in order to overcome the problem of crowd environment. We propose a fuzzy logic technique to get decisions on the subject's behavior. Our system is focused on the goals of accuracy, convenience, and cost, which in addition does not require any devices attached to the subject. The system permits one to study and model subject responses to behavioral change intervention because several levels of alarm can be incorporated according different situations considered. Our algorithm performs a fast 3-D recovery of the subject's head position by locating eyes within the face image and involves a model-based prediction and optical correlation techniques to guide the tracking procedure. The object detection is based on (hue, saturation, value) color space. The system also involves an adapted fuzzy logic control algorithm to make a decision based on information given to the system. Furthermore, the principles described here are applicable to a very wide range of situations and robust enough to be implementable in ongoing experiments.
In some recognition form applications (which require multiple images: facial identification or sign-language), many
images should be transmitted or stored. This requires the use of communication systems with a good security level
(encryption) and an acceptable transmission rate (compression rate). In the literature, several encryption and
compression techniques can be found. In order to use optical correlation, encryption and compression techniques cannot
be deployed independently and in a cascade manner. Otherwise, our system will suffer from two major problems. In fact,
we cannot simply use these techniques in a cascade manner without considering the impact of one technique over
another. Secondly, a standard compression can affect the correlation decision, because the correlation is sensitive to the
loss of information. To solve both problems, we developed a new technique to simultaneously compress & encrypt
multiple images using a BPOF optimized filter. The main idea of our approach consists in multiplexing the spectrums of
different transformed images by a Discrete Cosine Transform (DCT). To this end, the spectral plane should be divided
into several areas and each of them corresponds to the spectrum of one image. On the other hand, Encryption is achieved
using the multiplexing, a specific rotation functions, biometric encryption keys and random phase keys. A random phase
key is widely used in optical encryption approaches. Finally, many simulations have been conducted. Obtained results
corroborate the good performance of our approach. We should also mention that the recording of the multiplexed and
encrypted spectra is optimized using an adapted quantification technique to improve the overall compression rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.