In the area of Joint ISR (Joint Intelligence, Surveillance and Reconnaissance), it is important to have robust (semi- )automatic support for the identification and processing of text-based information products like formal reports. The representation of reports as text unifies contributions from heterogeneous information sources (e.g. delivered by various intelligence disciplines). Such text-based information products also often encapsulate dense information of high-quality. Therefore, the capability for machine processing to adequately integrate various pieces of information from different sources and display them to the user in a coherent and comprehensible manner is essential for maximizing the utility and accessibility of intelligence data/report information. Current AI models and methods from the field of Natural Language Processing (NLP) can make valuable contributions to the processing of text-based information in general, e.g. textsummarization, extraction of named entities or other important information-parts. They are widely used for social media applications. However, to adopt this capability for the military domain, they have to be adapted to the specific vocabulary of the Joint ISR domain and the grammatical structures. Especially challenging is the limited grammatical variance found within these text products, limiting the scarcity of available sample data suitable for training purposes even further. This publication examines the variations in training data for NLP methodologies that emerge when dealing with the Joint ISR domain and its reporting procedures. An approach is presented to capture entities within formalized texts using Named Entity Recognition (NER) and to illustrate how this approach can support the processing of textual information, especially formal reports, in the field of Joint ISR. The value of formal reporting is also emphasized for achieving syntactic and semantic interoperability within Joint ISR networks.
|