Paper
25 May 2023 Design and research of Chinese word segmentation method in architecture field
Peng Li, Fan Hang, Junyan Cao, Ziwei Guan
Author Affiliations +
Proceedings Volume 12636, Third International Conference on Machine Learning and Computer Application (ICMLCA 2022); 126363V (2023) https://doi.org/10.1117/12.2675253
Event: Third International Conference on Machine Learning and Computer Application (ICMLCA 2022), 2022, Shenyang, China
Abstract
At present, one of the problems of Chinese word segmentation is the low efficiency of Out-Of-Vocabulary (OOV) detection in the field of expertise. Due to restrictions on the characteristics of the words of the profession itself, the word segmentation of architectural texts is not very effective in identifying OOV. This paper proposes a new method to recognize OOV, which is an unsupervised method based on improved algorithm and entropy. This paper uses algorithms to identify strings with relatively large interdependencies between texts, filters through the stop-words vocabulary and corpus to obtain candidate dictionaries, calculates the entropy between candidate dictionaries, and determine the final OOV by setting an accurate threshold, Add the recognized OOV as a professional dictionary for word segmentation. Experiments show that by using the algorithm proposed in this paper, the recognition effect of OOV in architectural text has been significantly improved. Compared with the algorithm, P ( precision ) increased by 15.92 %, R ( recall ) increased by 7.61 %. Therefore, the final word segmentation precision can reach 82.15% and recall can reach 80.45%.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Peng Li, Fan Hang, Junyan Cao, and Ziwei Guan "Design and research of Chinese word segmentation method in architecture field", Proc. SPIE 12636, Third International Conference on Machine Learning and Computer Application (ICMLCA 2022), 126363V (25 May 2023); https://doi.org/10.1117/12.2675253
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Detection and tracking algorithms

Associative arrays

Tunable filters

Design and modelling

Statistical methods

Information theory

Systems modeling

Back to Top