Paper
11 October 2023 Research on a watermarking recognition method for PDF documents based on natural language processing
Rui Liu, Shu Li, Zizun Li, Yizhen Sun, Zheru Cai, Dawei Dai
Author Affiliations +
Proceedings Volume 12918, Fourth International Conference on Computer Science and Communication Technology (ICCSCT 2023); 129180M (2023) https://doi.org/10.1117/12.3009389
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2023), 2023, Wuhan, China
Abstract
Digital watermarking, as a technology to protect copyright, integrity, copy prevention or direction tracking of digital products, is currently commonly used to protect confidential documents and files within enterprises. In view of the PDF document format which is widely used in enterprise documents, this paper presents a method of PDF document watermark recognition based on natural language processing technology. By collecting a large number of PDF documents, using the improved N-gram language model based on forward and reverse matching algorithms to segment text content, a KenLM language model based on language model probability and conditional probability calculation rules is established to identify PDF document watermarks, which effectively improves the accuracy of PDF document watermark recognition. The validity of this method is verified by selecting a PDF format document of an enterprise, training the language model and calculating the prediction accuracy.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Rui Liu, Shu Li, Zizun Li, Yizhen Sun, Zheru Cai, and Dawei Dai "Research on a watermarking recognition method for PDF documents based on natural language processing", Proc. SPIE 12918, Fourth International Conference on Computer Science and Communication Technology (ICCSCT 2023), 129180M (11 October 2023); https://doi.org/10.1117/12.3009389
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Digital watermarking

Education and training

Detection and tracking algorithms

Reverse modeling

Information security

Statistical modeling

Associative arrays

Back to Top