Paper
21 December 2023 Focused crawler based on concept context graph
Xiaolei Li, Yajun Du, Xiaoping Huang, Yufeng Hai
Author Affiliations +
Proceedings Volume 12970, Fourth International Conference on Signal Processing and Computer Science (SPCS 2023); 1297034 (2023) https://doi.org/10.1117/12.3012107
Event: Fourth International Conference on Signal Processing and Computer Science (SPCS 2023), 2023, Guilin, China
Abstract
In order to improve the performance of focused crawler, a topic crawling strategy based on Concept Context Graph(CCG) is proposed. User knowledge background is constructed from the initial topic text returned by zhishi.me Knowledge Graph, and it is transformed into a CCG to guide the crawling of focused crawler. During the crawling process, the crawler treats the hyperlinks it encounters as a Virtual Concept(VC), and then calculates the Sememe Similarity(SS) between the VC and the core concept in CCG based on HowNet. According to the relationship between the SS value and the given threshold, it is decided whether to keep the link or not, and the CCG is updated for the reserved VC. The corresponding hyperlinks of the preserved VC are then stored in the queue to be crawled according to the order of SS value from high to low. Through experiments, it can be seen that the topic crawling strategy can improve the performance of focused crawler.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Xiaolei Li, Yajun Du, Xiaoping Huang, and Yufeng Hai "Focused crawler based on concept context graph", Proc. SPIE 12970, Fourth International Conference on Signal Processing and Computer Science (SPCS 2023), 1297034 (21 December 2023); https://doi.org/10.1117/12.3012107
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Semantics

Internet

Mining

Software engineering

Statistical methods

Strategic intelligence

Back to Top