Volume 2022 (30),
Article ID 4058722,
Methods and Applications in Artificial Intelligence and Machine Learning: 40AI 2022
Abstract
The paper presents the architecture concepts for building crawling clusters for data-driven on-page optimization tasks. The aim of the study is to develop a base architecture capable of continuous monitoring of on-page parameters in terms of SEO on a very large scale. The issue of building an efficient data crawling mechanism has been addressed in the literature since the spread of the Internet, however, the problem is not thoroughly described in relation to tools designed for SEO. This paper offers the concept of building a highly scalable environment designed to analyze key on-page metrics, the analysis of which will provide critical knowledge in the context of their optimization.
Keywords: crawling, spider cluster, content extraction, SEO, crawling cluster architecture