Performance Comparison of graph-tool and NetworkX on Web Graph Domain Subgraphs: A Common Crawl Analysis

Krystian MAGDZIARZ and Stanisław SKRZYPECKI

Military University of Technology, Warsaw, Poland

https://doi.org/10.5171/2025.4642125

Abstract

Efficient graph processing is critical for web-scale analysis, yet practitioners lack empirical guidance for library selection on real-world data. We present a comprehensive performance comparison of graph-tool and NetworkX on Common Crawl web graph data, focusing on domain-level subgraph analysis. Through systematic benchmarking of seven core operations across thousands of domain subgraphs, we challenge the assumption that C++ libraries with Python bindings always outperform pure Python implementations. Our results reveal operation-dependent performance patterns: NetworkX excels in graph traversal operations (2.5-4.6× faster for connected components, shortest path, degree distribution) and community detection (7.5× faster), while graph-tool dominates computationally intensive algorithms (35× faster betweenness centrality, 4× faster clustering coefficient). Memory usage differs significantly, with NetworkX maintaining consistent 900-970 MB baseline versus graph-tool’s operation-dependent overhead reaching 2.5 GB. The domain-based decomposition methodology enables statistical analysis across diverse website structures, revealing that optimal library choice depends critically on specific operations, graph size, and available system resources rather than blanket performance assumptions.

Keywords: Graph processing, performance benchmarking, web graphs, Common Crawl, domain subgraphs, domain decomposition, NetworkX, graph-tool
Shares