Katz Centrality, Perron-Frobenius, and PageRank

Katz Centrality, Perron-Frobenius, and PageRank#

Centrality measures the importance of nodes in a graph. This notebook develops two spectral centrality measures — Katz centrality and PageRank — and shows how the Perron-Frobenius theorem underpins both their existence and the convergence of power-iteration algorithms.

Katz Centrality#

Motivation. Eigenvector centrality assigns each node a score proportional to the sum of its neighbors’ scores: \(\mathbf{x} = \lambda^{-1} A \mathbf{x}\). This fails for nodes with degree zero (no walks reach them). Katz centrality fixes this by adding a baseline score to every node.

Definition. Given a graph with adjacency matrix \(A \in \mathbb{R}^{n \times n}\), the Katz centrality vector \(\mathbf{x} \in \mathbb{R}^n\) satisfies: $\(x_i = \alpha \sum_{j=1}^n A_{ij}\, x_j + \beta, \qquad i = 1, \ldots, n\)$

In matrix form: $\(\mathbf{x} = \alpha A \mathbf{x} + \beta \mathbf{1}\)$

Solving for \(\mathbf{x}\): $\(\boxed{\mathbf{x} = \beta\,(I - \alpha A)^{-1}\mathbf{1}}\)$

The series expansion \((I - \alpha A)^{-1} = \sum_{k=0}^\infty (\alpha A)^k\) converges when \(\alpha < 1/\rho(A)\), where \(\rho(A)\) is the spectral radius of \(A\). Entry \((\alpha A)^k_{ij}\) counts the number of walks of length \(k\) from \(j\) to \(i\), weighted by \(\alpha^k\) — so Katz centrality sums up all walks, with exponentially decaying weights.

Perron-Frobenius Theorem#

The Perron-Frobenius theorem guarantees that spectral centrality measures are well-defined and unique.

Theorem (Perron 1907, Frobenius 1912). Let \(A\) be an \(n \times n\) real matrix.

  1. (Perron) If \(A > 0\) (all entries strictly positive), then:

    • There exists a unique eigenvalue \(\rho(A) > 0\) equal to the spectral radius.

    • \(\rho(A)\) is a simple root of the characteristic polynomial.

    • The corresponding eigenvector \(\mathbf{v} > 0\) is entrywise positive and unique up to scaling.

    • All other eigenvalues \(\lambda\) satisfy \(|\lambda| < \rho(A)\).

  2. (Frobenius) The conclusions of (1) hold for any irreducible non-negative matrix \(A \geq 0\).

Corollary (Katz centrality). For any connected graph, the adjacency matrix \(A\) is irreducible and non-negative. Perron-Frobenius guarantees \(\rho(A) > 0\) exists, so \((I - \alpha A)^{-1}\) has positive entries for any \(0 < \alpha < 1/\rho(A)\), and the Katz centrality vector \(\mathbf{x}\) is well-defined and strictly positive.

Corollary (power iteration). The iteration \(\mathbf{x}^{(t+1)} = A\mathbf{x}^{(t)} / \|A\mathbf{x}^{(t)}\|\) converges to the Perron eigenvector at rate \((|\lambda_2|/\rho(A))^t\), where \(\lambda_2\) is the second largest eigenvalue in magnitude.

PageRank#

Setup. Let \(G = (V, E)\) be a directed graph on \(n\) nodes (the web graph). Define:

  • \(A \in \{0,1\}^{n \times n}\): adjacency matrix with \(A_{ij} = 1\) iff edge \(j \to i\) exists.

  • \(d_j^{\mathrm{out}} = \sum_i A_{ij}\): out-degree of node \(j\).

  • \(M = A D^{-1}\): column-stochastic transition matrix, where \(D = \mathrm{diag}(d_1^{\mathrm{out}}, \ldots, d_n^{\mathrm{out}})\). Entry \(M_{ij}\) is the probability of moving from \(j\) to \(i\) in one step of a random walk.

Definition (PageRank). With damping factor \(d \in (0,1)\) (typically \(d = 0.85\)), the PageRank vector \(\boldsymbol{\pi} \in \mathbb{R}^n_+\) with \(\|\boldsymbol{\pi}\|_1 = 1\) is the unique solution to: $\(\boldsymbol{\pi} = d\, M \boldsymbol{\pi} + \frac{1-d}{n}\mathbf{1}\)$

Equivalently, \(\boldsymbol{\pi}\) is the stationary distribution of the Google matrix: $\(G_P = d\, M + \frac{1-d}{n}\mathbf{1}\mathbf{1}^T\)$

Since \((1-d)/n > 0\), the matrix \(G_P\) is strictly positive and column-stochastic. By Perron-Frobenius, \(G_P\) has a unique stationary distribution \(\boldsymbol{\pi}\), guaranteeing PageRank is well-defined regardless of the graph structure.

Closed form. Rearranging: $\(\boldsymbol{\pi} = \frac{1-d}{n}\left(I - d\,M\right)^{-1}\mathbf{1}\)$

Power iteration. The fixed-point iteration $\(\boldsymbol{\pi}^{(t+1)} = d\, M \boldsymbol{\pi}^{(t)} + \frac{1-d}{n}\mathbf{1}\)\( converges to \)\boldsymbol{\pi}\( at rate \)d^t\( (the subdominant eigenvalue of \)G_P\( has magnitude \)\leq d\(). With \)d = 0.85\(, roughly 50 iterations suffice for \)10^{-6}$ accuracy.

import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

np.random.seed(42)
# ---- Katz Centrality ----

def katz_centrality(G, alpha=None, beta=1.0):
    """
    Compute Katz centrality via the closed form x = beta * (I - alpha*A)^{-1} * 1.
    If alpha is None, use 0.9 / rho(A) (90% of the convergence radius).
    """
    A = nx.to_numpy_array(G)
    n = A.shape[0]
    rho = np.max(np.abs(np.linalg.eigvals(A)))
    if alpha is None:
        alpha = 0.9 / rho
    x = beta * np.linalg.solve(np.eye(n) - alpha * A, np.ones(n))
    return x / x.sum()   # normalize to sum 1 for comparability


# Build a small undirected test graph (Barabasi-Albert, n=20)
G_test = nx.barabasi_albert_graph(20, 2, seed=42)

katz_scores = katz_centrality(G_test)
nx_katz    = nx.katz_centrality_numpy(G_test)
nx_katz_arr = np.array([nx_katz[v] for v in range(20)])
nx_katz_arr /= nx_katz_arr.sum()

print('Max deviation from networkx katz_centrality_numpy:',
      np.max(np.abs(katz_scores - nx_katz_arr)))
# ---- PageRank via power iteration ----

def pagerank_power(G, d=0.85, tol=1e-8, max_iter=200):
    """
    Compute PageRank by iterating pi = d*M*pi + (1-d)/n * 1.
    Handles dangling nodes (zero out-degree) by redistributing
    their mass uniformly.
    """
    n = G.number_of_nodes()
    nodes = list(G.nodes())
    idx = {v: i for i, v in enumerate(nodes)}

    # Column-stochastic transition matrix
    A = nx.to_numpy_array(G, nodelist=nodes)
    out_deg = A.sum(axis=0)                      # out-degree of each column
    dangling = out_deg == 0
    out_deg[dangling] = 1                         # avoid division by zero
    M = A / out_deg[np.newaxis, :]               # column-normalise
    M[:, dangling] = 1.0 / n                     # dangling -> uniform teleport

    pi = np.ones(n) / n
    for _ in range(max_iter):
        pi_new = d * M @ pi + (1 - d) / n
        if np.linalg.norm(pi_new - pi, 1) < tol:
            break
        pi = pi_new

    return {nodes[i]: pi_new[i] for i in range(n)}


# Test on a directed graph
DG = nx.scale_free_graph(30, seed=42)          # directed scale-free graph
DG = nx.DiGraph(DG)                            # remove multi-edges

pr_custom = pagerank_power(DG, d=0.85)
pr_nx     = nx.pagerank(DG, alpha=0.85)

err = max(abs(pr_custom[v] - pr_nx[v]) for v in DG.nodes())
print(f'Max deviation from nx.pagerank: {err:.2e}')
# Convergence speed: track L1 error vs iteration count
def pagerank_convergence(G, d=0.85, max_iter=80):
    n = G.number_of_nodes()
    nodes = list(G.nodes())
    A = nx.to_numpy_array(G, nodelist=nodes)
    out_deg = A.sum(axis=0)
    dangling = out_deg == 0
    out_deg[dangling] = 1
    M = A / out_deg[np.newaxis, :]
    M[:, dangling] = 1.0 / n

    pi = np.ones(n) / n
    pr_true = np.array([pr_nx[v] for v in nodes])
    errors = []
    for _ in range(max_iter):
        pi_new = d * M @ pi + (1 - d) / n
        errors.append(np.linalg.norm(pi_new - pr_true, 1))
        pi = pi_new
    return errors

errors = pagerank_convergence(DG)
iters = np.arange(1, len(errors) + 1)

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

# Left: convergence plot with theoretical d^t bound
axes[0].semilogy(iters, errors, color='steelblue', label='L1 error')
axes[0].semilogy(iters, errors[0] * 0.85**iters, 'r--', label='d^t bound (d=0.85)')
axes[0].set_xlabel('Iteration')
axes[0].set_ylabel('L1 error')
axes[0].set_title('PageRank power iteration convergence')
axes[0].legend()

# Right: graph with node size ~ PageRank score
pr_arr = np.array([pr_nx[v] for v in DG.nodes()])
pos = nx.spring_layout(DG, seed=42)
nx.draw_networkx(
    DG, pos=pos, ax=axes[1],
    node_size=3000 * pr_arr,
    node_color=pr_arr, cmap='viridis',
    edge_color='lightgray', with_labels=False,
    arrows=True, arrowsize=8
)
axes[1].set_title('Directed graph: node size and color ~ PageRank')
axes[1].axis('off')

plt.tight_layout()
plt.show()