Katz Centrality, Perron-Frobenius, and PageRank

Katz Centrality, Perron-Frobenius, and PageRank#

Centrality measures the importance of nodes in a graph. This notebook develops two spectral centrality measures — Katz centrality and PageRank — and shows how the Perron-Frobenius theorem underpins both their existence and the convergence of power-iteration algorithms.

Katz Centrality#

Motivation. Eigenvector centrality assigns each node a score proportional to the sum of its neighbors’ scores: $\mathbf{x} = \lambda^{-1} A \mathbf{x}$. This fails for nodes with degree zero (no walks reach them). Katz centrality fixes this by adding a baseline score to every node.

Definition. Given a graph with adjacency matrix $A \in \mathbb{R}^{n \times n}$, the Katz centrality vector $\mathbf{x} \in \mathbb{R}^n$ satisfies: $$x_i = \alpha \sum_{j=1}^n A_{ij}\, x_j + \beta, \qquad i = 1, \ldots, n$$

In matrix form: $$\mathbf{x} = \alpha A \mathbf{x} + \beta \mathbf{1}$$

Solving for $\mathbf{x}$: $$\boxed{\mathbf{x} = \beta\,(I - \alpha A)^{-1}\mathbf{1}}$$

The series expansion $(I - \alpha A)^{-1} = \sum_{k=0}^\infty (\alpha A)^k$ converges when $\alpha < 1/\rho(A)$, where $\rho(A)$ is the spectral radius of $A$. Entry $(\alpha A)^k_{ij}$ counts the number of walks of length $k$ from $j$ to $i$, weighted by $\alpha^k$ — so Katz centrality sums up all walks, with exponentially decaying weights.

Perron-Frobenius Theorem#

The Perron-Frobenius theorem guarantees that spectral centrality measures are well-defined and unique.

Theorem (Perron 1907, Frobenius 1912). Let $A$ be an $n \times n$ real matrix.

(Perron) If $A > 0$ (all entries strictly positive), then:
- There exists a unique eigenvalue $\rho(A) > 0$ equal to the spectral radius.
- $\rho(A)$ is a simple root of the characteristic polynomial.
- The corresponding eigenvector $\mathbf{v} > 0$ is entrywise positive and unique up to scaling.
- All other eigenvalues $\lambda$ satisfy $|\lambda| < \rho(A)$.
(Frobenius) The conclusions of (1) hold for any irreducible non-negative matrix $A \geq 0$.

Corollary (Katz centrality). For any connected graph, the adjacency matrix $A$ is irreducible and non-negative. Perron-Frobenius guarantees $\rho(A) > 0$ exists, so $(I - \alpha A)^{-1}$ has positive entries for any $0 < \alpha < 1/\rho(A)$, and the Katz centrality vector $\mathbf{x}$ is well-defined and strictly positive.

Corollary (power iteration). The iteration $\mathbf{x}^{(t+1)} = A\mathbf{x}^{(t)} / \|A\mathbf{x}^{(t)}\|$ converges to the Perron eigenvector at rate $(|\lambda_2|/\rho(A))^t$, where $\lambda_2$ is the second largest eigenvalue in magnitude.

PageRank#

Setup. Let $G = (V, E)$ be a directed graph on $n$ nodes (the web graph). Define:

$A \in \{0,1\}^{n \times n}$: adjacency matrix with $A_{ij} = 1$ iff edge $j \to i$ exists.
$d_j^{\mathrm{out}} = \sum_i A_{ij}$: out-degree of node $j$.
$M = A D^{-1}$: column-stochastic transition matrix, where $D = \mathrm{diag}(d_1^{\mathrm{out}}, \ldots, d_n^{\mathrm{out}})$. Entry $M_{ij}$ is the probability of moving from $j$ to $i$ in one step of a random walk.

Definition (PageRank). With damping factor $d \in (0,1)$ (typically $d = 0.85$), the PageRank vector $\boldsymbol{\pi} \in \mathbb{R}^n_+$ with $\|\boldsymbol{\pi}\|_1 = 1$ is the unique solution to: $$\boldsymbol{\pi} = d\, M \boldsymbol{\pi} + \frac{1-d}{n}\mathbf{1}$$

Equivalently, $\boldsymbol{\pi}$ is the stationary distribution of the Google matrix: $$G_P = d\, M + \frac{1-d}{n}\mathbf{1}\mathbf{1}^T$$

Since $(1-d)/n > 0$, the matrix $G_P$ is strictly positive and column-stochastic. By Perron-Frobenius, $G_P$ has a unique stationary distribution $\boldsymbol{\pi}$, guaranteeing PageRank is well-defined regardless of the graph structure.

Closed form. Rearranging: $$\boldsymbol{\pi} = \frac{1-d}{n}\left(I - d\,M\right)^{-1}\mathbf{1}$$

Power iteration. The fixed-point iteration $$\boldsymbol{\pi}^{(t+1)} = d\, M \boldsymbol{\pi}^{(t)} + \frac{1-d}{n}\mathbf{1}$$ converges to $\boldsymbol{\pi}$ at rate $d^t$ (the subdominant eigenvalue of $G_P$ has magnitude $\leq d$). With $d = 0.85$, roughly 50 iterations suffice for $10^{-6}$ accuracy.

import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

np.random.seed(42)

# ---- Katz Centrality ----

def katz_centrality(G, alpha=None, beta=1.0):
    """
    Compute Katz centrality via the closed form x = beta * (I - alpha*A)^{-1} * 1.
    If alpha is None, use 0.9 / rho(A) (90% of the convergence radius).
    """
    A = nx.to_numpy_array(G)
    n = A.shape[0]
    rho = np.max(np.abs(np.linalg.eigvals(A)))
    if alpha is None:
        alpha = 0.9 / rho
    x = beta * np.linalg.solve(np.eye(n) - alpha * A, np.ones(n))
    return x / x.sum()   # normalize to sum 1 for comparability


# Build a small undirected test graph (Barabasi-Albert, n=20)
G_test = nx.barabasi_albert_graph(20, 2, seed=42)

katz_scores = katz_centrality(G_test)
nx_katz    = nx.katz_centrality_numpy(G_test)
nx_katz_arr = np.array([nx_katz[v] for v in range(20)])
nx_katz_arr /= nx_katz_arr.sum()

print('Max deviation from networkx katz_centrality_numpy:',
      np.max(np.abs(katz_scores - nx_katz_arr)))

# ---- PageRank via power iteration ----

def pagerank_power(G, d=0.85, tol=1e-8, max_iter=200):
    """
    Compute PageRank by iterating pi = d*M*pi + (1-d)/n * 1.
    Handles dangling nodes (zero out-degree) by redistributing
    their mass uniformly.
    """
    n = G.number_of_nodes()
    nodes = list(G.nodes())
    idx = {v: i for i, v in enumerate(nodes)}

    # Column-stochastic transition matrix
    A = nx.to_numpy_array(G, nodelist=nodes)
    out_deg = A.sum(axis=0)                      # out-degree of each column
    dangling = out_deg == 0
    out_deg[dangling] = 1                         # avoid division by zero
    M = A / out_deg[np.newaxis, :]               # column-normalise
    M[:, dangling] = 1.0 / n                     # dangling -> uniform teleport

    pi = np.ones(n) / n
    for _ in range(max_iter):
        pi_new = d * M @ pi + (1 - d) / n
        if np.linalg.norm(pi_new - pi, 1) < tol:
            break
        pi = pi_new

    return {nodes[i]: pi_new[i] for i in range(n)}


# Test on a directed graph
DG = nx.scale_free_graph(30, seed=42)          # directed scale-free graph
DG = nx.DiGraph(DG)                            # remove multi-edges

pr_custom = pagerank_power(DG, d=0.85)
pr_nx     = nx.pagerank(DG, alpha=0.85)

err = max(abs(pr_custom[v] - pr_nx[v]) for v in DG.nodes())
print(f'Max deviation from nx.pagerank: {err:.2e}')

# Convergence speed: track L1 error vs iteration count
def pagerank_convergence(G, d=0.85, max_iter=80):
    n = G.number_of_nodes()
    nodes = list(G.nodes())
    A = nx.to_numpy_array(G, nodelist=nodes)
    out_deg = A.sum(axis=0)
    dangling = out_deg == 0
    out_deg[dangling] = 1
    M = A / out_deg[np.newaxis, :]
    M[:, dangling] = 1.0 / n

    pi = np.ones(n) / n
    pr_true = np.array([pr_nx[v] for v in nodes])
    errors = []
    for _ in range(max_iter):
        pi_new = d * M @ pi + (1 - d) / n
        errors.append(np.linalg.norm(pi_new - pr_true, 1))
        pi = pi_new
    return errors

errors = pagerank_convergence(DG)
iters = np.arange(1, len(errors) + 1)

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

# Left: convergence plot with theoretical d^t bound
axes[0].semilogy(iters, errors, color='steelblue', label='L1 error')
axes[0].semilogy(iters, errors[0] * 0.85**iters, 'r--', label='d^t bound (d=0.85)')
axes[0].set_xlabel('Iteration')
axes[0].set_ylabel('L1 error')
axes[0].set_title('PageRank power iteration convergence')
axes[0].legend()

# Right: graph with node size ~ PageRank score
pr_arr = np.array([pr_nx[v] for v in DG.nodes()])
pos = nx.spring_layout(DG, seed=42)
nx.draw_networkx(
    DG, pos=pos, ax=axes[1],
    node_size=3000 * pr_arr,
    node_color=pr_arr, cmap='viridis',
    edge_color='lightgray', with_labels=False,
    arrows=True, arrowsize=8
)
axes[1].set_title('Directed graph: node size and color ~ PageRank')
axes[1].axis('off')

plt.tight_layout()
plt.show()

Katz Centrality, Perron-Frobenius, and PageRank

Contents

Katz Centrality, Perron-Frobenius, and PageRank#

Katz Centrality#

Perron-Frobenius Theorem#

PageRank#