Back to Research

Co-Activation Spectra as a Map of General vs. Task-Specific Structure in Pretrained Vision Transformers

SVD analysis of attention head co-activation matrices reveals layer-dependent rank structure that characterizes general vs. task-specific representations.

Vision Transformers · Attention · SVD · Continual Learning

Pretrained Vision Transformers contain structured patterns of attention head co-activation that vary systematically by depth. This brief proposes SVD analysis of cross-covariance matrices between input embeddings and attention outputs as a diagnostic for characterizing where general structure lives versus where task-specific specialization emerges.

Key finding: layer-dependent rank structure. Early layers (0-4) exhibit effective rank near 1.0 — a single dominant co-activation direction, consistent with low-level feature extraction that generalizes across tasks. Late layers (9-11) show rich multi-dimensional structure with effective rank up to 3.0 and 10-12 components needed to capture 90% of variance, consistent with task-specific semantic representations.

Cross-task consistency confirms this is architectural structure, not task-specific artifact: the spectral profiles are stable across different task distributions on the same pretrained backbone.

The implications extend to continual learning: seeding mechanisms for structural commitment (as in Latent Topology Networks) should be calibrated per-layer — scalar seeds for early layers, multi-dimensional seeds for late layers — rather than using a single global hyperparameter.

Request Access to This Paper

Submit your details and we'll follow up with access to the full paper.