A Comparative Analysis of Representation Flow in State-Space and Transformer Architectures
State-space models (SSMs) have emerged as promising alternatives to Transformers, particularly for long-context tasks, due to their efficiency in modeling long-range dependencies through structured state transitions. While prior work has focused on interpreting final-layer outputs, this study investigates the feature flow across layers in SSMs. By comparing this behavior to Transformers, we show fundamental differences in how contextual information is encoded and propagated. Our analysis reveals trade-offs in efficiency and expressivity, offering a deeper understanding of learning dynamics in both architectures. This work not only advances our understanding of SSMs but also lays the foundation for designing hybrid models that combine the strengths of both paradigms.