The Neural Matthew Effect: Low Effective Degrees of Freedom in Training

I use the Neural Matthew Effect to describe a recurring pattern in deep network training: a small number of directions, modules, or connections carry most of the learning signal. In parameter space, this appears as low-rank updates; in gradient space, as concentration along dominant directions; at the functional level, as the strengthening of existing circuits; and at the structural level, as increasingly uneven interactions among neurons. Throughout this article, low rank does not mean strictly low algebraic rank. It means low effective rank. For neural network weights, gradients, and the Hessian, strict rank is often unstable: an arbitrarily small perturbation can turn a matrix into full rank, but that does not mean all directions are equally important. The more useful question is not “how many singular values are nonzero?”, but “how many directions contain most of the spectral mass?” ...

May 10, 2026 · 12 min

Why is LLM's Output Detectable

Prerequisite Basic knowledge of the structure of Transformer and RNN Understand how to train AI models in NLP Notations: Symbol Meaning $x_t$ token at time $t$ $x_{:t}$ tokens before time $t$ $x_{a:b}$ tokens after time $a$ (included) and before time $b$ (excluded) $p$ the distribution of ground truth $q$ model’s prediction $v$ vocabulary size (# of different tokens) $d$ embedding dimension (dimension of hidden states) Main This blog will explore some potential factors contributing to the distinction between text generated by LLMs and human’s text. ...

December 10, 2025 · 9 min

Working Memory in Neural Networks

Classified by their duration, there are at least three kinds of memory in humans: Working Memory(WM), Short-Term Memory(STM), and Long-Term Memory(LTM). WM can be seen as the internal state of the system that varies across the entire process. STM can be considered the memory for a milestone or key objects in a multi-stage task. And finally, LTM can be conceived as the neural network itself. Persistent knowledge is embedded in architecture. ...

December 7, 2024 · 9 min