Neumo

Reimagining Internal Family Systems Through the Lens of AI

IFS Concepts The Internal Family Systems is a psychotherapy proposed by Richard Schwartz in the 1980s. In this model, everybody is a combination of different parts like those characters in the film, Inside Out. But the difference is that only emotions are anthropomorphized in the animation while in IFS, a part can also be a piece of thought or mindset. For example, if you have a social appearance anxiety, there would always be a voice in your mind criticizing your appearance. In IFS we consider the source this voice as a part of your mind. It is kind of like the dramaturgical theory with IFS focusing more on internal feelings and dramaturgical theory emphesizing more on social characters. ...

Notes on Sauer's Lemma

Introduction Every binary classifier is a function mapping its input, which is an element in an enumerable dataset, to 0 or 1. Equivalently, we could regard the classifier as a function $ f : \mathbb{N} \rightarrow \{ 0, 1 \} $. We have a set of hypotheses $\mathcal{H}$ from which a function is chosen to maximize the classification accuracy. It is perfect if $\mathcal{H}$ contains all possible functions $ f : \mathbb{N} \rightarrow \{ 0, 1 \} $, which indicates a universal approximator. However, when the expressiveness of our model is limited by computational cost or the size of the magnitude of parameters, it remains a problem to quantitatively measure the ability to approximate the ground truth. For example, if $\mathcal{H}$ only consists of functions which produce 1 only on one data point, namely, ...

A Succinct Proof of Decoupled Parallel Backpropagation Convergence Lemma

The meaning of the notations below complies with the original paper. $$ \begin{align*} \mathbb{E} [f (w^{t + 1})] - f (w^t) & \leqslant \nabla f (w^t)^{\top} \mathbb{E} [(w^{t + 1} - w^t)] + \frac{L}{2} \mathbb{E} [\| w^{t + 1} - w^t \|^2]\\ & = - \gamma_t \nabla f (w^t)^{\top} \mathbb{E} \left[ \sum^K_{k = 1} \nabla f_{\mathcal{G} (k), x_i (t - K + k)} (w^{t - K + k}) \right] + \frac{L \gamma_t^2}{2} \mathbb{E} \left[ \left\| \sum^K_{k = 1} \nabla f_{\mathcal{G} (k), x_i (t - K + k)} (w^{t - K + k}) \right\|^2 \right]\\ & = - \gamma_t \| \nabla f (w^t) \|^2 - \gamma_t \nabla f (w^t)^{\top} \left( \sum_{k = 1}^K \nabla f_{\mathcal{G} (k)} (w^{t - K + k}) - \nabla f (w^t) \right) + \frac{K L \gamma_t^2}{2} \sum_{k = 1}^K \mathbb{E} [\| \nabla f_{\mathcal{G} (k), x_i (t - K + k)} (w^{t - K + k}) \|^2]\\ & \leqslant - \gamma_t \| \nabla f (w^t) \|^2 + \frac{\gamma_t}{2} \| \nabla f (w^t) \|^2 + \frac{\gamma_t}{2} \left\| \sum_{k = 1}^K \nabla f_{\mathcal{G} (k)} (w^{t - K + k}) - \nabla f (w^t) \right\|^2 + \frac{K^2 L M \gamma_t^2}{2}\\ & \leqslant - \frac{\gamma_t}{2} \| \nabla f (w^t) \|^2 + \frac{K \gamma_t}{2} \sum_{k = 1}^K \| \nabla f_{\mathcal{G} (k)} (w^{t - K + k}) - \nabla f_{\mathcal{G} (k)} (w^t) \|^2 + \frac{K^2 L M \gamma_t^2}{2}\\ & \leqslant - \frac{\gamma_t}{2} \| \nabla f (w^t) \|^2 + \frac{K \gamma_t}{2} \sum_{k = 1}^K \| \nabla f (w^{t - K + k}) - \nabla f (w^t) \|^2 + \frac{K^2 L M \gamma_t^2}{2}\\ & \leqslant - \frac{\gamma_t}{2} \| \nabla f (w^t) \|^2 + \frac{K^2 L M \gamma_t^2}{2} + \frac{K L^2 \gamma_t}{2} \sum_{k = 1}^K \| w^{t - K + k} - w^t \|^2\\ & \leqslant - \frac{\gamma_t}{2} \| \nabla f (w^t) \|^2 + \frac{K^2 L M \gamma_t^2}{2} + \frac{K^4 L^2 M^2 \sigma \gamma_t^2}{2}\\ & = - \frac{\gamma_t}{2} \| \nabla f (w^t) \|^2 + \gamma_t^2 \frac{K^2 L M}{2} (1 + K^2 L M \sigma) \end{align*} $$

Intuition of Universal Approximation Theorem

Universal approximation theorem states that an infinite width single layer neural network is able to approximate an arbitrary continuous function uniformly with a squashing function. It also have some stronger statement for the approximation to Borel measurable function. But continuous function is enough in our case. And we may intuitively expect that the space of all continuous functions could approximate the space of Borel measurable functions almost surely in the sense of probability. But we would not delve into the details of it. For more information, check the original paper(Multilayer Feedforward Networks are Universal Approximators). ...

Optimal Code Existence for Countably Infinite Sources

Huffman coding demonstrates the existence and a concrete construction for sources with finite alphabet. However, the construction fails when it comes to infinity. We would prove the existence of the optimal code for sources with a countably infinite alphabet. Notations We only use 0 and 1 to construct codewords without loss of generality and the base of $\log$ is 2 by default. $X$ is the random variable from the source whose probability distribution is $p_1 , p_2 , p_3 , \cdots$. And $l_1, l_2, l_3 \cdots$ denote the lengths of the corresponding codewords. Without loss of generality, we make $p_1 \geq p_2 \geq p_3 \geq \cdots$. By the rearrangement inequality, we may set $l_1 \leq l_2 \leq l_3 \cdots$. The problem we are considering here can be modeled as an integer optimization problem: ...

A Problem on `va_list` in C Language

What’s the output of the following codes? And Why? #include <stdio.h> int main(int argc, char *argv[]) { printf("%#018llx\n", (char)0x80); printf("%#018llx\n", (unsigned char)0x80); return 0; } (You might encounter warnings informing you of the inconsistency between the specified format and the given arguments. Let’s neglect them.) The answer is 0x00000000ffffff80 0x0000000000000080 Questions We have two questions: Is it overloading that contributes to different behaviors when different types of arguments are passed. Why is the first output 0x00000000ffffff80 instead of 0xffffffffffffff80? Answers The analysis is based on the printf implementation in x86 linux boot ...

My Paper-Reading Workflow in 2023

Main The conventional approach to storing a file involves fitting it into a hierarchical structure that necessitates a comprehensive overview of the corresponding field before the very first paper reading. You may place them flattened in an inbox folder before the tedious task of reindexing and categorizing hundreds of them hierarchically, otherwise, the overwhelming folder becomes your first obstacle to retrieve information. However, it can be deduced that both methods involve an additional burden of metal. Furthermore, the fact that papers can be organized in multiple ways compounds the disaster in a hierarchical system where one paper is only placed in one place. Therefore, an ideal structure of notetaking of papers should be bottom-up, where papers are connected like a web and a single paper can be attached to distinct categories. ...

Strictness of Markov Properties

A stochastic process $\{X_i\}_{i=0}^\infty $ is $n$-Markov if $$P(X_{t+n}|X_{t+n-1}, X_{t+n-2}, \cdots , X_{t}) = P(X_{t+n}|X_{t+n-1})$$for any $t \ge 0$. We would prove that An $n$-Markov stochastic process must be $m$-Markov while is not necessarily $l$-Markov where $l > n > m$ N+1 to N First, we prove an (n+1)-Markov stochastic process must be n-Markov. Proof: Suppose $\{X_i\}_{i=0}^\infty$ is an $(n+1)$-Markov stochastic process. We have $$P(X_{t+n}|X_{t+n-1}, X_{t+n-2}, \cdots, X_t) = P(X_{t+n} | X_{t+n-1})$$ for any $t \ge 0$, deriving ...

Some QR Codes Generated by ControlNet

I have tried HARD to generate a photo of a brain or a neuron that can be scanned. But unfortunately, either it cannot be recognized as a QR code or it is irrelevant to my prompts.😭 It is much easier to generate that of girls or natural scenery though. Perhaps it’s better to choose another stable diffusion checkpoint. (By the way, thanks to Stable Diffusion WebUI, it is quite easy to deploy famous diffusion models.) ...

Build Singularity/Docker Image on a Singularity Server without `sudo` Privileges

In a docker container, you have full privileges to build the image of singularity or docker in it. But if only singularity is installed on the server and the root user sets up neither --fakeroot nor proot and you have exhausted your remote build minutes, what trick can you play to work around those restrictions? Software Selection To solve the problem, we need a virtual machine under control on the server for enough privileges to execute singularity build(or docker build) which requires sudo if you meet such a tough condition as mentioned before. Then an OS is run on the virtual machine. Finally, we could build the image. ...