Don’t click in, this post is a placeholder, update pending.
]]>
status: being updated.
The original title is: ‘Neuronal Dynamics, the Basics’ but now that I think of it’s too big a topic to be covered in one article, so I should probably start a series, on neuronal dynamical issues pertaining to SNNs.
This series will only cover relevant issues(that I deem them to be) about SNNs so it’s by no mean comprehensive(something you really shouldn’t expect from a amateur blogger) nor even correct/applicable.
The main ideas are extracted from a great textbook on neuronal dynamics[1], with some supplement information from several papers. Personal hunches and wild guesses will be marked out.
A more formal description of IF neuron models should be included in my previous post about SNNs, to avoid redundancy let’s get to the business real quick.
IF models are very simplified regarding the complexity of actual neurons, basically they treat neuron as Shishiodoshi:
We can already see some problem with this model:
Let’s examine some neuroscientific facts to appreciate the oversimplification of IF models.
[1] 2014 – book – Neuronal Dynamics, From Single Neurons to Networks and Models of Cognition – Wulfram Gerstner, et al. – A really good textbook for neuronal dynamics, highly recommended if you’re interested in neuroscientific principles behind thinking brain but not really care about a whole lot mumbo jumbo about anatomy jargons, subsidiary systems and peripheral neural systems which takes most of the space in normal neuroscience textbooks. But if you still insist in getting a better neurosci background please check reference below because I don’t want to be the only one have done this and regret later.
[2] 2016 – book – Neuroscience: Exploring the Brain – Barry W. Connors, et al. – A good neuroscience text book.
]]>TL;DR: A more neuroscientifically realistic model of artificial neural networks.
Current generation of widely applied ANN models are much more artificial than intelligence. I don’t want to really elaborate on this but most applied ANNs can be seen as universal function approximators, and since I don’t believe in a functionalism version of mind theory so ANNs are not of much difference from traditional statistical learning methods for me. It’s not dynamic, by and large, thus limits its applicability to even temporal sequential data.
One (and almost only) great example of neuroscience inspired ANN model is the fabled convolutional neural networks. Purported to be derivative of how columns in the primary visual cortex works. But it stretches the point to say we are anywhere near making useful neuromorphic models. Sensory systems are mere utility and largely feed forward, thus the ease of studying and modeling. They are on the very low tier of peripheral system supporting the brain. Things get funky really quick if we try to climb up the ladder to even how the lower level of sub-symbolic processing works. And beyond that we have to bridge the gap between the sub-symbolic systems and the huge unexplored ??? zones of neural processing, for which we don’t even have the adequate philosophical tools to grasp the issue.
There will always be another under satisfied lunatic. I’m not quite sure if I’m on board yet but lets take some look, not gonna hurt.
One cool thing about SNNs is, there’s no well established model/framework/training method at all because it’s still in its infancy, so there’s a wide plethora of wildly different models and methods tried to tackle this challenge, and I doubt there will ever be one grand unified methodology like gradient propagating for currently applied ANNs. Sky is the limit!
To be real is to be realistic. We are no god so we mimic god(-sic) works by looking at the brain. The gross history of ANN development is after higher biological realism.
First generation ANNs, synchronous linear activation with binary output.
Second generation ANNs, synchronous non-linear activation with real output.(we are here)
Third(?) generation ANNs, asynchronous, non-linear threshold-firing model with binary spike train output.
For a little bit detailed description check out the referenced article. The spirit is that we push it to higher biological fidelity whenever we can.
Do note that the real-valued output of 2nd gen ANNs is not a deviation from biological realism and binary output of purported 3rd gen ANNs is not a regression from it. Current synchronous networks with decidedly non-temporal, real-valued outputs can be view as a smoothed approximation of firing rates of binary spiking trains, incorporating limited temporal information without introducing too much additional complexity. It can be seen as a special case of SNNs whose precise timing of spikes were discarded.
Brain is a crazy and chaotic place. Feed forward neural networks are beautiful pieces of mathematical entities with desirable properties that give you a sense of control-ability, and peace of mind(People doing deep learning(like me) might disagree, read any book about neural dynamics and computational neuroscience and you’ll see what I mean). That’s the reason why almost nobody is doing SNNs.(NIPS2019 only accepted 2/1430 papers that have anything to do with spiking neurons, and one of them is about using autoencoders to model sequential spiking neural data).
(This can be totally wrong)By getting out of the comfort zone of feed forward networks, we can have much bigger variation on network topology, neural coding scheme, and dynamics. By thinking out of the box of back-prop trainability, we can get a step closer to understanding how the brain actually think and learns, being a giant self-organized holistic system largely relied on local interactions.
p.s. The biggest financial incentive behind SNN research is it’s locality and event-driven nature, assumed to cost far less energy when instantiated in an dedicated chip with neuromorphic sensors, than current version of networks. Not like there are many people care about knowing how the brain works lol.
Spiking neural network is a paradigm shift for neuronal information processing, it goes without saying that they are not drop-in replacements for current neuronal models(remember the good days when you stacking neuronal models like Lego and your software handles everything for you?). SNNs in general are utterly hard to simulate and train, without introducing considerable amount of reduction and simplification.
The only way to do some meaningful works on SNN based computing is by divide and conquer, in and incremental way. And there are different start points.
If it’s too hard to make SNN work from scratch then it’s sensible to make modifications on current working neuronal network models, add in some SNN flavor and hope it would gain some desirable features of SNNs.
If SNNs are so hard to train, how about we don’t train them at all? How is that going to be useful? Actually that’s exactly what reservoir computing is doing. I’m not going to talk much about liquid state machine or echo state networks, so I’ll give their general ideas here.
Basically, you throw a rock into a bucket(hence reservoir) and let the water wave evolve and bounce for a fixed period of time, then freeze the wave. Now you try to infer the size and shape of the rock by looking at the fixed wave pattern.
For reservoir computing, you have a pool of randomly initialized spiking neurons(Both connectivity and other parameters are random) that you can feed information into. After information propagates through the networks for a predetermined time period, you try to use the result activation pattern of the spiking neuron soup as input for a traditional trainable feedforward network, as “read-out” neurons, to carry out classification or regression tasks.
We can treat the pool of interacting neurons as performing high dimensional embedding for the input. Maybe the reservoir can have certain clustering effect on the input but it’s not very clear how we gonna benefit from a untrained spiking neural networks. (further investigation pending)
Reservoir computing method is applied to many real life problems like spoken word recognition, spatio-temporal spike pattern classification, motion prediction or motor control. It’s purported to generate smoother movement for motor control problems.
Augmentation of vanilla reservoir computing methods including:
Some facts that may contribute to thinking about SNNs.
Edit: I was compiling a list of neuroscientific facts on neural information processing that are not respected by current neural networks models, but it soon get out of control because there are too many crazy intricacies in our brain that are too hard to prioritize and categorize in an informative manner, so I’d leave this part for another post.
Finally we enter the description of SNN it self.
One fun fact about neuronal modeling is it’s almost fractal.When you dig deeper, there’s always more detail to be modeled. We can never reach absolute biological realism but it would also be stupid to do so, too. The important thing is to try your best to understand how the brain works and make reasonable abstraction, preserving the most principal principle/mechanisms behind the thinking brain, and utilize them at your best. We are not copying brain, but learning from it.
For a more comprehensive description of neuronal dynamics please refer to the great textbook[4] where all sorts of neuronal models of different granularity are described.
There’s no established practice on how to make a SNN, so you are free^ to choose different combinations of various neuronal models, network topologies and learning rules. I’ll describe some simple and commonly used or explored examples.
^ Please bear in mind that by free I mean you are doing whatever as you wish but there’s no guarantee it will work. Because natural neural networks are temporal system evolving continuously, we have to choose proper model granularity and time resolution for us to run the network. Relevant abstraction is of utmost importance. More sophisticated neuronal model is not necessarily better, unless you have unlimited computational power.
to a first and rough approximation, neuronal dynamics can be conceived as a summation process (sometimes also called “integration” process” combined a mechanism that triggers action potentials above some critical voltage.
This is the simplest and historically most common biologically realistic model, but it’s powerful enough to capture many aspect of how natural neuron works.
An intuitive understanding of IF(leaky if there’s a hole leaking water) model is very well summarized in this picture:
It simply integrates the time-dependent input current and fires if a threshold(Is the threshold fixed or even exist at all? This is an interesting topic deserves another post) is reached. Let’s derive the formula for a simplified LIF model, from biological neurons.
IF models are defined by two sub-components:
By ‘simplified’ I mean:
First we define some variables for our model;
Because action potentials fired by same neuron always have the same form, no information is conveyed by how the spike is shaped. Thus we can reduce neuronal firing activities to a train of spiking events, fully specified by their timing.
Cellular membrane is pretty good at insulating, we can treat it as a capacitor with capacitance C. Also because it’s not perfect insulator, we denote leaking resistance by R(you can intuitively see that capacitor is doing the integrating while resistor doing leaking. hence leaky integrate):
The capacitor and resistor runs in parallel to connect extracellular and intracellular fluid, possibly driven by currentI(t).
From the law of current conservation we split the driving current into two parts:
I(t)=I_R+I_CIn which resistor current if find by calculating:
I_R=\frac{u_R}{R}; u_R=u(t)-u_{rest}Also capacitor current:
I_C=\frac{dq}{dt}=C\frac{du}{dt}Thus:
I(t)=\frac{u(t)-u_{rest}}{R}+C\frac{du}{dt}That’s pretty much it. Simple is it? But we need some adaptation for use in neuronal computation models. First, we don’t really care about membrane potential’s relativity to resting potential so we substitute u(t)-u_{rest} with u(t). Second, we need to further divide driving current into external injected current and currents from different input neurons:I(t)=i_o(t)+\sum_{j}{w_ji_j(t)}
By substitute and reorganize we get:
C\frac{du}{dt}=-\frac{u(t)}{R}+(i_o(t)+\sum_{j}{w_ji_j(t)})where i_o(t) is the external driving current and i_j(t)is the inject current from j-th synaptic input with w_jbeing it’s synaptic strength. (for R\rightarrow \infty the formula describes IF model that is not leaking)
For the firing mechanism part, in this simplified setting we simply reset the membrane potential u(t) to 0 or a fixed value u_{rest} when it reaches \mathcal{V}, and then send a spike down it’s axon(to other neurons). To respect the refractoriness of biological neurons, we can clamp the membrane potential for a fixed period of time after firing.
Notice in this representation, driving currents(i_o(t)+\sum_{j}{w_ji_j(t)}) includes undefined time-dependent functions, you are free to choose different input functions for your convenience.
One simple option is: treat input current as infinitely short pulse that deliver constant charge q, then we can substitute the entire input current component with:
q\sum\delta(t)where \delta(t) is the Dirac \delta-function representing one spike.
Another common practice is to use:
i_j(t)=\int_{0}^{\infty}S_j(s-t)exp(-\frac{s}{\tau_s})dswhere S_j(t) is an arbitrary complex presynaptic spike train function from inputj. \tau_s is the synaptic time constant controls how fast the input is exponentially decayed.
By now we can see the power of LIF models largely depend on how you choose specific representation for input functions. Simple as it is, generalized LIF model can even accurately predict the spike train of biological neuron(with constraints)!
I hate to say this but we really don’t know enough about neural wiring in the brain. Simple feedforward networks rarely appear in biological networks. Even peripheral system and lower level of somatosensory system, having a gross feedforward structure, contains a lot of feedback loops that control how the network behaves. As we get closer to the central neural system, the picture get more and more murky.
Also, neurons are usually modeled as extentless points. This biological inaccuracy may have some impacts on modeling neuronal information processing. Thousands of input synapses on different dendritic locations interact with each other non-linearly. which is in itself a very complex computational process, very much like logical gates in integrated circuits. The assumption of linearly summable input is only for convenience but not biologically warranted.
Besides its spatial complexity, the brain is also evolving temporally. Everybody have a different brain scheme, not only controlled by their genetic code, but also by their experience. Genetic code decides the overall structure and wiring of different modules, but the maturation of neural network and local topology is intrinsically experience-driven. This topologically dynamic nature of brain is even harder to capture. How the brain is formed and matured from prenatal to adolescence is an fascinating topic in itself.
So we’re back at square 2, try to make baby steps from currently well understood networks. The choice of topology also affects how hard it is to train the network, the deeper and more freely recurrent it is, the harder to train the network.
In feedforward networks the information flow is one-directional. FF networks are the backbone of many biological peripheral systems, where a lot of information is passed to the central networks from sensory systems. Thus FF networks are usually applied to model low-level sensory systems.
FF networks are also generally easier to train than networks with recurrent connections, many supervised learning methods are only applicable to strictly FF spiking neural networks. But the lack of feedback loops can seriously limit the network’s capability.
Recurrent networks is a big spectrum spanning from simpler ones augmented from FF networks, to stoschastically connected networks, where the sense of direction is completely lost. In recurrent networks neurons interact with each other through reciprocal connections. This enables the networks to have temporal internal states, resulting in richer dynamics and potentially higher computational capabilities than FF networks. But it’s way harder to train or even control and stabilize the network.
Because of the coexistence of positive and negative feedback, recurrent neural networks can have very rich internal dynamics where theories of complex dynamical systems come into consideration. (one good example of instability of biological neural networks is epilepsy).
The difficulty of training a recurrent SNN limits its application in real life problems. Currently RSNNs are generally used in brain dynamic modeling, to investigate the biological neuronal information processing.
see 3.2 Reservoir Computing for example.
Brain is a fascinating information processing machinery, but how is information actually encoded in the brain?
First we need to determine the required temporal resolution of signaling in the neural network.
The rate coding theory, i.e. information is encoded in the firing frequency of neurons, is the dominant paradigm for both theoretical neural information processing and artificial neural networks for decades. Describing neural coding with firing rates smoothed out the neuronal output by discarding precise timing of each spikes. This enabled us to use real-valued output on artificial neurons, to respect some temporal aspects of biological networks with non-temporal networks. But newer evidence shows that while being important for sensorimotor systems, rate firing theory can not account for many aspects of higher level information processing in biological networks.
many behavioral responses are completed too quickly for the underlying sensory processes to rely on the estimation of neural firing rates over extended time windows.
Recent neurophysiological results suggest that both information processing (pulse coding) and learning(spike-timing-dependent plasticity) in biological networks are heavily dependent on the precise timing of individual spikes rather than on their firing rate.
(pause)
[1] 2003 – Article – Spiking Neural Networks an Introduction – Jilles Vreeken – A legacy introduction to SNN, more about spiking neurons and less about networks/training, etc.
[2] 2009 – Article – Third Generation Neural Networks: Spiking Neural Networks – Samanwoy Ghosh-Dastidar, et al. – Another introduction with feed forward focus, not worth reading if already familiar with spiking neurons. Listed for the sake of reference
[3] 2011 – Article – Introduction to Spiking Neural Networks, Information Processing, Learning and Applications – Filip Ponulak, et al. – A more extensive and recent introduction to this topic, recommended.
[4] 2014 – Book – Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition – Wulfram Gerstner,et al. – Great textbook on the basics and modeling of neuronal dynamics, from detailed single neuron modeling to network modeling with reasonably simplified neuron models.
]]>
\begin{align}
(w^*,b^*)&=\mathop{\arg\min}_{(w,b)}\sum^{m}_{i=1}(f(x_i)-y_i)^2\\
&=\mathop{\arg\min}_{(w,b)}\sum^{m}_{i=1}(y_i-wx_i-b)^2
\end{align}
\begin{align}
\ln y&=\boldsymbol{w}^T\boldsymbol{x}+b\\
y&=e^{\boldsymbol{w}^T\boldsymbol{x}+b}
\end{align}
\begin{align}
y=g^{-1}(\boldsymbol{w}^T\boldsymbol{x}+b)
\end{align}
this is a brief note of simplified NFL described in Machine Learning
General Idea:
“states that any two optimization algorithms are equivalent when their performance is averaged across all possible problems”
Example:
Suppose sample space $\mathcal{X}$ and hypothesis space $\mathcal{H}$ are all discrete, and the distribution of problems are even. Given training data $X$, learning algorithm $\mathcal{L}_a$, and denote the unknown truth function to be learned as $f$. Then the summed error on samples out of training set (i.e. in $\mathcal{X}-X$) of learning algorithm $\mathcal{L}_a$ is:
\[
E_{ote}(\mathcal{L}_a|X,f)=\sum_{h}\sum_{x\in \mathcal{X}-X}P(x)I(h(x)\neq f(x))P(h|X,\mathcal{L}_a)
\]
$I(\textit{stmt})$ is the indicator function, returns 1 if the statement included is true, returns 0 otherwise.
This is equation is pretty straight forward, the product of (1)the probability of each hypothesis (2) the probability of each example in $\mathcal{X}-X$ (3)the indicator function is summed.
In the case of binary classification, if we assume (for the sake of simplicity) the truth object function can be any function in $\mathcal{X} \mapsto\{0,1\}$, and the function space is $\{0,1\}^{|\mathcal{X} |}$. Assume $f$ is uniformly distributed, we sum $E_{ote}$ for all possible $f$(Note the implication of the highlighted assumption):
\begin{align}
\sum_{f}E_{ote}(\mathcal{L}_{a}|X,f)&=\sum_{f}\sum_{h}\sum_{x\in \mathcal{X}-X}P(x)I(h(x)\neq f(x))P(h|X,\mathcal{L}_a)\\
&=\sum_{x\in \mathcal{X}-X}P(x)\sum_{h}P(h|X,\mathcal{L}_{a})\sum_{f}I(h(x)\neq f(x))\\
&=\sum_{x\in \mathcal{X}-X}P(x)\sum_{h}P(h|X,\mathcal{L}_{a})\frac{1}{2} 2^{|\mathcal{X} |}\\
&=\frac{1}{2} 2^{|\mathcal{X} |}\sum_{x\in \mathcal{X}-X}P(x)\sum_{h}P(h|X,\mathcal{L}_{a})\\
&=2^{|\mathcal{X}|-1}\sum_{x\in \mathcal{X}-X} P(x)\cdot 1
\end{align}
TL;DR
As a result, under certain assumptions, $E_{ote}$ is not related to learning algorithm $\mathcal{L}_{a}$.
This is not saying that learning algorithm is not important. The takeaway is that comparing the efficiency of learning algorithm without the context of problem is meaningless. The selection of learning algorithm is highly correlated to a priori information about the problem we try to solve.
]]>一个我接下来想试着做的菜的列表
Status: In progress
While a ‘picture’ of the Kantian system is common to all who have commented on it, there is no agreement whatsoever as to the strength, or even as to the content, of his arguments. A commentator who presents clear premises and clear conclusions will be invariable be accused of missing Kant’s argument, …(to escape academic censure) is to fall into the verbal mannerism of the original.
Cogito ergo sum
Objectivity
Summary
Kant’s aims in the first Critique
The Basic:
待测试用纸：
应当使用无涂布纸，反常规的是使用无酸纸反而可能导致问题。蓝晒应该远离碱性环境。
棕色柠檬酸铁铵和绿色柠檬酸铁铵的差别。
http://www.mikeware.co.uk/mikeware/New_Cyanotype_Process.html 传统蓝晒法的缺点和改进
]]>
featured image: The Fortress
题图： 城堡
题图的地点大约在蛤工大的红楼附近的废弃公园（就是中间有一个石鸟雕像的那里）。地方不大，十分破落，基本不能当成一个正经公园来用。有一些零散排放的，没人去清洁的条椅，一些石桌石凳。
猫们的城堡就在这个位置，椅子上的油布毯子竹凉席，边上的木板纸箱泡沫板，还有不知道谁放下的瓶子碟子猫食槽，构成了这个学校里猫出没最多的位置。
据知情人士透露，这里以前的时候猫要比现在多得多，也比现在要对人友善得多。但是在不明人士对部分猫进行屠戮之后，便成了现在的样子。
（上图是去年的大黑和兔兔正在谈情说爱的谍照，根据线报这一对狗男女（猜测的）已经在一起了）
剩下的猫形成了一个更小更核心的集体，联系和沟通都十分紧密。我经常观察到猫之间用语言的互动，但是我不会讲猫语，所以听不懂。遗憾。
总之，我想要和学校里的所有猫打成一片（全员友善声望，部分尊敬声望），成为名副其实的猫大(公)王(仆)。为达成这个任务我需要好好计划……
日后陆续更新猫咪观察笔记以及各猫党成员观察及传记。
没空写了 以后再更新。
未来的可能的new post：
to be continued…
]]>