Content
|
Splines
When Kolmogorov-Arnold network
$$ M(x_1, x_2, x_3, ... , x_n) = \sum_{q=0}^{2n} \Phi_q\left(\sum_{p=1}^{n} \phi_{q,p}(x_{p})\right), $$
where $\Phi_q, \phi_{q,p}$ are univariate functions, trained on the physical objects, piecewice linear
model is accurate enough. For the pysical systems recorded data are usually approximate, due to errors in recordings, unobserved
factors (epistemic uncertainty) and some system trend during recording, when the object is not absolutly
stationary. However, there are some cases when dataset is exact and features and targets
belong to certain but unknown analytical formula. For example, when KAN is used for numerical solution of
partial differential equations. In this case, the accuracy can be increased by approximation of modeled functions by splines or other
smooth functions.
The splines can be applied in a several different ways, it is a wide topic, so
this article documents the particular approach used in our code.
Each function from the model $f(x)$ is expressed as linear combination of the basis functions $B_i(x)$.
Every basis function is modeled by cubic splines. The tricky part in the program is reduction of the
number of splines. For example, for 5 features we need 66 functions in a model. If to use 10 point grid we need 660
basis functions, 5940 cubic splines and 23760 parameters to express them. They all must be organized in structures and
navigating through them repeatedly in the long loops will take significant time. The reduction can be achieved
by using dimensionless basis functions which shapes only depend on the number of points.
Some critical choices
- Each basis function is created for unit vector $e^T = [0\: 0\: 1\: ... \: 0 \:0]$ which has only one
non null element.
- All unit vectors are pairwise orthogonal and their number equals to number
of points in modeled function.
- The intervals between points are equal.
The actual code for building the splines for unit functions is sitting in class AlgebraHelper and
it is written based on example provided in the article of James Keesling. It has a unit test.
Two samples of such basis functions are shown in the image below.
Since entire set of basis functions depends only on the number of points, the object holding them can be shared between
the modeled functions. This shared object instantiated from the class Basis.
The parameters $c^T = [c_1 \: c_2 ...]$ for linear combination of basis functions belong to each individual function (object Univariate).
Making everything clear, we can say that one particular function $f_k$ is expressed by linear combination of basis functions in the following way
$$f_k(x) = \sum_{j=0}^n c_{j}^{(k)} B_j(x).$$
Another function $f_m$ may be expressed as different linear combination of the same basis functions in case if they both have same number of points
in the unit vectors used for instantiation
$$f_m(x) = \sum_{j=0}^n c_{j}^{(m)} B_j(x).$$
The fields of definitions for $f_k$ and $f_m$ may be different but we use relative argument $x$ which points to the segment associated to particular cubic spline.
All basis functions are generated once at the start of the training process and only coefficients $c$ are updated at each training step.
Projection descent for a single function
Let us drop indexes for convenience. The function value is computed via basis function
$$\hat f(x) = \sum_{j=0}^n c_{j} B_j(x).$$
The computed value $\hat f$ is different from wanted value $f$. According to projection descent algorithm all coefficients $c_j$ are
updated in the following way
$$c_j = c_j + \alpha [f(x) - \hat f(x)] B_j(x), $$
where $\alpha$ is regularization parameter. By applying it for each coefficient in every function for all records in multiple epochs the model
is trained to output targets. Wanted function values are computed in a similar to back propagation algorithm explained in
another article.
|
|
|