Content



Probabilistic Kolmogorov-Arnold Network

Reference to code PKAN



I start from the explanation of data. I generate 5 random features $X_j$ (from interval [0,1]) and compute target $y$ by formula



The values ${X_j}^*$ in formula are unobserved. They are derived from observed values $X_j$ by adding noise as it is shown below the formula. Noise values $C_j$ are uniformly distributed on [0,1] and $\delta = 0.4$ controls the noise level. By setting the data in this way, we have aleatoric uncertainty with very complex and input dependent distribution. In the picture below I show cumulative distributions and probability densities (solid blue areas) for the 4 following inputs.



The dataset contains only scalars and no identical inputs, but the true distributions can be easily generated by Monte Carlo simulation, by changing noise for the same observed features $X_j$.




The probabilistic model returns a sample $-$ multiple values of $y$. Since true distribution is available we can compare and estimate accuracy of returned by the model sample.

The algorithm is called Divisive Data Resorting. I explained it in details in video video and in preprint. Below I comment the test result assuming the reader navigated to video and learned the concept prior to reading the conclusion.

Test results

Here is result of code execution
Time for training 0.249 sec.
Mean relative error for mean ensemble of 32 and mean MonteCarlo 0.0333
Mean relative error for STD  ensemble of 32 and STD  MonteCarlo 0.1225
Passed goodness of fit tests 63 from 100
In the test I generate 100 new inputs, not used in training. For each of these inputs I obtain sample of targets from the trained ensemble and Monte Carlo sample representing conventionally true distribution. Then for each of these pairs I calculate expectations and standard deviations. Below is example of 10 expectations for comparison
1.4530  1.4807
1.4055  1.4325
1.8577  1.9599
0.1273  0.0620
0.3711  0.3390
1.9615  1.9735
0.2727  0.1571
1.8358  1.8170
0.2665  0.1820
0.5011  0.4151
and 10 standard deviations
0.2340  0.1063
0.2054  0.3704
0.4548  0.4213
0.4671  0.4083
0.2610  0.3026
0.1456  0.1161
0.5619  0.6823
0.0893  0.1091
0.2217  0.3302
0.3950  0.4557
The differences between ensemble and conventionally true statistical moments are shown by relative errors 0.03 and 0.12. Besides that, I implemented goodness-of-fit test for both samples, 63% of them passed. I can refer reader at this point to the second image, which shows the complexity of true distributions and their dependence on features. The training dataset is 10 000 records. Please also pay attention how fast the training is. It is near 0.25 sec. Bayesian Neural Network for such problem needs at least several minutes.

There are many details that I can't explain on one page. Those who interested can contact me personally.