Content



Probabilistic Kolmogorov-Arnold Network

Reference to code PKAN



I start from the explanation of data. I generate 5 random features $X_j$ (from interval [0,1]) and compute target $y$ by formula



The values ${X_j}^*$ in formula are unobserved. They are derived from observed values $X_j$ by adding noise as it is shown below the formula. Noise values $C_j$ are uniformly distributed on [0,1] and $\delta = 0.4$ controls the noise level. By setting the data in this way, we have aleatoric uncertainty with very complex and input dependent distribution. In the picture below I show cumulative distributions and probability densities (solid blue areas) for the 4 following inputs.



The dataset contains only scalars and no identical inputs, but the true distributions can be easily generated by Monte Carlo simulation, by changing noise for the same observed features $X_j$.




The probabilistic model returns a sample $-$ multiple values of $y$. Since true distribution is available we can compare and estimate accuracy of returned by the model sample.

The algorithm is called Divisive Data Resorting. I explained it in details in video video and in preprint. Below I comment the test result assuming the reader navigated to video and learned the concept prior to reading the conclusion.

Test results

Here is result of code execution
Time for training 0.249 sec.
Mean relative error for mean ensemble of 32 and mean MonteCarlo 0.0333
Mean relative error for STD  ensemble of 32 and STD  MonteCarlo 0.1225
Passed goodness of fit tests 63 from 100
In the test I generate 100 new inputs, not used in training. For each of these inputs I obtain sample of targets from the trained ensemble and Monte Carlo sample representing conventionally true distribution. Then for each of these pairs I calculate expectations and standard deviations. Below is example of 10 expectations for comparison
1.4530  1.4807
1.4055  1.4325
1.8577  1.9599
0.1273  0.0620
0.3711  0.3390
1.9615  1.9735
0.2727  0.1571
1.8358  1.8170
0.2665  0.1820
0.5011  0.4151
and 10 standard deviations
0.2340  0.1063
0.2054  0.3704
0.4548  0.4213
0.4671  0.4083
0.2610  0.3026
0.1456  0.1161
0.5619  0.6823
0.0893  0.1091
0.2217  0.3302
0.3950  0.4557
The differences between ensemble and conventionally true statistical moments are shown by relative errors 0.03 and 0.12. Besides that, I implemented goodness-of-fit test for both samples, 63% of them passed. I can refer reader at this point to the second image, which shows the complexity of true distributions and their dependence on features. The training dataset is 10 000 records. Please also pay attention how fast the training is. It is near 0.25 sec. Bayesian Neural Network for such problem needs at least several minutes.

There are many details that I can't explain on one page. Readers can find this concept explained in a format of regular publication.

Designing of risk management systems

Predicting the distribution of target is very desired but challenging task. In practical business it may not be necessary and may be relaxed to prediction of the probability of falling the target into a certain range. For example, if target is a future market price of a traded object and the goal of modeling is a decision to buy for the purpose of speculation. For the effective risk management model user is interested to learn if the predicted price becomes greater certain value or lower or in range, the distribution is not needed.

For each predicted sample we can define the interval which represents the researched risk, for example if our predicted sample looks like follows:
1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 20, 28
and we want to estimate the probability of the target being in range [2.5, 4.5] based on this specific prediction, than it is $P = 4/12$. Now we want to assess the accuracy, because our profit depends on it, but there is no true distribution, we only have historical data divided into training and validation.

This accuracy can be assessed for the entire model, not for each individual prediction. We simply select arbitrary intervals for other predictions but with the same probability, for example, if our other sample is different:
3, 3, 3, 5, 5, 5, 6, 7, 8, 9, 9, 12
we still can choose an interval associated with the same probability, in this case it is [4.5, 6.5]. The rest part is to run validation test for unseen data and check how many predicted scalars fall into preselected individual intervals, assuming these intervals match the same probability. In this case we don't need to know true distribution, it is applicable to observed physical systems without identical inputs.

Here we show how it works for this formula but without using true distribution. The sample size is 32. For each record in validation set we compute the left border as an average for elements 10 and 11 and the right border as an average for elements 21 and 22. The probability is near 0.344. When we run experiment for validation sample and count how many individual targets fall into selected by predicted samples intervals, we have the following results for 1000 records:
325, 352, 331, 324, 357, 350, 355, 333
Each figure is obtained in a different run which includes new generation of training and validation data, new model and new test on 1000 unseen records. We can see that the model is statistically stable throughout the entire validation data. We did not use here conventional true distribution.

The code for this test is not provided, but it is elementary, I can add the critical fragment here, this block goes to testing loop.
        sorted.clear();
        for (int i = 0; i < nU; ++i) {
            sorted.push_back(sample[i]);
        }
        sort(sorted.begin(), sorted.end());
        double left = (sorted[10] + sorted[11]) / 2.0;
        double right = (sorted[21] + sorted[22]) / 2.0;
        if (monteCarlo[nMCSize / 2] > left && monteCarlo[nMCSize/2] < right) {
            ++right_stat;
        }