Content
|
Probabilistic Kolmogorov-Arnold Network
I start from the explanation of data. I generate 5 random features $X_j$ (from interval [0,1]) and compute target $y$ by formula
The values ${X_j}^*$ in formula are unobserved. They are derived from observed values $X_j$ by adding noise as it is
shown below the formula. Noise values $C_j$ are uniformly distributed on [0,1] and $\delta = 0.4$ controls the noise level.
By setting the data in this way, we have aleatoric uncertainty with very complex and input dependent distribution. In
the picture below I show cumulative distributions and probability densities (solid blue areas) for the 4 following inputs.
The dataset contains only scalars and no identical inputs, but the true distributions can be easily generated by
Monte Carlo simulation, by changing noise for the same observed features $X_j$.
The probabilistic model returns a sample $-$ multiple values of $y$. Since true distribution is available we can compare
and estimate accuracy of returned by the model sample.
The algorithm is called Divisive Data Resorting. I explained it in details in video video and
in preprint. Below I comment the test result assuming the reader
navigated to video and learned the concept prior to reading the conclusion.
Test results
Here is result of code execution
Time for training 0.249 sec.
Mean relative error for mean ensemble of 32 and mean MonteCarlo 0.0333
Mean relative error for STD ensemble of 32 and STD MonteCarlo 0.1225
Passed goodness of fit tests 63 from 100
In the test I generate 100 new inputs, not used in training. For each of these inputs
I obtain sample of targets from the trained ensemble and Monte Carlo sample representing conventionally true
distribution. Then for each of these pairs
I calculate expectations and standard deviations.
Below is example of 10 expectations for comparison
1.4530 1.4807
1.4055 1.4325
1.8577 1.9599
0.1273 0.0620
0.3711 0.3390
1.9615 1.9735
0.2727 0.1571
1.8358 1.8170
0.2665 0.1820
0.5011 0.4151
and 10 standard deviations
0.2340 0.1063
0.2054 0.3704
0.4548 0.4213
0.4671 0.4083
0.2610 0.3026
0.1456 0.1161
0.5619 0.6823
0.0893 0.1091
0.2217 0.3302
0.3950 0.4557
The differences between ensemble and conventionally true statistical moments are shown by relative errors
0.03 and 0.12. Besides that, I implemented goodness-of-fit test for both samples, 63% of
them passed. I can refer reader at this point to the second image, which shows
the complexity of true distributions and their dependence on features.
The training dataset is 10 000 records. Please also pay attention how fast the training is.
It is near 0.25 sec. Bayesian Neural Network for such problem needs at least several minutes.
There are many details that I can't explain on one page. Readers can find this concept explained in
a format of regular publication.
Designing of risk management systems
Predicting the distribution of target is very desired but challenging task. In practical business it may not be necessary and may be relaxed to
prediction of the probability of falling the target into a certain range. For example, if target is a future market price of a traded object and
the goal of modeling is a decision to buy for the purpose of speculation. For the effective risk management model user is interested to learn
if the predicted price becomes greater certain value or lower or in range, the distribution is not needed.
For each predicted sample we can define the interval which represents the researched risk, for example if our predicted sample looks like follows:
1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 20, 28
and we want to estimate the probability of the target being in range [2.5, 4.5] based on this specific prediction, than it is $P = 4/12$. Now we want
to assess the accuracy, because our profit depends on it, but there is no true distribution, we only have historical data divided into
training and validation.
This accuracy can be assessed for the entire model, not for each individual prediction. We simply select arbitrary intervals for other predictions
but with the same probability, for example, if our other sample is different:
3, 3, 3, 5, 5, 5, 6, 7, 8, 9, 9, 12
we still can choose an interval associated with the same probability, in this case it is [4.5, 6.5].
The rest part is to run validation test for unseen data and check how many predicted scalars fall into preselected individual intervals, assuming
these intervals match the same probability. In this case we don't need to know true distribution, it is applicable to observed physical systems
without identical inputs.
Here we show how it works for this formula but without using true distribution. The sample size is 32. For each record in validation set we compute
the left border as an average for elements 10 and 11 and the right border as an average for elements 21 and 22. The probability is near 0.344.
When we run experiment for validation sample and count how many individual targets fall into selected by predicted samples intervals, we
have the following results for 1000 records:
325, 352, 331, 324, 357, 350, 355, 333
Each figure is obtained in a different run which includes new generation of training and validation data, new model and new test on
1000 unseen records. We can see that the model is statistically stable throughout the entire validation data. We did not use here
conventional true distribution.
The code for this test is not provided, but it is elementary, I can add the critical fragment here, this block goes to
testing loop.
sorted.clear();
for (int i = 0; i < nU; ++i) {
sorted.push_back(sample[i]);
}
sort(sorted.begin(), sorted.end());
double left = (sorted[10] + sorted[11]) / 2.0;
double right = (sorted[21] + sorted[22]) / 2.0;
if (monteCarlo[nMCSize / 2] > left && monteCarlo[nMCSize/2] < right) {
++right_stat;
}
|
|
|