This code is recommended as the most user friendly for the moment it is published. The list of hyper parameters that end user must
assign is very short. Users can find it at the entry point.
//constants that end user must assign with some understanding of the logic
const int nAddends = 11;
const int innerMax = 8;
const int outerMax = 16;
const double mu = 0.01;
Number of addends is denoted in the formula as $nAddends$
$$ M(x_1, x_2, x_3, ... , x_n) = \sum_{q=1}^{nAddends} \Phi_q\left(\sum_{p=1}^{n} \phi_{q,p}(x_{p})\right). $$
Functions $\phi_{q,p}$ belong to $n$ inner blocks, functions $\Phi_q$ make outer block. Inner blocks use observed inputs $x_p$, outer block
returns targets. All functions are piecewise linear with linear segments of the same size. The numbers of linear blocks in inner functions
and outer functions are defined by hyper parameters $innerMax$ and $outerMax$. The number of linear blocks is growing during the training process.
It is part of the training algorithm. It starts from one linear segment per function and ends with the specified numbers.
//adding linear segments if needed according to max numbers
for (int j = 0; j < nAddends; ++j) {
if (addends[j]->HowManyInner() < innerMax) {
addends[j]->IncrementInner();
}
}
for (int j = 0; j < nAddends; ++j) {
if (addends[j]->HowManyOuter() < outerMax) {
addends[j]->IncrementOuter();
}
}
The last hyper parameter is regularization $0.01$. Making it smaller improves the accuracy, but extends training time and
vice versa. Risky level is over $0.2$, the training process may diverge when parameter of regularization is too high.
The result and training time does not significantly depend on hyper parameters. End users can test it.
The code runs two concurrent threads and choose the best model.
The dataset is sitting in $Data$ object.
class Data
{
public:
virtual std::unique_ptr<double[]> GetInput() = 0;
virtual double GetTarget(std::unique_ptr<double[]>& x) = 0;
virtual bool Populate() = 0;
std::unique_ptr<std::unique_ptr<double[]>[]> inputs;
std::unique_ptr<std::unique_ptr<double[]>[]> validationinputs;
std::unique_ptr<double[]> target;
std::unique_ptr<double[]> validationtarget;
int nRecords = -1;
int nValidationRecords = -1;
int nFeatures = -1;
};
Pure virtual functions must be overridden in derived class.
Here is one result of execution
Training epoch 0, thread 2, RRMSE for validation data 0.1030
Training epoch 0, thread 1, RRMSE for validation data 0.1028
Training epoch 10, thread 2, RRMSE for validation data 0.0148
Training epoch 10, thread 1, RRMSE for validation data 0.0143
Training epoch 20, thread 2, RRMSE for validation data 0.0087
Training epoch 20, thread 1, RRMSE for validation data 0.0091
Training epoch 30, thread 2, RRMSE for validation data 0.0077
Training epoch 30, thread 1, RRMSE for validation data 0.0084
Training epoch 40, thread 2, RRMSE for validation data 0.0071
Training epoch 50, thread 2, RRMSE for validation data 0.0066
Training epoch 60, thread 2, RRMSE for validation data 0.0062
Training time 0.31 sec, validation errors in threads 0.0084, 0.0061
-------------------------------------------------------------------
RRMSE for test data, not used in training and validation 0.0079
The errors, reported as 'RRMSE for validation data' are computed for the validation sample, which is used as stopping condition.
The last error is reported for another testing sample, not involved in training. RRMSE is relative root mean square errors. Relative
means that it is divided by the range of variation.
The execution time for datasets with 5000 to 10000 and 5 to 15 features is shorter than a second.