Content



Demo of Hyper Parameter Optimization

Code Elementary HPO



The identification starts concurrently in independent threads. Each thread tests a single model with the different number of addends chosen by Fibonacci sequence $1,2,3,5,8,13,21,34$. Here is the example which initializes 5 addends and starts the thread.
    nModels = 5;
    auto addends5 = std::make_unique>();
    for (int i = 0; i < nModels; ++i) {
        addends5->push_back(KANAddendPL(xmin, xmax, targetMin / nModels, targetMax / nModels, 2, 2, data->nFeatures));
    }
	
    double lastError5 = 0.0;
    std::thread t5(Training, std::ref(data->inputs), std::ref(data->target),
        std::ref(data->validationinputs), std::ref(data->validationtarget),
        std::ref(addends5),
        data->nRecords, data->nValidationRecords, mu, targetMin, targetMax, innerMax, outerMax, std::ref(lastError5));
Thread uses training and validation datasets. It makes early stopping when accuracy of validation set stop improving. The thread returns validation error when completed.

The regularization parameter is a constant $0.01$. It has been found that it is the best statistical choice and even changing it in the range $[0.1, 0.001]$ not make a siginificant change in result.

Number of linear blocks starts from one and is gradually growing during training. It provides move from coarse to fine tuning.
    for (int j = 0; j < addends->size(); ++j) {
        if (addends->at(j).HowManyInner() < innerMax) {
            addends->at(j).IncrementInner();
            }
    }
    for (int j = 0; j < addends->size(); ++j) {
        if (addends->at(j).HowManyOuter() < outerMax) {
            addends->at(j).IncrementOuter();
        }
    }
The maximum numbers of linear blocks for inner and outer operators are limited, the limits are hard coded. Changing these limits into larger numbers does not make a siginificant difference in model.
    const int innerMax = 8;
    const int outerMax = 16;
    const double mu = 0.01;
This is an elementary example, where the end users not have to make network configuration choices. It is not as quick as fixed configuration and a single thread, however it is not that bad. The execution time for different datasets of 10 000 records is between 1 and 2 seconds.

It can be tested for three different datasets.
    //std::unique_ptr data = std::make_unique();
    //std::unique_ptr data = std::make_unique();
    std::unique_ptr data = std::make_unique();
Dataset 'Formula' is an algebraic expression. The 'Circles' has target the distance between two circles given by three random points. In 'Triangles' the target is areas of triangles given by random vertices.

Interesting that 'Triangles' is a challenging dataset. It has been tried by different MLPs including MATLAB and the modeling error is 4 to 6 percent. For KAN it shows the best result for $34$ addends, for 6 features.