Mushroom classification
This example shows application of quantized Urysohn to Mushroom Data Set.
Data has 8124 records with 22 observed mushrooom features. The output labels are either Edible or Poisonous. Below is
example of data format (first few lines):
p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
The output labels are sitting in the first column. Symbolic features were converted to sequential integers
1,2,3,... and Urysohn model became quantized
$$y=\sum\limits_{j=1}^{n}f_j(x_j),$$
which means that arguments $x_j$ take only integer values, functions $f_j$ are defined only in several points
and a single two dimmensional array $U$ is used in code instead of multiple quantized functions
$$y=\sum\limits_{j=1}^{n}U[j, x(j)].$$
The test is conducted as 10 fold validation, which means we use 90% records for training and 10% for validation
and switching them 10 times, so each record becomes once validation record. Total execution for
10 retraining steps is about 0.6 second, so 0.06 per training and 0.0006 per epoch. The number of
errors in validation typically 0, but it depends on random initialization and sometimes few
validation errors may be reported, such as 3 wrong predictions out of 8124.
The error probability is so small that the authors of code agree to eat every mushroom
that program qualifies as edible.
The data set is published in 1987 and is frequently used by students. The reported accuracy is always near 100%,
so it is not a surprise, but the peformance of quantized Urysohn model is significanly higher than all
other examples that we managed to find online.
Other implementation for comparison
C#Corner. They report accuracy 0.992 that means 64 errors.
|
|