Machine Learning Progress? (non-recurrent MLP)

Posted: October 21, 2013 at 4:59 pm

I’ve put my leaking code problems aside for now to continue working on the project, the next phase being the ML stuff. So I’m now using FANN because while OpenNN was nicer, more complete and active, it did not provide functions for online / sequential learning needed for this project.

This is my second attempt to train an MLP with plausible data produced by the system. The input is a set of 41,887 state vectors (representing the presence of clusters at each moment in time) produced by a previous run of the segmentation and clustering system. Each element in the vector is a Boolean value corresponding to each perceptual cluster: 0 when the cluster is not present in the frame and 1 when it is. For training, 0 to 1 values are scaled to -1 to +1. The previous attempt appeared to work because the output resembled the input, but I realized after running prototype feedback (dreaming) code that the network was trained just to reproduce the input pattern, not the next input pattern.

The MLP here is considered a canonical case to compare with future sequential learning and contained three layers (1026 input, 103 hidden, 1026 output), and was presented the whole input set over 50 epochs. The network was presented a single state at each iteration, not a window of states over time. The code is a modified version of the FANN xor_example.cpp and uses the “rProp” learning algorithm where weights are initialized with random values between -1 and +1.

Following is the mean-squared error reported after each epoch of training. Note the total lack of learning (decrease of error) over the first three epochs, which indicates sequential learning with single presentations of inputs could have a hard time learning. Although, because the system is long-term and online, there will be no shortage of input patterns.

error_canonical

Following are the results: Each column is one moment in time while each row is a particular cluster where white means the cluster was present at that time, black if it was not. Below is a placeholder, please click on it to download the full image. Note that this is a very large image (41887×3088 pixels) but because it is an indexed PNG, the file-size is quite small (<7MB). Additionally, Firefox does not seem to display these PNGs, so if you get some error about them being damaged, try “save link as” and look at them externally. The image is made up of three groups of rows. The top group is the raw input data, the middle is the raw output of the network, and the bottom is a thresholded version of the raw output where values <0 are shown as 0 and values >0 shown as 1.

backgroundState_canonical.results-thumb

The complexity of the data makes analysis difficult, but it appears the network output is less likely to predict that a cluster is persisting over time (a lack of dominance of horizontal lines), and appears less dense. The structure of the output seems quite similar to the structure of the input in terms of increasing sparsity. There are vertical bands in the output that do not appear related to the input which correspond to the spikes in the histogram below. The following histogram shows the sum of clusters present for each column (time-slice):

backgroundState_canonical.results.hist

The histograms above show the sums over columns of all outputs >0, which corresponds to the number of percepts predicted to be present at each time-slice. The distribution of clusters in the output indeed resembles that of the input. Compare the min / mean / max of the input (3 / 52.76 / 106) to that of the output (0 / 39.19 / 131). Both input and output have a similar standard deviation (~25).

Considering that learning here is happening without a recurrent network where each column of data is presented to the network (not a window of states over time), I’m quite surprised with the quality of these results. Though, I was surprised with the results when the system was just learning to reproduce the input. I only realized the error when I fed the network outputs back as inputs, which resulted in a static prediction. The next step is to run a dream simulation where we see the behaviour of the trained network where the next set of inputs are the previous set of outputs where the first input is the last data-point in the training set. I hope this does not result in a static output again!

Following is the code used to train the MLP in this canonical learning case:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
/*
*
* Fast Artificial Neural Network (fann) C++ Wrapper Sample
*
* C++ wrapper XOR sample with functionality similar to xor_train.c
*
* Copyright (C) 2004-2006 created by freegoldbar (at) yahoo dot com
*
* This wrapper is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* This wrapper is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
*/


#include "floatfann.h"
#include "fann_cpp.h"

#include
#include
#include
using std::cout;
using std::cerr;
using std::endl;
using std::setw;
using std::left;
using std::right;
using std::showpos;
using std::noshowpos;

// Callback function that simply prints the information to cout
int print_callback(FANN::neural_net &amp;net, FANN::training_data &amp;train,
unsigned int max_epochs, unsigned int epochs_between_reports,
float desired_error, unsigned int epochs, void *user_data)
{
cout &lt;&lt; "Epochs " &lt;&lt; setw(8) &lt;&lt; epochs &lt;&lt; ". "
&lt;&lt; "Current Error: " &lt;&lt; left &lt;&lt; net.get_MSE() &lt;&lt; right &lt;&lt; endl;
return 0;
}

// Test function that demonstrates usage of the fann C++ wrapper
void xor_test()
{
cout &lt;&lt; endl &lt;&lt; "XOR test started." &lt;&lt; endl;

const float learning_rate = 0.7f;
const unsigned int num_layers = 3;
const unsigned int num_input = 1026;
const unsigned int num_hidden = 103;
const unsigned int num_output = 1026;
const float desired_error = 0.01f;
const unsigned int max_iterations = 50;
const unsigned int iterations_between_reports = 1;

FANN::neural_net net;
net.create_standard(num_layers, num_input, num_hidden, num_output);

net.set_learning_rate(learning_rate);

net.set_activation_steepness_hidden(1.0);
net.set_activation_steepness_output(1.0);

net.set_activation_function_hidden(FANN::SIGMOID_SYMMETRIC_STEPWISE);
net.set_activation_function_output(FANN::SIGMOID_SYMMETRIC_STEPWISE);

net.print_parameters();

cout &lt;&lt; endl &lt;&lt; "Training network." &lt;&lt; endl;

FANN::training_data data;
if (data.read_train_from_file("../data/backgroundState_FANN.data"))
{
// scale data from 0-1 to -1 to 1
data.scale_train_data(-1, 1);

// Initialize and train the network with the data
net.randomize_weights(-1, 1);

cout &lt;&lt; "Max Epochs " &lt;&lt; setw(8) &lt;&lt; max_iterations &lt;&lt; ". "
&lt;&lt; "Desired Error: " &lt;&lt; left &lt;&lt; desired_error &lt;&lt; right &lt;&lt; endl;
net.set_callback(print_callback, NULL);
net.train_on_data(data, max_iterations,
iterations_between_reports, desired_error);

cout &lt;&lt; endl &lt;&lt; "Testing network." &lt;&lt; endl;

// for each data point (row)
for (unsigned int i = 0; i &lt; data.length_train_data(); i++)
{
fann_type *calc_out = net.run(data.get_input()[i]);
//float *output = net.run(data.get_input()[i]);
//cout &lt;&lt; "input (" &lt;&lt; data.get_input()[i][0] &lt;&lt; "," &lt;&lt; data.get_input()[i][1] &lt;&lt; "): " &lt;&lt; calc_out[0] &lt;&lt; "," &lt;&lt; calc_out[1] &lt;&lt; endl;

// for each input dimention (col)
for (unsigned int j = 0; j &lt; data.num_input_train_data(); j++) {
cout &lt;&lt; "RESULT " &lt;&lt; i &lt;&lt; " " &lt;&lt; j &lt;&lt; " " &lt;&lt; data.get_input()[i][j] &lt;&lt; " " &lt;&lt; calc_out[j] &lt;&lt;endl;
}
}

// Save the network in floating point and fixed point
net.save("learn_sequence_canonical.net");
//unsigned int decimal_point = net.save_to_fixed("xor_fixed.net");
//data.save_train_to_fixed("xor_fixed.data", decimal_point);

//cout &lt;&lt; endl &lt;&lt; "XOR test completed." &lt;&lt; endl;
}
}

/* Startup function. Syncronizes C and C++ output, calls the test function
and reports any exceptions */

int main(int argc, char **argv)
{
try
{
std::ios::sync_with_stdio(); // Syncronize cout and printf output
xor_test();
}
catch (...)
{
cerr &lt;&lt; endl &lt;&lt; "Abnormal exception." &lt;&lt; endl;
}
return 0;
}

/******************************************************************************/