Neural Networks

Section author: Malcolm Smith <msmith.malcolmsmith@gmail.com>

[status: unwritten]

Motivation, Prerequisites, Plan

Motivation

In a world where A.I. is an ever-increasing part of our lives, it’s important to have a feel for the basics (and these truly are the basics) of the mechanism by which they operate - namely neural networks, or nets. In addition, it will introduce the concept of the residual sum of sqaures to measure error in data fitting. The advantage of using a neural net to fit data to curves (as opposed to a polynomial, as seen in Section 7) is that a neural net is almost guarenteed not to overfit; the genetic algorithm will not be able to stumble upon such a comparitively unlikely solution.

Prerequisites

  • The 10-hour “serious programming” course.

  • The “Introduction to NetworkX Graphs” mini-course in Section 22.

Plan

The plan for this mini-course is to first understand the concept of a neural net, then work towards implementing a simple one. We will then use a genetic algorithm to fit a neural net to noisy data. By the end, we will have a versetile neural net program which can improve a neural net using a genetic algorithm and backpropagation.

What is a Neural Net?

To think about it in a high-level way, a neural net attempts to emulate a brain. It has “neurons” and “synapses” which connect the neurons. An important difference is that whereas an actual neuron can only output a 1 or a 0, and the synapses carry that 1 or 0 to the next neuron, our model brain can have neurons with varying outputs. In addition, our “synapses” can change the value of that output, even changing the sign. This allows for greater complexity with fewer neurons, something that is valued very highly by programmers trying to optimize efficiency. Now, let us look at how to actually implement such a net.

A neural net is a directed network, where nodes are called “neurons”. It is also weighted, which is a very important part of the way it functions. A neural net will have “layers” of nodes, with an input layer where your input layer goes (we will get to how you can input something later), a multitude of intermidiate layers called “hidden” layers, and finally an output layer where you get a result from the net. Every neuron in one layer is connected to every neuron in the next, with one exception. Every layer except the output layer has what’s know as a bias neuron, the purpose of which we will get to later.

Now, the mechanics of how it actually works: each edge has a weight associated with it, and each neuron has something called an activation. To start off, all the weights are set; it doesn’t matter if you’ve input something or not, the weights are constant (for now). However, the activations are all currently blank (except for the bias neurons; they have a 1). When you input something, the activations of the input layer neurons are set to the input. Then, along each edge, the activation of the origin neuron is multiplied by the weight of the edge. Then, at every neuron in the next layer, those products are summed for all incoming edges. This sum becomes the activation of the neuron, and thus the next layer can be calculated. Now, for the bias neurons. They don’t have any incoming edges, and their activation is always 1. They do connect to every neuron in the next layer, though, and this allows the net to have “default” activations for each of the neurons.

If that seemed like a lot, don’t worry. It was. For the programming, we will take it nice and slow, building from the ground up.

NetworkX Refresher

If you recall from the NetworkX chapter, the library is a powerful tool for manipulating and visualizing mathematical networks. If you don’t remember the basic commands, I would recommend going back and looking at the “first steps” sections, as it contains many useful commands that we will be using throught the course.

With that being said, there are some new commands that we need to be aware of as well:

  • nx.multipartite_layout(G, layer_dict) will create the proper layout for displaying a neural net. It returns a dictionary that can be plugged into the pos kwarg in nx.draw().

  • G.add_node(node, attribute=value) will add a node with an attribute. This will be used to record activations.

  • G.nodes[node][attribute] = new_value can be used to change the attribute of a node.

If you ever feel like something with the NetworkX library doesn’t make sense, got check Section 22. It’ll probably be in there.

First Neural Net

To start, we will create a neural net with 3 layers. The input and output layers will have 1 neuron each, and the hidden will have 3. To create it, you can run

$ python3
>>> import networkx as nx
>>> import matplotlib.pyplot as plt
>>> Net = nx.DiGraph()
>>> Net.add_nodes_from(['l0_0','l0_b','l1_0','l1_1','l1_2','l1_b',l2_0'])

The termiology here is fairly simple, with the layer being the first number and the position in the layer being the second. A b in the second position means the neuron is a bias neuron. let us add the edges.

>>> Net.add_edges_from([['l0_0', 'l1_0'],['l0_0', 'l1_1'],['l0_0', 'l1_2'],['l0_b', 'l1_0'],['l0_b', 'l1_1'],['l0_b', 'l1_2'],['l1_0', 'l2_0'],['l1_1', 'l2_0'],['l1_2', 'l2_0'],['l1_b', 'l2_0']])

That’s a lot of edges! In our tiny net, we already have ten edges. Clearly, we’re going to have to make a function to do this for us. First, though, let us visualize the net with:

>>> layer_dict = {0:['l0_0','l0_b'], 1:['l1_0','l1_1','l1_2','l1_b'], 2:['l2_0']}
>>> pos = nx.multipartite_layout(Net, layer_dict)
>>> nx.draw(Net, pos=pos)
>>> plt.show()

When run, this code should produce something like figure 23.4.1:

neural-net-basics/simple-net.*

Basic neural network

Note that this net doesn’t have an weights or activations yet. That’s OK; this was just a way to help understand the structure of the nets, not the functionality. While it may look like we have two input neurons, keep in mind one of those is a bias neuron. The reason you can’t tell the difference is that input neurons don’t have edges coming in anyway, so there’s no visual way to tell them apart. Our next step is to make a function that can generate these neural net shells for us, and save us all the typing that goes along with making even small nets. Something like this should work:

make_blank_net.py - creates and displays a neural net with specified layer sizes.
import networkx as nx
import matplotlib.pyplot as plt
import random # for weights

def main():
    simple_net = generate_blank_net([1,3,1]) # create our simple example net
    layer_dict = generate_layer_dict(simple_net)
    pos = nx.multipartite_layout(simple_net, layer_dict)
    nx.draw(simple_net, pos=pos)
    plt.savefig('simple-net.png')
    print('New net saved to \'simple-net.png\'.')

def generate_blank_net(layer_sizes): # creates a network with a specified number of neurons in each layer and random weights.
    blank_net = nx.DiGraph()
    for i in range(len(layer_sizes)):
        for j in range(layer_sizes[i]):
            neuron = f'l{i}_{j}' # properly formats the neuron names
            blank_net.add_node(neuron)
        blank_net.add_node(f'l{i}_b', activation=1)
    blank_net.remove_node(f'l{len(layer_sizes) - 1}_b') # remove bias neuron in output layer

    for n_1 in blank_net.nodes:
        for n_2 in blank_net.nodes:
            # we need to check if the neurons are in adjacent layers, and make sure that the second neurom isn't a bias.
            if int(n_1[1]) + 1 == int(n_2[1]) and n_2[3] != 'b':
                blank_net.add_edge(n_1, n_2, weight=random.random()*2 - 1)

    return blank_net

def generate_layer_dict(net): # make a dictionary of a given neural net keyed by layer.
    layer_dict = {}
    for neuron in net.nodes:
        if int(neuron[1]) not in layer_dict.keys():
            layer_dict[int(neuron[1])] = []
        layer_dict[int(neuron[1])].append(neuron)
    return layer_dict


main()

If you open the file simple-net.png, it should contain the exact same image displayed above. We now have a method for easily generating neural nets of any size, which will be useful for larger nets.

Inputting and Outputting with Neural Nets

Now let us add a function that can evaluate a neural net for a given input. But before we do that, we need to look at an important feature of neural nets, called an activation function. An activation function is basically a simple nonlinear function that you apply to the sum of products of weights and activations from the previous layer. You can use any non-linear function, such as x squared or the square root of x, but for this example we will use the hyperbolic tangent of x. The reason for this is that it outputs in a range between 1 and -1, which makes keeping track of the activations convenient. let us implement this with:

helper_functions.py - these are the basic functions necessary to start doing more complex things with neural nets.
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
import random # for weights

def main():
    simple_net = generate_blank_net([1,3,1]) # create our simple example net
    layer_dict = generate_layer_dict(simple_net)
    input_layer = [2]
    output_layer = calc_all_layers(input_layer, simple_net, layer_dict)
    input_layer.pop(-1) # to get rid of the 1 from the bias neuron
    print(input_layer, output_layer)

def generate_blank_net(layer_sizes): # creates a network with a specified number of neurons in each layer and random weights.
    blank_net = nx.DiGraph()
    for i in range(len(layer_sizes)):
        for j in range(layer_sizes[i]):
            neuron = f'l{i}_{j}' # properly formats the neuron names
            blank_net.add_node(neuron)
        blank_net.add_node(f'l{i}_b', activation=1)
    blank_net.remove_node(f'l{len(layer_sizes) - 1}_b') # remove bias neuron in output layer

    for n_1 in blank_net.nodes:
        for n_2 in blank_net.nodes:
            # we need to check if the neurons are in adjacent layers, and make sure that the second neuron isn't a bias.
            if int(n_1[1]) + 1 == int(n_2[1]) and n_2[3] != 'b':
                blank_net.add_edge(n_1, n_2, weight=random.random()*2 - 1)

    return blank_net

def generate_layer_dict(net): # make a dictionary of a given neural net keyed by layer.
    layer_dict = {}
    for neuron in net.nodes:
        if int(neuron[1]) not in layer_dict.keys():
            layer_dict[int(neuron[1])] = []
        layer_dict[int(neuron[1])].append(neuron)
    return layer_dict

def calc_next_layer(previous_layer_index, net, layer_dict): # given a neural net and the index of the last calculated layer, iterate the calculation one layer further.
    for n_1 in layer_dict[previous_layer_index + 1]: # n_1 is the neuron that gets a new activation
        activation = 0
        if n_1[3] != 'b':
            for n_2 in layer_dict[previous_layer_index]:
                activation += net.nodes[n_2]['activation'] * net[n_2][n_1]['weight']
            net.nodes[n_1]['activation'] = np.tanh(activation)
        else:
            net.nodes[n_1]['activation'] = 1

    return net

def calc_all_layers(input_layer, net, layer_dict): # returns the output layer given a neural net and an input layer.
    if input_layer[-1] == 1:
        input_layer.pop(-1) # double check that bias neuron hasn't been included in the input layer
    if len(input_layer) != len(layer_dict[0]) - 1: # make sure the input layer is valid
        raise Exception(f'Input layer must be the same length as the first layer of nodes! (input is {input_layer} and first layer is {len(layer_dict[0])} long)')
    input_layer.append(1) # add bias neuron value
    for i in range(len(input_layer)):
        net.nodes[layer_dict[0][i]]['activation'] = input_layer[i]

    for i in range(len(layer_dict) - 1):
        net = calc_next_layer(i, net, layer_dict)

    output_layer_neurons = layer_dict[np.max(list(layer_dict.keys()))]
    output_layer_neurons.sort(key=lambda x: x.split('_')[1])
    output_layer = []
    for neuron in output_layer_neurons:
        output_layer.append(float(np.arctanh(net.nodes[neuron]['activation'])))
    
    return output_layer
        

main()

When run, this should output two arrays, one with a two (or whatever you put for the input) and the other with a random number. This number would normally be between -1 and 1, but it’s not because of the np.arctanh() in the calc_all_layers() function. The reason we have that is to make it possible for the net to fit to function that have y values outside of [-1, 1]. The randomness is to be expected; we populated the weights with random numbers, so the output will be random. To get meaninful results, there are a couple options. The most common and most powerful method is known as back propogation, which involves calculating how much each weight is affecting the final output and adjusting it. However, this requires math that is well beyond the scope of this course. The method we will be using, known as the genetic algorithm, is much simpler: it makes a bunch of copies of the net, randomly alters them, and then checks if they did better than the original. The defenition of “better” is something we will get to when defining the cost function.

Note: for all future programs, we will import all four of the helper functions with from helper_functions import *. After running the last program you should delete the main() function so it doesn’t mess up the main functions of the files we will be importing it to.

For simplicities sake, we will try to use the net to fit a curve to data. In reality, this involves only a single input neuron (x coordinate) and a single output neuron (y coordinate). This means it’s pretty much a trivial example, but it’s an easy one to understand and a good place to start.

The next step is to define a cost function. To do this, we will use a residual sum of squares (RSS) approach to cost. This sounds complicated, but all it means is that you take the value that you want the net to output (i.e. the height of a data point) and subtract it from what the net actually outputted. Then you square that to account for negatives, and sum it up across all data points. For our purposes, we will also probably divide by the number of data points, as we don’t want the cost depending on the size of the data set.

def eval_cost(net_outputs, expected_outputs):
    RSS = 0
    for i in range(len(net_outputs):
        for j in range(len(net_outputs[i])):
            RSS += ((net_outputs[i][j] - expected_outputs[i][j]) ** 2) / len(net_outputs[i]) # make error independant of output layer size
    RSS /= len(net_outputs) # make error independant of data set size
    return RSS

The next step will be to add a function that can make and tweak copies of our neural net. Something like this will work:

genetic_neural.py - first steps in coding the genetic algorithm
from helper_functions import *

def eval_RSS(net_outputs, expected_outputs): # calculate the RSS for a given set of outputs
    RSS = 0
    for i in range(len(net_outputs)):
        for j in range(len(net_outputs[i])):
            RSS += ((net_outputs[i][j] - expected_outputs[i][j]) ** 2) / len(net_outputs[i]) # make error independant of output layer size
    RSS /= len(net_outputs) # make error independant of data set size
    return RSS
            
def make_children(net, variance, num_children): # creates a certain number of copies of a net, then tweaks them all slightly.
    children = []
    for i in range(num_children):
        child = nx.DiGraph(net)
        for edge in child.edges:
            child.edges[edge]['weight'] += random.random() * variance - variance / 2
        children.append(child)
    return children

We don’t have a main() function here, which means that this currently won’t do anything. However, it is a good idea to take a look through the functions to try to understand what they do, as they will be being used in the next step.

Fitting Curves with Neural Nets

To fit a curve, we have to work iteratively, meaning over several iterations. It would require far too much processing power to create enough copieps of a net that one would happen to fit the data properly; what we will do instead is choose the best of a previous generation to be copied and tweaked for the next one. For the data, it’s not super important what it actually is, so for now we will be using python to generate noisy data for us. This gives us precise control over the shape and noise level of that data, something we would have if we were using one off the internet. With all that being said, here’s the program:

genetic_neural.py - a complete data fitting neural network algorithm.
from helper_functions import *

def main():
    poly_coeffs = [0.4, -1, 1, 1]
    domain = [-2,1.5]
    num_points = 100
    noise = 0.6
    num_children_per_net = 50
    num_survive = 5
    variance = 0.1
    generations = 150
    net_dimensions = [1,5,5,5,1]
    
    first_parent = generate_blank_net(net_dimensions)
    xs, ys = generate_polynomial_data(poly_coeffs, domain, num_points, noise)
    next_gen = make_next_gen([first_parent], xs, ys, num_children_per_net, num_survive, variance) # create first generation
    
    inputs = []
    expected_outputs = []
    for i in range(len(xs)):
        inputs.append(xs[i][0])
        expected_outputs.append(ys[i][0])  # reformat xs and ys for matplotlib
    
    for i in range(generations):
        next_gen = make_next_gen(next_gen, xs, ys, num_children_per_net, num_survive, variance)
        print(f'The RSS is {eval_total_cost(next_gen[0], xs, ys) * 1000} after generation {i+1}.') # the times 1000 makes the RSSs a reasonable value
        
        outputs = []
        for i in range(len(xs)):
            outputs.append(calc_all_layers(xs[i], next_gen[0], generate_layer_dict(next_gen[0])))
        plt.clf()
        plt.plot(inputs, expected_outputs)
        plt.plot(inputs, outputs)
        plt.savefig('fitted-curve.png')
        print('Figure saved to fitted-curve.png')

                    

def eval_RSS(net_outputs, expected_outputs): # gets RSS for a given set of outputs
    RSS = 0
    for i in range(len(net_outputs)):
        for j in range(len(net_outputs[i])):
            RSS += ((net_outputs[i][j] - expected_outputs[i][j]) ** 2) / len(net_outputs[i]) # make error independant of output layer size
    RSS /= len(net_outputs) # make error independant of data set size
    return RSS

def eval_total_cost(net, input_layers, expected_output_layers): # properply formats inputs to eval_RSS
    net_outputs = []
    layer_dict = generate_layer_dict(net)
    for i in range(len(input_layers)):
        net_outputs.append(calc_all_layers(input_layers[i], net, layer_dict))
    cost = eval_RSS(net_outputs, expected_output_layers)
    return cost / len(input_layers)
                       
def make_children(net, variance, num_children): # creates a specificied number of tweaked copies of a given net
    children = []
    for i in range(num_children):
        child = nx.DiGraph(net)
        for edge in child.edges:
            child.edges[edge]['weight'] += random.random() * variance - variance / 2
        children.append(child)
    return children

def make_next_gen(parents, input_layers, expected_output_layers, num_children_per_net=20, num_survive=1, variance=0.1): # heart of the program. it will create a bunch of copies of a net, tweak them slightly, then pick out the best ones to create the next generation.
    children = parents.copy() # make sure RSS doesn't go up
    best_children = []
    costs = []
    for parent in parents:
        loop_children = make_children(parent, variance, num_children_per_net)
        for child in loop_children:
            children.append(child)
    for child in children:
        costs.append(eval_total_cost(child, input_layers, expected_output_layers))
        
    for i in range(num_survive):
        best_child_index = costs.index(np.min(costs))
        best_children.append(children[best_child_index])
        children.pop(best_child_index)
        costs.pop(best_child_index)

    return best_children
                                 
def generate_polynomial_data(coefficients, domain, num_points, noise=0): # returns two formatted lists: one for inputs, the other for expected outputs
    xs = np.linspace(domain[0], domain[1], num_points)
    ys = []
    xs_lists = [] # because of numpy syntax, we have to make two xs arrays
    for i in range(len(xs)):
        # add variation to the data to give fitting a thourough test
        xs[i] += random.random() * (domain[1] - domain[0]) * noise / (0.1 * num_points) - (domain[1] - domain[0]) * noise / (0.2 * num_points)
        xs_lists.append([xs[i]])
        # input_layers is a list of lists, so this formats the inputs properly
        y = 0
        for j in range(len(coefficients)):
            y += (coefficients[j] * (xs[i] ** j)) + (random.random() * noise) - (noise / 2)
        ys.append([y]) # same as above but for outputs
    return xs_lists, ys

main()

There’s a lot of code here! A lot of it is just formatting various arrays for functions, which isn’t that interesting. However, make_next_gen() is densly packed with meaning. The first block of code is simply defining variable. The only thing of note is the first line, which starts children off as a copy of the parents. This is so that if, by some fluke, all the children have RSSs higher than the parents, the RSS still won’t go up. The program then makes all the children, and adds their RSSs to the costs array. Then, because the indexing is the same, we look through costs for the lowest costs and find the matching children to return.

The main() function is also very long, indicating that there’s a significant meaning.. All of the variables at the top can be changed, and you should feel free to change them and see what happens. A word of caution, though: if you increase num_points, num_children_per_net, num_survive, the inner values of net_dimension, any value in poly_coeffs, or generations too much, it’ll take an incredibly long time to run. If it takes over 5 minutes to run, I would recommend hitting control C and reducing one of those variables.

You may have noticed in the list of variable not to increase too much “the inner values of net_dimension.” That’s because the outer values, the sizes of the input and output layers, must be one in this case. We only have 1 input and 1 output, namely the x value and the y value. If you change either of those, the program with throw an error.

With that out of the way, let us look at what this program produces when run with the values shown above. (Fig 23.6.1)

../_images/fitted-curve.svg

A noisy polynomian being fitted by a neural network.

That’s a pretty nice fit. It totally avoids overfitting, and it doesn’t require any input about the degree of the polynomial (while it is true that the information does exist within the program, the neural net training never accesses it). It is possible if you let it run for an absurd amount of time it would overfit. But because we’re limiting it to a reasonable amount of time, it should produce nice, smooth fits.

Another nice thing about this program is that it updates live after every generation. You can see this in the output, which should be showing the RSS and printing that it saved a figure. If you open that file that it’s saving to (fitted-curve.png), you’ll be able to see the neural net’s output updating in real time, which gives an interesting perspective on the way it fits to data.

Backpropagation

So far we have been using a genetic algorithm to update the net. This works and is easy to understand, but it is not the most efficient method. What large-scale neural nets use is known as backpropagation, which is somewhat tricky to understand. Basically, it’ll go through the network backward after a given iteration, and figure out how each individual activation affected the final RSS. Then, it’ll use that information to figure out which way and how much each weight should be changed to change all those activations in the proper directions. Understand all the theory and math behind this requires knowledge equivalent to at least calculus III, so we will skip that and go from a high-level description to the method of implementation.

The basic structure of the backpropagation algorithm is in two parts: calculating neuron errors and adjusting the weights. The first part can be accoplished by doing some manipulation of various values, such as the weights, activations, and errors of other neurons. The general formula can be found on line 40 of listing 23.7.1 . The second part involves the errors, as well as something called the learning rate. The learning rate is a number that controls how fast the net’s weights can change. For our purposes, it will likely be between 0.1 and 0.001. The way is calculates the amount to change the weight by is fairly simple; it multiplies the error of the neuron at the end of the edge by the activation of the neuron at the start, then multiplies all of that by the learning rate. This can be understood as “what is wrong with the node we affect” * “the amount we can affect that node” * “some constant”. This algorithm is much faster than the genetic algorithm, though is has some trouble fitting close to the endpoints.

backpropagation.py - the fastest and best method for training neural nets
from helper_functions import *

def main():
    poly_coeffs = [0.4, -1, 1, 1]
    domain = [-2, 1.5]
    num_points = 100
    noise = 0.75
    iterations = 1000
    learning_rate = 0.05

    
    nnet = generate_blank_net([1,5,5,5,1])
    inputs, expected_outputs = generate_polynomial_data(poly_coeffs, domain, num_points, noise)
    for iteration in range(iterations):
        outputs = []
        for i in range(len(inputs)):
            output = calc_all_layers(inputs[i], nnet, generate_layer_dict(nnet))
            outputs.append(output)
            backpropogate(nnet, inputs[i], output, expected_outputs[i])
            update_weights(nnet, learning_rate)
            
        if (iteration + 1) % 20 == 0:
            inputs_without_bias = []
            for input in inputs:
                inputs_without_bias.append(input[0])
            plot_current_fit(inputs_without_bias, outputs, expected_outputs)
        
def backpropogate(net, input, output, expected_output):
    layer_dict = generate_layer_dict(net)
    for layer in reversed(sorted(layer_dict.keys())):
        if layer == len(layer_dict.keys()) - 1:
            for i in range(len(layer_dict[layer])):
                net.nodes[layer_dict[layer][i]]['error'] = (output[i] - expected_output[i]) * (1 - np.tanh(output[i]) ** 2)
        else:
             for current_neuron in layer_dict[layer]:
                 net.nodes[current_neuron]['error'] = 0
                 for previous_neuron in layer_dict[layer + 1]:
                     if previous_neuron[3] == 'b':
                         continue
                     current_error = net.edges[(current_neuron, previous_neuron)]['weight'] * net.nodes[previous_neuron]['error'] * (1 - np.tanh(net.nodes[current_neuron]['activation']) ** 2)
                     net.nodes[current_neuron]['error'] += current_error

def update_weights(net, l_rate):
    ## update weights based on error of concluding neuron and activation of beginning neuron
    for edge in net.edges:
        net.edges[edge]['weight'] -= l_rate * net.nodes[edge[0]]['activation'] * net.nodes[edge[1]]['error'] 
                     
def generate_polynomial_data(coefficients, domain, num_points, noise=0): # returns two formatted lists: one for inputs, the other for expected outputs
    xs = np.linspace(domain[0], domain[1], num_points)
    ys = []
    xs_lists = [] # because of numpy syntax, we have to make two xs arrays
    for i in range(len(xs)):
        # add variation to the data to give fitting a thourough test
        xs[i] += 0.1
        xs_lists.append([xs[i]])
        # input_layers is a list of lists, so this formats the inputs properly
        y = 0
        for j in range(len(coefficients)):
            y += (coefficients[j] * (xs[i] ** j)) + (random.random() * noise) - (noise / 2)
        ys.append([y]) # same as above but for outputs
    return xs_lists, ys

def plot_current_fit(inputs, outputs, expected_outputs):
    plt.clf()
    flat_outputs = sum(outputs, [])
    flat_expected_outputs = sum(expected_outputs, [])
    plt.plot(inputs, flat_expected_outputs)
    plt.plot(inputs, flat_outputs)
    plt.savefig('fitted-curve.png')
    print('Figure saved to fitted-curve.png')


main()

This is also a lot of code. Some of it is directly copied from the genetic algorithm code, but most of it is new. The function backpropagate() has all the horrible math, when you look through it be aware of that. With that being said, here’s what this function can produce:

../_images/backpropagation.svg

An OK fit.

Something that should strike you right away is that the fit seems to be less good than in the case of the genetic algorithm. There are a couple reasons for this, the main one being that neural nets, and especially backpropagation, aren’t designed to fit data. Fitting data is a great way to learn them, as a simple input/output structure can help give an intuition for what they are and can do. Using this exact code, and merely tweaking the main() function to input data for hand-drawn digits, this setup can tell you what a new digit is that it hasn’t seen before.