Improving AI Neural Net Weights

Discussion in 'PA: TITANS: General Discussion' started by Quitch, May 26, 2018.

  1. Quitch

    Quitch Post Master General

    Messages:
    5,853
    Likes Received:
    6,045
    So I find myself with the ability to run neural net training for the AI and my long-term goal is to improve some improved neural network weights as a mod.

    I have access to configure the following:
    • Armies
    • System
    • Learning rate
    • Momentum
    • Number of passes
    By default neural net training is performed on a 650 radius moon between two armies. I haven't touched this as a moon seems like the ideal place to run such training as it reduces errors in outcomes that terrain features would bring. Neural network error values are generated for every output after every game. The AI has neural networks for land, fighters and bombers. These are called by the AI as the tasks land_attack, fighter_attack and bomber_attack. Training can be done for one or more of these networks.

    The land network has 28 outputs:
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "Fabber - Air - Orbital",
    • "Fabber - Air - Orbital",
    • "Fabber - Air - Orbital",
    • "Structure & (MetalProduction | EnergyProduction)",
    • "Structure & (MetalProduction | EnergyProduction)",
    • "Structure & (MetalProduction | EnergyProduction)",
    • "Structure & SurfaceDefense",
    • "Structure & SurfaceDefense",
    • "Structure & SurfaceDefense",
    • "(Structure & Defense) - Wall - SurfaceDefense - Orbital",
    • "(Structure & Defense) - Wall - SurfaceDefense - Orbital",
    • "(Structure & Defense) - Wall - SurfaceDefense - Orbital",
    • "Commander",
    • "Commander",
    • "Commander",
    • ""
    The fighter network 6 outputs:
    • "Mobile & (Air | Transport)",
    • "Mobile & (Air | Transport)",
    • "Mobile & (Air | Transport)",
    • "Mobile & (Air | Transport)",
    • "Fabber & Air",
    • ""
    The bomber network 11 outputs:
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))",
    • "Mobile & AirDefense",
    • "Fabber - Air - Orbital",
    • "Structure & (MetalProduction | EnergyProduction)",
    • "Structure & AirDefense",
    • "(Structure & Defense) - Wall - AirDefense - Orbital",
    • "Commander",
    • ""
    I know the AI operates by selecting different orders, such as target DPS, so my assumption is that where you have multiple identical outputs each represents the success of one of these orders against those targets. This is a total guess though.

    Now, I don't know a lot about neural networks. I may use the wrong terms and it wouldn't surprise me if I'm making a number of bad assumptions. Let me say right now that I'll be using "iteration" to mean a single game, while I've used "epoch" to mean all "iterations" using the same learning rate and momentum settings, but I'm not sure if that's correct use of the terminology. I'm also assuming that the neural network error values represent how far from the expected result the output was, where the desired value is zero as that would indicate the expected and actual outcomes were perfectly aligned.

    The neural networks shipping with the game seem to have each been trained over 6 epochs totalling between 675 and 1200 iterations, with an increasing number of iterations for each epoch. The learning rate starts at 0.6 and momentum at 0.7, reducing to 0.1 and 0 respectively. The momentum change surprises me because everything I've read says you that if you change momentum at all, it will increase where learning rate decreases. This discrepancy could indicate I don't understand what momentum is being used for in the PA neural network.

    I decided to stick with a pretty simple exponential learning rate decline with a fixed momentum. Based on what I'd read I saw a lot of references to 0.9 being considered a "good enough" momentum figure and less valuable in terms of quality output than using the right learning rates and number of iterations. That's assuming that momentum means what I think as a value the game accepts, and that I understood the concept. The number of iterations was to be constant across epochs.

    I was planning to try and determine the optimal learning rate to start with, but eventually decided after some initial runs that a meaningful number of iterations would take too long and I wasn't really confident that I even understood how to interpret the results. 0.2 and 0.1 seemed to be common starting points, with even 0.1 being referred to as a large learning rate. To that end I decided to carry out 3 epochs of 400 iterations each with momentum set to 0.9. Learning rates would be 0.1, 0.01 and 0.001 respectively.

    I've run both the vanilla neural net training and my own version. 2398 of the 2400 games completed successfully. Below are recorded the results of the network error value of every output of every network. The left column for each network is the result after epoch 1 iteration 1, while the right column is after the final iteration of the final epoch.
    gmase, stuart98 and NikolaMX like this.
  2. Quitch

    Quitch Post Master General

    Messages:
    5,853
    Likes Received:
    6,045
    Code:
           |                                             Quitch                                            |                                            Vanilla 
    Output |             Land              |            Fighter            |            Bomber             |             Land              |            Fighter            |            Bomber     
    1      |  0.0329028000 |  0.0061851800 |  0.1310050000 |  0.0011532000 |  0.1292060000 |  0.0001792040 |  0.0083646800 |  0.0054935300 |  0.0396398000 |  0.0288579000 |  0.0935859000 |  0.0016901000
    2      |  0.1136700000 |  0.0005845220 |  0.1207870000 |  0.0009936700 |  0.1557470000 |  0.0000019718 |  0.0446668000 |  0.0009178920 |  0.0934296000 |  0.0000308521 |  0.0933920000 |  0.0015054100
    3      |  0.1533510000 |  0.0005627950 |  0.1031290000 |  0.0004535160 |  0.1389280000 |  0.0003568620 |  0.0191228000 |  0.0054687600 |  0.0475935000 |  0.0001246740 | -1.0000000000 | -1.0000000000
    4      |  0.0210732000 |  0.0046596100 |  0.0729719000 |  0.0004079090 |  0.1145210000 |  0.0000408328 |  0.1036990000 |  0.0002537300 |  0.0206073000 |  0.0001367830 |  0.1804120000 |  0.0029044400
    5      |  0.0007282930 |  0.0004158000 |  0.0750440000 |  0.0004295870 | -1.0000000000 |  0.0000197559 |  0.0091388400 |  0.0000724154 | -1.0000000000 |  0.0084099600 | -1.0000000000 |  0.0000838604
    6      |  0.0820305000 |  0.0009783770 | -1.0000000000 |  0.0109219000 | -1.0000000000 |  0.0000150855 |  0.0625671000 |  0.0025960900 | -1.0000000000 |  0.0000077729 | -1.0000000000 | -1.0000000000
    7      |  0.0434460000 |  0.0003428610 |               |               | -1.0000000000 |  0.0000113764 |  0.0322804000 |  0.0008326760 |               |               | -1.0000000000 |  0.0015059200
    8      |  0.0243504000 |  0.0017061600 |               |               | -1.0000000000 |  0.0000053687 |  0.0303185000 |  0.0112834000 |               |               |  0.1519790000 | -1.0000000000
    9      |  0.0102110000 |  0.0011777200 |               |               | -1.0000000000 |  0.0000108432 |  0.0317598000 |  0.0033870700 |               |               | -1.0000000000 | -1.0000000000
    10     |  0.0008867920 |  0.0055312000 |               |               |  0.1131250000 |  0.0001559130 |  0.0474005000 |  0.0006706770 |               |               |  0.1802780000 |  0.0010331900
    11     |  0.0563538000 |  0.0007294210 |               |               |  0.1720040000 |  0.0024939900 |  0.0001752400 |  0.0004316090 |               |               | -1.0000000000 |  0.0740266000
    12     |  0.0018041500 |  0.0010910800 |               |               |               |               |  0.0800815000 |  0.0008677780 |               |               |               | 
    13     | -1.0000000000 | -1.0000000000 |               |               |               |               | -1.0000000000 |  0.0000377801 |               |               |               | 
    14     | -1.0000000000 | -1.0000000000 |               |               |               |               | -1.0000000000 | -1.0000000000 |               |               |               | 
    15     | -1.0000000000 |  0.0083719100 |               |               |               |               |  0.0010171800 | -1.0000000000 |               |               |               | 
    16     |  0.0053388900 |  0.0081492200 |               |               |               |               |  0.0614809000 |  0.0000149914 |               |               |               | 
    17     | -1.0000000000 |  0.0004005590 |               |               |               |               |  0.0534463000 |  0.0029624100 |               |               |               | 
    18     |  0.0793586000 |  0.0005034900 |               |               |               |               |  0.0065075500 |  0.0004140490 |               |               |               | 
    19     | -1.0000000000 |  0.0000284054 |               |               |               |               | -1.0000000000 | -1.0000000000 |               |               |               | 
    20     | -1.0000000000 |  0.0049566600 |               |               |               |               |  0.0019805700 |  0.0000041682 |               |               |               | 
    21     | -1.0000000000 |  0.0000000207 |               |               |               |               | -1.0000000000 |  0.0000061345 |               |               |               | 
    22     | -1.0000000000 | -1.0000000000 |               |               |               |               | -1.0000000000 |  0.0000138864 |               |               |               | 
    23     |  0.0131093000 |  0.0002837700 |               |               |               |               | -1.0000000000 |  0.0000044396 |               |               |               | 
    24     | -1.0000000000 |  0.0000074383 |               |               |               |               |  0.0008408900 |  0.0000883835 |               |               |               | 
    25     |  0.0023453300 |  0.0001811050 |               |               |               |               |  0.0651575000 |  0.0002410570 |               |               |               | 
    26     |  0.0877678000 |  0.0000379507 |               |               |               |               |  0.0583101000 |  0.0000727479 |               |               |               | 
    27     |  0.0672948000 |  0.0001967170 |               |               |               |               |  0.0838285000 |  0.0006334450 |               |               |               | 
    28     | -1.0000000000 |  0.0004349620 |               |               |               |               |  0.0642724000 |  0.0000741555 |               |               |               |
    -------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------
    AVG    | -0.3287134766 | -0.1054458238 | -0.0828438500 |  0.0023932970 | -0.3796790000 |  0.0002992003 | -0.1833422482 | -0.1058270259 | -0.2997883000 |  0.0062613237 | -0.4818502818 | -0.3561136800
    
    Quitch final network AVG: -0.0342511088
    Vanilla final network AVG: -0.1518931274

    Now, the first thing to note is that I might be generating nonsense results here. I've stated earlier my assumption around what network error value is representing, but I could be dead wrong, so bear that in mind when reading these concluding thoughts.

    While the Quitch network scores slightly more accurately, there's not a lot in it. I think. I'm not entirely sure what a big different versus a small different might be considered. It also needs to be weighed against the fact that the neural network is only one part of the AI in Planetary Annihilation.

    The one very interesting area is in the bomber neural network where a significant difference can be seen in final result. I can only assume that the larger learning rate the vanilla network runs with is causing the bomber network to overadjust, and for some reason this is proving to be a bigger factor here than in other networks.

    The other item of interest is that in both land networks there are three outputs which remain at -1, a seemingly terrible error. I have no idea why this is happening. At first I thought it represented a possible error in the network, but there's not a perfect mirror between vanilla and mine of these so that doesn't seem to be the case. Certainly with errors that big I would take it to mean that more training is likely required on the land network to improve the quality of results for these outputs. I'm not sure whether I should go with a higher opening learning rate, more iterations, or reduce the learning rate even further in case this represents it bouncing around the target. Understanding how to graph this stuff might help.

    Owing to flaws in the AI's understanding of threat there is likely some inherent flaws baked into the training. Firstly, the AI sees the Leveler as a 300 DPS threat, when the true threat is 600 DPS. It suffers a similar problem with the Bumblebee where it only sees the DPS of one bomb, not the entire load. Land training would likely be improved in quality by substituting in a Leveler with either a single 600 DPS shell or two 300 DPS weapons operating from the same turret. Likewise, the Bumblebee's weapon should probably be changed to something which doesn't carpet bomb so the AI gets a realistic outcome based on the numbers in play.

    I could insert the Legion Expansion mod into training, but I don't see any benefits to doing so and I think units like the Earthshaker would throw the results off.
    stuart98 and NikolaMX like this.
  3. NikolaMX

    NikolaMX Active Member

    Messages:
    139
    Likes Received:
    141
    Can you crowdsource the simulations? would love to help by lending a couple cores
    stuart98 likes this.
  4. Quitch

    Quitch Post Master General

    Messages:
    5,853
    Likes Received:
    6,045
    I wouldn't have thought so, no. Each iteration needs to feed back into the network.
    stuart98 likes this.
  5. Quitch

    Quitch Post Master General

    Messages:
    5,853
    Likes Received:
    6,045
    To eliminate the known issue with the Leveler threat I changed projectiles_per_fire from a value of 2 to 1 and re-ran my training. This resulted in an improvement in the overall results, with an average error value of -0.0686652820. However, there were two outputs with an error value of -1, and both were different from last time.

    I will need to run a series of tests using some different learning rates and iteration numbers to see if it helps. I will likely also disable Locusts as their AntiSurface threat value is approximate at best and could throw off results as they damage metal, not health, and I don't think the neural network is setup to really understand how this difference matters.

    It's also possible that the problem is caused by shots missing due to known issues with targeting in PA. A number of instances of that happening could be causing the numbers to be thrown off as the AI wouldn't realise it was missing the target. This would explain why these outliers are only being seen in the land neural net. This wouldn't appear to perfectly correlate with the outputs though.
    stuart98 likes this.
  6. exterminans

    exterminans Post Master General

    Messages:
    1,881
    Likes Received:
    986
    If I'm not mistaken, this is only to get "more oppinions" on especially complex outputs. So e.g. for "(Mobile - Air - Orbital) | (Structure - (Orbital - Land))", the maximum of all matching outputs counts.

    Partially to account for - as you have noticed - outputs which will just fail to produce meaningful output, or only do so very situationally. Keep in mind that PAs neural net is rather shallow with only a single hidden layer, and it's also optimized for horizontal size. So the training may be unable to correctly use all output nodes.

    A modern approach to this problem would be to choose a much broader network, and then perform ellimination of redundant / dead neurons before shipping to production. But that wasn't quite state of the art yet when PA's training system was implemented.
    river39 and Quitch like this.
  7. Quitch

    Quitch Post Master General

    Messages:
    5,853
    Likes Received:
    6,045
    Sorry, but what does "the maximum" mean in this instance?
  8. exterminans

    exterminans Post Master General

    Messages:
    1,881
    Likes Received:
    986
    Output node with highest value wins, and determines the target selection.
    If the node without a name wins, it means retreat.

    Don't bother with re-training too much though. The same faults wich screw with training, also screw with execution of the network. The bugs with the Leveler and alike would need to be fixed in the DPS calculation. Just fixing that for training stil causes the AI not to use these units properly.

    The networks also have neither any input percepts nor suitable filters on the output nodes to allow for synergies (e.g. land platoons can't even prioritize taking down Anti-Air, nor can they detect that a nearby airforce would require that).

    Without access to the code, not much can be improved.
  9. Quitch

    Quitch Post Master General

    Messages:
    5,853
    Likes Received:
    6,045
    Since the AI doesn't train per unit, there's no reason to train it with bad Leveler data. All that does is cause it to cock up its calculations for all units in all situations, since the Leveler causes it to draw bad conclusions. Correcting for that won't fix situations where the Leveler is around, but will where it's not.

    I understand there are rather limited gains to be had, but it's interesting to try regardless.
    river39 and stuart98 like this.

Share This Page