Sunday livestream

Discussion in 'Planetary Annihilation General Discussion' started by Sorian, February 14, 2015.

  1. Sorian

    Sorian Official PA

    Messages:
    998
    Likes Received:
    3,844
    Ok, there are two reasons why I think that is an incorrect assumption.

    1) During training the platoons only care about the units in their own platoon. While this can cause some fuzziness when there are allied units around, this should be an edge case given the shear amount of training.
    2) Attacking anti-surface defenses and attacking other (non anti-surface) defenses are two separate outputs, which means they should learn differently.

    If after more intense training I still see weird behaviors like you mention I will re-evaluate the inputs and adjust them.
  2. exterminans

    exterminans Post Master General

    Messages:
    1,881
    Likes Received:
    986
    That's the IMHO biggest weakness of the AI. We both know that it does a great job at micro managing platoons and taking the action which makes best use of an individual platoon during encounters in the open field, when each side occupies only a single movement layer. But it completely fails in all these "edge cases" during an attack on a base where a simple joint attack could have caused far more damage with less metal investment.
    Ok, that even means it didn't just happen by chance. During the last match, the ground force in the final assault even prioritized the AA tower over the laser tower (and all factories) with not allied air around.

    And I don't think that the net will learn that relation. It has probably seen the tower as a target without risk and a cheap kill, and maybe as a generic threat. But the way the training is set up, it can't possibly have grasped the relation between not killing AA and a FUTURE loss of an air platoon.

    EDIT:
    Unless...
    What would happen if the training did also apply retroactively to the platoon which have previously encountered the enemy?
    E.g. if platoon A retreats against platoon C, and platoon B later on (within a range of 10-30 seconds) gets killed by platoon C, then A gets rewarded for not dying, but also punished for the death of B.

    That way, the net could be trained to understand to deal respect to the consequences when disregarding allies.
    Last edited: February 19, 2015
  3. Sorian

    Sorian Official PA

    Messages:
    998
    Likes Received:
    3,844
    That is true, it wont learn about future losses. That isn't the intended use of the neural net. The intended use is the tactical decision making of a single platoon, not as an overall strategic decision making tool. Making a neural network that controlled the strategic decision making processes of the AI would be a huge undertaking.
  4. exterminans

    exterminans Post Master General

    Messages:
    1,881
    Likes Received:
    986
    Long term strategy: Yes, that's out of scope.

    Short term however should be possible with surprisingly little effort.

    During training, just remember for 5-20 seconds were a platoon was, and who it encountered (not necessarily who it fought, but who was in proximity). With a lower weight, scaled by time passed, also apply training to it when a different platoon has an encounter at either the same position or with the same opponents.

    That way, the platoon is actually rewarded for saving, or punished for causing the loss of another platoon which should cause the network to make assertions about upcoming encounters.
  5. Sorian

    Sorian Official PA

    Messages:
    998
    Likes Received:
    3,844
    That would take a surprising amount of work, actually.

    [Edit] You did give me an idea on how I can condense the inputs, however. Well, maybe.[/Edit]

    [Edit 2] By the way, in case it isn't clear from my replies, I am enjoying this conversation @exterminans. I don't get to talk neural networks often because I am the only person I know really using them in games. [/Edit 2]
    Last edited: February 19, 2015
  6. exterminans

    exterminans Post Master General

    Messages:
    1,881
    Likes Received:
    986
    Yes, it wouldn't be for free and I didn't meant to say it was easy, but it appears cheaper than all the alternatives. Well, that is if there were any other options of making platoons aware of short term strategic options short of writing an entirely new AI for that task.

    Well, one more point on the ever growing "things to try when I have nothing better to do" list.

    EDIT: The Chronocam does offer that log if I'm not mistaken. It's basically a look backwards whenever you encounter an event which causes training. Yielding additional events during the simulation could aid that, at least for encounters, so no additional parsing of the game state is required.
    Last edited: February 19, 2015
  7. crizmess

    crizmess Well-Known Member

    Messages:
    434
    Likes Received:
    317
    Sounds a bit like temporal difference learning.

    But keep in mind that, since a platoon doesn't have any information of what other platoons are around, this will converge to a static set of probabilities of what kind of platoons may encounter an enemy within the next seconds.
    exterminans likes this.
  8. exterminans

    exterminans Post Master General

    Messages:
    1,881
    Likes Received:
    986
    @crizmess Thank you so much! I had no idea how it is called, as I just made up what appeared logic to me, based on what I saw in PA. So inputs which evaluate other allied platoons are actually necessary, good to have that as a backed fact as well.

    PS: Whoa, the research on that topic is about 30 years old by now :eek:

    @sorian That 2010 paper about applying that approach to board games looks very promising: http://www.cs.bris.ac.uk/Publications/Papers/2000100.pdf
    stuart98 likes this.
  9. someonewhoisnobody

    someonewhoisnobody Well-Known Member

    Messages:
    657
    Likes Received:
    361
    Just wondering, what is the "zu" namespace you keep using? My best guess is that it is a custom math library that Uber has made.

    Also what build system are you guys using? You said it was some Python system so I assume it isn't CMake. Maybe GYP? Or some in house magic?
  10. Sorian

    Sorian Official PA

    Messages:
    998
    Likes Received:
    3,844
    zu is our low level engine stuff. crom is on top of that, and the engine sits on top of that. They are just namespaces to differentiate the different layers.

    Our python build stuff is some other set of magic.
    Remy561 and someonewhoisnobody like this.
  11. stevenrs11

    stevenrs11 Active Member

    Messages:
    240
    Likes Received:
    218
    I just played a game that has a very good example of this, in combination with another problem I think.

    There was a choke point between a lava lake and a mountain that the AI was constantly sending units through, so I built a single line of walls with two double laser turrets behind them. Intelligently (or by chance not sure), the AI more or less just avoided the choke.

    Then I built a pelter behind them. Now I'm not sure if this was just chance or the AI said to itself, "Oh crap pelter KILL IT NOW", but the AI started attacking those walls like nuts. This is where the problem started.

    It would send its mixed army of bolos and infernos at the walls, just into range of its tanks. They would fire a shot or two, then retreat, taking losses in the process. Over and over again. If at any time it had actually committed and got the infernos into range of the walls, it would have wrecked the defensive line. With the AI's two combat fabbers, it would have probably lost only two of its 5 infernos, and none of its tanks. If it had kept the combat fabbers in range well, it might not have lost any units at all.

    It got worse when the AI sent a second platoon into the choke. Each was moving in opposite directions, at one point trapping the majority of its infernos (moving away from the wall) in range of the turrets and preventing the tanks (moving towards the wall) from entering their firing range.

    In the end, these two double barreled turrets behind walls where firing continuously for 56 seconds against a vastly superior force that should have totally wiped them out.

    Here is a video of the relevant bits- link (when it uploads)

    Also, I wanted to give you the lobby ID/replay ID, but the game didnt show up in my recent games.
    Last edited: February 20, 2015
    thelordofthenoobs likes this.
  12. Sorian

    Sorian Official PA

    Messages:
    998
    Likes Received:
    3,844
    I think I have a fix for the back and forth thing. Should be fixed further once the new neural networks are done.
  13. stevenrs11

    stevenrs11 Active Member

    Messages:
    240
    Likes Received:
    218


    I intended to post the lobby ID as well, so sorry for the uninformative (but cinematic) camera angle.
  14. crizmess

    crizmess Well-Known Member

    Messages:
    434
    Likes Received:
    317
    Yes, TD learning is really old. I never seriously worked on machine learning, so take all my talk as what they are, mostly ramblings. ;)
    TD learning is the dual-form of Q learning (that means both solve the same problem, but using perpendicular approaches) - by the way, Q learning is from 1989. If you look at their update rules it is obvious that both are direct "consequences" of the markov decision process (plus some magic to actually proof the convergence, which is actually the important part).
    That's the reason why the popped up so early. Once most of the math about markov chains were in place, it just was a matter of time.
    And they still being used, because they are really fundamental and universal, once you have a (hidden) markov model - and they are almost everywhere - you can use those to approximate it. The math behind it is really simple, the update function is really easy to be understood.

    BTW: @sorian the AI can used a markov model to approximate the damage graph to learn about unit compositions. It is really easy (more or less ;) ) I'll write something up if you're interested.

Share This Page