Extremely slow flocking algorithm, move to GPU?

Discussion in 'Support!' started by Hexamfelonious, August 10, 2013.

  1. Hexamfelonious

    Hexamfelonious New Member

    Messages:
    3
    Likes Received:
    0
    The recent patch (51853) seems to have slowed games down to a crawl. The only thing in the patch notes that seems likely to explain this is the new flocking algorithm. Is the boids algorithm implemented server-side on CPU? If so, are you considering eventually moving it to the GPU (still server side)? There are some implementations out there that work pretty well in real-time with several thousand units and still have room to scale.

    Thanks so much for the fantastic game so far, I've already gotten more than $90 worth of value out of it already and count myself lucky that there's still plenty of excellent gameplay to be had!
  2. KNight

    KNight Post Master General

    Messages:
    7,681
    Likes Received:
    3,268
    The real question is how many server farm servers actually have GPUs......

    Mike
  3. exterminans

    exterminans Post Master General

    Messages:
    1,881
    Likes Received:
    986
    Not many, but the issue is not that you couldn't buy servers with GPUs, that's actually possible if you have the need for it. It's just that the regular server is designed for different tasks, but if you take a look at modern clusters, you will see that most of them use GPUs and the very same hardware is available for single servers too.

    Also don't forget, that most private servers are most likely to be actually regular gaming rigs - which do have powerful GPUs at hand.

    However, moving to GPU isn't necessarily required. There is still a lot of improvement possible on the algorithmic layer, that means even if GPUs would be more powerful, thats without any doubt, CPU can still be sufficient.
  4. Hexamfelonious

    Hexamfelonious New Member

    Messages:
    3
    Likes Received:
    0
    I hope so. The goal of a million units is daunting, however. Even a few divisions and multiplications on that many objects is painful when done on the CPU, let alone on each unit affected by surrounding units. It looks to be simultaneously a terrifying and exciting challenge.
  5. cola_colin

    cola_colin Moderator Alumni

    Messages:
    12,074
    Likes Received:
    16,221
    I doubt PA will ever use any kind of GPU for this. It's much more likely to see better multithreading on CPUs in the future.
  6. SXX

    SXX Post Master General

    Messages:
    6,896
    Likes Received:
    1,812
    Actually they can use OpenCL which can run on both CPU and GPU.

    Anyway it's will be completely useless because Uber run their game servers on cloud services and none of them have GPU. :)
  7. Raevn

    Raevn Moderator Alumni

    Messages:
    4,226
    Likes Received:
    4,324
    GPUs aren't suited for this in any case. Give them a simple thing to repeat a massive number of times, that doesn't require data fetching or outside interaction, and they'll do well. Pathfinding has too many branches, conditionals and information fetching, all of which perform vastly better on CPUs.
  8. SXX

    SXX Post Master General

    Messages:
    6,896
    Likes Received:
    1,812
    I think hexamfelonious not talk about using GPU for all pathfinding. He is talk exactly about flocking algorithm which can be calculated on GPU extremely fast.

    Check this demo as example:
    http://www.youtube.com/watch?v=E67jVgcBZS0

    It's will be really interesting to check what Elitron think about OpenCL usage in PA. I understand that it's won't help a lot for now (because servers don't have GPUs), but if we want some super large battles with million of units it's could be extremely helpful.
  9. cola_colin

    cola_colin Moderator Alumni

    Messages:
    12,074
    Likes Received:
    16,221
    My guess would be that it is way too expensive to move all the required data to the GPU, do calculations on it and move it back.
  10. exterminans

    exterminans Post Master General

    Messages:
    1,881
    Likes Received:
    986
    That's surely not the case. You have ridiculous bandwidths available in every participating component, so copying the data isn't so much of a problem as you are imagining.

    GPU is about 14 times as efficient compared to CPU when it comes to computing power pet Watt (excluded special instructions where no native instructions are implemented on the GPU, e.g. encryption or ultra-wide data words) and the limitations when moving stuff to the GPU are rarely of a technical nature, but it's only limited by the choice of algorithms which can barely run in parallel and some additional effort required for porting the algorithm to OpenCL.

    You also can't say in general that branching is bad, well, while this does decrease the performance in general as each cluster of cores in the GPU is forced to follow the same code path - and if the code path diverges, the cores which have chosen the other path are forced to pause, so the execution time CAN equal the sum of all code fragments, even when not used by a single thread - you should not forget about the optimizing compiler which can resolve many of these issues for you or at least reduce the penalty to a minimum by shrinking the diverging sections.

    Historic limitations where each thread would at least take the time of the longest POSSIBLE path to execute are no longer existent with modern hardware.

    There is only one thing you can't do with OpenCL or any other GPGPU-language:
    Locking. Your algorithm must perform stable without any locks. There are extensions which provide atomic sections, but these usually come at a HUGE cost.
  11. neutrino

    neutrino low mass particle Uber Employee

    Messages:
    3,123
    Likes Received:
    2,687
    I've addressed this in other threads. Go find them but I'm not a fan of using the gpu for this for many reasons.

    Also you guys are assuming way too much about what causes performance differences in a build.

Share This Page