Optimization wonderland extravaganza

Discussion in 'Planetary Annihilation General Discussion' started by varrak, June 27, 2014.

  1. exterminans

    exterminans Post Master General

    Messages:
    1,881
    Likes Received:
    986
    I think I can provide a solution to that specific problem:

    Use a map for visibility instead of testing each unit against each other.
    32bit integer map, same resolution as the flow field maps. Update maps ONLY on unit movement / activation/deactivation of radar channel and perform visibility test for a unit towards a certain player by sampling inside that map. Values >0 means that at least 1 unit of that player is seeing this location.

    This way you can do it in linear time which is definitely better than the n² worst case complexity you have right now. It even works with more complex recon models like gradual visibility or different unit-dependent thresholds and alike!

    If you are going to do it on the CPU only: Make sure to only update the pixels which are about to change on unit movement, don't redraw the entire vision radius. This way even mobile radar units remain quite inexpensive.

    And if you choose to offload that specific task to GPU (told you GPU support on the server would come handy ;) ): Even better. Projecting onto a cube mapped sphere is as easy as it could get. All you need to do, is to perform a perspective transformation to render the delta into the recon map. First apply transformation to unit position to get location inside the texture for each side of the cube, and then apply transformation to the stencil.

    But wait: There is more!
    You can even use non-circular vision that way, everything you can express as a greyscale texture works. Well, you DO loose the option to calculate recon in 3D space, but it's not like that would be required ;)
  2. bgolus

    bgolus Uber Alumni

    Messages:
    1,481
    Likes Received:
    2,299
    Okay, there's a bit of confusion here. All modern GPUs are multithreaded on the GPU, the difference in multithreading between OpenGL, DirectX, and Mantle are in reference to how the CPU interacts with the GPU's CPU side drivers.

    For OpenGL only one CPU thread can give API calls to the OpenGL drivers for a particular "context". A "context" can be thought of as what's being rendered in a single window, though it's different than that, but that's an easy way to think about it. API calls are things like how big is the window we're rendering, what textures does it need to know about, what shaders, here's what and how to render something and what image to render it to.

    NVidia's driver itself is actually multithreaded though. It takes those commands on a single CPU process that hands them off to multiple threads to upload to the GPU. AMD and Intel's drivers are single threaded. The GPU though is always rendering "multithreaded". Everything about modern graphics is designed around the idea of being done in parallel.

    DirectX 12 and Mantle however require the CPU side GPU drivers for them are multithreaded and can accept API calls for the same context from CPU multiple threads. DirectX 11 I believe allows multiple threads issuing calls to a single context like this as well, but the design is such that it's not any faster (and might be slower?) than giving commands from a single thread.


    Now where this matters to us is the number of API calls issued to the drivers. Each call has a cost, and if every individual object on screen has a hand full of calls associated with it (render to this image, with these textures, with this shader, with this vertex data, with this style, now, etc.) it gets expensive fast, especially when you have hundreds of thousands of these and you've only just rendered one type of rock. Early optimizations @varrak worked on were on instancing, or bundling multiple calls into a single one. Our limiting ourselves to OpenGL 3.2 (now OpenGL 3.1) removes some of the more attractive options for this kind of optimization that versions of OpenGL 4 offers. The problem with OpenGL 4 is it is not well supported and some of the features we would want to use are still vendor specific extensions which we've been trying to avoid thus far.
  3. bgolus

    bgolus Uber Alumni

    Messages:
    1,481
    Likes Received:
    2,299
    The way recon is done on the client (which is just a basic visual approximation in low resolution screen space) and the server are different. The servers we use have no guaranteed GPU.

    The problem with your technique is it's optimizing for a limited number of changes. When you have >1000 units most likely a smaller percentage of those are the static structures or waiting units that don't move.

    And we already do a couple extra layers of optimization over what I described to bring down the complexity from checking every recon object against every recon object. For example we have fast "is there anything nearby" tests reject most units. Finding faster ways to do that has been an ongoing research project inside the company for several months as it's needed for just about everything; physics, ai, pathing, attack, and recon all need to know if there are things it cares about nearby. There's no magic bullet we've found that's fast for adding and updating data as well as fast for checking against, one part is almost always slow, so they've been trying to find the options that hit the best balance.
  4. exterminans

    exterminans Post Master General

    Messages:
    1,881
    Likes Received:
    986
    Yes and no.

    It's not quite fitting the current recon mechanics of PA since it was tailored towards a different recon model which removes the need for timely updates by introducing fuzziness and a timed (delayed) component to recon, thereby reducing the required update frequency, especially once the map gets crowded.
    While also introducing a different requirement: Stacking effects for recon. That would plain impossible with your current system since you couldn't just search for the nearest enemy unit and then abort, but you had to sample all units within.

    Hence the map-reduce inspired approach because I know for sure that it will scale, even though it results in the loss of precision.

    (System I was talking about? Ambigous mod concept, since I really don't like the way recon is handled in PA currently, it feels like a huge step back.)
    Last edited: June 30, 2014
  5. bgolus

    bgolus Uber Alumni

    Messages:
    1,481
    Likes Received:
    2,299
    For the purposes of client recon, knowing that there's "something approximately in this area" and "this exact thing is exactly here" isn't that much different ... and the first one is more expensive.
  6. ikickasss

    ikickasss Active Member

    Messages:
    349
    Likes Received:
    114
    it would be cool if pa had a ingame bench mark test. does anybody know of any good sites that i can do this?
  7. tatsujb

    tatsujb Post Master General

    Messages:
    12,902
    Likes Received:
    5,385
    Is this something @neutrino asked specifically? should we poll to see how many of us run sufficient hardware for OpenGL 4 and how many of us think PA justifies a hardware upgrade?

    I always find having to type Html and CSS that is compatible for older browsers really exhausting and constraining, and I feel in the case of a large scale simulated projectile RTS (the kind of thing that's bound to be top-of-the-line) compatibility-coding for old hardware to be particularly off.

    this is probably one matter that deserves debate.
    Last edited: July 1, 2014
  8. cdrkf

    cdrkf Post Master General

    Messages:
    5,721
    Likes Received:
    4,793
    The hardware isn't the problem... It's driver support on mac and linux that's the killer. Mac os simply doesn't support beyond ogl 3, linux kinda does but it's vendor specific. Then of course of pa was windows only they probably would have used dx anyhow lol....
  9. SXX

    SXX Post Master General

    Messages:
    6,896
    Likes Received:
    1,812
    You're wrong here.
    There still many GPUs that only support OpenGL 3 / D3D 10:
    • All AMD GPUs before HD5XXX
    • All Nvidia Gefore before 4XX
    • Intel Sandy Bridge (HD2000 / 3000)
    So if Uber want to use D3D11 or OpenGL 4 they have to drop to support of all of them.

    OS X 10.9 does support OpenGL 4.1:
    https://developer.apple.com/graphicsimaging/opengl/capabilities/

    PS: Also Intel OpenGL driver on WIndows don't only support OpenGL 4.2.
    And it's questionable which driver (Linux or Windows one) get OpenGL conformance first.
    Last edited: July 1, 2014
  10. fajitas23

    fajitas23 Member

    Messages:
    32
    Likes Received:
    4
    Would it be feasible to have the AI run on its own machine as a client? Or is the pathing computed on the server anyway and hence all those calls from the AI are the bottleneck?

    Thanks for the interesting info and discussion!
  11. Terrasque

    Terrasque Member

    Messages:
    49
    Likes Received:
    29
    Some more info for the curious: http://linustechtips.com/main/topic...s-mantle-from-nvidias-talk-at-steam-dev-days/

    Also, Nvidia apparently have some OpenGL extensions to better allow multithreaded coding, and AMD seem to be working on something in that direction too. So in the future OpenGL will have the same functions that Mantle and next DirectX has (from what I understand, DX11 doesn't have good multithreading).

    The problem is, as bgolus mentions, what's supported by the majority of hardware out there.
  12. tatsujb

    tatsujb Post Master General

    Messages:
    12,902
    Likes Received:
    5,385
    I say hell yeah!

    I mean common a billion units on a HD2000? what's going on here?
  13. SXX

    SXX Post Master General

    Messages:
    6,896
    Likes Received:
    1,812
    Many players don't have nigh-end hardware and devs still want to let them play game too. It's always possible to make separate OpenGL 4.4 renderer for AMD/Nvidia and Windows/Linux only. And yeah latest PTE runs pretty well already and have good client performance so at moment main bottleneck is on server-side.

    And yeah from what I see all APIs wars like D3D vs OpenGL vs Mantle only mater for consoles and players with low-end hardware. E.g AMD have slow CPU cores in their APUs so they need CPU-effective API because they always have problems with proper OpenGL implementation. In same time players with high-end $1000-2000 hardware won't really notice big difference because they have damn good FPS in most of games already and nobody going to create PC-only eye-candy content for 5% of high-end GPUs.
  14. bobucles

    bobucles Post Master General

    Messages:
    3,388
    Likes Received:
    558
    Awesome read! It's great to see some of the cool stuff happening under the hood of PA.

    Has any research been done with running the server AND client on a single PC (such as for single player)? Optimizing each executable on its own is obviously super important, but it may be useful to know how they work when they're fighting each other for the same stuff.
    lokiCML likes this.
  15. cdrkf

    cdrkf Post Master General

    Messages:
    5,721
    Likes Received:
    4,793
    I would argue given the age of HD 5000 and GTX 400 series that dropping support from prior generations isn't that unreasonable (I mean you can pick up a $50 card that will run PA handily).

    The lack of OGL support on Sandy is more of an issue as there are plenty of laptops and systems about based on those processors that won't be able to be upgraded to anything better.

    On the other side I do agree with you that the performance on the client for the PTE is plenty high enough sticking with OGL 3.1 so I agree that whilst it isn't causing a problem why lock anyone out they don't have to?
  16. tatsujb

    tatsujb Post Master General

    Messages:
    12,902
    Likes Received:
    5,385
    it's not high enough guys be reasonable, we aren't hitting promised performance.

    flowfield isn't completely back in, it's still single file. I've probably never hit more than 8K units without crashing the server and it was said this could run a billion units stable. we can't run a 6+ planet system on these bits of hardware and over 1000 radius planets is out of the question as well.

    it's not even a question of my PC the server can't run it.
  17. cptconundrum

    cptconundrum Post Master General

    Messages:
    4,186
    Likes Received:
    4,900
    The goal was a million units, and that's still not even quite what they meant. They wanted to get the engine tech to a point where it could theoretically handle a million units spread out over a whole solar system years from now when hardware catches up. When this game is released I would expect to be a couple orders of magnitude away from that goal.
    Quitch and lokiCML like this.
  18. ace63

    ace63 Post Master General

    Messages:
    1,067
    Likes Received:
    826
    Which has nothing to do with GPUs at all....
  19. tatsujb

    tatsujb Post Master General

    Messages:
    12,902
    Likes Received:
    5,385
    it was part of the conversation which transitioned to optimizations in general, read.
  20. SXX

    SXX Post Master General

    Messages:
    6,896
    Likes Received:
    1,812
    You answered me about GPUs so... :rolleyes:

    I can see 5000 Doxes on high graphics settings moving just fine without any lags on my old HD6950 and slow open source drivers and on Windows it's likely a lot better. So client rendering performance is cleanly fairly well at moment.

Share This Page