Optimization wonderland extravaganza

exterminans · June 29, 2014

bgolus said: ↑

Recon isn't the only remaining thing that takes up time, but it's definitely one that causes problems for planets with trees, as well as a major bottleneck for extreme numbers of units. An average sized earth type planet might have tens if not hundreds of thousands of trees, and right now everything that has vision has to find out if they can see each one, just as each unit has to check to see if it can see each other unit, structure, or projectile.
Click to expand...

I think I can provide a solution to that specific problem:

Use a map for visibility instead of testing each unit against each other.
32bit integer map, same resolution as the flow field maps. Update maps ONLY on unit movement / activation/deactivation of radar channel and perform visibility test for a unit towards a certain player by sampling inside that map. Values >0 means that at least 1 unit of that player is seeing this location.

This way you can do it in linear time which is definitely better than the n² worst case complexity you have right now. It even works with more complex recon models like gradual visibility or different unit-dependent thresholds and alike!

If you are going to do it on the CPU only: Make sure to only update the pixels which are about to change on unit movement, don't redraw the entire vision radius. This way even mobile radar units remain quite inexpensive.

And if you choose to offload that specific task to GPU (told you GPU support on the server would come handy ): Even better. Projecting onto a cube mapped sphere is as easy as it could get. All you need to do, is to perform a perspective transformation to render the delta into the recon map. First apply transformation to unit position to get location inside the texture for each side of the cube, and then apply transformation to the stencil.

But wait: There is more!
You can even use non-circular vision that way, everything you can express as a greyscale texture works. Well, you DO loose the option to calculate recon in 3D space, but it's not like that would be required

bgolus · June 29, 2014

tatsujb said: ↑

this is where the theory of OpenGL is "just as good" and "can do all the same things as" DirectX and Mantle falls apart :/ .

I'm stocked about the multithreading, especially since I run a 6-core, 12-thread CPU. If i understand correctly though : not being able to split up the work onto multiple GPU threads means being able to split CPU tasks a slight bit less.

Is it truly impossible to add a library to OpenGL that would allow multi-threading on GPU? (I know I'm talking crazy talk here but I only suggest this because I am in the presence of the Uber, the best group of coders on the planet )
Click to expand...

Okay, there's a bit of confusion here. All modern GPUs are multithreaded on the GPU, the difference in multithreading between OpenGL, DirectX, and Mantle are in reference to how the CPU interacts with the GPU's CPU side drivers.

For OpenGL only one CPU thread can give API calls to the OpenGL drivers for a particular "context". A "context" can be thought of as what's being rendered in a single window, though it's different than that, but that's an easy way to think about it. API calls are things like how big is the window we're rendering, what textures does it need to know about, what shaders, here's what and how to render something and what image to render it to.

NVidia's driver itself is actually multithreaded though. It takes those commands on a single CPU process that hands them off to multiple threads to upload to the GPU. AMD and Intel's drivers are single threaded. The GPU though is always rendering "multithreaded". Everything about modern graphics is designed around the idea of being done in parallel.

DirectX 12 and Mantle however require the CPU side GPU drivers for them are multithreaded and can accept API calls for the same context from CPU multiple threads. DirectX 11 I believe allows multiple threads issuing calls to a single context like this as well, but the design is such that it's not any faster (and might be slower?) than giving commands from a single thread.

Now where this matters to us is the number of API calls issued to the drivers. Each call has a cost, and if every individual object on screen has a hand full of calls associated with it (render to this image, with these textures, with this shader, with this vertex data, with this style, now, etc.) it gets expensive fast, especially when you have hundreds of thousands of these and you've only just rendered one type of rock. Early optimizations @varrak worked on were on instancing, or bundling multiple calls into a single one. Our limiting ourselves to OpenGL 3.2 (now OpenGL 3.1) removes some of the more attractive options for this kind of optimization that versions of OpenGL 4 offers. The problem with OpenGL 4 is it is not well supported and some of the features we would want to use are still vendor specific extensions which we've been trying to avoid thus far.

bgolus · June 29, 2014

exterminans said: ↑

bgolus said: ↑

Recon isn't the only remaining thing that takes up time, but it's definitely one that causes problems for planets with trees, as well as a major bottleneck for extreme numbers of units. An average sized earth type planet might have tens if not hundreds of thousands of trees, and right now everything that has vision has to find out if they can see each one, just as each unit has to check to see if it can see each other unit, structure, or projectile.
Click to expand...

I think I can provide a solution to that specific problem:

Use a map for visibility instead of testing each unit against each other.
32bit integer map, same resolution as the flow field maps. Update maps ONLY on unit movement / activation/deactivation of radar channel and perform visibility test for a unit towards a certain player by sampling inside that map. Values >0 means that at least 1 unit of that player is seeing this location.

This way you can do it in linear time which is definitely better than the n² worst case complexity you have right now. It even works with more complex recon models like gradual visibility or different unit-dependent thresholds and alike!

If you are going to do it on the CPU only: Make sure to only update the pixels which are about to change on unit movement, don't redraw the entire vision radius. This way even mobile radar units remain quite inexpensive.

And if you choose to offload that specific task to GPU (told you GPU support on the server would come handy ): Even better. Projecting onto a cube mapped sphere is as easy as it could get. All you need to do, is to perform a perspective transformation to render the delta into the recon map. First apply transformation to unit position to get location inside the texture for each side of the cube, and then apply transformation to the stencil.

But wait: There is more!
You can even use non-circular vision that way, everything you can express as a greyscale texture works. Well, you DO loose the option to calculate recon in 3D space, but it's not like that would be required
Click to expand...

The way recon is done on the client (which is just a basic visual approximation in low resolution screen space) and the server are different. The servers we use have no guaranteed GPU.

The problem with your technique is it's optimizing for a limited number of changes. When you have >1000 units most likely a smaller percentage of those are the static structures or waiting units that don't move.

And we already do a couple extra layers of optimization over what I described to bring down the complexity from checking every recon object against every recon object. For example we have fast "is there anything nearby" tests reject most units. Finding faster ways to do that has been an ongoing research project inside the company for several months as it's needed for just about everything; physics, ai, pathing, attack, and recon all need to know if there are things it cares about nearby. There's no magic bullet we've found that's fast for adding and updating data as well as fast for checking against, one part is almost always slow, so they've been trying to find the options that hit the best balance.

exterminans · June 30, 2014

bgolus said: ↑

The way recon is done on the client (which is just a basic visual approximation in low resolution screen space) and the server are different. The servers we use have no guaranteed GPU.

The problem with your technique is it's optimizing for a limited number of changes. When you have >1000 units most likely a smaller percentage of those are the static structures or waiting units that don't move.
Click to expand...

Yes and no.

It's not quite fitting the current recon mechanics of PA since it was tailored towards a different recon model which removes the need for timely updates by introducing fuzziness and a timed (delayed) component to recon, thereby reducing the required update frequency, especially once the map gets crowded.
While also introducing a different requirement: Stacking effects for recon. That would plain impossible with your current system since you couldn't just search for the nearest enemy unit and then abort, but you had to sample all units within.

Hence the map-reduce inspired approach because I know for sure that it will scale, even though it results in the loss of precision.

(System I was talking about? Ambigous mod concept, since I really don't like the way recon is handled in PA currently, it feels like a huge step back.)

bgolus · June 30, 2014

For the purposes of client recon, knowing that there's "something approximately in this area" and "this exact thing is exactly here" isn't that much different ... and the first one is more expensive.

ikickasss · June 30, 2014

it would be cool if pa had a ingame bench mark test. does anybody know of any good sites that i can do this?

tatsujb · July 1, 2014

bgolus said: ↑

Our limiting ourselves to OpenGL 3.2 (now OpenGL 3.1) removes some of the more attractive options for this kind of optimization that versions of OpenGL 4 offers. The problem with OpenGL 4 is it is not well supported and some of the features we would want to use are still vendor specific extensions which we've been trying to avoid thus far.
Click to expand...

Is this something @neutrino asked specifically? should we poll to see how many of us run sufficient hardware for OpenGL 4 and how many of us think PA justifies a hardware upgrade?

I always find having to type Html and CSS that is compatible for older browsers really exhausting and constraining, and I feel in the case of a large scale simulated projectile RTS (the kind of thing that's bound to be top-of-the-line) compatibility-coding for old hardware to be particularly off.

this is probably one matter that deserves debate.

cdrkf · July 1, 2014

tatsujb said: ↑

bgolus said: ↑

Our limiting ourselves to OpenGL 3.2 (now OpenGL 3.1) removes some of the more attractive options for this kind of optimization that versions of OpenGL 4 offers. The problem with OpenGL 4 is it is not well supported and some of the features we would want to use are still vendor specific extensions which we've been trying to avoid thus far.
Click to expand...

Is this something @neutrino asked specifically? should we poll to see how many of us run sufficient hardware for OpenGL 4 and how many of us think PA justifies a hardware upgrade?

I always find having to type Html and CSS that is compatible for older browsers really exhausting and constraining, and I feel in the case of a large scale simulated projectile RTS (the kind of thing that's bound to be top-of-the-line) to be compatibility-coding for old hardware to be particularly off.

this is probably one matter that deserves debate.
Click to expand...

The hardware isn't the problem... It's driver support on mac and linux that's the killer. Mac os simply doesn't support beyond ogl 3, linux kinda does but it's vendor specific. Then of course of pa was windows only they probably would have used dx anyhow lol....

SXX · July 1, 2014

cdrkf said: ↑

The hardware isn't the problem...
Click to expand...

You're wrong here.
There still many GPUs that only support OpenGL 3 / D3D 10:

All AMD GPUs before HD5XXX

All Nvidia Gefore before 4XX

Intel Sandy Bridge (HD2000 / 3000)

So if Uber want to use D3D11 or OpenGL 4 they have to drop to support of all of them.

cdrkf said: ↑

Mac os simply doesn't support beyond ogl 3, linux kinda does but it's vendor specific.
Click to expand...

OS X 10.9 does support OpenGL 4.1:
https://developer.apple.com/graphicsimaging/opengl/capabilities/

PS: Also Intel OpenGL driver on WIndows don't only support OpenGL 4.2.
And it's questionable which driver (Linux or Windows one) get OpenGL conformance first.

fajitas23 · July 1, 2014

bgolus said: ↑

A game without any AI can handle way more units than a game with AI. No hard numbers, but someone in the office recently equated the AI to "several average players' worth of units by mid game", which is an exceedingly nebulous number if there ever was one.
Click to expand...

Would it be feasible to have the AI run on its own machine as a client? Or is the pathing computed on the server anyway and hence all those calls from the AI are the bottleneck?

Thanks for the interesting info and discussion!

Terrasque · July 1, 2014

bgolus said: ↑

Now where this matters to us is the number of API calls issued to the drivers. Each call has a cost, and if every individual object on screen has a hand full of calls associated with it (render to this image, with these textures, with this shader, with this vertex data, with this style, now, etc.) it gets expensive fast, especially when you have hundreds of thousands of these and you've only just rendered one type of rock.

Early optimizations @varrak worked on were on instancing, or bundling multiple calls into a single one. Our limiting ourselves to OpenGL 3.2 (now OpenGL 3.1) removes some of the more attractive options for this kind of optimization that versions of OpenGL 4 offers.

The problem with OpenGL 4 is it is not well supported and some of the features we would want to use are still vendor specific extensions which we've been trying to avoid thus far.
Click to expand...

Some more info for the curious: http://linustechtips.com/main/topic...s-mantle-from-nvidias-talk-at-steam-dev-days/

Also, Nvidia apparently have some OpenGL extensions to better allow multithreaded coding, and AMD seem to be working on something in that direction too. So in the future OpenGL will have the same functions that Mantle and next DirectX has (from what I understand, DX11 doesn't have good multithreading).

The problem is, as bgolus mentions, what's supported by the majority of hardware out there.

tatsujb · July 1, 2014

SXX said: ↑

You're wrong here.
There still many GPUs that only support OpenGL 3 / D3D 10:

All AMD GPUs before HD5XXX

All Nvidia Gefore before 4XX

Intel Sandy Bridge (HD2000 / 3000)

So if Uber want to use D3D11 or OpenGL 4 they have to drop to support of all of them.
Click to expand...

I say hell yeah!

I mean common a billion units on a HD2000? what's going on here?

SXX · July 1, 2014

tatsujb said: ↑

I mean common a billion units on a HD2000? what's going on here?
Click to expand...

Many players don't have nigh-end hardware and devs still want to let them play game too. It's always possible to make separate OpenGL 4.4 renderer for AMD/Nvidia and Windows/Linux only. And yeah latest PTE runs pretty well already and have good client performance so at moment main bottleneck is on server-side.

And yeah from what I see all APIs wars like D3D vs OpenGL vs Mantle only mater for consoles and players with low-end hardware. E.g AMD have slow CPU cores in their APUs so they need CPU-effective API because they always have problems with proper OpenGL implementation. In same time players with high-end $1000-2000 hardware won't really notice big difference because they have damn good FPS in most of games already and nobody going to create PC-only eye-candy content for 5% of high-end GPUs.

bobucles · July 1, 2014

Awesome read! It's great to see some of the cool stuff happening under the hood of PA.

Has any research been done with running the server AND client on a single PC (such as for single player)? Optimizing each executable on its own is obviously super important, but it may be useful to know how they work when they're fighting each other for the same stuff.

cdrkf · July 1, 2014

SXX said: ↑

cdrkf said: ↑

The hardware isn't the problem...
Click to expand...

You're wrong here.
There still many GPUs that only support OpenGL 3 / D3D 10:

All AMD GPUs before HD5XXX

All Nvidia Gefore before 4XX

Intel Sandy Bridge (HD2000 / 3000)

So if Uber want to use D3D11 or OpenGL 4 they have to drop to support of all of them.

cdrkf said: ↑

Mac os simply doesn't support beyond ogl 3, linux kinda does but it's vendor specific.
Click to expand...

OS X 10.9 does support OpenGL 4.1:
https://developer.apple.com/graphicsimaging/opengl/capabilities/

PS: Also Intel OpenGL driver on WIndows don't only support OpenGL 4.2.
And it's questionable which driver (Linux or Windows one) get OpenGL conformance first.
Click to expand...

I would argue given the age of HD 5000 and GTX 400 series that dropping support from prior generations isn't that unreasonable (I mean you can pick up a $50 card that will run PA handily).

The lack of OGL support on Sandy is more of an issue as there are plenty of laptops and systems about based on those processors that won't be able to be upgraded to anything better.

On the other side I do agree with you that the performance on the client for the PTE is plenty high enough sticking with OGL 3.1 so I agree that whilst it isn't causing a problem why lock anyone out they don't have to?

tatsujb · July 1, 2014

it's not high enough guys be reasonable, we aren't hitting promised performance.

flowfield isn't completely back in, it's still single file. I've probably never hit more than 8K units without crashing the server and it was said this could run a billion units stable. we can't run a 6+ planet system on these bits of hardware and over 1000 radius planets is out of the question as well.

it's not even a question of my PC the server can't run it.

cptconundrum · July 1, 2014

tatsujb said: ↑

it's not high enough guys be reasonable, we aren't hitting promised performance.

flowfield isn't completely back in, it's still single file. I've probably never hit more than 8K units without crashing the server and it was said this could run a billion units stable. we can't run a 6+ planet system on these bits of hardware and over 1000 radius planets is out of the question as well.

it's not even a question of my PC the server can't run it.
Click to expand...

The goal was a million units, and that's still not even quite what they meant. They wanted to get the engine tech to a point where it could theoretically handle a million units spread out over a whole solar system years from now when hardware catches up. When this game is released I would expect to be a couple orders of magnitude away from that goal.

ace63 · July 1, 2014

tatsujb said: ↑

it's not high enough guys be reasonable, we aren't hitting promised performance.

flowfield isn't completely back in, it's still single file. I've probably never hit more than 8K units without crashing the server and it was said this could run a billion units stable. we can't run a 6+ planet system on these bits of hardware and over 1000 radius planets is out of the question as well.

it's not even a question of my PC the server can't run it.
Click to expand...

Which has nothing to do with GPUs at all....

tatsujb · July 1, 2014

ace63 said: ↑

Which has nothing to do with GPUs at all....
Click to expand...

it was part of the conversation which transitioned to optimizations in general, read.

SXX · July 1, 2014

tatsujb said: ↑

it was part of the conversation which transitioned to optimizations in general, read.
Click to expand...

You answered me about GPUs so...

I can see 5000 Doxes on high graphics settings moving just fine without any lags on my old HD6950 and slow open source drivers and on Windows it's likely a lot better. So client rendering performance is cleanly fairly well at moment.

Optimization wonderland extravaganza

exterminans Post Master General

bgolus Uber Alumni

bgolus Uber Alumni

exterminans Post Master General

bgolus Uber Alumni

ikickasss Active Member

tatsujb Post Master General

cdrkf Post Master General

SXX Post Master General

fajitas23 Member

Terrasque Member

tatsujb Post Master General

SXX Post Master General

bobucles Post Master General

cdrkf Post Master General

tatsujb Post Master General

cptconundrum Post Master General

ace63 Post Master General

tatsujb Post Master General

SXX Post Master General

Share This Page

Optimization wonderland extravaganza

exterminans Post Master General

bgolus Uber Alumni

bgolus Uber Alumni

exterminans Post Master General

bgolus Uber Alumni

ikickasss Active Member

tatsujb Post Master General

cdrkf Post Master General

SXX Post Master General

fajitas23 Member

Terrasque Member

tatsujb Post Master General

SXX Post Master General

bobucles Post Master General

cdrkf Post Master General

tatsujb Post Master General

cptconundrum Post Master General

ace63 Post Master General

tatsujb Post Master General

SXX Post Master General

Share This Page

Useful Searches