Optimization

zihuatanejo · September 11, 2013

neutrino said: ↑

We actually have a guy who's job it is to spend all of his time optimizing. He's currently coming up to speed as he didn't write the engine but he's good at this kind of stuff.
Click to expand...

Woohoo!

neutrino · September 12, 2013

exterminans said: ↑

And how is the optimization on the netcode going?

I'm asking because they game is pretty much unplayable on a dedicated 3Mbit connection for now, with merely 1000 active(!) units in games it's already down to about 0.5-1 gamestate FPS which fit the bandwidth. Rendering still runs smooth at that point, but that doesn't matter at that point since it already feels laggish with units and projectiles merely jumping.
Click to expand...

There is still plenty of work to do on network optimization. 3Mbit is kind of a slow connection btw. Although our bandwidth target is more like 1mbit.

What happened to the promised extrapolation(!) of entity paths on the client? Right now, units will just "jump" around in unpredictable way when the bandwidth is used up and there appears to be no extrapolation whatsoever. If I would sniff the traffic, I would probably find that you only transmit location and orientation, but not movement vector, timestamp or any other attribute which would be required for that task.
Click to expand...

Nobody promised extrapolation. At most it's a possible feature but it sure wasn't promised. It's far better to put work into the overall network model than bandaid it with extrapolation.

Bottom line plenty of work and plenty of headroom to do work on the network side.

neutrino · September 12, 2013

exterminans said: ↑

"1 million units" is type of unrealistic. Not only is the number itself "a bit" exaggerated, but you will also never see that number of units at a time.
Click to expand...

Of course a single player won't see all of those units at the same time. They would be spread across a ton of planets. That's what makes the idea even feasible as a goal.

That's the target unit count for the whole simulation, spread over all planets. The idea is, to only communicate units and their properties for objects which currently are in any of your viewports and only the properties which are actually required for the current level of detail.
Click to expand...

Exactly.

You can't express a unit in a single byte though. It's more like a minimum of about 40-100 bytes (already assuming optimal compression) per unit and frame / event. So in theory, you could get at most ~50.000 unit updates per second over an 3Mbit connection, if you consider smart extrapolation you can do with ~5 updates/events per unit and second, so you SHOULD be able to display about 10,000 units at the same time across all viewports at the same time without noticeable lags.

Well, let's keep it real and say that half the number is about reasonable, that leaves us still with 5,000 units which the netcode SHOULD be able to handle. Right now, it can't even handle 1/10th.
Click to expand...

I'm not sure I agree with your numbers. Care to go into more detail? Part of the plan here is to do motion planning into the future and update individual units curve data less often. As you've pointed out a simple extrapolation could also hide some latency but I would rather do a better job delivering the data.

exterminans · September 12, 2013

neutrino said: ↑

I'm not sure I agree with your numbers. Care to go into more detail? Part of the plan here is to do motion planning into the future and update individual units curve data less often. As you've pointed out a simple extrapolation could also hide some latency but I would rather do a better job delivering the data.
Click to expand...

I assume that you can express any movement (actually offset, quaternion, momentum, angular momentum, first two extrapolated either by a cubic spline, or linear interpolation based on the later two) in at most 12 32 bit integers/floats, plus an additional 32 bit unit identifier and a single timestamp for each message block. Only absolute and complete attributes should be communicated since the connection is not reliable (eviction can occur). This eliminates possible savings from incremental messages, but also allows for the simple use of QoS priorities and eviction in the outgoing message queue. This leaves you with a maximum of about 40-60 bytes of entropy for locations, there are additional informations though for more complex units like the barrel orientation of tanks and artillery.

Overhead for padded structures can be ignored since it would be eliminated by any modern compression algorithm. Gzip isn't suited in this case since it is designed for human readable plaintext and not binary data, despite the speed.
I'm assuming the use of bzip2 as compression algorithm since the savings when compressing binary content are significantly better than with gzip. Compression would occur on a block base with a block size of 100-200kB. Fall back to weaker compression algorithms like gzip or even drop compression all together is bandwidth is sufficient to save computational overhead.

So much for the assumption about the "40-100 bytes" per unit and message.

I'm assuming that more than 5 updates per second will not make the simulation feel any more responsive since the momentum on units is limiting the response time anyway. Even for complex scenarios, no more than 5 individual events per second should be required. For messages of a single message class (e.g. movement), even 2 events per unit and second for active units might be sufficient.

As for a "better job of delivering the data", I would have used priority aging. It appears like you are currently using only QoS based on message classes, but without priority aging so same message types will be oppressed indefinitely.

This is clearly visible with missile bots, the message which would pronounce the explosion of the missile gets suppressed if to many position updates are to be send. While position updates are important, they are not top priority. Priority aging for position updates would ensure that neither message type starves while explosion events (static priority) and other animation events would have higher base priority - which is important since they are essential for the immersion with the game.

For priority aging, an additional, internal message identifier is required which classifies two messages which contain the same attribute. A not yet send message can be evicted in the out queue on the server by a message with the same identifier, the age is inherited from the unsent message in that case.

This message identifier is not send over the network except for debug purpose since it would be a waste of traffic. It's also quite hard to compress since it contains to much entropy (which is required to make the id unique).

exterminans · September 12, 2013

neutrino said: ↑

exterminans said: ↑

And how is the optimization on the netcode going?

I'm asking because they game is pretty much unplayable on a dedicated 3Mbit connection for now, with merely 1000 active(!) units in games it's already down to about 0.5-1 gamestate FPS which fit the bandwidth. Rendering still runs smooth at that point, but that doesn't matter at that point since it already feels laggish with units and projectiles merely jumping.
Click to expand...

There is still plenty of work to do on network optimization. 3Mbit is kind of a slow connection btw. Although our bandwidth target is more like 1mbit.
Click to expand...

I'm fully aware that 3mbit is quite slow, but it's still a common connection speed in Germany. You can only live 5 miles outside the next "large" city (you could just call it suburb) and you are down from 50Mbit to 2Mbit, and no cable TV either. 15 miles and you are already down to 1Mbit and nobody cares.

neutrino · September 12, 2013

exterminans said: ↑

neutrino said: ↑

I'm not sure I agree with your numbers. Care to go into more detail? Part of the plan here is to do motion planning into the future and update individual units curve data less often. As you've pointed out a simple extrapolation could also hide some latency but I would rather do a better job delivering the data.
Click to expand...

I assume that you can express any movement (actually offset, quaternion, momentum, angular momentum, first two extrapolated either by a cubic spline, or linear interpolation based on the later two) in at most 12 32 bit integers/floats, plus an additional 32 bit unit identifier and a single timestamp for each message block. Only absolute and complete attributes should be communicated since the connection is not reliable (eviction can occur). This eliminates possible savings from incremental messages, but also allows for the simple use of QoS priorities and eviction in the outgoing message queue. This leaves you with a maximum of about 40-60 bytes of entropy for locations, there are additional informations though for more complex units like the barrel orientation of tanks and artillery.
Click to expand...

I'm slammed for time today so I don't have time to write you as much stuff as this deserves.

Anyway a lot of your assumptions are wrong about how the system works. For example we don't communicate momentum at all, it's just the derivative of the position curve. The idea is for units to "plan ahead" by up to a couple of seconds so that there isn't a data point generated every "tick" for the curves which drastically can reduce the amount of data sent. This is the system TA used btw but with a formalization of the curve infrastructure.

Anyway no time but there is a lot of room for doing different things with the system to send less data, compress the data etc. However, I might also adjust my expectations as I do expect some amount of lag to be acceptable, for example in the case of rapidly switching to another location in the world.

forrestthewoods · September 12, 2013

Exterminans, I've got a *huge* blog post in the works that I plan to release when we launch beta. I go into a lot of detail on a lot of these things. If you can hang on for a little while longer I think you'll enjoy it.

Daddie · September 12, 2013

Well.. why networking isn't really my thing I wouldn't like to see some hard limit build into the game because of the limitations today. I really like a game that scales with the evolution of hardware and bandwidth. Maybe 1M units is a bit too much today, lets talk again 5 years from now! I mean in 2007 a 75 year old Swedish grandma got 40Gbps Internet connection (source: http://www.thelocal.se/7869/ )

exterminans · September 12, 2013

neutrino said: ↑

Anyway a lot of your assumptions are wrong about how the system works. For example we don't communicate momentum at all, it's just the derivative of the position curve. The idea is for units to "plan ahead" by up to a couple of seconds so that there isn't a data point generated every "tick" for the curves which drastically can reduce the amount of data sent. This is the system TA used btw but with a formalization of the curve infrastructure.
Click to expand...

I didn't try to guess how the system currently works (as I have no time to start sniffing traffic, and as I'm sure that it is pretty much unoptimized for now), but I made general assumptions on how it could be done in the most efficient way within the boundaries of the current architecture as far as I understood the surrounding systems.

Extrapolating a cubic spline from only a single attribute is certainly fine (I assume the formula you are using is basically just an approximation to one of the common spline types), but highly inaccurate since the first derivation is already inaccurate at the data point which leads to a significant drift rate. Communicating the first derivation can therefor actually decrease the required update interval since it will be accurate until actually changed on the server end.
Communicating the momentum means that you only need to send an update if the momentum changes, everything else is a simple and accurate linear extrapolation. Trying to guess the derivation will lead to oversteering on the clients end after each course correction, demanding for at least another datapoint past the relevant one, in case of cubic splines even another two. Sounds like an 1:1 trade in terms of traffic, but only if you ignore the static overhead associated with every single message.

With a suitable extrapolation you can even mask a second long lag and only very little units will be noticeably affected, to be specific: Only units which did change course or speed during that very second. Also means that you can suppress movement messages in critical situations for a while, if the screen is crowded, nobody will really notice.

Don't worry so much about the overhead so much either, momentum and angular momentum actually get quite cheap once you consider that they are often enough zero, and compression algorithms are great at eliminating constant values, thanks to Huffman encoding in Bzip, that zero might actually get reduced to a single bit only. Even for moving units momentum can be rendered into a constant by transmitting momentum transformed into object space (thus eliminating 2 axis and the 3rd axis becomes constant topspeed for moving crowds), although I'm not sure about the performance impact from the additional coordinate transformation.

neutrino · September 12, 2013

exterminans said: ↑

neutrino said: ↑

Anyway a lot of your assumptions are wrong about how the system works. For example we don't communicate momentum at all, it's just the derivative of the position curve. The idea is for units to "plan ahead" by up to a couple of seconds so that there isn't a data point generated every "tick" for the curves which drastically can reduce the amount of data sent. This is the system TA used btw but with a formalization of the curve infrastructure.
Click to expand...

I didn't try to guess how the system currently works (as I have no time to start sniffing traffic, and as I'm sure that it is pretty much unoptimized for now), but I made general assumptions on how it could be done in the most efficient way within the boundaries of the current architecture as far as I understood the surrounding systems.

Extrapolating a cubic spline from only a single attribute is certainly fine (I assume the formula you are using is basically just an approximation to one of the common spline types), but highly inaccurate since the first derivation is already inaccurate at the data point which leads to a significant drift rate. Communicating the first derivation can therefor actually decrease the required update interval since it will be accurate until actually changed on the server end.
Communicating the momentum means that you only need to send an update if the momentum changes, everything else is a simple and accurate linear extrapolation. Trying to guess the derivation will lead to oversteering on the clients end after each course correction, demanding for at least another datapoint past the relevant one, in case of cubic splines even another two. Sounds like an 1:1 trade in terms of traffic, but only if you ignore the static overhead associated with every single message.

With a suitable extrapolation you can even mask a second long lag and only very little units will be noticeably affected, to be specific: Only units which did change course or speed during that very second. Also means that you can suppress movement messages in critical situations for a while, if the screen is crowded, nobody will really notice.

Don't worry so much about the overhead so much either, momentum and angular momentum actually get quite cheap once you consider that they are often enough zero, and compression algorithms are great at eliminating constant values, thanks to Huffman encoding in Bzip, that zero might actually get reduced to a single bit only. Even for moving units momentum can be rendered into a constant by transmitting momentum transformed into object space (thus eliminating 2 axis and the 3rd axis becomes constant topspeed for moving crowds), although I'm not sure about the performance impact from the additional coordinate transformation.
Click to expand...

You are still making too many assumptions. Let's wait until Forrest posts his blog entry and pick this up again with more context.

shotogun · September 13, 2013

neutrino said: ↑

exterminans said: ↑

"1 million units" is type of unrealistic. Not only is the number itself "a bit" exaggerated, but you will also never see that number of units at a time.
Click to expand...

Of course a single player won't see all of those units at the same time. They would be spread across a ton of planets. That's what makes the idea even feasible as a goal.

That's the target unit count for the whole simulation, spread over all planets. The idea is, to only communicate units and their properties for objects which currently are in any of your viewports and only the properties which are actually required for the current level of detail.
Click to expand...

Exactly.

You can't express a unit in a single byte though. It's more like a minimum of about 40-100 bytes (already assuming optimal compression) per unit and frame / event. So in theory, you could get at most ~50.000 unit updates per second over an 3Mbit connection, if you consider smart extrapolation you can do with ~5 updates/events per unit and second, so you SHOULD be able to display about 10,000 units at the same time across all viewports at the same time without noticeable lags.

Well, let's keep it real and say that half the number is about reasonable, that leaves us still with 5,000 units which the netcode SHOULD be able to handle. Right now, it can't even handle 1/10th.
Click to expand...

I'm not sure I agree with your numbers. Care to go into more detail? Part of the plan here is to do motion planning into the future and update individual units curve data less often. As you've pointed out a simple extrapolation could also hide some latency but I would rather do a better job delivering the data.
Click to expand...

What about in the instance you simply have 1 giant planet the combined size of all the others? I mean if the game could support say fifteen planets with 1,000,000 units why couldn't you just have 1 giant planet with enough space to fit the units?

exterminans · September 13, 2013

Because you have a lot of algorithms with a quadratic or even worse runtime per planet and there are also a few calculations which are difficult to parallelize on a single planet.

Simulating 2 planets with 500.000 units each costs less resources than simulating a single planet with 1.000.000 units. In case of algorithms with quadratic runtime (lets hope there are none, thats just an example! O(n log(n)) might still occur though), this split actually safes you 50% computational overhead, multiplied with speedups from being able to calculate the planets in individual threads with no need of complex interlocks.

If you were to try that on a single, monolithic planet, you would need to cut the planet into "zones" which could be treated independent of each other, but thats nontrivial and sometimes actually impossible / only possible per approximation.

shotogun · September 13, 2013

exterminans said: ↑

Because you have a lot of algorithms with a quadratic or even worse runtime per planet and there are also a few calculations which are difficult to parallelize on a single planet.

Simulating 2 planets with 500.000 units each costs less resources than simulating a single planet with 1.000.000 units. In case of algorithms with quadratic runtime (lets hope there are none, thats just an example! O(n log(n)) might still occur though), this split actually safes you 50% computational overhead, multiplied with speedups from being able to calculate the planets in individual threads with no need of complex interlocks.

If you were to try that on a single, monolithic planet, you would need to cut the planet into "zones" which could be treated independent of each other, but thats nontrivial and sometimes actually impossible / only possible per approximation.
Click to expand...

what about a earth type planet that when monolithic each different biome acts like an independent zone?

exterminans · September 13, 2013

Like I said, thats non-trivial. Biomes are not really independent, not as soon as units start moving and firing across borders.

Once the whole planet is covered, it becomes a single, monolithic zone which must be calculated as a whole.

neutrino · September 13, 2013

Yeah it's pretty tricky to multithread a single planet. There are a bunch of schemes that we've discussed but it seems simpler to break as many things as possible into work queue items instead. Then the planet update itself can be a work queue item and smaller faster planets can share a single thread. Really my goal is to scale perf as closely as possible to surface area. We do have a long way to go before we get there.

Gorbles · September 13, 2013

Gotta say, loving the responses in this thread. Looking forward to the blog post!

thetrophysystem · September 13, 2013

neutrino said: ↑

We actually have a guy who's job it is to spend all of his time optimizing. He's currently coming up to speed as he didn't write the engine but he's good at this kind of stuff.
Click to expand...

Poor guy. His job didn't sound easy to begin with, and now he has to systematically learn how the whole engine and game were archived just so he knows where to even start looking for things to streamline on. God speed with that.

aeonsim · October 4, 2013

forrestthewoods said: ↑

Exterminans, I've got a *huge* blog post in the works that I plan to release when we launch beta. I go into a lot of detail on a lot of these things. If you can hang on for a little while longer I think you'll enjoy it.
Click to expand...

Looking forward to this log post when you get time to write/post it!

ultracommanderjim · October 6, 2013

Quick thought, Make other planets sprites when you're on a planet?

Optimization

zihuatanejo Well-Known Member

neutrino low mass particle Uber Employee

neutrino low mass particle Uber Employee

exterminans Post Master General

exterminans Post Master General

neutrino low mass particle Uber Employee

forrestthewoods Uber Alumni

Daddie Member

exterminans Post Master General

neutrino low mass particle Uber Employee

shotogun Member

exterminans Post Master General

shotogun Member

exterminans Post Master General

neutrino low mass particle Uber Employee

Gorbles Post Master General

thetrophysystem Post Master General

aeonsim Active Member

ultracommanderjim New Member

Share This Page

Optimization

zihuatanejo Well-Known Member

neutrino low mass particle Uber Employee

neutrino low mass particle Uber Employee

exterminans Post Master General

exterminans Post Master General

neutrino low mass particle Uber Employee

forrestthewoods Uber Alumni

Daddie Member

exterminans Post Master General

neutrino low mass particle Uber Employee

shotogun Member

exterminans Post Master General

shotogun Member

exterminans Post Master General

neutrino low mass particle Uber Employee

Gorbles Post Master General

thetrophysystem Post Master General

aeonsim Active Member

ultracommanderjim New Member

Share This Page

Useful Searches