Supreme commander all over again

Bhaal · July 30, 2013

neutrino said:

The fundamental core design of PA allows much greater scaling both up and down that a synchronous design like SupCom. We haven't realized the down scaling at all yet but it's going to be possible to run the game with something like a "thin client" that's been discussed in other threads.

In SupCom you were literally limited to the speed of the slowest peer which you can't get around. Client performance in PA is strictly about rendering which can be simplified and or improved in speed for that client.

Bottom line, we are setup to provide good performance we just have plenty of optimization and feature work to do before we get to that point.
Click to expand...

Its not the synchronous design thats the problem or was the problem. The problem is the bad multicore support and the resulting simulation slowdown.
I still cant believe that the simulation algorithms have changed or improved that much in the last 10 years.

mandarni · July 30, 2013

Even after longer games, when massive battles smash, I rarely use more than.. oh... say... 20% of my CPU. It is the bandwidth that is an issue, as it stands, but that can be optimized, later.

cola_colin · July 30, 2013

Bhaal said:

neutrino said:

The fundamental core design of PA allows much greater scaling both up and down that a synchronous design like SupCom. We haven't realized the down scaling at all yet but it's going to be possible to run the game with something like a "thin client" that's been discussed in other threads.

In SupCom you were literally limited to the speed of the slowest peer which you can't get around. Client performance in PA is strictly about rendering which can be simplified and or improved in speed for that client.

Bottom line, we are setup to provide good performance we just have plenty of optimization and feature work to do before we get to that point.
Click to expand...

Its not the synchronous design thats the problem or was the problem. The problem is the bad multicore support and the resulting simulation slowdown.
I still cant believe that the simulation algorithms have changed or improved that much in the last 10 years.
Click to expand...

I am pretty sure that neutrino already stated somewhere that they main reason why SupCom was unable to use more cores was because of the synchronous engine design. Keeping a massively multithreaded engine completely in sync across multiple clients is very, very hard.

thepilot · July 30, 2013

Cola_Colin said:

I am pretty sure that neutrino already stated somewhere that they main reason why SupCom was unable to use more cores was because of the synchronous engine design. Keeping a massively multithreaded engine completely in sync across multiple clients is very, very hard.
Click to expand...

I've heard rumors that current PA multithreading is per planet. I don't know if it's true, but if it is, it's not that difficult to make it synchronous.

monkeyulize · July 30, 2013

Bhaal said:

neutrino said:

The fundamental core design of PA allows much greater scaling both up and down that a synchronous design like SupCom. We haven't realized the down scaling at all yet but it's going to be possible to run the game with something like a "thin client" that's been discussed in other threads.

In SupCom you were literally limited to the speed of the slowest peer which you can't get around. Client performance in PA is strictly about rendering which can be simplified and or improved in speed for that client.

Bottom line, we are setup to provide good performance we just have plenty of optimization and feature work to do before we get to that point.
Click to expand...

Its not the synchronous design thats the problem or was the problem. The problem is the bad multicore support and the resulting simulation slowdown.
I still cant believe that the simulation algorithms have changed or improved that much in the last 10 years.
Click to expand...

So you're telling neutrino that not only does he have no idea about the supreme commander engine which he had a big part in, he also has no idea about the engine his team built from scratch?

SXX · July 30, 2013

Guys stop please! I don't want neutrino to be dead from laughting if he read this topic. :lol: :lol: :lol: :lol: :lol:

thepilot said:

I've heard rumors that current PA multithreading is per planet. I don't know if it's true, but if it is, it's not that difficult to make it synchronous.
Click to expand...

There is no calculations done on client, except planet generation.
What do you mean as "not that difficult to make it synchronous"? :shock:

Implementation of synchronous protocols are much harder than client-server protocols, because any error mean complete desync.

thepilot · July 30, 2013

sxx said:

no calculations done on client, except planet generation.
Click to expand...

I'm talking of the server sim. I've heard that it's not multithreaded at all and multithreading will come by having one thread/planet.

I would be kind of shocked if it was true as it doesn't make much sense to me. But as I'm lacking sources, I'm not entirely dismissing the idea.

sxx said:

What do you mean as "not that difficult to make it synchronous"? :shock:
Click to expand...

The main problem with multi-threading is the ordinance. You have to be sure that operation X happens before Y. But it's not a problem if Y doesn't depend of the result of X at all.
With X being on planet A and Y on planet B, there is no interaction between them, so it's not really important, meaning that you can make it synchronous (read deterministic) fairly easily.

Actually it seems that there is a fairly simple way to make multithreading while having inter-actions, but it requires some memory duplication. I'm not an expert in that domain, I'm just repeating what some devs told me.
So I'm just quoting :

As I have pointed out many times before, It is trivial to maintain determinism in a multithreaded environment. ..
Step 1: Wrap each sizable block of processing in a tasklet data structure. A unit's main update function is the perfect place for this. Loop through each unit, creating the tasklets. Ensure that no live data is touched during the update function, and that all results are cached in thread safe, unit-local structures. It will be committed to each unit later in a separate commit function.
Step 2: Split these tasklets amongst N threads. Begin executing them, and processing collision detection on the main thread using the last current unit positions while waiting for it to finish.
Step 3: When all tasklets have finished, loop through the unit list again, applying the cached update results to the live values.
Click to expand...

If you want to discuss it with them :
http://www.faforever.com/forums/viewtop ... 9&start=80 (yes, both are professional developers, they are not talking out of their asses).

sxx said:

Implementation of synchronous protocols are much harder than client-server protocols.
Click to expand...

Peer to peer and client/server protocols has nothing to do with a synchronous or asynchronous engine.

You can send the whole sim (like PA) through peer-to-peer, or you can send only command (like FA) through a client-server.

Actually, once the engine is synchronous, you can possibly do both and have the best of the two worlds (small replays, low bandwidth, chronocam and not laggy games because of a slow computer, by mixing things).

And Chronocam is possible because of how the engine is keeping the sim in memory, not because it's asynchronous or not. (it's saving the result of the sim to say it in simple terms, it doesn't matter how it happened).

Not saying it's easy, but it's possible.

And by the way, the core of the problem is not asynchronous or not, it's deterministic or not (PA, not being asynchronous, it's not deterministic, but a synchronous model is not necessarily deterministic either, hence desync).

cola_colin · July 30, 2013

thepilot said:

sxx said:

no calculations done on client, except planet generation.
Click to expand...

I'm talking of the server sim. I've heard that it's not multithreaded at all and multithreading will come by having one thread/planet.

I would be kind of shocked if it was true as it doesn't make much sense to me. But as I'm lacking sources, I'm not entirely dismissing the idea.
Click to expand...

It is basically confirmed that different planets will be in different threads, I remember that I mentioned that idea and neutrino answered me something like: "Yep, but that is only the surface" So it is quite likely that there are more threads than planets involved. It makes sense after all, as different planets are super independent from each other.
Only one thread per planet would be pretty weak considering that PA wants to be really good for multithreading servers. Even though it fits with the current (afaik) statement about servers that is: They work only single threaded currently. This could be related to one thread per planet, but I think it is caused by the fact that Uber is
a) not finished
b) trying not to use too big servers for now, which is probably true as they also limit bandwidth for the servers for now.

thepilot · July 30, 2013

Cola_Colin said:

Only one thread per planet would be pretty weak considering that PA wants to be really good for multithreading servers.
Click to expand...

Yeps, that would mean that if you create a single big planets (or actually a big planet in your system), that planet will lag the whole game (like FA on a 20x20 with 8000 units on the field. Well it depend of how many units are in battle, FA is fluid most of time in these situation on the current hardware .

I think the guy that said that the only kind of multithread is per planet was extrapolating from the current situation.

cola_colin · July 30, 2013

As I have pointed out many times before, It is trivial to maintain determinism in a multithreaded environment. ..
Step 1: Wrap each sizable block of processing in a tasklet data structure. A unit's main update function is the perfect place for this. Loop through each unit, creating the tasklets. Ensure that no live data is touched during the update function, and that all results are cached in thread safe, unit-local structures. It will be committed to each unit later in a separate commit function.
Step 2: Split these tasklets amongst N threads. Begin executing them, and processing collision detection on the main thread using the last current unit positions while waiting for it to finish.
Step 3: When all tasklets have finished, loop through the unit list again, applying the cached update results to the live values.
Click to expand...

I admit I am too lazy to go ask in the forums linked, but:
Can somebody explain to me how that is a good solution? 2 things I completely fail to understand:

Step2: How is this deterministic? The order of the calls to the update function of the differentt units will basically be random.
Also it would pretty much kill the whole performance advantage to do all collsision detection on a single thread? Isn't the whole point to split that?

If you were to put collision detection into the N threads you would have totally 0 determinism but you might get a reasonable result on a single machine, ending up with what PA seems to plan: One system that simulates without determinism and others that are only sent the game's state.

I can't follow

thepilot · July 30, 2013

You probably should ask the author himself. He may show you something interesting

cola_colin · July 30, 2013

Oh well... so let's find my login for that forums...

SXX · July 30, 2013

thepilot said:

I'm talking of the server sim. I've heard that it's not multithreaded at all and multithreading will come by having one thread/planet.
Click to expand...

First of all, I think that you talking about networking. Now about MT, I really doubt when is this idea born.

Probably it's happen when neutrino said that game servers currently run only on one processor core, which is obviously like that because it's just cheaper to host game server on one core.

thepilot said:

I would be kind of shocked if it was true as it doesn't make much sense to me. But as I'm lacking sources, I'm not entirely dismissing the idea.
Click to expand...

I get your idea, but I don't understand why somebody think that _all_ sim using in one thread, because PA sim should have many entities inside (like AI, pathfinding or ballistics simulation) which "know" completely nothing about "outside world", no reason to keep them in main thread.

As long as I understand all problems with multithreading is hard when one thread may not know what other thread doing. It's tricky moment, but I don't see that like problem with no solution.

thepilot said:

If you want to discuss it with them :
Click to expand...

Interesting read, thanks for link.

To be fair it's really hard for me to discuss such high technical things in English, I need something like 5x time to explain what I want to say. And I sure for people who reading my posts it's sometimes hard to understand what I mean.

thepilot said:

Peer to peer and client/server protocols has nothing to do with a synchronous or asynchronous engine.
Click to expand...

As I understand it: when you have P2P network architecture you can get advances of async engine. Otherwise you need P2P clients which trust to each other, when each client done it's own part of calculations, which isn't suitable for games for obvious reasons.

thepilot said:

You can send the whole sim (like PA) through peer-to-peer, or you can send only command (like FA) through a client-server.

Actually, once the engine is synchronous, you can possibly do both and have the best of the two worlds (small replays, low bandwidth, chronocam and not laggy games because of a slow computer, by mixing things).
Click to expand...

Yes you can do that, but then you lose advances of "thin client" as simple viewer. Currently PA server and client it's mostly two completely different entities which make it's easy to support.

E.g Uber can compile server and client with difference optimization levels because low optimized code usually more stable across all platforms and client doesn't doing any CPU-bound activity. When you mix server and client together it's quite harder to support.

thepilot said:

And Chronocam is possible because of how the engine is keeping the sim in memory, not because it's asynchronous or not. (it's saving the result of the sim to say it in simple terms, it doesn't matter how it happened).
Click to expand...

I understand that.

thepilot said:

And by the way, the core of the problem is not asynchronous or not, it's deterministic or not (PA, not being asynchronous, it's not deterministic, but a synchronous model is not necessarily deterministic either, hence desync).
Click to expand...

I don't really sure about that, but can non-deterministic model being cheat-free (not mean anything like map hack)? As long as I understand it's only possible when there is some control from 3rd party.

BulletMagnet · July 30, 2013

thepilot said:

sxx said:

no calculations done on client, except planet generation.
Click to expand...

I'm talking of the server sim. I've heard that it's not multithreaded at all and multithreading will come by having one thread/planet.

I would be kind of shocked if it was true as it doesn't make much sense to me. But as I'm lacking sources, I'm not entirely dismissing the idea.

sxx said:

What do you mean as "not that difficult to make it synchronous"? :shock:
Click to expand...

The main problem with multi-threading is the ordinance. You have to be sure that operation X happens before Y. But it's not a problem if Y doesn't depend of the result of X at all.
With X being on planet A and Y on planet B, there is no interaction between them, so it's not really important, meaning that you can make it synchronous (read deterministic) fairly easily.

Actually it seems that there is a fairly simple way to make multithreading while having inter-actions, but it requires some memory duplication. I'm not an expert in that domain, I'm just repeating what some devs told me.
So I'm just quoting :

As I have pointed out many times before, It is trivial to maintain determinism in a multithreaded environment. ..
Step 1: Wrap each sizable block of processing in a tasklet data structure. A unit's main update function is the perfect place for this. Loop through each unit, creating the tasklets. Ensure that no live data is touched during the update function, and that all results are cached in thread safe, unit-local structures. It will be committed to each unit later in a separate commit function.
Step 2: Split these tasklets amongst N threads. Begin executing them, and processing collision detection on the main thread using the last current unit positions while waiting for it to finish.
Step 3: When all tasklets have finished, loop through the unit list again, applying the cached update results to the live values.
Click to expand...

If you want to discuss it with them :
http://www.faforever.com/forums/viewtop ... 9&start=80 (yes, both are professional developers, they are not talking out of their asses).

sxx said:

Implementation of synchronous protocols are much harder than client-server protocols.
Click to expand...

Peer to peer and client/server protocols has nothing to do with a synchronous or asynchronous engine.

You can send the whole sim (like PA) through peer-to-peer, or you can send only command (like FA) through a client-server.

Actually, once the engine is synchronous, you can possibly do both and have the best of the two worlds (small replays, low bandwidth, chronocam and not laggy games because of a slow computer, by mixing things).

And Chronocam is possible because of how the engine is keeping the sim in memory, not because it's asynchronous or not. (it's saving the result of the sim to say it in simple terms, it doesn't matter how it happened).

Not saying it's easy, but it's possible.

And by the way, the core of the problem is not asynchronous or not, it's deterministic or not (PA, not being asynchronous, it's not deterministic, but a synchronous model is not necessarily deterministic either, hence desync).
Click to expand...

You mean in parallel, not synchronous.

thepilot · July 30, 2013

That's semantic here.
synchronous meaning "at the same time", the difference with "in parallel" is thin.

The difference is meaning even less when I/you don't define exactly what is "it".

My point is that synchronous or not is not important, deterministic is.

DeadMG · July 31, 2013

Bhaal said:

I still cant believe that the simulation algorithms have changed or improved that much in the last 10 years.
Click to expand...

They haven't, but our understanding of how to use them on multiple cores has. Ten years ago, virtually nobody had any real experience coding for multiple cores. Microsoft shipped a bunch of material on coding for the 360 which was multicore, and then had to issue completely different advice just two years later after the initial games all had massive trouble dealing with it. Hell, even the tools themselves have been completely revolutionized, from "A shitty wrapper on top of WinAPI and POSIX" to "Concurrent algorithms and collections, task-based parallelism, actor model". The compilers, languages, debuggers, all have integrated parallelism support now that they didn't before.

Not to mention general-purpose upgrades. The core language itself, C++, has had a major overhaul with C++11 introducing some new ways to get back a massive chunk of performance, like rvalue references, and shipping quite a few classes people had to write before, more mature compilers, and all that stuff.

The long and short is that absolutely, I expect the same people to do a MUCH better job today than ten years ago. Ten years is a huge time in the software business, and massively so when the intervening time had the parallelism revolution in it.

RainbowDashPwny · July 31, 2013

deadmg said:

Bhaal said:

I still cant believe that the simulation algorithms have changed or improved that much in the last 10 years.
Click to expand...

They haven't, but our understanding of how to use them on multiple cores has. Ten years ago, virtually nobody had any real experience coding for multiple cores. Microsoft shipped a bunch of material on coding for the 360 which was multicore, and then had to issue completely different advice just two years later after the initial games all had massive trouble dealing with it. Hell, even the tools themselves have been completely revolutionized, from "A **** wrapper on top of WinAPI and POSIX" to "Concurrent algorithms and collections, task-based parallelism, actor model". The compilers, languages, debuggers, all have integrated parallelism support now that they didn't before.

Not to mention general-purpose upgrades. The core language itself, C++, has had a major overhaul with C++11 introducing some new ways to get back a massive chunk of performance, like rvalue references, and shipping quite a few classes people had to write before, more mature compilers, and all that stuff.

The long and short is that absolutely, I expect the same people to do a MUCH better job today than ten years ago. Ten years is a huge time in the software business, and massively so when the intervening time had the parallelism revolution in it.
Click to expand...

+1, that is all.

carnilion · July 31, 2013

making the simulation for multicore should be possible in several ways, i think they will do it fine.

for example there is something like the in plasmaphysics used particle in cell method, where you get your area (here planet surface(s)) divided in several parts/cells (number = number of threads = cpu-cores) and then work the particles (here units, projectiles etc. ...) in every area simultaneously.
if one particle leafes the area of one cpu, it gets assigned to the other cpu. (normaly done with some overlapp in borders so there is interaction between the threads). this way each thread works only the units in its part/cell of the area/battlefield witch dont interact with units in the other cells (except for the units in the borders).
only problematic thing then is when in one area there are too mutch units (tankball of death), so this thread slows all the others down. but even then you could divide the cell of the lagging thread again to split the work between the waiting cpu's and so on.

even thou i dont think they use this specific method since its used mostly for supercomputing with thousands of cpu's, it is possible to make very good use of multiple cpu to calculate a simulation.

Supreme commander all over again

Bhaal Active Member

mandarni New Member

cola_colin Moderator Alumni

thepilot Well-Known Member

monkeyulize Active Member

SXX Post Master General

thepilot Well-Known Member

cola_colin Moderator Alumni

thepilot Well-Known Member

cola_colin Moderator Alumni

thepilot Well-Known Member

cola_colin Moderator Alumni

SXX Post Master General

BulletMagnet Post Master General

thepilot Well-Known Member

DeadMG Member

RainbowDashPwny Active Member

carnilion Member

Share This Page

Supreme commander all over again

Bhaal Active Member

mandarni New Member

cola_colin Moderator Alumni

thepilot Well-Known Member

monkeyulize Active Member

SXX Post Master General

thepilot Well-Known Member

cola_colin Moderator Alumni

thepilot Well-Known Member

cola_colin Moderator Alumni

thepilot Well-Known Member

cola_colin Moderator Alumni

SXX Post Master General

BulletMagnet Post Master General

thepilot Well-Known Member

DeadMG Member

RainbowDashPwny Active Member

carnilion Member

Share This Page

Useful Searches