Dual xeon

mcheung63 · July 20, 2015

Does this game optimized for dual xeon over 20 cores CPUs?
thanks
from Peter

websterx01 · July 20, 2015

No. The game is not designed to actively scale across such a massive number of cores for client side stuff. The server side stuff can use it more effectively, but operations per second matters more than anything else. @mikeyh @cola_colin have done a HUGE amount of data gathering about server side stuff, so they should be able to expand.

cola_colin · July 20, 2015

Two short facts:
- the highest CPU load from a PA server I ever saw used 8 cpu cores on a machine that a lot more
- the simulation is basically a single thread, so while more cores can make sure that that thread has a full core for itself they can't do any more than that.

doud · July 21, 2015

cola_colin said: ↑

Two short facts:
- the highest CPU load from a PA server I ever saw used 8 cpu cores on a machine that a lot more
- the simulation is basically a single thread, so while more cores can make sure that that thread has a full core for itself they can't do any more than that.
Click to expand...

As far as i remember @neutrino original plan was to have one thread per planet. Hopefully this is something which will come later though i can imagine that's à lot of stuff. But PA scalability really relies on this change. Considering uber has been providing regular and non cosmetic improvements since one year (yes !!!) i guess it's not irrelevant to expect this to happen eventually.

cola_colin · July 21, 2015

doud said: ↑

cola_colin said: ↑

Two short facts:
- the highest CPU load from a PA server I ever saw used 8 cpu cores on a machine that a lot more
- the simulation is basically a single thread, so while more cores can make sure that that thread has a full core for itself they can't do any more than that.
Click to expand...

As far as i remember @neutrino original plan was to have one thread per planet.
Click to expand...

That's a legend that the forums came up with, but it doesn't seem to actually be that way and I think it is unlikely that anything in this area will change any time soon. Basically I am pretty sure that we're talking of multiple month of work to make it work really well and without tons of crashes and you would not have any other changes in that time. I can imagine a lot of other things that pay back more, especially since it is only a matter of years until new CPUs will be considerably faster per core. The speed at which CPUs get faster may have slowed down, but I am pretty sure CPUs will continue to get faster per core for quite some time.

mcheung63 · July 21, 2015

So what is the max number of core that PA supports?

cdrkf · July 21, 2015

mcheung63 said: ↑

So what is the max number of core that PA supports?
Click to expand...

8 for server, 4 for client, so 12 total if your hosting and playing on the same machine.

doud · July 21, 2015

cola_colin said: ↑

doud said: ↑

cola_colin said: ↑

Two short facts:
- the highest CPU load from a PA server I ever saw used 8 cpu cores on a machine that a lot more
- the simulation is basically a single thread, so while more cores can make sure that that thread has a full core for itself they can't do any more than that.
Click to expand...

As far as i remember @neutrino original plan was to have one thread per planet.
Click to expand...

That's a legend that the forums came up with, but it doesn't seem to actually be that way and I think it is unlikely that anything in this area will change any time soon. Basically I am pretty sure that we're talking of multiple month of work to make it work really well and without tons of crashes and you would not have any other changes in that time. I can imagine a lot of other things that pay back more, especially since it is only a matter of years until new CPUs will be considerably faster per core. The speed at which CPUs get faster may have slowed down, but I am pretty sure CPUs will continue to get faster per core for quite some time.
Click to expand...

I'm pretty sure it's not à Legend, i remember neutrino talking about it during alpha/beta, otherwise how scaling could be achieved ? Scaling up with faster single core is not the current trend. The current trend is all about scaling out with more core/threads per socket. Plus it will become more and more difficult to increase CPU frequency from a pure physics perspective. Not spreading the sim accross different cores (1 thread per planet and dealing with units switching from one planet to another planet) is going to seriously limit PA scalability.

cola_colin · July 21, 2015

doud said: ↑

Not spreading the sim accross different cores (1 thread per planet and dealing with units switching from one planet to another planet) is going to seriously limit PA scalability.
Click to expand...

Yep. But writing fully scalable stuff is really hard and was cut due to time constraints. Sad reality.
And physical limits might be a thing but I am sure per core speed will go up more as well in the future by a considerable margin.

I'd love for PA to be multithreaded. But it is probably unlikely Uber ever had or will have the resources to write/rewrite PA's simulation to "perfectly" handle many core systems.
As much as the trend is to have more cores, the reality is that just pushes the burden onto the software developers more. The cpu manufacturers cant go up by an order of magnitude per core that easily anymore, so they make more cores. But writing complex software like PA in a way that fully utilizes that many cores is _hard_

doud · July 22, 2015

cola_colin said: ↑

doud said: ↑

Not spreading the sim accross different cores (1 thread per planet and dealing with units switching from one planet to another planet) is going to seriously limit PA scalability.
Click to expand...

Yep. But writing fully scalable stuff is really hard and was cut due to time constraints. Sad reality.
And physical limits might be a thing but I am sure per core speed will go up more as well in the future by a considerable margin.

I'd love for PA to be multithreaded. But it is probably unlikely Uber ever had or will have the resources to write/rewrite PA's simulation to "perfectly" handle many core systems.
As much as the trend is to have more cores, the reality is that just pushes the burden onto the software developers more. The cpu manufacturers cant go up by an order of magnitude per core that easily anymore, so they make more cores. But writing complex software like PA in a way that fully utilizes that many cores is _hard_
Click to expand...

Yep i know writing a full multi threaded simulation is really tough. However as far as i understood neutrino idea was originally to have one simulation thread per planet, which from a very simplified perspective may not change things that much. It's pretty much différent from having N threads randomly handling P units/projectiles without considering the planet they belong to. This very last approach is extremly hard but as far as i know not impossible. Now coming back to the 1 planet / 1 thread / 1 sim, i found the idea à very good compromise. If you put away unit transitions from one planet to another planet, it's pretty much the same engine running N times in parallel (provided there are no interactions between the planets). The difficult stuff comes when a unit has to be switched from one planet (one sim) to another planet (another sim). But i see this as something much easier than rewriting the entire sim to make it multi threaded. This will not remove the limitation of one thread per planet and thus the number of units that can be handled on a single planet. Obviously it will also be necessary to keep the N sims synchonized and the slowest one will also limit the speed of the faster sims. But it's à good trade off. I'm not an expert so i can't figure out all the technical challenges this solution would bring. However i can't imagine @neutrino did not have a plan to mitigate the scalability issue, especially when hé was talking about massive systems with dozens of players.

cola_colin · July 22, 2015

Yep one thread per planet sounds reasonable from an outside perspective and I am sure that if we ever are to see multithreading in the sim it is going to be like that. However the fact that we don't have it shows there has to be something that spoke against it ,and may it be a simple "we only have 2 years to get this out of the door... do it quick", and that probably has consequences for how the code looks like that make it hard to incorporate it now. I am all for it, but based on what Uber has done about it so far it probably is a not so easy as it seems due to the reality of PA's code base.

cdrkf · July 22, 2015

cola_colin said: ↑

Yep one thread per planet sounds reasonable from an outside perspective and I am sure that if we ever are to see multi-threading in the sim it is going to be like that. However the fact that we don't have it shows there has to be something that spoke against it ,and may it be a simple "we only have 2 years to get this out of the door... do it quick", and that probably has consequences for how the code looks like that make it hard to incorporate it now. I am all for it, but based on what Uber has done about it so far it probably is a not so easy as it seems due to the reality of PA's code base.
Click to expand...

Yeah I remember the '1 thread per planet' concept as well and it's something I'd fully support if / when the opportunity comes up, although I don't see PA in it's current form getting it (maybe if Uber decide to raise funds for a sequel / expansion it's something they could revisit when they have additional funds).

What I would say though in Uber's defence is they have actually multi-threaded quite a lot of the game. The sim gets a lot of attention due to the fact that they've sufficiently optimised the rest of it to force the sim to be the bottleneck. Lets be fair though, PA uses a good 4+ cores for client side rendering, network and UI (coheren't can use *a lot* of threads, though they're all light enough not to need more than about 1 core). The server is also multi threaded, it's just that the simulation itself is limited to 1 thread (whilst everything else is moved off into other threads to free up cycles for the sim thread), to the point that as @cola_colin mentioned the server can use up to about 8 threads.

That really isn't so bad for multi thread support when so many AAA games released are pretty much restricted to 2 threads, with the better ones utilising a grand total of 4.

doud · July 22, 2015

Y

cola_colin said: ↑

Yep one thread per planet sounds reasonable from an outside perspective and I am sure that if we ever are to see multithreading in the sim it is going to be like that. However the fact that we don't have it shows there has to be something that spoke against it ,and may it be a simple "we only have 2 years to get this out of the door... do it quick", and that probably has consequences for how the code looks like that make it hard to incorporate it now. I am all for it, but based on what Uber has done about it so far it probably is a not so easy as it seems due to the reality of PA's code base.
Click to expand...

You're absolutly right. Actually since @neutrino had an original vision of massive battles across multiple planets i really thought the sim thread per planet was considered as a core fondation for the sim engine. But as you mention it and considering the iterative process uber has always used i can understand multi threading the sim could only be considered as a very big next step (at least not mandatory for original release). And i understand that working on this is really questionable from à business perspective. You're absolutly right when you say that there must be à good reason why thread per planet has not been implémented. As far as i remember one of the most touchy problem was making units transition from one sim to another sim. Only time will tell, but like any other RTS killer i really would like uber taking PA to the next dimention. Because PA really deserves it. If it were me i would put again 250 bucks for à new PA kickstarter.

devoh · July 22, 2015

cola_colin said: ↑

doud said: ↑

Not spreading the sim accross different cores (1 thread per planet and dealing with units switching from one planet to another planet) is going to seriously limit PA scalability.
Click to expand...

Yep. But writing fully scalable stuff is really hard and was cut due to time constraints. Sad reality.
And physical limits might be a thing but I am sure per core speed will go up more as well in the future by a considerable margin.

I'd love for PA to be multithreaded. But it is probably unlikely Uber ever had or will have the resources to write/rewrite PA's simulation to "perfectly" handle many core systems.
As much as the trend is to have more cores, the reality is that just pushes the burden onto the software developers more. The cpu manufacturers cant go up by an order of magnitude per core that easily anymore, so they make more cores. But writing complex software like PA in a way that fully utilizes that many cores is _hard_
Click to expand...

I'd also like to see a second kickstarter for performance improvements like this.. I'm sure this would have VERY mixed popularity.. but I wouldn't mind putting a few more $ to see it happen. I really want to see the development and improvement of PA continue for as long as possible.. I still think this game has so much untapped potential.

exterminans · July 22, 2015

I wonder what would happen, if the PA server executable would be compiled with the flags "-fopenmp" and "-D_GLIBCXX_PARALLEL".

Either it crashes horribly, because of side effects, or it does nothing because stdlib features havn't been used consequently (curse you for reinventing the wheel!), or it leads to some unexpected performance gains, due to the compiler now automagically parallelizing many algorithms, mostly when they are scheduled using stdlib features.

But judging at least by what I've seen from Sorians part of the code base, chances are pretty good that it will not only work without exposing too many bugs, but also improve scalability in large parts.

https://gcc.gnu.org/onlinedocs/libs..._using.html#parallel_mode.using.parallel_mode

DeathByDenim · July 22, 2015

exterminans said: ↑

I wonder what would happen, if the PA server executable would be compiled with the flags "-fopenmp" and "-D_GLIBCXX_PARALLEL".

Either it crashes horribly, because of side effects, or it does nothing because stdlib features havn't been used consequently (curse you for reinventing the wheel!), or it leads to some unexpected performance gains, due to the compiler now automagically parallelizing many algorithms, mostly when they are scheduled using stdlib features.

https://gcc.gnu.org/onlinedocs/libstdc /manual/parallel_mode_using.html
Click to expand...

Don't you need need #pragma omp statements all over the place to get anything from -fopenmp?

exterminans · July 22, 2015

DeathByDenim said: ↑

Don't you need need #pragma omp statements all over the place to get anything from -fopenmp?
Click to expand...

In theory - yes. But "-D_GLIBCXX_PARALLEL" enables the use of already existing #pragma omp statements inside the stdlib, which are just inactive by default.

That's not going to fully parallelize your application at a high level - simply because the control flow isn't managed by the stdlib, and it would also be risky to attempt that - but it's enable parallelization for many of the more performance hungry border cases.

There's only a little catch, if you are using e.g. std::for_each, but your functor isn't free of side effects (shouldn't really happen, but who knows), you will still have to spray #pragma omp critical around the offending section, despite your code not being OpenMP aware in general. (Just don't forget to name critical sections when using OpenMP, or it will use a single semaphore for all unnamed critical sections!)

And parallelizing even further at an higher level later on is still an option, since OpenMP is perfectly happy if you choose to attempt parallelization inside a nested function, but you have already depleted the thread pool at a much higher level. Either way, OpenMP will use no less no more threads than provided in the thread pool, and will just execute the inner function sequential instead.

jomiz · July 22, 2015

Port simulation part to Rust and multithreading will be easy, pure win. /s

exterminans · July 22, 2015

jomiz said: ↑

Port simulation part to Rust and multithreading will be easy, pure win. /s
Click to expand...

You wish...

Rust has the benefit that claiming whatever variable gives you an implicit semaphore on it, so if you were to fetch the base reference to an index structure or alike, you would be guaranteed exclusive access.

Unfortunately, that is not what you want. Rust is rather safe in these terms, but it's also restrictive. You can only afford locks on such a level when you are not already CPU bound in these critical sections.

Otherwise you are facing a simple problem:
You are still bound by the single thread performance, as two threads must not enter the critical section at once. More fine grained semaphores which are e.g. only locking dirty data aren't supported by rust by default, and so optimization actually becomes much, much harder.

You are essentially back to manually defining named semaphores to ensure concurrent access to complex data structures. Or even less complex ones, such as simple tree structures and spatial databases. But in that case you have the additional obstacle of bypassing the limitations which Rust just imposed onto you, because concurrent and seemingly unsafe access is exactly what you need to scale any further.

Well, I can think of ways to do it in Rust, and still achieve a decent level of parallelism. But that requires careful redesign of all data structures - so "port the simulation" is actually far more work than it sounds, and even just switching the language sounded like an awful lot!
If you were to apply the necessary transformations to the data model, you could as well just achieve the same level of security in C++, by adhering to the same guidelines.

Oh, and don't forget the memory overhead of Rust, caused by implicit semaphores, reference counters, and whatever else Rust is storing in addition to the primitive data. Doesn't sound much, but it could easily double PAs memory consumption.

jomiz · July 22, 2015

exterminans said: ↑

jomiz said: ↑

Port simulation part to Rust and multithreading will be easy, pure win. /s
Click to expand...

You wish...

Rust has the benefit that claiming whatever variable gives you an implicit semaphore on it, so if you were to fetch the base reference to an index structure or alike, you would be guaranteed exclusive access.

Unfortunately, that is not what you want. Rust is rather safe in these terms, but it's also restrictive. You can only afford locks on such a level when you are not already CPU bound in these critical sections.

Otherwise you are facing a simple problem:
You are still bound by the single thread performance, as two threads must not enter the critical section at once. More fine grained semaphores which are e.g. only locking dirty data aren't supported by rust by default, and so optimization actually becomes much, much harder.

You are essentially back to manually defining named semaphores to ensure concurrent access to complex data structures. Or even less complex ones, such as simple tree structures and spatial databases. But in that case you have the additional obstacle of bypassing the limitations which Rust just imposed onto you, because concurrent and seemingly unsafe access is exactly what you need to scale any further.

Well, I can think of ways to do it in Rust, and still achieve a decent level of parallelism. But that requires careful redesign of all data structures - so "port the simulation" is actually far more work than it sounds, and even just switching the language sounded like an awful lot!
If you were to apply the necessary transformations to the data model, you could as well just achieve the same level of security in C++, by adhering to the same guidelines.

Oh, and don't forget the memory overhead of Rust, caused by implicit semaphores, reference counters, and whatever else Rust is storing in addition to the primitive data. Doesn't sound much, but it could easily double PAs memory consumption.
Click to expand...

I think you missed the "/s".
edit: But still, thanks for the explanation.

Dual xeon

mcheung63 New Member

websterx01 Post Master General

cola_colin Moderator Alumni

doud Well-Known Member

cola_colin Moderator Alumni

mcheung63 New Member

cdrkf Post Master General

doud Well-Known Member

cola_colin Moderator Alumni

doud Well-Known Member

cola_colin Moderator Alumni

cdrkf Post Master General

doud Well-Known Member

devoh Well-Known Member

exterminans Post Master General

DeathByDenim Post Master General

exterminans Post Master General

jomiz Active Member

exterminans Post Master General

jomiz Active Member

Share This Page

Dual xeon

mcheung63 New Member

websterx01 Post Master General

cola_colin Moderator Alumni

doud Well-Known Member

cola_colin Moderator Alumni

mcheung63 New Member

cdrkf Post Master General

doud Well-Known Member

cola_colin Moderator Alumni

doud Well-Known Member

cola_colin Moderator Alumni

cdrkf Post Master General

doud Well-Known Member

devoh Well-Known Member

exterminans Post Master General

DeathByDenim Post Master General

exterminans Post Master General

jomiz Active Member

exterminans Post Master General

jomiz Active Member

Share This Page

Useful Searches