ELO vs. TRUE SKILL

Discussion in 'Planetary Annihilation General Discussion' started by tatsujb, July 8, 2013.

?

ELO or TRUE SKILL?

  1. ELO

    18 vote(s)
    29.0%
  2. TRUE SKILL

    44 vote(s)
    71.0%
  1. thepilot

    thepilot Well-Known Member

    Messages:
    744
    Likes Received:
    347
    To be clear, it's not a variation of trueskill.

    It's a variation of how trueskill is displayed to the user. It has nothing to do with how it's computed.
  2. bmb

    bmb Well-Known Member

    Messages:
    1,497
    Likes Received:
    219
    I'm pretty sure I read on B.net that it was a slower variant on the algorithm which was used for ranked. Which would seem to be consistent with the more accurate and stable matchups it gave. Whereas in social playlists you could clearly feel that a win would give you a more difficult matchup next, and a loss a more easy one.
    Not much hope of digging that thread back up at this point though.

    It also seemed to be aware that two lower skill players do not make up for a higher skill player, so that while it did mix skill levels in matchups, it did have an even number of high and low skill players in a given match. (my face when I was matched as the high skill player for my team: http://1.bp.blogspot.com/_Ixx6tGz4o...R-5M2-qKk/s400/sweaty-man-cartoon-230x300.jpg )

    I really can't praise it enough, I never had a single bad matchup in those playlists.
  3. thepilot

    thepilot Well-Known Member

    Messages:
    744
    Likes Received:
    347
    It's easy to do when you have a large playerbase, something PA won't have.
  4. bmb

    bmb Well-Known Member

    Messages:
    1,497
    Likes Received:
    219
    I'm not sure why that matters. Even at its peak of millions of players the matches were never perfect, but what it did was that it was able to represent the skill of players accurately and balance teams around that even if the total pool of players was not balanced.
    If you can't represent the skill of a player properly any kind of auto balancing is a nonstarter regardless of how many you have. So I feel you are just arguing for the sake of arguing now. You don't have a point. You are not addressing a point, you are just making a blanket statement with no basis for it for no reason.

    Additionally, the matchmaking in Halo 3 worked fine even in downtimes on unpopular playlists with only a few thousand reported online, and probably fewer actual.
    Last edited: February 9, 2014
  5. thetrophysystem

    thetrophysystem Post Master General

    Messages:
    7,050
    Likes Received:
    2,874
    I would love if there was a way for individuals to create a ladder based on a set of options (mods used included), and people can subscribe to begin playing in that ladder.

    That way, people not caring about ladders wouldn't have to see ladders at all, people playing in ladders can find one that suits them specifically, and people looking for the most "showplace" ladder can look for one with the "most people registered in it".

    Actually, then you could even choose what the ladder "ranks" based off of, including several different options such as ELO or TrueSkill or other simpler point systems.

    Sure, there would be some 5 person ladders called "just bored" and you could lead that, but you don't win anything for having accomplished that. Then again, every clan could have an in-clan ladder where it ladders only the clanmates which obviously register to that ladder. Then your clan would have it's own tracked ladder.

    Fancy.

    Obviously, there would be some global ladder created where anyone wanting to be laddered against "the most people" in a set of options deemed "most standardized" could participate in it and that would be what most people reference as the "global" ladder.

    Then, there could obviously be "timed" and "lifetime" ladders as well. Ladders that begin and end monthly and at the end of their time they just remain as a chronicle'd endscore. Perhaps the ability to use "same ladder" and start over to track new scores but list them beside old scores. That way, you could have a monthy ladder, and a lifetime ladder, and the lifetime ladder tracks the whole thing while monthly tracks by month and saves that month when starting a new month but collect all the months for you to scroll through, all in the same ladder.

    Fancy, fancy, fancy...
  6. tatsujb

    tatsujb Post Master General

    Messages:
    12,902
    Likes Received:
    5,385
    Interesting idea. I say why not though it would hurt the "official" ladders and themselves alot.
    A ladder creates a currency from nothing : ranking. it only matters to people if they see value in it, and they'll only see value in it if it's the grand majority of people who do too.

    this is why a grand playerbase (and by grand I mean majority) is essential to the ladder.

    and this is why I believe that my idea with a system with validation from either Uber or a jury of Top and biggest players is a better approach.
  7. Quitch

    Quitch Post Master General

    Messages:
    5,885
    Likes Received:
    6,045
    Uber have implemented TrueSkill before, so I'd be surprised if they didn't use that for their rating system.
  8. Shalkka

    Shalkka Active Member

    Messages:
    166
    Likes Received:
    51
    I don't understand what is the fear of using a "wrong" method of matchmaking.

    The details of what is the exact method of pairing people up will only somewhat affect your experience. The goal is the same. Yeah it might be that on week #3 you are ranked at marginally better than person B than you would if another system would be in place. But on the grand scales those even out. Who cares about the variations?

    To my understanding most of these algorythms look only at who particiapted and who won. They won't look who 6 pooled or who went for a orbital laser snipe or who "won legit". Any of the actual game details don't even enter into the equation. So the fittness for a particular game is pretty moot. And the meaning of that number is likewise limited by what it is a function of. You will beat players of lesser skill and be beaten by players of greater skill. Making that statement have a very technical definition will only serve to stray away from its spirit. Yes, using a particular system A of matchmaking ranks you better than person B, but it can be interpreted simply that B got screwed by the choice of A. So fixing a metric A won't really mean anything for your status.

    But then I realise the introduction of the metric shouldn't have mattered in the first place. It will just help you delude that you are playing for skill when you really are collecting unrelated points.
  9. Quitch

    Quitch Post Master General

    Messages:
    5,885
    Likes Received:
    6,045
    It affects the quality of your experience.

    Because they're irrelevant.

    The purpose of a ranking mechanism, and why accuracy matters, is to improve the quality of your experience. A good matchmaking system puts me against players of similar skill levels so I have close and exciting games.
    stormingkiwi and Clopse like this.
  10. Shalkka

    Shalkka Active Member

    Messages:
    166
    Likes Received:
    51
    Because they're irrelevant.

    The purpose of a ranking mechanism, and why accuracy matters, is to improve the quality of your experience. A good matchmaking system puts me against players of similar skill levels so I have close and exciting games.[/quote]

    Agreed, elaborating what I was getting after.


    Would the experience quality really drop down that much? That is also being vague as the choice of the details means what aspects of the experience is traded for other qualities. One of them is ease of aquiring games versus uncertainty of match outcome. Your experience could be ruined by waiting forever or the outcome not being surprising. One could for example commit days ahead of playing a certain number of matches scheduling each to be of very high quality. Or you could hold it dear that you can just jump into a game. Given that human being are pretty variable the gains of getting the algorythm perfect wouldn't be that big. It is also so that a good matching for you can make it harder to make a good match for someone else. The availability of suitable opponents isn't really dependent on you or the way of pairing you up.
  11. mered4

    mered4 Post Master General

    Messages:
    4,083
    Likes Received:
    3,149
    Certain people like to *game* the system for a better score and artificially boost their score.

    Now, the only way to do this in ELO is to play low-skill players in one vs one battles. You quickly gain points, and the only thing anyone knows is that you've been winning more often in the past 10 matches or so.

    A suitable ranking algorithm to help gauge a player's skill and compare it to another player's should take into account the ranking of the opponent for each match. Doing so correctly easily eliminates the issue of gaming the system - to get a higher ranking, you have to beat better people. Since playing noobs is now a worthless endeavor, you have to play people at about your skill level. As most folks can understand, trying to grind on something that is just as good as you are is nigh impossible.

    As for what you were saying:
    Experience quality for me is based on how much I am enjoying the game. I use the rankings to help gauge my skill level against other players, to see WHO I need to beat next. I've stopped using it lately because folks like Matiz have taken advantage of the ELO system and risen to the top without much effort (this isn't to begrudge him; he's a great player). In other words, the community up top is pretty darn close. I know who I can beat easily, and who I'll struggle with. I don't need the ELO or any other ranking to tell me that.

    When a tourney comes around though, i want a representative skill level. Not something I know is possibly rigged.
  12. Clopse

    Clopse Post Master General

    Messages:
    2,535
    Likes Received:
    2,865
    Well matiz is 1st on elo and second on the trueskill. I'm 2nd and 1st. I think this is quite accurate as many of the other top players just don't play any more to gauge their level.
    mered4 likes this.
  13. Quitch

    Quitch Post Master General

    Messages:
    5,885
    Likes Received:
    6,045
    Yes, the closer the game the more exciting it is. Stomping someone or being ground into a fine powder is not a fun use of twenty minutes.
  14. nightbasilisk

    nightbasilisk Active Member

    Messages:
    194
    Likes Received:
    103
    Not gonna vote since the poll essentially glorifies one or the other due to not having a "neither" or "other" option. It's like "what's the best color in the world?", options 1: "red", option 2: "another color that's close to red"

    As for ranking systems. I don't like "generalizing" systems, ie. probability curve style system. I prefer the idea of using measurements and creating divisions as well as having everything easy to understand and open. I mean in a tournament format you don't have some curve determine the winner now do you? why should ranking systems work differently. This is kind of hard to get right with words so I'll just use a more concrete system example, purely for 1v1 mind you:

    Disclaimer: this is just an example to illustrate a non probabilistic based system, not some perfect example of how to do it

    First, find all things in the game that can be accurately measured. Of the top of my head: number of metal extractors at key points in time (5min, 15min, etc), number of factories at key points in time, number of offensive units at key points in time. There would also be "control" measurements to avoid player manipulation of the system (ie. trying to abuse the divisions).

    Next we need to organize our divisions.

    The game would be organized in ladders. A ladder may represent a certain balance state (ie. balance patch) or just plain seasons (eg. 4month ladder) or just special periods (eg. 2 day weekend ladder, mini-tournament ladder played only every sunday and saturday for 6months, special hour ladder, ie. can only start games only between X and Y hours of the day). Ladders could have other special aspects to them, in a blind ladder you might not see other divisions, how many divisions there are, which division you're in (except of course for who are rank 1 to 10 on the top division). In other ladders you may only be able to see 5 players behind you and 5 in front of you and have access to replays of games won by players in front of you. etc.

    Your first 3 or so matches will only be judged by measurements of skill and not win/loss, so assuming you do good you'll end up in the correct division even if you lose all your first 3 games. This will also not be recorded in ladder records.

    Ranking is as follows: if you are in a "higher" division then you may lose every game in your division and be in constant last place, so long as you win every game against challenging players from lower divisions you maintain your division and your rank is always higher then all players in lower divisions. If you lose a game against a challenger while being in last place of the division you swap places. Obviously if you're at the top of your division you get to challenge the last place (or last 3 places) of the next division up.

    In division ranking is based on a multiplier. The multipliers work as follows. Each "ladder cycle" (which may be a day or may not be) you have a base multiplier start at 100 and thus get full "points" for wins. Each time you win against anyone the multiplier goes down by 10, each time you win against the same person you get an additional -20 penalty to your win multiplier for that person. The multiplier can not go bellow 25 (by default). Losses are counted as a deduction of points valued at half of what the winner won if his multiplier could not go bellow 50. if the ranks are more then 25% of number of player in division apart and the loser is the lower ranked player then the loser doesn't lose points. The multiplier simply ensures that no player can stay on top by simply (a) feeding on some vulnerability of a single other player he constantly gets matched with (b) simply playing 20x more then everyone else.

    Each day all players pay an upkeep in points of half of the average gain in points (not counting lost points) of all players in the division. ie. if you don't participate at all you'll eventually go down in rank based on how active the division is.

    Players may make a call for stalemate. This is not a call that's visible to the other player, you simply select stalemate and if the other player selects it the game is nulled, nobody loses or gains anything.

    In case of disconnects, the player who disconnects is counted as a lose for the disconnecting player. If the server loses communication with both players in a specified interval then the game is nulled, nobody loses or gains anything. All disconnects are logged, if one player constantly appears in disconnects too often as the victor (ie. everyone else disconnects against him) then the circumstances may lead to a ban after being inspected by a human; ie. the case where a hacker causes other people to disconnect.

    No details on the scoring are closed or secret with the expectation of your gain and loses at the end of the match, you only get your gains and loses for the cycle at the end of it; not that you're prevented from doing some of the math yourself knowing all the variables. Ranks update dynamically after every match but you only see the update after each cycle as well just to discourage potentially bad behavior such as dropping games with very low point loss.

    When player jumps division he gets points valued at half of the value of points the player with positive points currently has. All players with negative points may choose to drop a division at any time. Dropping a division is applied the same. Half positive points of the player with the lowest positive value in points.

    In matchmaking players have the option to "wait" basically waiting tells the match making to be more accurate and the player has some options of what to wait for. The player may make some very hard waits. When in wait mode the game simply minimizes and makes a call for action when a match is found. Waits are balanced by upkeep, ie. if you wait too much you may drop in points naturally next cycle (if a player plays no games he is not counted towards the average in the upkeep).

    tl;dr you measure general common sense such as expanding, and once you cant you measure overall performance based on a win to go up, lose to go down, in an "open" as close to open system with diminishing returns organized into finite leagues (may have more then 1 at once) of varying timeframes; where timeframes are based on whatever is fair for most members of the community
    Last edited: March 13, 2014
  15. thepilot

    thepilot Well-Known Member

    Messages:
    744
    Likes Received:
    347
    Of course you do. That's why you can have sport bets to begin with.

    For the rest of your post, I won't bother explaining why it's a highly biased system (ie. biased toward economy in your exemple) and why it's not a good idea, while being a overly complicated system asking for failure. (or at best, an highly convoluted way to say "player A won or lost").
    I feel I won't make change your mind about it.
    Last edited: March 13, 2014
  16. tatsujb

    tatsujb Post Master General

    Messages:
    12,902
    Likes Received:
    5,385
    Guys no need to get so heated up, If you read through the thread we kinda determined that, yes obviously ELO is absolutely not adapted to PA.

    But first of there are other alternatives than trueskill and secondly, Uber already seem to have a tendancy for trueskill since they used it in SMNC, lastly such algorythms can be reinvented an improved upon so that they can call it their "own" ranking system.

    oh and.
    not gonna lie, that statement doesn't look very good on you.

    Just because you and Matiz are good folk and don't spend your days bashing rookies doesn't mean we should encourage this behavior in any way.

Share This Page