- 1 WN7 Summary
- 2 Key points of the WN Rating are:
- 3 Common Misconceptions
- 4 Limitations and problems
- 5 Implications of these limitations
- 6 Back to the formula
- 7 Winrate
- 8 The Formula
- 9 Scale
Written by Neverwish, Crabeatoff and Praetor77
Work on WN7 is a community effort. I consider everyone who has posted in this forum to have contributed their two cents into making WN what it is today. However, I would like to highlight the contribution of the people who have dedicated the most time and effort into making WN:
Key contributors: Tpapp157, Neatoman, Syndicate, Maokai, Makaze2048, Guerdon, DracoArgentum, Crabeatoff, TheKilltech, etc. etc.
The WN rating was created using statistical analysis tools like correlation studies and evolutionary algorythms to create an accurate formula, using Win Rate as a proxy to accurately determine the weight each stat would have in the formula. The idea was to create a formula which actually tried to measure player skill in the most accurate way possible, using global account stats. The Efficiency formula was the basis for an analysis to figure out what was wrong with it and create an improved formula with those problems fixed. WN is short for Weighed and Normalized, and implements various ways to deal with statpadding, and in the end tried to develop a metric that could only be padded by actually being good at the game.
Key points of the WN Rating are:
- Damage is scaled according to your average tier and is the most important stat in the formula. The points you get for damage are carefully tied to the avg tier played, so that players with avg tier played 6 or 9 could be accurately compared, despite having very different average damage. To do this, average damage for tiers were collected from vbaddict.net and the data was carefully fitted to a non-linear curve.
- Players with a considerable number of battles who have an average tier lower than 4 are heavily penalized for sealclubbing. New players with few battles under their belts are not penalized until they achieve a big number of battles but remain with a low average tier played.
- Cap points are not counted towards your rating, since despite HUGE efforts and statistical analysis mainly performed by Syndicate, there was no statistically sound way to include cap points into the formula. The data suggests that for the average player cap points which are actually useful in winning a game for your team are drowned out by the huge amount of useless cap points gathered at the end of already won games.
- Winrate is used as a proxy to measure intangible stats which are not available on the player profile like spotted damage, cap used to lure the enemy out, stopping scouts from killing your arty, tracking enemies at crucial moments, keeping your teammates alive, map awareness and other crucial decisions not recorded in the stats. This term of the forumla counts for 0-10% of final WN rating.
- Average Defense points is capped at 2.2 to prevent padding. Defense also proved to be highly correlated to winrate, suggesting players who have map awareness and return to base when needed to stop enemy capping win more often.
- One of the most important characteristics of the WN Rating is the open development format, meaning any player can post in this thread and suggest modifications, which will be tested and, if successful, implemented. By having this open development model, it effectively eliminates biases which closed formulas such as Efficiency have.
- After the rating was released, WoTLabs was the first website to implement it, keeping it up to date every time a new version is launched. Although many people dislike the fast evolution of the rating (having gone through several changes and versions in only 4 months), this means that the formula rapidly grows more accurate. It has spread to the point where the WN Rating is now the standard rating used on the XVM mod, although transformed into a 0-99 scale rating.
Despite having been based on advanced algorithms, the WN Rating did not pass without heavy criticism, although most, if not all, of this criticism turned out to be misconceptions.
One common complaint was that, if it was made to correlate with win rate, then we could just use win rate. Unfortunately win rate can be easily padded by platooning with good players. The WN Rating separates those padded players by using their actual stats. A veteran player with a low WN Rating but a high win rate has been probably been heavily padded.
Another common criticism is that we should stop caring about statistics and just play the game, since statistics eventually lead to mockery. Unfortunately World of Tanks is a competitive game, and as in all competitive games, there must be a measuring stick in order to know if you are improving and how much you are improving, otherwise we might as well go play Farmville. Shooting tanks for the heck of it gets boring after a while. The idea behind WN is to use it as a tool to make sure you keep improving, and also as a wealth of information when used in XVM to help you make critical decisions based on the skill of your teammates and enemies.
Other players criticize the formula for not taking things like spotting damage into account. This can hardly be pinned on the WN Rating, since Wargaming has not released this information publicly. The WN Rating can only work with what it has available. To account for those invisible stats that help win games, Win Rate has been added to the formula.
Lastly, many players criticized using winrate in the formula, since the very same winrate was used as a proxy to weigh the other stats in the evolutionary formula. However, as posted above, the idea was not to reward winrate, but to use it as a proxy for intangible stats. Also, the reward for these "intangibles" are carefully tailored by Neatoman into an S-curve with diminishing returns for winrates above 60%, where correlation to stats drop significantly, suggesting winrates above this value are purely due to platooning and companies. Also supporting this data is the fact that Zakaladas (quite possibly NA server´s best player) almost always plays solo and averages 64% wins.
Limitations and problems
Formulas can only be created from stats that are made available via the official WoT website. Efficiency depends on those same stats. Everything WoT-news computes is off those same stats. More information on YOUR history is available from the cached dossier file, but unless everyone starts mass uploading those (which will never happen), then the official website stats are THE source for data.
What isn't in WoT website stats?
- Normalized Experience (XP) - theoretically WG could keep track of experience based on whether a user had a premium account or not, and then either remove the premium bonus OR give all standard account users the bonus (for stats purposes) to normalize XP across users.
- Damage Upon Detection - Damage done to targets you are spotting yourself, by tanks who are not spotting them themselves. The latter is the bread and butter of light tanks (LTs) and of front line fighters. The other bonuses are relatively small compared to the latter. But it is the most noticeably missing in all rating calculations, and it particularly screws over LTs.
- Per tank anything - the website cannot tell you damage per tank, spots per tank, etc. This information lives in your dossier and somewhere on the WG servers. If you use a dossier parsing tool (there are several web based and one local), you can obtain this information on your per tank performances. There is something called an API which gives you this kind of information, but currently the NA server API does not work correctly, like for example it says I have 15.4 spots per game on my IS-4.
Implications of these limitations
Due to the lack of DUD (damage upon spotting) on the WG website, light tanks are unfairly measured by WN7. They normally get lower WN7 scores than heavy, med or TDs of the same tier. That being said, WN7 is actually the metric that gives one of NA server´s best scouts (Redparadize) the highest rating...
PR: 1842 Eff2.0: 1742 WN7: 1943
SPGs also cause issues, as their tiers are not lined up with the rest of tanks! They do much more damage than their tier value would indicate for a HT, MT or TD. This is a known limitation of the formulas. Extensive programming (parsing the website stats for SPG counts and adjusting their tier) COULD fix this, but the problem will go away when the SPGs tiers are stretched to match (per the latest ASAP with SerB). For now...we deal with it. Who care about SPG players stats anyways, amirite?!?!?
Some statistical limitations
When measuring a population, its not going to be possible to put every single person on the scale and have the scale make sense. Again, returning to a notable outlier, Tazilon and his 20k+ VK2801 games. This massive number of games means his average tier played is 5.35, which is lower than is "generally expected" for someone with 28k total games. It takes longer to move through higher tier tanks, and so you end up with more weight at 6+. Because WN7 is designed to measure the population relative to each other, some assumptions have to be made about the habits of the general population. Most players don't play 20k games in any single class or tier below 8, let alone 20k in a single tier 5. If someone plays 10k games in the MS-1....outlier! Takeaway: population ratings cannot account for every outlier.
Back to the formula
A detailed explanation of each portion of the WN7 formula by Crabeatoff and Praetor77. Includes the answer to questions like "Why is cap not included in the formula?", "How much does winrate contribute to the equation" and "How does the low tier penalty work?":
Here is your frags factor. Note that tier is accounted for, by taking the min of either tier played or 6. Praetor et al found that frags were a very important factor in predicting player skill. However "although frags is a much more sound statistic, we decided to give them equal weight to avoid kill farming"
A Note on Scales: For those not accustomed to reading these types of formulas...these random numbers 1240, 1040, 0.164 may seem arbitrary. And the fact is...they ARE arbitrary. The scale of WN6 is itself arbitrary, as are many other familiar scales, Efficiency, SAT scores, and IQ. Without going into a long discussion of scaling (which I would love to do, because I am pedant), lets just say that statisticians create these scales so that persons can be compared to each other, and that the scale is simply accepted as convention. The scale could be from 0-100, but isn't because then it would look like percentages. It could be from 0-1600 like the old SAT or 0-2400 like the new one. The scale is bounded only by the amount of damage available in a game. It can run from the negatives into over 3000 points. However, in practice it has the following values based on the population percentages: So the coefficients or random numbers simply help normalizescores into a familiar bell curve, and stick generally to the convention of ranges which the Efficiency guys chose, and has become familiar to the WoT community.
After frags, we add in damage, which is again normalized using the mathematical constant e. In laymans terms, e helps us turn the distribution of the scores into a bell curve. As with frags, tier is taken into account, and constants are added, multiplied and divided to weight and normalize damage relative to the other components of the score, and to put the population in right order.
Finally! something simple. Every game (except those MM bugged out 7 player games) there are 15 opponents, at every tier. So a spot is ostensibly worth the same at tier 1 as tier 10. There is no tier factor in this portion of the formula.
Defense points, capped at 2.2 per game times 100. As with spots, there are the same number of cap and defense points needed at tier 1 and tier 10, so no tier factor. A Note on Cap and Defense Points Investigations by the WNX team, specially by Syndicate, found that they could not find a correlation between cap points and player skill.Here you can see a graph of a linear regression which relates win% and cap points, and though there is a clear trend, the actual data is VERY dispersed and in many points is very far from the predicted winrate. So the correlation between cap points and win% is very very low. This is probably due to a huge percentage of cap points being obtained in situations where they made no difference to the outcome of the game.
Also, when Syndicate performed a stepwise multiple regression, he came upon the fact that cap was simply not contributing to improve the fit of the data, frags was determined to be the best predictor of player skill, with damage very close behind (transformed to adjust to avg tier as per the WN7 formula), and then spots, and then defense.
This data suggests that damage and frags are not wholly independent variables, and that frags/game is the best predictor of player skill, however, the WNx team as a whole felt simply using kills might lead to kill farming, so kills and damage were given equal weights in the formula, in order to prevent farming, as presently, if you farm kills by holding your shots until enemies are at low health, your damage/game will suffer a drop. The least squares method coupled with evolutionary algorithms also suggested assigning such a low value to cap points that we decided to remove it from the formula. Both evolutionary algorithms and stepwise multiple regression suggested adding cap points into the formula did not improve the accuracy, or rather decreased the accuracy of the WN formula.
Also, this analysis suggests similar results as the previously performed least sum of squares coupled with evolutionary algorithms, suggesting that 75-80% of total score should be assigned to frags+damage, 10-15% to spots and 5-10% to defense points.
Cap points can vary as much as 1.5 to 4.5 per game! However defense points tend to rise with player skills. Very poor players will have below 0.5 per game, while better players will see 1.5 or more (up to and over 2). This bears out, in that suicide rushers never see the end of the game, and aware players will return to base to reset, or to destroy enemies trying to defend cap on Encounter modes. Capping IS a winning condition however. But because it became so well known that Efficiency was most easily manipulated by capping, the data has become biased towards capping. Capping also takes less skill than eliminating the enemy team. This isn't to say that you shouldn't cap to win, if capping is your best chance. WIN ALWAYS. But making it work across the population wasn't working. Again, this can effect LTs, as often times the best thing they can do is drive real fast and cap out, but....limitations.
Conclusions on cap points are evidence based, so if you want to pick this fight, bring large volumes of data! We would be more than happy if someone comes up with a viable and statistically solid way of adding cap into the equation which actually improves the formula´s capacity to predict average player skill.
Having 48% wins leads this term to have a value of 0. This term is an S-curve, which rewards winrates above 48%, but using winrate as a proxy to reward intangible skills like map awareness, knowing when to track tanks which leads to team kills, protecting arty from scouts, etc. All these lead to team wins but dont show anywhere else. Nevertheless, this term accounts for 0-10% of total WN7 score. To view an S-curve:  So as your win-rate goes above 48% (the population average), you get bonuses to your WN7, up to a point, at which point the contribution levels off. Similarly, players lose WN7 points for being below the population average, but again only to a point. This is a graph of WN7 score increase as winrate increases:
Low tier penalty
-[(5 - MIN(TIER,5))*125] / [1 + e^( ( TIER - (GAMESPLAYED/220)^(3/TIER) )*1.5 )]
This one is for you UMB! If your minimum tier is 5 or greater, this value is 0. For average tier lower than 5 the penalty increases up to 500 points. However, The penalty also takes into account the number of games played. If you have few games, this penalty also tends to 0. The idea behind this portion of the formula is to lower the score Pedotankers (aka sealclubbers), by the stats, show more skill relative to their peers, because their peers are worse than the general population. Again this is a limitation of trying to compare portions of the population to each other. The pedotankers do something aberrant and thus we get aberrant results. This knocks them down a few points. Your average player should be showing above tier 5 by around 2k games played though.
(1240-1040/(MIN(TIER,6))^0.164)*FRAGS +DAMAGE*530/(184*e^(0.24*TIER)+130) +SPOT*125*MIN(TIER, 3)/3 +MIN(DEF,2.2)*100 +((185/(0.17+e^((WINRATE-35)*-0.134)))-500)*0.45 -[(5 - MIN(TIER,5))*125] / [1 + e^( ( TIER - (GAMESPLAYED/220)^(3/TIER) )*1.5 )]
Would like to highlight that MIN() means the number capped at that value, so MIN(TIER, 5) means avg tier capped at 5, (so player avg tier is used if it is lower than 5, otherwise 5 is used) and MIN(DEF,2.2) means defense is capped at 2.2.
Current scale for the Rating.
This scale is different from the one used in XVM, since their analysis of russian player database gave different results. The scale is currently on schedule to be readjusted to an analysis of the wotlabs player database being performed by Neverwish.
And this is what goes on inside the WN Rating. I hope I could clear some doubts regarding this formula! Feel free to post on this thread if you have any suggestions or questions.