Bar exam scores are scaled. To explain it in the simplest way possible, there are a number of questions on the MBE that have been repeated in past exams. After an exam, how well a particular pool of examinees did on those questions determines the scale. For example, let's suppose the average July administration answers 30/60 of these "equator items" correctly but the July 2017 administration answered 32/60 correctly. Based on this, it is presumed that the July 2017 candidates are more-able than the average July candidate meaning these candidates will receive higher scaled scores. Thus, the exam scale is essentially based on how each administration tests on the MBE equators.

Please note that scaling is not linear. For example, on the July 2017 MBE exam, an examinee that chose C for all his MBE answers would probably receive a scaled MBE score of 90 based on a raw MBE score of 44/175 (or about 25% correct). This is a difference of 45 points between the raw score and the scaled score. Meanwhile, an examinee with a "passing" scaled MBE score of 133 probably earned a raw score of 106/175 (or about 61% correct). This is a difference of 27 points. In contrast, an examinee with a scaled MBE score of 162 probably earned a raw score of 148/175 (or about 85% correct). This is a difference of only 15 points. Meanwhile, in another administration, these scaled scores could change substantially. For example, when I took the exam in 2005, I scored a 162 scaled on the MBE, but my raw score was 157 (a difference of only 5 points).

Scaling is performed to maintain the same level of difficulty for each exam to stabilize the pass rates. However, in order to determine the scale, you need to know the mean and standard deviation (SD) for the MBE in your jurisdiction for that exam, along with the MEEs/MPTs mean and SD. I don’t know of any jurisdiction that divulges this information anymore (the PA bar exam once reported this information in 2002 and that’s all I am aware of). According to NY BOLE, "The scaled score for each of the six MEE questions and two MPT questions are arrived at by converting the raw score for each question to a scale that generally ranges from approximately 20 to 80, with 50 as the mean. The candidate's MEE scores are totaled and divided by six, the MPT scores are totaled and divided by two and the resultant scores are added together to arrive at the average scale score for the written section. In computing the average scale score the MEE is weighted 60% and the MPT is weighted 40%. That average is then converted to a score distribution that is comparable to that of the MBE. The resulting figure is the candidate's total written score." NCBE explains the MBE scaled score as follows:

*The MBE is a standard examination. This means that a score is produced which will represent the same level of ability from test to test. It is impossible to construct a form of the MBE that has exactly the same difficulty level as other forms of the test. A "raw" score, or the number of correct answers of an applicant, is affected by the difficulty of the particular form of the test. If the test is relatively easy compared to the standard established by earlier tests, the applicant may have a high score. If the test is relatively hard, the applicant will have a lower score. With a predetermined passing raw score, an applicant might pass or fail depending on the difficulty of the particular form of the test. This is obviously an unfair result. To avoid a fluctuating level of ability required for admission to the bar, a measure is needed which will represent the same level of difficulty from test to test. In order to compare the difficulty of each form of the MBE, some questions are repeated from earlier forms of the MBE, When approximately twenty-five hundred answer sheets reach ETS, they are scored and statistical procedures are used to compare the difficulty of this test with the difficulty of the standard test. If this test is harder than the standard established by earlier tests, points are added to raw scores to obtain a "scaled" score which represents the level of difficulty of the same score on the standard test. If this test is easier than the standard test, points are subtracted from the raw score to obtain the "scaled" score.*

The MBE is highly reliable because it re-uses questions. According to NBCE, "*by analyzing performance of a group of examinees on the reused set of questions, we can determine the proficiency of this group of examinees relative to other groups of examinees who took the same questions in the past. Then, by analyzing performance on the new questions in light of performance on the previously used questions, we can determine the relative difficulty of the new questions. Scores are 'scaled' using the equating process to ensure that scores have a stable meaning over time. For example, an examinee with a scaled score of 135 on the July 2004 MBE would demonstrate the same level of proficiency as would examinees with scaled scores of 135 on any other MBE administrations.*"

Because the MBE exam is different with each administration, the level of difficulty of the exam fluctuates. Therefore, equating is performed to neutralize the effects of differences in MBE exam difficulty. The MBE raw score is converted to a "scaled score" using an equating process to ensure that the scores have a stable meaning over time. The equating process requires that a mini-test comprised of 60 questions that have appeared on earlier versions of the test be embedded in the larger exam. The mini-test mirrors the full exam in terms of content and statistical properties of the items. According to NCBE, "[t]wo sets of 'equator' items are carefully selected from two old test forms and inserted in a new test form. In terms of content coverage and statistical properties, each of these sets of equator items is like a miniature version of the previous full length test. Next, the balance of the new test form is developed to meet total test specifications." For example, the July 2017 MBE may draw 30 questions from previous February MBE exams and 30 questions from previous July MBE exams. Following is an excerpt from an August 2003 NCBE article regarding the MBE:

Our ability to generate scale scores requires the use of previously used items (known as “equators”), and scale scores are required in order to compare performance and maintain an equitable pass/fail standard from one test administration to the next. Without the use of equators, examinee scores would vary with the difficulty of the exam questions or with the performance of others who are sitting for the exam at the same time ... Producers of high-stakes tests (such as the tests developed by NCBE) are obligated to eliminate exposed test material from the item pool and generate new items. Test development costs are high, and these costs are inevitably passed along to the examinees.

According to NCBE: *After the exam, statistical analvsis is performed on the equating and nonequating items in the new and old forms. These analyses first determine how much differently the new group performed on their test form compared to the old groups on their test forms. Then the analysis breaks these differences down into the amount attributable to differences in difficulty between these forms, and the amount attributable to differences in ability between the groups. Once the determination is made on how much of the difference is attributable to differences in test difficulty, the proper "correction" can be calculated. This correction is made by rescaling the scores using a conversion equation. This means that .1 multiplier and an additive constant are applied to all the scores. This rescaling places the new candidate group on the same score scale as the old groups. The intended result is to neutralize the effects of differences in form difficulty, as though everyone took the same test form. Ideally, it should be a matter of indifference to a candidate as to which test form he or she gets.*

This scaling is intended to maintain the same level of difficulty for each MBE exam, in effect stabilizing the pass rate. Accordingly, there is no "national" raw MBE score - only raw scores by jurisdiction. The scaled MBE scores of applicants from a given jurisdiction are aggregated and the mean and standard deviation of those scores are calculated. The mean and standard deviation of the raw written scores earned by the same applicants are then calculated. The scores on the written exam are then converted so that they have the same mean and standard deviation as the scaled MBE scores. Scaling the essays to the MBE is an essential step in ensuring that scores have a consistent meaning over time. When essay scores are not scaled to the MBE, they tend to remain about the same: for example, it is common for the average raw July essay score to be similar to the average February score even if the July examinees are known to be more knowledgeable on average than the February examinees.

If an individual of average proficiency sits for the bar with a particularly bright candidate pool, this individual’s raw written scores will remain lower than they would have been in previous sittings with less able peers. But the equating of the MBE will take into account that this is a particularly bright candidate pool and that the individual in question is in fact of average ability. The individual’s written test scores will then be scaled to account for the difference in the candidate pool, and his written test scores will be brought into alignment with his demonstrated level of ability. Scaled essay scores lead to total bar examination scores that eliminate contextual issues and that accurately reflect individual proficiency. Therefore, the higher the MBE average for an exam, the higher the scale. Scaling written scores to the MBE does not change the rank-ordering of examinees on either test. A person who had the 83rd best MBE score and the 23rd best essay score will still have the 83rd best MBE score and the 23rd best essay score after scaling.

NCBE explains in detail how essay scores are scaled to the MBE here. Using this article, I created the below calculator (based on the pre-UBE NY bar exam) which estimates your raw essay score based on your scaled essay score and vice versa.
Please note that in order for this conversion to be accurate, you need to know the correct mean and standard deviation (SD) for the MBE in your jurisdiction, along with the essays/MPT.
For New York, I guesstimated both. While I think the MBE figures are fairly accurate (the estimated median MBE score in NY is based on my
interpolation of scores submitted to me by failing examinees), I really have no idea about the essay figures (unfortunately NY BOLE does not release the New York bar exam statistics needed to correctly calculate scaled essay scores). Without knowing the New York mean and standard deviation for the MBE and Essays/MPT, there is no way to properly estimate the raw scores. However, the calculator is useful in illustrating how the mean MBE and SD affects the scaling of scores: * the higher the mean, the higher the scale. The higher the standard deviation, the lower the scale.* Accordingly, a 50 will generally not represent an average of 5/10 on the essays.

To illustrate how the calculator works (and how I tested it), an examinee who wrote one sentence on Essay #4 of the July 2010 exam received a Scaled Essay Score of 21.59. Because only one sentence was written, the raw score for this essay was likely 0.0. The New York MBE mean scaled score for the July 2010 exam (based on 11,557 examinees) was approximately 141.86. A scaled score of 141.86 would be a raw score of 135 (based on the July 2006 NY MBE scale - this is the last exam that NY BOLE released both raw and scaled MBE scores). The NY MBE raw SD, NY Essay mean raw score, and NY Essay raw SD are unknown (only NY BOLE knows this), so I guesstimated values for these items. For example, the average national MBE standard deviation from 2000-2005 was 15.3 - I used 13.5 for New York. The NY Essay mean raw score must be under 5.0 since 50 is the mean scaled score according to NY BOLE. Using the below guesstimated values, a July 2010 raw essay score of 0.0 results in a scaled score of 21.60.

