Researchers have long used federal court data assembled by the Administrative Office of the U.S. Courts (AO) and the Federal Judicial Center (FJC). The data include information about every case filed in federal district court and every appeal filed in the twelve nonspecialized federal appellate courts. Much research using the AO data spans subject matter areas, and includes articles on appeals, caseloads and case-processing times, case outcomes, the relation between demographics and case outcomes, class actions, diversity jurisdiction, and litigation generally. Other research using the AO data covers particular subject matter areas, such as inmate cases, contract cases, corporate litigation, antitrust litigation, patent litigation, employment litigation, constitutional tort litigation, and products liability cases. These varied uses of the AO database have led to it being called "by far the most prominent" database used by legal researchers for statistical analysis of case outcomes. For many years researchers relied on the data as published in the Annual Reports of the AO Director or on specific inquiries answered by the AO staff. In recent years, the FJC has made the data available in electronic form through the Inter-university Consortium for Political and Social Research. This easier access to the data, together with increasing use of computers and sophisticated statistical software programs, forecasts even greater future use of the AO data. Like many large data sets, the AO data are not completely accurate. Some reports exist relating to the AO data's reliability, but no systematic study of the AO's non-bankruptcy data has been published. In the course of a substantive study of federal litigation brought by prison and jail inmates, one of us began to investigate the nature and rate of errors, exploiting a technological innovation in federal court records: the availability of docket sheets over the Internet via the federal judiciary's Public Access to Court Electronic Records project (PACER). This Article follows a similar method to begin more comprehensively the process of assessing the AO data's reliability. (Relatively little is known about the accuracy of other major law-related data sets although it is clear that another source of information about thousands of cases, jury verdict reporters, vary in their accuracy.) In the large majority of districts, PACER allows public Internetbased access to docket sheets recorded since 1993; in some districts other case materials are also available. To test the AO data's reliability, we compare the characteristics of cases as coded in the AO data with what we believe to be the more accurate information recorded by clerks on individual case docket sheets, as obtained through the PACER system. Evert though the court personnel who update case dockets are frequently the very people responsible for the AO data collection (and indeed, such personnel may often fill in many, though not all, of the AO variables on the basis of the docket sheet itself), the information on the docket sheets is likely to be more reliable because it is entered in narrative form and therefore without coding issues and as litigation events occur rather than retrospectively, and because maintenance of dockets (unlike data entry for AO statistical purposes) is a core function of court clerks' office personnel. This study looks at two large categories of cases, torts and inmate civil rights, and separates two aspects of case outcomes: which party obtained judgment and the amount of the judgment when plaintiffs prevailed. With respect to the coding for the party obtaining judgment, we find that the AO data are very accurate when they report a judgment for plaintiff or defendant, except in cases in which judgment is reported for plaintiff but damages are reported as zero. As to this anomalous category (which is far more significant in the inmate sample than in the torts sample), defendants are frequently the actual victors in the inmate cases. In addition, when the data report a judgment for "both" parties (a characterization that is ambiguous even as a matter of theory), the actual victor is nearly always the plaintiff. Because such cases are quite infrequent, this conclusion is premised on relatively few observations and merits further testing.
Schlanger, Margo. "The Reliability of the Administrative Office of the U.S. Courts Database: An Initial Empirical Analysis." T. Eisenberg, co-author. Notre Dame L. Rev. 78, no. 5 (2003): 1455-96.