In this Article, I explore why measuring disparate-treatment discrimination by police is so difficult, and consider the ways that researchers' existing tools can make headway on these challenges and the ways they fall short. Lab experiments have provided useful information about implicit racial bias, but they cannot directly tell us how these biases actually affect real-world behavior. Meanwhile, for observational researchers, there are various hurdles, but the hardest one to overcome is generally the absence of data on the citizen conduct that at least partially shapes policing decisions. Most crime, and certainly most noncriminal "suspicious" or probable-cause-generating behavior, goes unreported and undetected, and is unobservable to researchers. The available measures of observed crime are not necessarily good proxies for total crime, and in any event, such data generally do not exist at an individual level that can be linked to individual outcome data on police interactions. Meanwhile, while we often do have data on the subset of people who are stopped by police, analyses limited to those individuals are often distorted by selection bias and by the absence of exogenous measures of their conduct; researchers have no choice but to rely, circularly, on what police write down. These hurdles are serious. Some headway has been made in particular contexts in which quasi-experimental methods or direct physical observation by researchers is possible, but most policing contexts are not readily amenable to these approaches. It may be possible to do more using survey methods, though these pose their own challenges. And when it comes to assessing discrimination against neighborhoods of color (as opposed to individuals), it is sometimes possible to rely on aggregate-level data to make plausible claims. Often, however, the limits of available data will mean that it is just not possible to determine whether the police are discriminating based on race. These research challenges are also problems for courts, litigants challenging such discrimination, and police departments themselves as they seek to comply with their constitutional obligations. I suggest, in some contexts, that a new approach would work better. The method I propose is called "auditing," which would employ "testers" (probably undercover officers) of different races to elicit possible interactions with the police. Auditing has not been tried or even discussed in the law enforcement field, which is surprising because for decades it has been a central tool in antidiscrimination research and civil rights enforcement more generally. It presents safety, legality, and efficacy concerns when applied to policing, but with careful design I argue that these concerns can be overcome. If so, auditing could provide something observational research usually cannot: causally rigorous analysis of police discrimination in a realworld setting. Part I begins by examining why it is important to develop good methods for measuring "disparate treatment" discrimination by police. Disparate treatment is certainly not the only source of racial disparity in policing that researchers or policymakers should care about. That said, constitutional doctrine forces us to confront the question, and, I outline other moral and policy reasons for why we should be concerned about disparate treatment. I also examine the conceptual problems associated with thinking of racial discrimination as a "cause" of disparity. In Part II, I examine existing methods of analyzing disparate treatment: individual- and neighborhood-level regression analyses, quasi-experimental methods exploiting variation in police ability to observe race, and lab experiments on implicit bias. In Part III, I set forth the auditing proposal and explore its advantages, challenges, and limitations.