Skip to content

From research design to data collection/creation to analysis and presentation of results, we support faculty working with empirical research.

Research Assistance

  • Support for faculty doing research with an empirical component

Publication Support

  • Review empirical papers for sound research design and methodology
  • create publishable quality tables and figures

Instruction

  • In-class research instruction
  • Research classes on demand (open to students and staff)
    • How to conduct surveys for programs at HLS
    • How to read empirical papers for SJD students

Consultations

Schedule an appointment to discuss your project and learn how we can support you

  • Full support for finding, acquiring, cleaning, reformatting, merging, and scraping data
  • Statistical analysis of data
    • identifying testable hypotheses
    • research design
    • consulting on validity and reliability
    • analyzing, visualization, and reporting
  • Conducting field experiments, mturk, and nationally representative surveys
  • Data visualization
  • Text analysis, geographical information systems (GIS)
  • Review grant applications for sound methodology; application support for IRB compliance

Resources for Empirical Research

  • Designing an Empirical Legal Study

    This section offers broad advice on what to keep in mind when constructing a research design. Many of the points discussed below are drawn from and presented more fully within Epstein & King (2002) as well as King, Keohane, & Verba (1994) which are highly recommended sources for in-depth guidance on proper research design and execution. The first step of any empirical research study is to formulate a research question. What does the study seek to explain? A good research question should generally conform to the following rules:

    • The question should be relevant to the real world. It is important that the study seeks to provide practical and important implications for society.
    • The question should contribute to an existing body of scholarly literature. By speaking to an established set of related studies, the researcher can help avoid significant problems such as duplicating or overlooking previous work. Issac Newton’s famous quote, “if I have seen a little further it is by standing on the shoulders of Giants” colorfully illustrates this rule.

    Once the research question is clearly stated, the next step is to offer a clear answer to the question which is theoretically informed and from which falsifiable hypotheses can be derived. The hypothesis should:

    • Be stated clearly enough to allow for a test which can determine if the proposed answer is wrong.
    • Specify a relationship between an outcome (dependent variable) and one or more explanatory variables (independent variables).

    If there is insufficient evidence to reject a clearly stated, falsifiable hypothesis, then the theory becomes increasingly plausible. A theory which offers many observable implications and therefore more opportunities to be tested has the potential to become a very strong theory if the hypotheses derived from it cannot be rejected.

    Remember that the fundamental objective of empirical research is to make inferences—that is, using known facts to understand unknown facts. Typically we use observable data (known facts) to test certain hypotheses which are guided by theory to uncover these unknown facts.

    Let’s take a look at a simple example.

  • Collecting Data
    Michelle Pearse wrote an introduction to finding data for Research Assistants.  After the research question and hypothesis are posed, the next step is to collect data so that the hypothesis can be tested. The hypothesis posits a relationship between two or more variables. That is, an outcome (dependent) variable is influenced by one or more explanatory (independent) variables.Example: “Homicide conviction rates in Massachusetts (dependent variable) will be lower in the post Miranda v Arizona period (independent variable).”Following the formulation of the above hypothesis, we next want to operationalize the concepts within it. How can these concepts be observed and measured? Conviction rates in Massachusetts could be the ratio of guilty to total rulings for all homicide cases in Massachusetts over the period 1950-2011. The post-Miranda period includes all years following the Supreme Court decision in 1966. We want to make sure that the variables we are using as measurements of the concepts in our hypothesis are both:

    • Valid – the measure reflects the underlying concept accurately.
      • Does the scale accurately report my weight?
    • Reliable – the measure will produce similar a similar value when the measuring instrument is reapplied. A measure which is reliable need not be valid; indeed it may consistently produce similar but nevertheless biased estimates.
      • When stepping on the scale multiple times, does it return a consistent weight estimate?

    Data are typically classified into two categories—qualitative and quantitative. The levels of measurement are as follows:

    • Nominal data are one form of qualitative data where objects have no natural order (e.g. gender, race, religion, brand name). It does not make sense to think of Buddhism being “more than” Confucianism.
    • Ordinal data are another form of qualitative data—specifically, groups which can be ranked. An example of an ordinal variable is a survey respondent’s sense of agreement (e.g. strongly agree, agree, disagree, and strongly disagree). These responses do have a natural order and can be ranked, although the distance between each response is difficult to determine.
    • Interval data are one from of quantitative data which have a definite natural order and, unlike ordinal data, the difference between data can be determined and is meaningful. Interval-level data do not have a natural zero point, however. For example, 0 degrees on the Fahrenheit scale is arbitrary and therefore 100 degrees Fahrenheit is not twice as warm as 50 degrees.
    • Ratio data are the second form of quantitative data. In contrast, to interval data, ratio-level data have a non-arbitrary 0 point. For example, 0 yards means no length. 100 yards is twice as far as 50 yards.

    Even though qualitative data are mostly based on unordered groups, they can nevertheless be analyzed quantitatively. This is achieved by coding the qualitative data of interest into numerical values. For example, if we are running a survey, we can transform gender (nominal data) into a dichotomous (dummy) variable with each respondent assigned a 1 if female and 0 if male. Likewise, the attitudinal responses on the survey can be assigned numerical values as well, for example, Strongly Agree = 4; Agree = 3; Disagree = 2; Strongly Disagree = 1. Once qualitative data have been coded into numerical variables, they can be analyzed using both basic and advanced statistical models.

    In empirical legal research, content coding of natural language text is commonly employed (see Hall and Wright, 2008; Evans et al., 2007). Content analysis is a popular methodology which, for example, can be employed to summarize characteristics of interest related to court decisions. When possible, it is always best to have individuals other than the researcher code variables as to reduce bias.

    Empirical research on legal issues can rely on primary (original) as well as secondary (obtained from elsewhere) data. Bradley Wright and Robert Christensen, for example, in studying the effects of public service motivation on job sector choice, employ an original survey of law students in one study (Christensen and Wright, 2011) as well as survey data from the American Bar Association in another study (Wright and Christensen, 2010).

    Another data collection technique is webscraping, using software to visit web sites and extract specific bits of information. Here is a tutorial on web scraping written in the R language that was prepared by Jonathan Whittinghill, the Applied Research Statistician at the HLS Empirical Research Services.

  • Analyzing Data

    This section provides links to helpful resources on analytical methodologies—both descriptive and statistical. For further assistance, you may also schedule a meeting with the empirical research services unit at the Law Library.
    Where to learn about analyzing data:

    Empirical courses offered at Harvard
    For students interested in taking courses on empirical research, there are numerous classes offered at Harvard Law School, Kennedy School and FAS. There are five tracks for student interested in empirical research methods, ranging from courses appropriate for those with no background in empirical research to courses for those students who have an extensive methodological background. A description of these tracks can be found here and a list of courses offered on campus can be found in this file prepared by Jonathan Whittinghill.

    MIT OpenCourseWare classes on empirical research

    Political Science

    Political Science Scope and Methods (Undergraduate, Berinsky, Fall 2010)

    Quantitative Research in Political Science and Public Policy (Graduate, Ansolabehere, Spring 2004)

    Quantitative Research Methods: Multivariate (Graduate, Ansolabehere, Spring 2004)

    Qualitative Research: Design and Methods (Graduate, Meyer, Spring 2005)

    Qualitative Research: Design and Methods (Graduate, Locke, Fall 2007)

    Economics

    Introduction to Statistical Method in Economics (Undergraduate, Bennett, Spring 2006)

    Introduction to Statistical Methods in Economics (Undergraduate, Menzel, Spring 2009)

    Econometrics (Undergraduate, Angrist, Spring 2007)

    Statistical Method in Economics (Graduate, Chernozhukov, Fall 2006)

    Econometrics I (Graduate, Hausman & Chernozhukov, Spring 2005)

    Nonlinear Econometric Analysis (Graduate, Chernoshukov & Newey, Fall 2007)

    New Econometric Methods (Graduate, Newey, Spring 2007)

    Time Series Analysis (Graduate, Mikusheva, 2013)

    Mathematics

    Introduction to Probability and Statistics (Undergraduate, Panchenko, Spring 2005)

    Probability and Random Variables (Undergraduate, Sheffield, 2014)

    Statistics for Applications (Undergraduate, Kempthorne, 2015)

    Sloan School of Management

    Statistical Thinking and Data Analysis (Undergraduate, Rudin, 2011)

    Data, Models, and Decisions (Graduate, Gamarnik, Freund & Schulz, Fall 2007)

    Communicating with Data (Graduate, Carroll, Summer 2003)

    Doctoral Seminar in Research Methods I (Graduate, Sorensen & Bailyn, Fall 2004)

    Doctoral Seminar in Research Methods II (Graduate, Sorensen, Spring 2004)

    Overview of quantitative methods prepared by Parina Patel
    Statistical software packages:
    Stata is a general-purpose statistical software package which is popular among researchers in economics, sociology, political science, epidemiology and biomedicine among others. The statisticians at Harvard Law School primarily use Stata for data analysis.

    Machines with Stata are located in the computer classroom in Langdell 353. You may also purchase Stata directly from Statacorp.

    • The UCLA Institute for Digital Research & Education Stata site has many excellent step-by-step tutorials on a wide range of statistical estimation procedures using Stata.
    • Germán Rodriguez, Princeton University, Stata resources also has a comprehensive overview of Stata, including data management, graphics and programming examples.
    • One of the advantages of Stata is its active community of users. The Statalist is an email listserver where more than 3,500 Stata users discuss all aspects of the program. If you have a question, you are likely to find a relevant discussion in the archives of the listserver.
    • Stata Press publishes excellent manuals on best-practices for a whole range of statistical estimations. Most titles can be found using Hollis. The Stata Journal is also an invaluable resource for furthering usage effectiveness.
    • Downloadable material for upcoming Stata workshop

    R is an open-source programming language and statistical software environment. R offers a wide variety of statistical and graphical techniques. A good description of the software can be found on the official site of R. Compared to Stata and certainly SPSS, R requires a significant amount of programming proficiency. The program is free and can be downloaded here.

    IBM SPSS is another popular general-purpose statistical software package which can handle almost all econometric estimations. A notable difference between the SPSS and Stata/R environments is that SPSS relies much more on Graphical User Interface (point-and-click) procedures making it more user friendly. While the “vanilla” version of SPSS may be somewhat limited relative to Stata or R, there are many SPSS add-ons and modules which provide additional capabilities. SPSS can be bought directly from IBM SPSS.

    SAS is yet another popular software package used for statistical analysis. It is generally understood as a powerful program especially when working with very large datasets. One significant limitation of SAS is its poor graphical capabilities.

  • Presenting Results
    Whether explaining the distribution of a single variable or reviewing the results of an advanced statistical analysis, the researcher should try to convey substantive information in a clear and concise manner. Lee Epstein, Andrew Martin, Mathew Schneider, and Christina Boyd co-authored an insightful series on best practices in statistical presentation for empirical legal research. These papers are highly recommended for beginners interested in employing quantitative methods in their research. Some key rules derived from the papers include:

    • When discussing the results of a regression analysis, do not only focus on the parameters which are “statistically significant.” The researcher should also convey how “substantive” an effect each significant variable has on the outcome (dependent) variable. Holding other variables in the model fixed, for instance, what is the predicted value of the dependent variable when the significant independent variable in question is at its minimum, mean, and maximum values?
    • If you are attempting to use your results to infer about a population, then you should do so while explicitly discussing the level of uncertainty of your estimates. This typically implies discussing the confidence levels of your estimates. For a fun discussion of uncertainty see Ian Ayres’ SuperCrunchers (pgs. 112-116).
    • Try to avoid presentation of data and results using tables—graphs are almost always superior.

    Epstein et al. (2006) “On the Effective Communication of Results of Empirical Studies, Part I” Vanderbilt Law Review 59(6): 1811-1871

    Epstein et al. (2007) “On the Effective Communication of Results of Empirical Studies, Part II” Vanderbilt Law Review vol. 60(3): 801-846.

  • Publication Process

    Once you are finished, you might be interested in trying to get your study published in a law school law review or in a peer-reviewed journal from a society and larger publisher. Washington and Lee’s Law Journal Submission and Ranking website, Ulrich’s Global Serials Directory, and ISI Journal Citation Reports are good resources for identifying both types of journals both within and outside of the United States. (There is a peer-reviewed journal that is actually devoted completely to empirical legal studies work, Journal of Empirical Legal Studies.While simultaneous submission of manuscripts to multiple journals is the norm for most law school law reviews (with August-October and February-April being the big submission “seasons”), most peer-reviewed journals require exclusive submissions. (Some student edited journals like the Harvard Law Review and Stanford Law Review are also starting to experiment with peer or faculty review and may prefer exclusive submissions.) You should always check the journal’s website for specific guidelines about preparing manuscripts for publication. (For example, the NYU Law Review has special guidelines just for empirical work).You might also want to consider depositing your data to make it available for replication and further use by future researchers. Some journals might actually require you to submit your data for manuscript review or for publication. While there are various options for storing and archiving your data, one of the most popular ones with social scientists is IQSS Dataverse. It has several features, including the ability to prepare data visualizations for users.

  • External Resources

Director of ERS

Arevik Avedian smiling.

Arevik Avedian

Director of Empirical Research Services

Arevik is a Lecturer on Law at HLS, teaching quantitative methods. She holds a Ph.D. in world politics and methods and a M.A. in economics from Claremont Graduate University, a dual B.A/M.A., summa cum laude, in international relations from David Anhaght University of Armenia. Before joining HLS, she taught courses on statistics and international relations at University of California, Riverside and California State University, Fullerton.

Her research focuses on armed conflict, inequality and corporate governance. Some of her current methodological interests include geographic information systems (GIS), text mining and location analytics.