- Valid – the measure reflects the underlying concept accurately.
- Does the scale accurately report my weight?
- Reliable – the measure will produce similar a similar value when the measuring instrument is reapplied. A measure which is reliable need not be valid; indeed it may consistently produce similar but nevertheless biased estimates.
- When stepping on the scale multiple times, does it return a consistent weight estimate?
Data are typically classified into two categories—qualitative and quantitative. The levels of measurement are as follows:
- Nominal data are one form of qualitative data where objects have no natural order (e.g. gender, race, religion, brand name). It does not make sense to think of Buddhism being “more than” Confucianism.
- Ordinal data are another form of qualitative data—specifically, groups which can be ranked. An example of an ordinal variable is a survey respondent’s sense of agreement (e.g. strongly agree, agree, disagree, and strongly disagree). These responses do have a natural order and can be ranked, although the distance between each response is difficult to determine.
- Interval data are one from of quantitative data which have a definite natural order and, unlike ordinal data, the difference between data can be determined and is meaningful. Interval-level data do not have a natural zero point, however. For example, 0 degrees on the Fahrenheit scale is arbitrary and therefore 100 degrees Fahrenheit is not twice as warm as 50 degrees.
- Ratio data are the second form of quantitative data. In contrast, to interval data, ratio-level data have a non-arbitrary 0 point. For example, 0 yards means no length. 100 yards is twice as far as 50 yards.
Even though qualitative data are mostly based on unordered groups, they can nevertheless be analyzed quantitatively. This is achieved by coding the qualitative data of interest into numerical values. For example, if we are running a survey, we can transform gender (nominal data) into a dichotomous (dummy) variable with each respondent assigned a 1 if female and 0 if male. Likewise, the attitudinal responses on the survey can be assigned numerical values as well, for example, Strongly Agree = 4; Agree = 3; Disagree = 2; Strongly Disagree = 1. Once qualitative data have been coded into numerical variables, they can be analyzed using both basic and advanced statistical models.
In empirical legal research, content coding of natural language text is commonly employed (see Hall and Wright, 2008; Evans et al., 2007). Content analysis is a popular methodology which, for example, can be employed to summarize characteristics of interest related to court decisions. When possible, it is always best to have individuals other than the researcher code variables as to reduce bias.
Empirical research on legal issues can rely on primary (original) as well as secondary (obtained from elsewhere) data. Bradley Wright and Robert Christensen, for example, in studying the effects of public service motivation on job sector choice, employ an original survey of law students in one study (Christensen and Wright, 2011) as well as survey data from the American Bar Association in another study (Wright and Christensen, 2010).
Another data collection technique is webscraping, using software to visit web sites and extract specific bits of information. Here is a tutorial on web scraping written in the R language that was prepared by Jonathan Whittinghill, the Applied Research Statistician at the HLS Empirical Research Services.