AP Statistics
Today's Topic: Categorical Data
Objectives for Today
- Compare and Contrast Categorical and Quantitative variables
- Represent Categorical Data sets with various graphs
- Calculate frequencies and relative frequencies from Categorical Data sets
Warm-Up: Goofy Graphs
The following are actual graphs that have been published by various media outlets. In your table groups, describe the error in each one and create a graph that more effectively (and correctly) represents the data. Be prepared to share your results with the class.
Categorical vs. Quantitative Data
Sort the cards into two stacks - one for categorical variables and one for quantitative variables.
Are there any variables that could possibly be both? Why/how?
Which quantitative variables are discrete? Which are continuous?
.....................................................................................
What are ways we can represent categorical data?
Bar Graph
Just like you learned in Elementary school.
Segmented ('Stacked') Relative Bar Graph
Way harder, sometimes confusing, but required.
Pie Chart
'Nuff said
.....................................................................................
Categorical Data give counts (frequencies) or percents of things in categories. Also called "qualitative" data.
Quantitative Data have units
More Vocabulary
Frequency = Counts (ie. count up how many times that thing happens/appears)
Relative Frequency = Percent of Counts (percent of category to total sample or percent of subgroup to category)
Example of Frequency Table
The column for "tally marks" are usually left out, but sometimes it is necessary when the data set is large.
Example of Relative Frequency Table
Just takes the counts and turns them into percents or proportions out of the whole sample. This table includes the frequencies as well as the relative frequencies for reference.
Two-way frequency table (AKA Contingency Table)
When more than one variable is being analyzed we break up the columns and rows into the various categories for each variable and display the counts for each cell. We can then calculate relative frequencies, which have numerous options depending on which category everything is relative to. You can do relative to the total, the row, or the column. Relative to the row or column is called a conditional relative frequency because it depends on the condition (or category) of a specified variable.
Independence
Based on the two-way table above, do the variables "Vehicle Type" and "Gender" appear to be independent? Another way to think about this question is, does being male or female affect your likelihood to own a particular vehicle type over the other (based on this data)?
Conditional Relative Frequencies
The example below represents a relative frequency table when the calculations are done relative to the row. Notice at the end of each row, the marginal totals equal 1, or 100%. If the column had been the chosen relative variable, then each total at the bottom would be 100%.