3  Results

In this section, we will begin by introducing the variables used for analysis. Subsequently, we will delve into a discussion on youth tobacco exposure and addiction rates, considering variations over time, across states, and differences in gender and education levels.

3.1 Data Processing

We will continue to use variables mentioned previously.

  • Dependent variable:

    • Value : the percentage of students who share the same smoking status among the entire student population with similar features
  • Independent variable:

    • YEAR: Year of the observation

    • State: Location of the observation

    • Topic: Type of tobacco consumption(Cigarette/ Smokeless Tobacco/ Cessation)

    • Measure: Smoking Status (Smoking/ Quit Attemp)

    • Res: Frequency of tobacco usage (Ever/ Current/ Frequent)

    • Sample: sample size

    • Edu: Education background of the observation (Middle School/ High School)

    • Gender: Female/ Male/ Overall

Code
# Impart Packages 
library(readr)
library(tidyverse)
library(ggplot2)
library(choroplethr)
library(dplyr)
library(plotly)
library(cowplot)
library(vcd)
library(choroplethrMaps)
# Import Dataset 
data <- read_csv("https://raw.githubusercontent.com/yutongzhang37/tobacco/main/Data/Youth_Tobacco_Survey__YTS__Data.csv")

# We now remove variables with single observation
data <- Filter(function(x)(n_distinct(x)>1), data)

# use only the independent and dependent variable mentioned previously
data_bf <- subset(data, select = c("YEAR", "LocationDesc", "TopicDesc", "MeasureDesc", "Response","Data_Value","Sample_Size", "Gender", "Education"))

# Rename variables
data_bf <- data_bf %>%
  rename(
    State = LocationDesc,
    Topic = TopicDesc,
    Measure = MeasureDesc,
    Sample = Sample_Size,
    Edu = Education,
    Res = Response,
    Value = Data_Value
)

# order ordinal category
data_bf$Edu <- factor(data_bf$Edu, levels = c("Middle School", "High School"))

3.2 How did patterns of youth tobacco exposure and addiction evolve from 1999 to 2017 ?

To address this question, we will initially examine the overall trends in youth tobacco exposure and addiction from 1999 to 2017. Subsequently, we will explore potential variations in these patterns based on gender and education levels. Our specific focus will be on the percentage of students who had ever smoked (youth tobacco exposure) and the percentage of students who had smoked frequently (youth tobacco addiction).

3.2.4 Considering Both Genders and Education Levels

Building upon our previous analysis, it became evident that both gender and education level influence youth tobacco exposure and addiction rates. In an effort to gain a deeper understanding of the patterns across years, we have introduced controls for both gender and education. The dataset was segmented based on education levels and gender, allowing us to calculate the mean and standard deviation of the observed percentages of youth tobacco exposure and addiction rates.

  • Tobacco Exposure

    Faceting on years, we observe a clear declining trend in youth tobacco exposure from 1999 to 2017, confirming our previous findings. Additionally, a notable distinction emerges between middle school and high school average tobacco exposure rates, with middle school rates consistently around half of those in high school across all years. While gender differences in tobacco exposure rates are relatively minor, the graph indicates that males consistently exhibit higher youth tobacco exposure rates when controlling for year and educational levels.

    Code
    data_avegee <- data_ge %>% group_by(YEAR,Edu, Gender) %>% summarise(ave_p = mean(Value), std = sd(Value))
    
    ggplot(data_avegee, aes(x = Edu, y = ave_p, fill= Gender))+
      geom_col(position = "dodge")+
      geom_errorbar(
        aes(ymin = ave_p - std, ymax = ave_p + std),
        position = position_dodge(.9),
        width = .2
      )+
      facet_wrap(vars(YEAR))+
      xlab("Year")+
      ylab("Percentage")+
      ggtitle("Youth Tobacco Exposure By Education Levels and Gender From 1999 to 2017")

  • Tobacco Addiction

    Youth tobacco addiction rates also exhibit a distinct decreasing pattern across years, controlling for gender and education level. The disparity between middle school and high school addiction rates is striking, with the middle school average consistently remaining below 5% (from around 5% to almost 0%), while high school tobacco addiction rates dropped from around 17% to 2.5% from 1999 to 2017. Additionally, gender differences in tobacco addiction are noticeable in both middle school and high school across the years. However, compared to education level differences, gender differences are less prominent in tobacco addiction rates.

    Code
    data_avegef <- data_gf %>% group_by(YEAR,Edu, Gender) %>% summarise(ave_p = mean(Value), std = sd(Value))
    ggplot(data_avegef, aes(x = Edu, y = ave_p, fill= Gender))+
      geom_col(position = "dodge")+
      geom_errorbar(
        aes(ymin = ave_p - std, ymax = ave_p + std),
        position = position_dodge(.9),
        width = .2
      )+
      facet_wrap(vars(YEAR))+
      xlab("Year")+
      ylab("Percentage")+
      ggtitle("Youth Tobacco Addiction By Education Levels and Gender From 1999 to 2017")

3.2.5 Summary

  • Youth tobacco exposure and addiction rates both exhibited a significant decrease from 1999 to 2017.

  • Male youth tobacco exposure and addiction rate is consistently higher than female youth tobacco exposure and addiction rates across all years

  • High school students consistently maintained markedly higher tobacco exposure and addiction rates than middle school students over the past two decades, although as high school rates experienced a notable decline.

  • Controlling for both gender and education level, education level exerts a stronger impact on youth tobacco exposure and addiction rates than gender.

3.3 How does youth tobacco exposure and addiction differ by states in 2017?

To address this question, we will initially examine youth tobacco exposure and addiction rates across states, controlling for education levels. Subsequently, we will explore gender and education level differences within each state. Only a subset of states has data in 2017 is included in this analysis.

3.3.1 Youth Tobacco Exposure and Addiction Rates Across States By Education

We controlled for education levels to create a choropleth maps since the dataset contains varied observations across different levels of education. States lacking the relevant data are represented by the color black on the choropleth map. The deeper shade of blue indicates that this state has a higher percentage of youth tobacco exposure.

  • Tobacco Exposure
    • Middle School

      West Virginia, Louisiana, and South Carolina exhibit a higher percentage of middle school youth who have previously engaged in smoking, with an approximate rate of 16%. Ohio reports the lowest rate of 8% for tobacco exposure in middle school youth.

      Code
      # filter data
      data2017 <- subset(data_bf, YEAR == 2017)
      data2017 <- data2017 |>
        mutate(State = tolower(State)) |>
        rename(region = State, value = Value)
      exposure <- data2017 |>
        filter(Topic == "Cigarette Use (Youth)" & Res == "Ever") |>
        select(region, value, Gender, Edu)
      exposure_middle <- exposure |>
        filter(Gender == "Overall" & Edu == "Middle School") |>
        select(region, value)
      exposure_high <- exposure |>
        filter(Gender == "Overall" & Edu == "High School") |>
        select(region, value)
      
      # choropleth map
      suppressWarnings({
        state_choropleth(exposure_middle, num_colors = 1, title = "2017 Middle School Youth Tabacco Exposure") +
          annotate("text", x = -104, y = 23, label = "HI", color = "black", size = 3) +
          annotate("text", x = -116, y = 23, label = "AK", color = "black", size = 3)
      })

    • High School

      Consistently, West Virginia, Louisiana, and South Carolina also exhibit higher percentage of tobacco exposure among high school youth, albeit with a notable increase to 35%. Connecticut has a comparatively lower percentage of 15% for tobacco exposure in high school youth.

      Code
      suppressWarnings({
        state_choropleth(exposure_high, num_colors = 1, title = "2017 High School Youth Tabacco Exposure") +
          annotate("text", x = -104, y = 23, label = "HI", color = "black", size = 3) +
          annotate("text", x = -116, y = 23, label = "AK", color = "black", size = 3)
      })

Notably, the proximity of the highest percentage of middle school youth ever engaged in smoking to the lowest percentage among high school youth suggests a substantial number of youths experimenting with smoking upon entering high school.

  • Tobacco Addiction

    The observed pattern of tobacco addiction closely aligns with the corresponding pattern of tobacco exposure.

    • Middle School

      Louisiana conspicuously exhibits the highest prevalence of youth who engage in frequent smoking, recording rates of 1% in middle school. On the other hand, Ohio demonstrate lowest proportion of youth addicted to tobacco in middle school, which is almost 0.1%.

      Code
      # filter data
      frequent <- data2017 |>
        filter(Topic == "Cigarette Use (Youth)" & Res == "Frequent") |>
        select(region, value, Gender, Edu)
      frequent_middle <- frequent |>
        filter(Gender == "Overall" & Edu == "Middle School") |>
        select(region, value)
      frequent_high <- frequent |>
        filter(Gender == "Overall" & Edu == "High School") |>
        select(region, value)
      # choropleth map
      suppressWarnings({
        state_choropleth(frequent_middle, num_colors = 1, title = "2017 Middle School Youth Tobacco Addiction") +
          annotate("text", x = -104, y = 23, label = "HI", color = "black", size = 3) +
          annotate("text", x = -116, y = 23, label = "AK", color = "black", size = 3)
      })

    • High School

      Again, Louisiana has the highest youth tobacco addiction rate among observed states, which is around 4%. In contrast, Connecticut demonstrate lowest proportion of youth addicted to tobacco in high school.

      Code
      suppressWarnings({
        state_choropleth(frequent_high, num_colors = 1, title = "2017 High School  Youth Tobacco Addiction") +
          annotate("text", x = -104, y = 23, label = "HI", color = "black", size = 3) +
          annotate("text", x = -116, y = 23, label = "AK", color = "black", size = 3)
      })

The prevalence of youth tobacco addiction is markedly higher in high school compared to middle school.

3.3.2 Gender and Education Level Differences within Each State

  • Tobacco Exposure

    In the Cleveland dot plot, states are organized in descending order based on the percentage of tobacco exposure among male youth. Consistent with previous findings, Louisiana, West Virginia, and South Carolina demonstrate higher rates of tobacco exposure across both educational levels and genders. Furthermore, across the majority of states, males exhibit a higher prevalence of tobacco exposure than females within both educational levels.

    Code
    # filter data
    exposure_g_middle <- exposure |>
      filter(Gender != "Overall" & Edu == "Middle School")
    exposure_g_high <- exposure |>
      filter(Gender != "Overall" & Edu == "High School")
    # Cleveland dot plots
    plot1 <- ggplot(exposure_g_middle, aes(x = value, y = fct_reorder2(region, Gender=='Male', value, .desc=FALSE), color = Gender)) +
      geom_point() +
      labs(title = "2017 Middle School Youth Tobacco Exposure By Gender and Education",
           x = "Percentage",
           y = "State") +
      theme_minimal()
    plot2 <- ggplot(exposure_g_high, aes(x = value, y = fct_reorder2(region, Gender=='Male', value, .desc=FALSE), color = Gender)) +
      geom_point() +
      labs(title = "2017 High School Youth Tobacco Exposure By Gender and Education",
           x = "Percentage",
           y = "State") +
      theme_minimal()
    # combined two plots
    plot_grid(plot1, plot2, ncol = 1, align = "vh")

  • Tobacco Addiction

    The plot revealed similar pattern with previous choropleth map, where the percentage of high school youth addicted to tobacco is higher. In most states, males have a higher prevalence of tobacco addiction than females within both educational levels. Male in both education levels in Louisiana have the highest rate of tobacco addiction.

    Code
    # filter data
    frequent_g_middle <- frequent |>
      filter(Gender != "Overall" & Edu == "Middle School")
    frequent_g_high <- frequent |>
      filter(Gender != "Overall" & Edu == "High School")
    # cleveland dot plots
    plot3 <- ggplot(frequent_g_middle, aes(x = value, y = fct_reorder2(region, Gender=='Male', value, .desc=FALSE), color = Gender)) +
      geom_point() +
      labs(title = "2017 Middle School Youth Tobacco Addiction By Gender and Education",
           x = "Frequent Smoker Percentage",
           y = "State") +
      theme_minimal()
    plot4 <- ggplot(frequent_g_high, aes(x = value, y = fct_reorder2(region, Gender=='Male', value, .desc=FALSE), color = Gender)) +
      geom_point() +
      labs(title = "2017 High School Youth Tobacco Addiction By Gender and Education",
           x = "Frequent Smoker Percentage",
           y = "State") +
      theme_minimal()
    # combined two plots
    plot_grid(plot3, plot4, ncol = 1, align = "vh")

3.3.3 Summary

  • In 2017, noticeable geographical disparities exist for youth tobacco exposure and addiction rates.

  • High schools consistently demonstrate higher tobacco exposure and addiction rates than middle schools across all states.

  • Male students tend to exhibit higher tobacco exposure and addiction rates in the majority of states.

3.4 How do gender and educational factors impact Youth Tobacco Exposure and Addiction within a specific state and year?

In this instance, we will focus on examining the data from the state of Louisiana in the year 2017, since Louisiana has the most frequent tobacco usage according to previous findings.

  • Tobacco Exposure

    The plot provides a detailed examination of the ratio of tobacco exposure rates by education level and gender in Louisiana. In line with earlier findings, high school, overall, exhibits higher rates of tobacco exposure for both genders compared to middle school. Furthermore, across both education levels, males consistently demonstrate higher tobacco exposure rates than females. The notable difference between middle school and high school tobacco exposure rates suggests that education level has a more pronounced impact on youth tobacco rates compared to gender effects, corroborating our previous findings. Additionally, it’s observed that there are more samples in middle school than high school and more in females than males.

    Code
    # data pre-processing
    exposure_mosaic <- data2017 |>
      filter(region == "louisiana" & Topic == "Cigarette Use (Youth)" & Res == "Ever" & Gender != "Overall")|>
      mutate(Exposure = ifelse(Res == "Ever", "Had Smoked", Res)) |>
      select(Gender, Edu, Exposure, value, Sample) |>
      mutate(Frequency = round(0.01 * value * Sample))
    never_smoked_mosaic <- exposure_mosaic |>
      mutate(Exposure = "Never Smoked",
             Frequency = Sample - Frequency)
    exposure_mosaic_full <- bind_rows(exposure_mosaic, never_smoked_mosaic) |>
      select(-value, -Sample) |>
      rename(Education_Level = Edu)
    # redefined variable order
    exposure_mosaic_full$Education_Level <- factor(exposure_mosaic_full$Education_Level, levels = c("Middle School", "High School"))
    exposure_mosaic_full$Exposure <- factor(exposure_mosaic_full$Exposure, levels = c("Never Smoked", "Had Smoked"))
    exposure_mosaic_cases <-tidyr::uncount(base::as.data.frame(exposure_mosaic_full), Frequency)
    # mosaic plot
    vcd::mosaic(Exposure ~ Education_Level + Gender, 
                data = exposure_mosaic_cases, 
                direction = c("v","v","h"),
                main="Youth Tobacco Exposure of Louisiana in 2017",
                highlighting_fill = c("lightyellow", '#FFA07A'), 
                gp_labels=(gpar(fontsize=9)))

  • Tobacco Addiction

    The ratio of youth tobacco addiction is very low for both genders in both education level, even if Louisiana has highest percentage of frequent smoker. Tobacco addiction is more prevalent among males during both middle school and high school.

    Code
    # data pre-processing
    frequent_mosaic <- data2017 |>
      filter(region == "louisiana" & Topic == "Cigarette Use (Youth)" & Res == "Frequent" & Gender != "Overall")|>
      mutate(Frequent_Smoker = ifelse(Res == "Frequent", "Yes", Res)) |>
      select(Gender, Edu, Frequent_Smoker, value, Sample) |>
      mutate(Frequency = round(0.01 * value * Sample))
    not_frequent_mosaic <- frequent_mosaic |>
      mutate(Frequent_Smoker = "No",
             Frequency = Sample - Frequency)
    frequent_mosaic_full <- bind_rows(frequent_mosaic, not_frequent_mosaic) |>
      select(-value, -Sample) |>
      rename(Education_Level = Edu)
    # redefined variable order
    frequent_mosaic_full$Education_Level <- factor(frequent_mosaic_full$Education_Level, levels = c("Middle School", "High School"))
    frequent_mosaic_full$Frequent_Smoker <- factor(frequent_mosaic_full$Frequent_Smoker, levels = c("No", "Yes"))
    frequent_mosaic_cases <-tidyr::uncount(base::as.data.frame(frequent_mosaic_full), Frequency)
    # mosaic plot
    vcd::mosaic(Frequent_Smoker ~ Education_Level + Gender, 
                data = frequent_mosaic_cases, 
                direction = c("v","v","h"),
                main="Youth Tobacco Addiction of Louisiana in 2017", 
                highlighting_fill = c("lightyellow", '#FFA07A'), 
                gp_labels=(gpar(fontsize=9)))

3.4.1 Summary

  • Our findings in Louisiana for 2017 align with our previous research.

  • High school students in Louisiana exhibit higher tobacco exposure and addiction rates compared to middle school students.

  • Male students in Louisiana tend to have higher tobacco exposure rates than their female counterparts.

  • Education level differences have a more substantial impact than gender differences on observed tobacco exposure and addiction rates.