Measuring Success in Education

Do low-stakes international assessments of student performance accurately describe student ability? According to this recent NBER Working paper, a country’s results may depend on their students’ intrinsic motivation to do well.

The authors used an experimental approach that offered treatment students a small financial incentive to do well on a test. Students in the treatment group were told just before the test that an envelope with the equivalent of $25 (US) was theirs, but that $1 would be subtracted for each incorrect answer. The researchers conducted the experiment in three high schools in China and two in the US. They found no impact of the financial incentive in China, but the effect size in the US was 0.20-0.23 standard deviations.

The authors conclude that the massive difference in treatment effect between countries indicates that “success on low states assessments does not solely reflect differences in ability between students across countries” and that “low-stakes tests do not measure and compare ability in isolation.”

With this in mind, researchers, policymakers, and the public should be a little more cautious when interpreting the results of low-stakes assessments like PISA. Student motivation clearly plays a significant role in the results of a country’s performance - a topic that deserves further investigation. For example, NCES will release new NAEP results in early 2018. Does student motivation vary significantly for NAEP across state lines? If so, how much of a role (if any) is it contributing to state performance?

Low-stakes assessments can be informative, but they may be measuring more than student ability. This study is a reminder to policymakers and advocates to exercise caution when comparing performance on these tests across jurisdictions.

Moving beyond a craft

Tyler Cowen recently interviewed Atul Gawande on his podcast, Conversations with Tyler. It’s a great episode and worth listening to the whole thing, but one section in particular stood out to me. From the transcript:

COWEN: What’s the number one thing missing in medical education today, for doctors?

GAWANDE: I think the number one thing is an education around the fact that we are no longer a craft. It’s no longer an individual craft of being the smartest, most experienced, and capable individual. It’s a profession that has exceeded the capabilities of any individual to manage the volume of knowledge and skill required. So we are now delivering as groups of people. And knowing how to be an effective group, how to solve problems when your group is not being effective, and to enable that capability—that, I think, is not being taught, it’s not being researched. It is the biggest opportunity to advance human health, and we’re not delivering on it.

Having worked in the education sector for a decade, it’s hard to listen to that and not consider replacing “medical education” with “educator preparation” and “human health” with “student outcomes.”

Teaching is certainly a craft that requires significant training, practice, and experience to master, but it also requires a high degree of collaboration with other teachers, administrators, support staff, parents, and students. Similar to medicine, it’s an under-taught, under-researched aspect of the profession.

How long is your project?

Two years ago, we sold our house in Branford - a beautiful shoreline town just east of New Haven - and moved to Kentucky. Of all the things that we could miss about our life in Connecticut, like lobster rolls or getting Patriots games on broadcast channels, what we miss most is the tight-knit community in Branford. We couldn’t go to the grocery store or get an ice cream cone without chatting with a neighbor, a family from a lacrosse team I coached, or someone from our gym.

Last Friday, that community experienced a tragic loss: a 10 year old boy drowned in the Branford River. We woke up to the details of this event on Saturday morning. My wife turned to me and said, “It’s Ben.”

Ben, number two of Dave’s three sons, at only 10 years old was an impressive athlete and just a wonderful person to be around. He has the kind of smile that once you see, you can’t forget it.

I briefly coached Ben in lacrosse. There were a few times my U11 team was short on players, so Ben and his older brother played up an age group so we could field a full team. Even though he was the youngest on the field, Ben played with a courage, confidence, and athleticism that made him stand out.

More than knowing Ben as an athlete, we got to know Ben’s family. His father, Dave, is a coach at the gym we attended and a handyman that helped us remodel our kitchen. As Dave was hanging drywall, Ben and his older brother played in our living room and yard. Anyone that spent a few minutes with the Callahans would walk away knowing that they are a family that exudes joy, energy, and love.

All weekend, the incredible pain and grief of our Branford friends dominated our Facebook feeds. Those who know the Callahans from the gym, youth sports, and Sliney Elementary shared their sorrow over this tragic loss and their support for the Callahan family during this time.

Last night, the Branford Town Green lived up to its role as the heart of the community. Thousands of people showed up to celebrate Ben’s life. Thanks to the reporting of Steve at, I was able to watch a live stream of the ceremony.

The Branford First Selectman, the Branford Public Schools Superintendent, coaches, teachers, and friends all stood up to speak about the impact Ben had on their lives. Two young men shared what they learned from Ben:

  1. Never give up.
  2. Always hug your friends.

They then asked the thousands in attendance to hug their neighbors. Everyone did. It was Branford at its best: loving and united.

One hour, nineteen minutes, and thirty seconds into the live stream, Ben’s father Dave stood up and spoke. He spoke for about 10 minutes - it’s worth watching all of it. He closed with the following:

Ben was given a 10 year project. God knew that before he made the world. He knew exactly what he had for Ben to do. He knew exactly how long it was going to take and man, did he kill it - he was awesome!

But I want you guys to ask yourselves a question: how long is your project? How long is it going to take?

If Ben had postponed his decision for Jesus until he was in high school, or done with college, or when he got married and had kids, then he would have missed it. So I want you guys to search your heart and look for that daily relationship. It’s not religion - it’s a relationship.

The walk into eternity that Ben took on Friday was determined not by his actions, but by his decision. As much as I love him and always will, I know that God loves him more. He loves everybody here in this audience so much that he was willing to watch his son die and I’ll tell you right now, after going through it, it’s not something I would do voluntarily.

If Ben’s earthly life lasted 10 years, 6 months, and 15 days after it started and then he continues in Heaven, that’s brilliant, that’s beautiful. But if one person here, or ten people, or a thousand people here are moved to search their hearts and establish their relationship with Jesus or truly commit to it and they join Ben in Heaven when they go, then this is truly worth it.

Thank you so much guys, I’m so blessed to have you all here.

Dave spoke with so much love, gratitude, and clarity that it’s hard for me to fathom given the pain he’s experiencing. For me, it’s difficult to understand how he could speak so eloquently in the wake of such an emotional tragedy without God’s help. It was one of the greatest displays of strength I’ve ever seen.

I’m taking up Dave’s challenge. I’m searching my heart, and for the first time in a long time, I opened my Bible this morning and prayed. I prayed for Ben, for Dave and the Callahan family, and everyone else touched by Ben’s brief, brilliant life.

Seeing a jersey with the number 2 will always be a reminder of the joy and love Ben Callahan exuded. He did a heck of a job making an impact on his community in the 10 years, 6 months, and 15 days we got to share with him.

How long is your project?

If you’d like to support the Callahans:

Tidytext analysis of podcast transcripts

I listen to podcasts for several hours each day. When I wake up, I launch Overcast and listen to a podcast while I have coffee. They’re playing my drive to work, sometimes while I work,1 and when I’m running my shutdown routine at night to get ready for the next day.

I love my podcasts.

Earlier this year, _DavidSmith released a side project, Podcast Search. His description of this project:

“I take a few podcasts and run them through automated speech-to-text, which is useless for reading, but works out to be just fine for keyword searching.”

It’s a really cool site - particularity since it produces searchable quasi-transcripts for a few of my favorite podcasts:

They aren’t perfect, but it’s a pretty cool tool, and it got me thinking: wouldn’t it be cool to take that text and run it through some tidytext functions?

So I did.

I started by scraping the raw HTML for each episode of ATP, Cortex, Hypercritical, and The Talk Show. I then did a little data cleaning to get the text from Podcast Search into a tidy format. There are still a few episodes of ATP and Hypercritical processing through _DavidSmith’s speech-to-text algorithm, but there’s enough data from each podcast to do some cursory text analysis using tidytext.

As of 6 June 2017, I have about 1.1 million lines of text from these four podcasts to analyze. Once I loaded the data, it was pretty easy to parse out the words spoken in each podcast using the unnest_tokens() function from tidytext.

# load libraries

# load transcript data
atp <- read_rds("data/atp.rda")
cortex <- read_rds("data/cortex.rda")
hypercritical <- read_rds("data/hypercritical.rda")
the_talk_show <- read_rds("data/tts.rda")

# join podcast data together
podcasts <- bind_rows(atp, cortex, hypercritical, the_talk_show)

# get count of words by podcast
podcast_words <- podcasts %>% 
  unnest_tokens(word, line) %>%
  anti_join(stop_words) %>%
  count(podcast, word) %>%

Next, I wanted to use tf-idf to determine which words were most important to each podcast. Not familiar with this approach? Here’s a description from the authors of the tidytext package:

The idea of tf-idf is to find the important words for the content of each document by decreasing the weight for commonly used words and increasing the weight for words that are not used very much in a collection or corpus of documents…

Calculating tf-idf attempts to find the words that are important (i.e., common) in a text, but not too common.

In other words, if one of these podcasts uses an uncommon word more than others, this approach will help identify it. Here is the code I used to do this:

# count total words per podcast
total_words <- podcast_words %>%
  group_by(podcast) %>%
  summarise(total = sum(n))

# join individual word count and total word count
# get tf_idf by podcast
podcast_words <- left_join(podcast_words, total_words) %>% 
  bind_tf_idf(word, podcast, n)
# plot top 10 tf_idf words by podcast
podcast_words %>%  
  group_by(podcast) %>% 
  top_n(10) %>% 
  ungroup() %>% 
  ggplot(aes(reorder(word, tf_idf), tf_idf, fill = podcast)) +
  geom_col(show.legend=FALSE) +
  facet_wrap(~podcast, scales = "free") +
  coord_flip() +
  labs(x = "tf-idf",
       title = "Most Important Words by Podcast",
       subtitle = "Measured by tf-idf") +
  scale_fill_manual(values = c("blue4", "grey18",
                               "palegreen4", "lightsteelblue4"))+
  theme(legend.position = "none",
        axis.title.y = element_blank())

podcast plot

This is a pretty interesting plot, but there’s definitely some noise here. It’s clear that there are some words that slipped through the stop_words filter (like “dont”) and some words that got mixed up during the speech-to-text process like (“fd”).

There’s also a lot of noise coming from each podcast’s ad reads. I’m sure this makes the folks at Casper, Lynda, Betterment, etc. happy, but I’d like to know a little more about the actual content of these podcasts. Let’s filter those words and plot again using this code:

podcast_plot_clean <- podcast_words %>% 
  arrange(desc(tf_idf)) %>% 
  filter(word != "mattress") %>%
  filter(word != "casper") %>%
  filter(word != "betterment") %>%
  filter(word != "lynda") %>%
  filter(word != "5x5") %>%
  filter(word != "rackspace") %>%
  filter(word != "") %>%
  filter(word != "dont") %>%
  filter(word != "tts") %>%
  filter(word != "afm") %>%
  filter(word != "fd") %>%
  filter(word != "wealthfront") %>%
  filter(word != "apron") %>%
  filter(word != "cgpgrey") %>%
  mutate(word = factor(word, levels = rev(unique(word))))

podcast_plot_clean %>%  
  group_by(podcast) %>% 
  top_n(10) %>% 
  ungroup() %>% 
  ggplot(aes(word, tf_idf, fill = podcast)) +
  geom_col(show.legend=FALSE) +
  facet_wrap(~podcast, scales = "free") +
  coord_flip() +
  labs(x = "tf-idf",
       title = "Most Important Words by Podcast",
       subtitle = "Measured by tf-idf") +
  scale_fill_manual(values = c("blue4", "grey18",
                               "palegreen4", "lightsteelblue4"))+
  theme(legend.position = "none",
        axis.title.y = element_blank())

clean podcast plot

This plot provides a much better understanding of what makes each of these podcasts unique:

  • ATP is clearly a very technical show. From API’s to PHP, Casey, Marco, and John spend a lot of time in the weeds of technical issues. If it were one word, “file system” (🛎) would have been on this plot.
  • Cortex, a show about the working habits of Myke Hurley and CGP Grey, is well-represented on this plot. Productivity topics come through clearly (coworking, Trello, Todoist, Amsterdam workcations), as well as their involvement in the YouTube community (Pewdiepie, vlog(s), vidcon, patreon). Grey’s frequent “mhm”-ing also came through clearly via speech-to-text.
  • Hypercritical’s most important words are the most complex, a reflection of the preparation and intentionality John Siracusa brought to this show. “Rumination” is the perfect word to emerge as the most important for Hypercritical.2
  • The Talk Show’s words are too perfect. John Gruber’s son Jonas takes the top spot, followed by ⚾️ (his beloved Yankees take the 10 spot). Two of the words (Vesper and bourbon) can be represented by cocktail 🍸🥃 emoji, which I’m sure would make him proud. I’m glad “dingus” and Han Solo made the list too. “Iowa” and “bomber” made me scratch my head, but once I looked at the transcripts, it was obvious this is how the speech-to-text interpreted how Gruber pronounces “iOS” and (Steve) “Ballmer”.

This was a pretty fun side project and there’s still a ton of data to analyze. I’ll likely write some additional posts on this front, but until then, follow this repo to see what I’m working on.

  1. Never while I code. I can’t code and listen to vocal music. 

  2. 28 of Hypercritical’s 100 episodes are still processing through the speech-to-text algorithm, so they were unavailable for this analysis. 

Analyzing the analysts with tidytext

The publication of a Capstone Report is the culmination of the Strategic Data Project’s Fellowship. These reports highlight the impact Fellows have on their respective education agencies. There are more than thirty Capstone reports on the SDP website, sorted into four broad categories:

  • Data Capacity, Quality, & Culture
  • Teacher Effectiveness
  • Post-secondary Access & Success
  • School Improvement & Redesign

This gives us a general idea of the topics Fellows were covering in their reports, but it would be nice to get a little more detail without having to read more than two dozen reports.

Enter tidytext.

I’ve been meaning to play around with the tidytext R package for several months. After seeing a really fun blog post using it to analyze Seinfeld scripts, I decided to try and apply the tidytext tools to the SDP Capstone reports, so I read through a few chapters of Text Mining with R and got to work.

Getting the reports

Getting the text of each Capstone Report wasn’t too challenging. I started by parsing the lines of the webpage that you’d use to manually download the PDFs of each capstone report, then I filtered it for all the direct links to each individual PDF. After running a simple for loop, I had all 38 PDFs on my MacBook. Here’s the code I used:


# download report pdf's ####

# get content of sdp capstone report webpage
sdp_url <- getURL("")
sdp_webpage <- readLines(tc <- textConnection(sdp_url)); close(tc)

# create df of webpage content
sdp_df <- tibble(line = 1:360, content = sdp_webpage)

# find and clean links to report pdf's
capstone_links <- sdp_df %>%
  mutate(link_present = str_detect(content, "")) %>%
  filter(link_present == TRUE) %>%
  mutate(clean_link = str_extract(content, "\\.pdf"))

# download report pdfs and save them in '/reports'
for(i in seq_along(1:length(capstone_links$clean_link))){
  report_url <- capstone_links$clean_link[i]
  download.file(report_url, str_c("reports/capstone", i,".pdf"))

Getting the text

After downloading the Capstone Reports, I needed to extract the text in each report, so I relied a function from the pdftools package. My code could probably be more efficient and there were some font-related errors, but I was able to extract the text from each report as a series of bigrams and then plot the result. Again, here’s the code I used:


# create empty df
bigram_df <- tibble(word1 = character(), word2 = character(), 
                    n = numeric(), report_num = numeric())

# loop through each capstone pdf
for(k in 1:38){
  # parse text from capstone pdf
  report_text <- pdf_text(str_c("reports/capstone", k,".pdf"))
  # unnest text into bigrams
  report_bigrams <- tibble(chapter = report_text) %>%
    unnest_tokens(bigram, chapter,token = "ngrams", n = 2)
  # split bigrams into separate columns
  bigrams_sep <- report_bigrams %>% 
    separate(bigram, c("word1", "word2"), sep = " ")
  # filter out stopwords, numbers, dupes; add count and report number
  bigrams_filtered <- bigrams_sep %>% 
    filter(!word1 %in% stop_words$word) %>% 
    filter(!word2 %in% stop_words$word) %>% 
    filter(!str_detect(word1, "[0-9]")) %>% 
    filter(!str_detect(word2, "[0-9]")) %>% 
    filter(word1 != word2) %>%
    count(word1, word2, sort = TRUE) %>%
    mutate(report_num = k)
  # add to df
  bigram_df <- bind_rows(bigram_df, bigrams_filtered)

# prep data for graphing; only count bigrams w/ n > 20
bigram_graph <- bigram_df %>% 
  filter(n > 20) %>% 

# set arrow type
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))


# plot bigram graph
ggraph(bigram_graph, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.03, 'inches')) +
  geom_node_point(color = "lightblue", size = 3) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +

# save plot
ggsave("figures/capstone_bigrams.png", width = 12, height = 8, units = "in")

capstone bigrams

The resulting plot paints a nice picture of what SDP Fellows wrote about in their Capstone Reports. There’s a lot of focus on college readiness and teacher effectiveness, which were two of the “buckets” on the SDP website, but it’s helpful to see what other words were frequently tied to those phrases. It also helps to see what some of the less-popular topics covered.

The tidytext package is a ton of fun to work with - I’m looking forward to using it more and finding new sources of text to analyze!