Management Quality in Public Education

What impact does a high-quality superintendent have on student performance? A new NBER paper by Victor Lavy and Adi Boiko investigates this question using data from schools in Israel, finding a significant effect on student achievement (.04 SD) for top-quality superintendents. From the abstract:

We exploit a quasi-random matching of superintendent and schools, and estimate that superintendent value added has positive and significant effects on primary and middle school students’ test scores in math, Hebrew, and English. One standard deviation improvement in superintendent value added increases test scores by about 0.04 of a standard deviation in the test score distribution. The effect doesn’t vary with students’ socio-economic background, is highly non-linear, increases sharply for superintendents in the highest-quartile of the value added distribution, and is larger for female superintendents.

How did top-quality superintendents achieve these results? By improving the focus and clarity of school priorities and procedures, with an emphasis on improving school climate. The authors note the similarity of this management approach to Fryer’s 2014 study of Houston schools adopting a charter-style “no excuses” approach.

The results of this study, along with Fryer’s 2014 study, should inform the training and practice of superintendents in the United States. While successful superintendents will also likely need to develop and demonstrate competency in instructional leadership, finance, HR, and legal matters, the value-add of strength in these domains isn’t as clear. Superintendents have the most impact on student outcomes when they help bring focus and clarity to schools while also emphasizing strong school culture. The leaders and programs that embrace this approach are likely to deliver better outcomes for their students in the long run.

Measuring Success in Education

Do low-stakes international assessments of student performance accurately describe student ability? According to this recent NBER Working paper, a country’s results may depend on their students’ intrinsic motivation to do well.

The authors used an experimental approach that offered treatment students a small financial incentive to do well on a test. Students in the treatment group were told just before the test that an envelope with the equivalent of $25 (US) was theirs, but that $1 would be subtracted for each incorrect answer. The researchers conducted the experiment in three high schools in China and two in the US. They found no impact of the financial incentive in China, but the effect size in the US was 0.20-0.23 standard deviations.

The authors conclude that the massive difference in treatment effect between countries indicates that “success on low states assessments does not solely reflect differences in ability between students across countries” and that “low-stakes tests do not measure and compare ability in isolation.”

With this in mind, researchers, policymakers, and the public should be a little more cautious when interpreting the results of low-stakes assessments like PISA. Student motivation clearly plays a significant role in the results of a country’s performance - a topic that deserves further investigation. For example, NCES will release new NAEP results in early 2018. Does student motivation vary significantly for NAEP across state lines? If so, how much of a role (if any) is it contributing to state performance?

Low-stakes assessments can be informative, but they may be measuring more than student ability. This study is a reminder to policymakers and advocates to exercise caution when comparing performance on these tests across jurisdictions.

Moving beyond a craft

Tyler Cowen recently interviewed Atul Gawande on his podcast, Conversations with Tyler. It’s a great episode and worth listening to the whole thing, but one section in particular stood out to me. From the transcript:

COWEN: What’s the number one thing missing in medical education today, for doctors?

GAWANDE: I think the number one thing is an education around the fact that we are no longer a craft. It’s no longer an individual craft of being the smartest, most experienced, and capable individual. It’s a profession that has exceeded the capabilities of any individual to manage the volume of knowledge and skill required. So we are now delivering as groups of people. And knowing how to be an effective group, how to solve problems when your group is not being effective, and to enable that capability—that, I think, is not being taught, it’s not being researched. It is the biggest opportunity to advance human health, and we’re not delivering on it.

Having worked in the education sector for a decade, it’s hard to listen to that and not consider replacing “medical education” with “educator preparation” and “human health” with “student outcomes.”

Teaching is certainly a craft that requires significant training, practice, and experience to master, but it also requires a high degree of collaboration with other teachers, administrators, support staff, parents, and students. Similar to medicine, it’s an under-taught, under-researched aspect of the profession.

How long is your project?

Two years ago, we sold our house in Branford - a beautiful shoreline town just east of New Haven - and moved to Kentucky. Of all the things that we could miss about our life in Connecticut, like lobster rolls or getting Patriots games on broadcast channels, what we miss most is the tight-knit community in Branford. We couldn’t go to the grocery store or get an ice cream cone without chatting with a neighbor, a family from a lacrosse team I coached, or someone from our gym.

Last Friday, that community experienced a tragic loss: a 10 year old boy drowned in the Branford River. We woke up to the details of this event on Saturday morning. My wife turned to me and said, “It’s Ben.”

Ben, number two of Dave’s three sons, at only 10 years old was an impressive athlete and just a wonderful person to be around. He has the kind of smile that once you see, you can’t forget it.

I briefly coached Ben in lacrosse. There were a few times my U11 team was short on players, so Ben and his older brother played up an age group so we could field a full team. Even though he was the youngest on the field, Ben played with a courage, confidence, and athleticism that made him stand out.

More than knowing Ben as an athlete, we got to know Ben’s family. His father, Dave, is a coach at the gym we attended and a handyman that helped us remodel our kitchen. As Dave was hanging drywall, Ben and his older brother played in our living room and yard. Anyone that spent a few minutes with the Callahans would walk away knowing that they are a family that exudes joy, energy, and love.

All weekend, the incredible pain and grief of our Branford friends dominated our Facebook feeds. Those who know the Callahans from the gym, youth sports, and Sliney Elementary shared their sorrow over this tragic loss and their support for the Callahan family during this time.

Last night, the Branford Town Green lived up to its role as the heart of the community. Thousands of people showed up to celebrate Ben’s life. Thanks to the reporting of Steve at, I was able to watch a live stream of the ceremony.

The Branford First Selectman, the Branford Public Schools Superintendent, coaches, teachers, and friends all stood up to speak about the impact Ben had on their lives. Two young men shared what they learned from Ben:

  1. Never give up.
  2. Always hug your friends.

They then asked the thousands in attendance to hug their neighbors. Everyone did. It was Branford at its best: loving and united.

One hour, nineteen minutes, and thirty seconds into the live stream, Ben’s father Dave stood up and spoke. He spoke for about 10 minutes - it’s worth watching all of it. He closed with the following:

Ben was given a 10 year project. God knew that before he made the world. He knew exactly what he had for Ben to do. He knew exactly how long it was going to take and man, did he kill it - he was awesome!

But I want you guys to ask yourselves a question: how long is your project? How long is it going to take?

If Ben had postponed his decision for Jesus until he was in high school, or done with college, or when he got married and had kids, then he would have missed it. So I want you guys to search your heart and look for that daily relationship. It’s not religion - it’s a relationship.

The walk into eternity that Ben took on Friday was determined not by his actions, but by his decision. As much as I love him and always will, I know that God loves him more. He loves everybody here in this audience so much that he was willing to watch his son die and I’ll tell you right now, after going through it, it’s not something I would do voluntarily.

If Ben’s earthly life lasted 10 years, 6 months, and 15 days after it started and then he continues in Heaven, that’s brilliant, that’s beautiful. But if one person here, or ten people, or a thousand people here are moved to search their hearts and establish their relationship with Jesus or truly commit to it and they join Ben in Heaven when they go, then this is truly worth it.

Thank you so much guys, I’m so blessed to have you all here.

Dave spoke with so much love, gratitude, and clarity that it’s hard for me to fathom given the pain he’s experiencing. For me, it’s difficult to understand how he could speak so eloquently in the wake of such an emotional tragedy without God’s help. It was one of the greatest displays of strength I’ve ever seen.

I’m taking up Dave’s challenge. I’m searching my heart, and for the first time in a long time, I opened my Bible this morning and prayed. I prayed for Ben, for Dave and the Callahan family, and everyone else touched by Ben’s brief, brilliant life.

Seeing a jersey with the number 2 will always be a reminder of the joy and love Ben Callahan exuded. He did a heck of a job making an impact on his community in the 10 years, 6 months, and 15 days we got to share with him.

How long is your project?

If you’d like to support the Callahans:

Tidytext analysis of podcast transcripts

I listen to podcasts for several hours each day. When I wake up, I launch Overcast and listen to a podcast while I have coffee. They’re playing my drive to work, sometimes while I work,1 and when I’m running my shutdown routine at night to get ready for the next day.

I love my podcasts.

Earlier this year, _DavidSmith released a side project, Podcast Search. His description of this project:

“I take a few podcasts and run them through automated speech-to-text, which is useless for reading, but works out to be just fine for keyword searching.”

It’s a really cool site - particularity since it produces searchable quasi-transcripts for a few of my favorite podcasts:

They aren’t perfect, but it’s a pretty cool tool, and it got me thinking: wouldn’t it be cool to take that text and run it through some tidytext functions?

So I did.

I started by scraping the raw HTML for each episode of ATP, Cortex, Hypercritical, and The Talk Show. I then did a little data cleaning to get the text from Podcast Search into a tidy format. There are still a few episodes of ATP and Hypercritical processing through _DavidSmith’s speech-to-text algorithm, but there’s enough data from each podcast to do some cursory text analysis using tidytext.

As of 6 June 2017, I have about 1.1 million lines of text from these four podcasts to analyze. Once I loaded the data, it was pretty easy to parse out the words spoken in each podcast using the unnest_tokens() function from tidytext.

# load libraries

# load transcript data
atp <- read_rds("data/atp.rda")
cortex <- read_rds("data/cortex.rda")
hypercritical <- read_rds("data/hypercritical.rda")
the_talk_show <- read_rds("data/tts.rda")

# join podcast data together
podcasts <- bind_rows(atp, cortex, hypercritical, the_talk_show)

# get count of words by podcast
podcast_words <- podcasts %>% 
  unnest_tokens(word, line) %>%
  anti_join(stop_words) %>%
  count(podcast, word) %>%

Next, I wanted to use tf-idf to determine which words were most important to each podcast. Not familiar with this approach? Here’s a description from the authors of the tidytext package:

The idea of tf-idf is to find the important words for the content of each document by decreasing the weight for commonly used words and increasing the weight for words that are not used very much in a collection or corpus of documents…

Calculating tf-idf attempts to find the words that are important (i.e., common) in a text, but not too common.

In other words, if one of these podcasts uses an uncommon word more than others, this approach will help identify it. Here is the code I used to do this:

# count total words per podcast
total_words <- podcast_words %>%
  group_by(podcast) %>%
  summarise(total = sum(n))

# join individual word count and total word count
# get tf_idf by podcast
podcast_words <- left_join(podcast_words, total_words) %>% 
  bind_tf_idf(word, podcast, n)
# plot top 10 tf_idf words by podcast
podcast_words %>%  
  group_by(podcast) %>% 
  top_n(10) %>% 
  ungroup() %>% 
  ggplot(aes(reorder(word, tf_idf), tf_idf, fill = podcast)) +
  geom_col(show.legend=FALSE) +
  facet_wrap(~podcast, scales = "free") +
  coord_flip() +
  labs(x = "tf-idf",
       title = "Most Important Words by Podcast",
       subtitle = "Measured by tf-idf") +
  scale_fill_manual(values = c("blue4", "grey18",
                               "palegreen4", "lightsteelblue4"))+
  theme(legend.position = "none",
        axis.title.y = element_blank())

podcast plot

This is a pretty interesting plot, but there’s definitely some noise here. It’s clear that there are some words that slipped through the stop_words filter (like “dont”) and some words that got mixed up during the speech-to-text process like (“fd”).

There’s also a lot of noise coming from each podcast’s ad reads. I’m sure this makes the folks at Casper, Lynda, Betterment, etc. happy, but I’d like to know a little more about the actual content of these podcasts. Let’s filter those words and plot again using this code:

podcast_plot_clean <- podcast_words %>% 
  arrange(desc(tf_idf)) %>% 
  filter(word != "mattress") %>%
  filter(word != "casper") %>%
  filter(word != "betterment") %>%
  filter(word != "lynda") %>%
  filter(word != "5x5") %>%
  filter(word != "rackspace") %>%
  filter(word != "") %>%
  filter(word != "dont") %>%
  filter(word != "tts") %>%
  filter(word != "afm") %>%
  filter(word != "fd") %>%
  filter(word != "wealthfront") %>%
  filter(word != "apron") %>%
  filter(word != "cgpgrey") %>%
  mutate(word = factor(word, levels = rev(unique(word))))

podcast_plot_clean %>%  
  group_by(podcast) %>% 
  top_n(10) %>% 
  ungroup() %>% 
  ggplot(aes(word, tf_idf, fill = podcast)) +
  geom_col(show.legend=FALSE) +
  facet_wrap(~podcast, scales = "free") +
  coord_flip() +
  labs(x = "tf-idf",
       title = "Most Important Words by Podcast",
       subtitle = "Measured by tf-idf") +
  scale_fill_manual(values = c("blue4", "grey18",
                               "palegreen4", "lightsteelblue4"))+
  theme(legend.position = "none",
        axis.title.y = element_blank())

clean podcast plot

This plot provides a much better understanding of what makes each of these podcasts unique:

  • ATP is clearly a very technical show. From API’s to PHP, Casey, Marco, and John spend a lot of time in the weeds of technical issues. If it were one word, “file system” (🛎) would have been on this plot.
  • Cortex, a show about the working habits of Myke Hurley and CGP Grey, is well-represented on this plot. Productivity topics come through clearly (coworking, Trello, Todoist, Amsterdam workcations), as well as their involvement in the YouTube community (Pewdiepie, vlog(s), vidcon, patreon). Grey’s frequent “mhm”-ing also came through clearly via speech-to-text.
  • Hypercritical’s most important words are the most complex, a reflection of the preparation and intentionality John Siracusa brought to this show. “Rumination” is the perfect word to emerge as the most important for Hypercritical.2
  • The Talk Show’s words are too perfect. John Gruber’s son Jonas takes the top spot, followed by ⚾️ (his beloved Yankees take the 10 spot). Two of the words (Vesper and bourbon) can be represented by cocktail 🍸🥃 emoji, which I’m sure would make him proud. I’m glad “dingus” and Han Solo made the list too. “Iowa” and “bomber” made me scratch my head, but once I looked at the transcripts, it was obvious this is how the speech-to-text interpreted how Gruber pronounces “iOS” and (Steve) “Ballmer”.

This was a pretty fun side project and there’s still a ton of data to analyze. I’ll likely write some additional posts on this front, but until then, follow this repo to see what I’m working on.

  1. Never while I code. I can’t code and listen to vocal music. 

  2. 28 of Hypercritical’s 100 episodes are still processing through the speech-to-text algorithm, so they were unavailable for this analysis.