Words of wisdom: The linguistic content and popularity of TED talks

Since 1990, the TED conference has featured speakers on the subjects of Technology, Education, and Design (TED), and as the years have gone by, many other subjects.

More than 2500 talks have been published online by the world’s experts, talking about problems and solutions to critical issues of our times. Visiting and viewing huge repository of knowledge, wisdom, and insight has become a healthy addiction of mine, and since March 14, I have committed to watching 1 talk a day.

TED Talks posts a new talk on weekdays, so I realized that seeing one a day would keep me up to date on new postings, but wouldn’t help put a dent in the archive of posted talks. I can read faster that I can view a video, so I undertook to get transcripts of the TED talks to read.

To know what talks I had seen and to measure my progress through the archive, I needed a definitive list of all the TED Talks.

I found out that each TED talk has an ID that used internally on the website. With a little sleuthing I determined that putting together the URL “http://www.ted.com/talks/view/id/” with a number between 1 and 2500 resulted in a TED talk.

I wrote a script in Python that used the Beautiful Soup module to poll the TED website and download the speaker name, talk title, duration, English transcript, number of views, and tags about the content.

The resultant dataset I created is posted here and a visualization to navigate it is here and below:

Then I began to analyze the data, around a few research questions.



Which talks were the most viewed?




