A pointless study for a pontificating party
One of the major themes of the most recent republican debate was reiteration: which candidate repeats himself the most? Who dodges questions by repeating the same, recited spiel over and over again, speaking words but not saying much? I was getting sick of hearing them all accuse each other (though it does make for great prime time television) and decided to figure out the true answer in the only way I know how.
First I found a transcript of the New Hampshire republican debate that occurred on February 6th, courtesy of the Washington Post. I copied and pasted the entire transcript into a word document. Then I used the "find" function to highlight every time the words "TRUMP: ", "RUBIO: ", or "CRUZ: " were used, since these were all the times they spoke.
Now came the annoying part: I had to highlight only their text by running through the entire transcript and looking for those highlighted names. By holding down CTRL and highlighting, I was able to highlight multiple different texts at once. I pasted each of the candidates' text into its own word document.
Then I had to purge each candidate's speech: periods, commas, quotation marks, question marks, hyphens, and even paragraph breaks and double spaces in order to get one long string of each of the words they spoke with only one space between them. I did this using the "find and replace" function again, only this time I replaced each non-word character with nothing, so that it was simply just eliminated. The transcript also included parenthetical actions of the crowd (e.g. APPLAUSE, or BOOING) which I eliminated by using a method I found from Writer's Technology. It's the same principle, but with a work-around for parentheses.
Once the text was purged, I then pasted each candidate's text into the first cell of each of three worksheets. I then used the "text to columns" function using a space as the delimitation, which separated almost every word from the next one into >3000 separate cells of adjacent columns. I say "almost" because there was some rate of error with some weird space-like characters, but I'm not sure what caused this. To get rid of them, I copied the entire >3000-cell long row and used a transpose paste to get them into a single column, with which I used the "text to columns" function again, but this time using the "other" delimitation option and copying and pasting one of these weird space-like characters into the field. This separated them out further, and there were relatively few of these so I copied and pasted them into new rows to get all the words into a single column.
At this point, the entire calculation rested on this one formula I found at Microsoft support which tells you the number of unique entries (in this case, unique words) in a data set.
Then it was just simple division. The number of unique words divided by the total number of words should tell you how much "new material" each candidate gives you every time they speak. As an added bonus, I calculated words per second each candidate spoke, based off speaking times courtesy of Politico.
Don't be so cocky Rubio, you're a close second.