In the last post, I aggregated the counts of tweets in python and generated a table that I used in R. This time I wanted to go from the raw input of tweets and the output of sentiment analysis to aggregated hourly counts using R, and this time for movies that came out on Jan 11, 2013 (Gangster Squad, Zero Dark Thirty, and A Haunted House.
Do this in R turned out to be harder than I expected and I had to install some libraries, namely “zoo” (which stands for “z ordered observations”)and “chron”, which can be seen in lines 1-5 in the gist below*. I also had to massage the data a little more than I was expecting. After reading it in (line 8), I had to muck around with making the tweet column be character strings instead of factors, and parsing the date info into a zoo object. Line 20 actually does the aggregation. After plotting it, I noticed a spike on Sunday evening. It turned out there was a lot of (re)tweets about Zero Dark Thirty and one about a ticket give-away for Gangster Squad:
“RT @goldenglobes: Best Actress in a Motion Picture – Drama – Jessica Chastain – Zero Dark Thirty – #GoldenGlobes”
“RT @vuecinemas: Help stop the mob with #GangsterSquad on Jan 10th! To win one of 5 movie packs, follow us and retweet this message by 5p …”
“Woot! RT @Bad_Wobot1013: Awesome Jessica Chastain!! ”
*Note that if you’re using Linux, you’ll need to have the R-devel packages to build this libraries in the installation process