Deep Dreaming at Vixlet

Last weekend I started playing around with Deep Dreams, an artistic, surreal application of Google’s deep belief neural networks and applied it to a picture from my colleague Steve Eldridge of Vixlet HQ’s inspiring view of downtown LA:


Here’s what it looks like at the deepest layer of the deep dream neural network (I’ll explain what that means shortly).


Although there is a tutorial already, I thought I’d make a blog post about it because I had to make some modifications to the code there and people were asking about how I got it working.

First of all, the part that I had to modify was the image loading commands.  The tutorial used ipython notebooks, which I’m not very famliar with. Also they loaded the image into a numpy array via PIL, python image library.  I’m not sure if it was due to PIL or ipython notebooks, but this didn’t work for me. Instead, I loaded the image to a numpy array via opencv (called cv2 in python).  That’s an obvious thing to try if you’re familiar with computer vision in python, but otherwise it would be a place to get stuck.  So basically, I used the code from the tutorial but added it to a single script and changed a couple of lines:

#!/usr/bin/env python
# imports and basic notebook setup
from cStringIO import StringIO
import numpy as np
import scipy.ndimage as nd
import PIL.Image
import sys
from IPython.display import clear_output, Image, display
from google.protobuf import text_format
import caffe
def showarray(a, fmt='jpeg'):
a = np.uint8(np.clip(a, 0, 255))
f = StringIO()
PIL.Image.fromarray(a).save(f, fmt)
model_path = '../caffe/models/bvlc_googlenet/' # substitute your path here
net_fn = model_path + 'deploy.prototxt'
param_fn = model_path + 'bvlc_googlenet.caffemodel'
# Patching model to be able to compute gradients.
# Note that you can also manually add "force_backward: true" line to "deploy.prototxt".
model =
text_format.Merge(open(net_fn).read(), model)
model.force_backward = True
open('tmp.prototxt', 'w').write(str(model))
net = caffe.Classifier('tmp.prototxt', param_fn,
mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent
channel_swap = (2,1,0)) # the reference model has channels in BGR order instead oF\
# a couple of utility functions for converting to and from Caffe's input image layout
def preprocess(net, img):
return np.float32(np.rollaxis(img, 2)[::1]) net.transformer.mean['data']
def deprocess(net, img):
return np.dstack((img + net.transformer.mean['data'])[::1])
def make_step(net, step_size=1.5, end='inception_4c/output', jitter=32, clip=True):
'''Basic gradient ascent step.'''
src = net.blobs['data'] # input image is stored in Net's 'data' blob
dst = net.blobs[end]
ox, oy = np.random.randint(jitter, jitter+1, 2)[0] = np.roll(np.roll([0], ox, 1), oy, 2) # apply jitter shift
dst.diff[:] = # specify the optimization objective
g = src.diff[0]
# apply normalized ascent step to the input image[:] += step_size/np.abs(g).mean() * g[0] = np.roll(np.roll([0], ox, 1), oy, 2) # unshift image
if clip:
bias = net.transformer.mean['data'][:] = np.clip(, bias, 255bias)
def deepdream(net, base_img, iter_n=10, octave_n=4, octave_scale=1.4, end='inception_4c/output', clip=True, **step_params):
# prepare base images for all octaves
octaves = [preprocess(net, base_img)]
for i in xrange(octave_n1):
octaves.append(nd.zoom(octaves[1], (1, 1.0/octave_scale,1.0/octave_scale), order=1))
src = net.blobs['data']
detail = np.zeros_like(octaves[1]) # allocate image for network-produced details
for octave, octave_base in enumerate(octaves[::1]):
h, w = octave_base.shape[2:]
if octave > 0:
# upscale details from the previous octave
h1, w1 = detail.shape[2:]
detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)
src.reshape(1,3,h,w) # resize the network's input image size[0] = octave_base+detail
for i in xrange(iter_n):
make_step(net, end=end, clip=clip, **step_params)
# visualization
vis = deprocess(net,[0])
if not clip: # adjust image contrast if clipping is disabled
vis = vis*(255.0/np.percentile(vis, 99.98))
print octave, i, end, vis.shape
# extract details produced on the current octave
detail =[0]octave_base
# returning the resulting image
return deprocess(net,[0])
# this is my humble addition/modification
import cv2
if not len(sys.argv) > 2: # check arguments to script
print "Usage: ./ input_file output_file [level]"
layer = "inception_4c/output"
if len(sys.argv) == 4: # set default end/layer
layer = sys.argv[3]
img = cv2.imread(sys.argv[1])
dd = deepdream(net, img, end=layer)
cv2.imwrite(sys.argv[2], dd)

view raw
hosted with ❤ by GitHub

The core python library that performs the neural network operations is Caffe. If the code above doesn’t work, it could be because I used a version of Caffe that I compiled to make use of the NVidia Cuda libraries.  I don’t think that is necessary, but I had been using that to speed up unrelated computer vision work I’ve been doing.

Now I’ll explain a little bit about the inception neural network that deep dreams is based on.  The neural network is visualized below.


You may want to open the pdf (

Click to access tmp.pdf

) to see it in more detail.  The left is the lower level of the neural network that processes the image input and the right is the deeper level of the neural network.  The intermediate levels range from the shallow end which identifies edges and shapes, to patterns, and then to patches and objects in the deep end.  The “end” named argument to the deepdream function specifies what layer to output, given the input image.  Below are a selection of various layers from  shallow to deep:
































































Here’s one where I ran the deep dream several times:



Fun with FFT at Vixlet

Social signals generated from users tends to follow a diurnal pattern where there’s activity at certain hours of the day and lulls at others.  This pattern is interesting but often it is other patterns that are more salient, like spikes  in traffic.  When the spikes are combined with the diurnal fluctuations, we have a signal separation problem.  In this framing of the problem, the observed web traffic is modeled as an underlying trend signal multiplied by the harmonic noise of daily fluctuations.  Multiplication in the time domain is equivalent to addition in the frequency domain.  So to recover the underlying trend signal we subtract the average frequency spectrum from the daily observed frequency spectrum.

It turned out that this was an easy problem to solve in R, so I wanted to share my experience.  First, to visualize the process below is an example of a snippet of the observed signal over about 3 weeks.



you can see that there are several big spikes as well as smaller daily bumps.  The goal is to remove the  daily bumps so that we can see underlying trends.  To do that, we use the FFT to convert the signal to the frequency spectral domain, average the the daily spectra, then subtract the average from the observed daily spectra.  One of the cool things about R is that the FFT is a built in function.  However, we had to import a library for the Hamming window.  The daily spectra correspond to sliding windows of one day length.  The edges of the window need to be rounded to prevent artifacts, which is what the Hamming window is needed for. library(‘e1071’) (line 5) imports the Hamming window and library(zoo) imports ‘rollapply’ (line 2), the function that applies the window across the signal (line 14).

# needs zoo for "rollapply"
# needs e1071 for Hamming window
# load data
th <- read.csv2('app_status_report_hourly.csv', header=F, sep=',', col.names=c('time','count'))
# normalize the data by the maximum
X <- th$count[1:length(th$count)]/max(th$count)
Xw <- rollapply(X,24,by=1,function(x){x*hamming.window(24)})
Fw <- apply(Xw,1,fft)
Fw_avg <- colMeans(t(abs(Fw)))
Fw_rem <- Fw – Fw_avg
Xw_rem <- apply(Xw,1,fft,inverse=T)
X_rem <- vector(mode=typeof(Xw_rem),length=length(X))
for(i in seq(1, dim(Xw_rem)[2])){
X_rem[i:(i+23)] <- X_rem[i:(i+23)] + Xw_rem[,i]
# plot(abs(X_rem),type='l') # the recovered with high frequency artifacts
plot(runmed(abs(X_rem), 25), type='l')
plot(runmed(abs(X), 25), type='l') #compare with simple median smoothing

view raw
hosted with ❤ by GitHub

The output looks like this:



One question that comes up is how this compares to just plain old median smoothing.  Median smoothing is already a part of the code to generate the output because it acts as a low frequency filter to remove high frequency artifacts from cutting up and reassembling the signal.  But what if we just apply plain old median smoothing to the original smoothing without the FFT.  The result of just using median smoothing looks like this.



Subjectively, it seems to me that the FFT frequency subtracted signal shows a bit more resolution and does a better job of removing the DC (constant) component of the signal.

My New Job at Vixlet

My old job analyzing social media (mainly Twitter) at Adly disappeared after it was acquired and I started working at Vixlet.  Vixlet’s aim is to connect people with what they are passionate about, which to date has been mainly music (Slipknot) and sports (baseball, football, and tennis).  My role there is to do R&D and analytics.  It’s been a productive 2.5 months so far and I’ve had a lot of fun (i got to go to London to the Barclay’s ATP tennis tournament for work!).  We started a site, to show demos from our work. One demo is the “Passion Bot”, a dialog agent to talk about a user’s passions.  Another was a visualization of Twitter traffic about tennis during the Barclay’s ATP tournament.  Another demo was a chatbot for tennis trivia.

Chinese allegory of the cave

L.A. is a very stimulating city.  One way that L.A. is stimulating is that it has a lot of art galleries and exhibitions. Last weekend I went back to the Mike Kelley exhibit at MOCA and I noticed some things that I didn’t see the first time.  One of the pictures was about spelunking and a friend, Richard, pointed out that it was about Plato’s allegory of the cave.

Exploring from Plato's Cave, Rothko's Chapel, Lincoln's Profile

Exploring from Plato’s Cave, Rothko’s Chapel, Lincoln’s Profile

So Plato’s cave was in the back of my mind recently.

Another way L.A. is stimulating is the mix of cultures, languages, and food.  I was at the Hong Kong Market mall in West Covina and I saw a Korean restaurant with the Chinese characters  明洞, so that sparked my curiosity.  It turns out that it’s a town in Korea, Myeong-dong, written in Chinese characters. When I looked it up in the Chinese dictionary it appeared to be the character for  “Bright”  and the character for “Cave”

In the last link, you can see that the character for cave is used in words like “see clearly” and “insight”.  This seemed somewhat related to Plato’s allegory of the cave.  However,  in Plato’s allegory of the cave  the insight (seeing how things are rather than just shadows) came from leaving the darkness of the cave, but in Chinese, it seems like insight comes from the meaning of piercing, like a how a cave pierces the mountain (the same character is used in the word for pierced ear, tunnel, etc.).

A lot of times it doesn’t make sense to take Chinese characters too literally (this applies to words and expressions in other languages in general), so I’m probably reading into it to much. But it can make the characters more interesting and aid in remembering them.

Vacation Metrics

When I stepped down from my former position as chief technology officer at Annenberg Innovation Lab, I had many plans for how I would use my free time before I found a new job.  It’s not completely a vacation free from cares.  I still need to find a new job, learn some new skills, follow up on some publications, and sail to Catalina Island, all of which require some work.  Since I’m not getting paid for it, I’m considering it vacation. Still, if I get too laid back I won’t do all that I had planned, so I decided to set some objectives to measure my vacation.  Here are my objectives and how I can measure their successful completion.

  • Get a job.  This is my primary goal for my vacation and if I don’t find a job in about two months I would consider this vacation a failure.  Metrics:  right now I’m looking for something that is primarily interesting and fun work, that is in either LA, the Bay Area, or California in general, working with good people, that ideally has something to do with what I studied (natural language processing), and that pays well (in that order). Outcome:  in progress.
  • Get in touch with friends. Luckily one of the first things I did on my vacation was to go to my high school friend, Scott Hagen’s, wedding so I reconnected with a lot of friends then and while I was home in Wisconsin and Minnesota.  Metrics: the main metric is to get in touch with as many friends as possible and a secondary consideration is being able to meet in person versus just by email or phone.  I guess a bonus point would be if getting in touch with a friend helped land a job. Outcome: in progress.
  • Sail to Santa Catalina Island.  Since I’ve had a boat (about a year and a half), Elly, my fiancee, and I been studying to made the 30 nautical mile trip from Marina Del Rey.  Sailing on the ocean is a lot more involved than on lakes so I’ve been cautious but now that I have more time to read up and practice I’ve been itching to make the trip.  Metrics: safely going and returning from Catalina, time taken, and amount of fun had.  Outcome:  Done!  Last weekend Elly and I successfully made it to Isthmus/Two Harbors.  The outward-bound trip took 10 hours, which was a bit long because we had to divert to Redondo/King Harbor to refuel due to calm winds in the morning.  In the afternoon the wind and swell picked up a lot and we got a bit wet from spray.  On the way back, it took 7 hours because we were more experience and we were in a hurry to beat the mid afternoon turbulence if it came back.   It was calmer on the way back and we had the company of many boats and dolphins.  Overall, it was fun despite some white knuckles and white cap waves.
  • Publications.  While I was working, it was hard to keep up with publishing my dissertation research, so I’m hoping to have time to do that over the vacation. Metrics: finish the one paper I have under revisions and start at least one more. Outcome: in progress.
  • Learning new skills.  One of the things I’ve been wanting to learn is Ruby and Ruby on Rails.  I also want to refresh my knowledge of web technologies about the new additions to html and css and new frameworks for javascript.  Metric: working demos of some projects used to learn these new skills.  Outcome: in progress.
  • Getting into shape.  When things get busy, for me it’s often exercise that gets put on the back burner.  While I’m on vacation I want to exercise more and loose some weight. Metrics:  frequency of exercise, amount of weight lost. Outcome: in progress.
  • Reading. I have a bunch of books that I’ve been reading that I need to finish and others that I want to start. Metrics: how many of the books I want to read that I actually read. Outcome: in progress.

Analysis of Social Media for NBA Finals

Here is some preliminary results of some analysis of Twitter data about the NBA Finals between the Miami Heat and San Antonio Spurs.  First we tracked tweets containing #spurs, @spurs, and #gospursgo for the Spurs and #miamiheat and @miamiheat for the Heat, based on .


In this the Spurs won in a blow out, pretty much starting in the second quarter.  So far the interesting things I noted were that people tend to tweet during the half time and after the game (7:10 and 8:31).  Comparing this game with the previous game, it seems that the winners tended to get more tweets.  But that’s not to say that there is causation or even direct correlation b/c we could also say that the home team got more tweets in both cases.

Here are some salient timestamps that Jon and I identified during the game.

6:15-Duncan dunk -Spurs up by 7
6:21-good LeBron James play -Miami coming back
6:49-Danny Green 3 pointer-Spurs go up by 8
6:57-spurs go up by 10
7:01-Neal 3 pointer,Spurs by 11
7:09-heat tie it up
7:10-Neal hits at buzzer spurs up at halftime
7:43-largest spurs lead of the night

7:58: End of third, 758


8:09-Spurs slam dunk
8:10-Spurs 3 point

8:16-Spurs 3 point

8:16-Spurs dunk
8:18-Spurs 3 point
8:22-Heat 3 point
8:22-Time out
8:29-Spurs dunk
8:31-Game over

Hourly Counts of Movie Tweets Using R


In the last post, I aggregated the counts of tweets in python and generated a table that I used in R.  This time I wanted to go from the raw input of tweets and the output of sentiment analysis to aggregated hourly counts using R, and this time for movies that came out on Jan 11, 2013 (Gangster Squad, Zero Dark Thirty, and A Haunted House.

Do this in R turned out to be harder than I expected and I had to install some libraries, namely “zoo” (which stands for “z ordered observations”)and “chron”, which can be seen in lines 1-5 in the gist below*.  I also had to massage the data a little more than I was expecting.  After reading it in (line 8),  I had to muck around with making the tweet column be character strings instead of factors, and parsing the date info into a zoo object.  Line 20 actually does the aggregation.  After plotting it, I noticed a spike on Sunday evening.  It turned out there was a lot of (re)tweets about Zero Dark Thirty and one about a ticket give-away for Gangster Squad:

“RT @goldenglobes: Best Actress in a Motion Picture – Drama – Jessica Chastain – Zero Dark Thirty – #GoldenGlobes”
“RT @vuecinemas: Help stop the mob with #GangsterSquad on Jan 10th! To win one of 5 movie packs, follow us and retweet this message by 5p …”
“Woot! RT @Bad_Wobot1013: Awesome Jessica Chastain!! ”

# install and load additional packages
# read in data
dt <- read.delim("movieReports/20130109_to_20120113.tsv",header=TRUE,sep="\t",comment.char="",quote="",na.strings="",fileEncoding="UTF-8")
# convert the tweet column from a factor (default) to a character string
# add a dummy variable for counts
dt<-transform(dt, dummy=1)
# create a zoo object (after many unsuccessful tries)
z <- zoo(dt$dummy,$postedTime),format="%Y-%m-%dT%H:%M:%OSZ",tz="GMT"))
# aggregate by passing a function that uses another R time functionality
aggregate(z,function(x) as.POSIXct(trunc(x,"hour")) )
# plot the time series (by default this will open a popup window
plot(aggregate(z,function(x) as.POSIXct(trunc(x,"hour")) ))
# this will change the times to Pacific time, from GMT (perhaps there is a better way)
plot(aggregate(z,function(x) as.POSIXct(trunc(x(8/24),"hour")) ))
#this will save the
# I was interested in a spike that I saw at 2013-01-13 19:00:00:
spike<-dt[as.POSIXct(trunc(time(z),"hour"))==as.POSIXct("2013-01-13 19:00:00"),]
# this sorts the tweets from the spike by retweet count
# output of top retweetedTweets:
# [23340] "RT @goldenglobes: Best Actress in a Motion Picture – Drama – Jessica Chastain – Zero Dark Thirty – #GoldenGlobes"
#[23341] "RT @vuecinemas: Help stop the mob with #GangsterSquad on Jan 10th! To win one of 5 movie packs, follow us and retweet this message by 5p …"
# [23342] "Woot! RT @Bad_Wobot1013: Awesome Jessica Chastain!! "

view raw
hosted with ❤ by GitHub

*Note that if you’re using Linux, you’ll need to have the R-devel packages to build this libraries in the installation process

2012 US Presidential Election Timeline in Twitter

I was looking at the spikes in twitter traffic about Obama and Romney during the 2012 US presidential election. So far this is just volumes of tweets containing “obama” and/or (logical or) “romney” (case insensitive), but in the future I would like to do similar plots for sentiment.


The data this is based on 315,000,000 tweets, which was more than 1TB when including all the metadata.  Only the subset of these matching “obama” or “romney” were used for the chart, but still it took 11 hours to tabulate the results on my desktop computer (the one that I mentioned in the previous post).

This plot was made using R and Inkscape.  Here’s the input data table, and here’s a gist of the R code :

# Read in data
timeSeriesElectionData <- read.table("twitterElectionTimeline_tsvWithHeader.txt",header=TRUE)
# Attach columns of data as variables in R environment
# open a pdf file for writing
# plot the sum of tweets about obama and romney, as a line, with no x-axis ticks, bla bla…
plot(obama+romney, type='l',xaxt='n',xlab='Date',ylab="Number of Obama and Romney Tweets")
# create a sequence of dates, from Mar. 1 to Dec. 1, by month
d <- seq.Date(from=as.Date('2012-3-01'), to=as.Date('2012-12-01'),by='month')
# find the number of data points:
# output: [1] 1068
# write the dates as axis ticks
#close the pdf file

view raw
hosted with ❤ by GitHub

raid setup

I recently got a new computer for work, a 32 GB, 8 core, RAID enabled machine (woohoo!).  I wanted to test exactly how fast the RAID-0 setup actually was.  Transferring about 820 GB from a USB-3 external HD:

time cp -rp /media/ts-backup-2/data_gnip/ /raid/
real    176m14.507s
user    0m23.533s
sys     44m27.038s

or about 3 hours, 273GB per hour, 4GB per min., 75MB per sec.  I did the same
copy procedure to a regular hdd (7200RPM, bla, bla), and the result was pretty
much the same, which leads me to believe (hope) that the speed was limited by
the usb hdd.

one thing that was taking a lot of time was simply doing wc on all the twitter
log files, so now I’m going to compare running wc on the raid and the regular

time wc /raid/data_gnip/tweets_2*txt
348649707  21225017895 877863706316 total

real    219m18.297s
user    213m20.851s
sys     5m25.995s

which is about 3 hrs 40 minutes, or about 1.5 million lines (rather big json
records) per minute or 26000 lines per second.

comparing that with the regular hdd:

time wc /data/data_gnip/tweets_2*txt

348649707  21225017895 877863706316 total

real    223m9.239s
user    216m21.881s
sys     6m3.174s

So I’m not really seeing the benefit of RAID at this point.  I’m still hoping
that I can find the benefit of RAID.  I was thinking that since it is a
sequential read, one file after another, that I might see the benefit with parallel reading:

time find /raid -name “tweets_2*txt” | parallel wc

real    67m58.278s
user    425m54.987s
sys     7m9.929s

So that was a lot faster (3x), and you can see the multiple
core effect in that the user CPU time is higher than the real time.

time find /data -name “tweets_2*txt” | parallel wc

real    305m27.366s
user    265m43.306s
sys     6m49.296s

aha, it finally found a way to see the speedup from RAID. itseems like the
cpu’s in the system monitor are less saturated, and more jagged/noisy.

Still, in the process of waiting for these programs to terminate, I found an
approximate line counter on a forum.  It uses a partial line count from
portion of a big file and then extrapolates the approximate number based on
the file size.
Thanks to Erik Aronesty for providing this perl script!

time /raid/data_gnip/tweets_2*txt

359757377 total
real    0m32.885s
user    0m0.496s
sys     0m0.800s

which is off by about 10 million out of 350 millon, roughly a factor of less than 10, but faster by about an hour (roughly faster than a factor of 100).
To be complete, here’s the performance of on the regular disk:

time /data/data_gnip/tweets_2*txt

359757377 total

real    0m36.171s
user    0m0.639s
sys     0m0.684s

VERDICT: Raid is definitely faster if you use it wisely, but if you just want an approximate line count, you can save a lot by just using a clever perl script. Of course I’ll be doing more than counting lines, so the raid should come in handy in the future, but it’s definitely possible to over engineer things if I’m not careful.

Interspeech 2012, Day 4

Today, the keynote speaker was Garet Lahvis, who presented on analyzing prosody in ultrasonic rat vocalizations.  I found this pretty interesting.  Even though the signal processing aspect of the study was very basic, the novel part is looking at animals instead of humans, and (non-human) lab rats that have been removed from their natural environment for several generations.  Also, an interesting aspect of this study is the implication for emotion research.  One of the interesting findings is that for a certain strain of rats, b6, positive valence emotion vocalization recordings had similar effects on other rats as food and morphine, and similar for negative valence events like foot shocks.  Another strain of rats, BALB, had less response to emotional vocalizations from other rats.  This study had implications for both substance abuse studies and autism research.

One of the interesting posters I saw was on building speech recognition language models from twitter.