Contest webpage:
Authors and Affiliations:
- Heike Hofmann, Iowa State University, hofmann@iastate.edu
- Dianne Cook, Iowa State University, dicook@iastate.edu
- Ulrike Genschel, Iowa State University, ulrike@iastate.edu
- Hadley Wickham, Iowa State University, hadley@iastate.edu
- Michael Lawrence, Iowa State University, lawremi@iastate.edu
- Barret Schloerke, Iowa State University, schloerke@gmail.com
- Spencer Barret, Iowa State University, spencer@iastate.edu
Tool(s):
- GGobi, developed by Deborah Swayne,
Dianne Cook, Duncan Temple-Lang, Andreas Buja, Michael Lawrence, Hadley Wickham
- Manet, developed by Heike Hofmann
- Mondrian, developed by Martin Theus
- R especially the ggplot package by Hadley Wickham
- Many Eyes, developed by the CUE's Visual
Communication Lab of IBM research
- Ruby, for data management.
Data Specific Tasks
1.1 Oscar Hopes
- Process:
We matched the original data provided to us on movies between 2000-2006 with
information on all nominated and Oscar winning movies as provided by the Official
Academy Awards Database.
- Image 1.1 :
- Insight:
- These two plots show a barchart and a spineplot of the number of movies
released by month. Highlighted in red are Oscar winning movies, in blue
we have Oscar nominated movies.
- Movie releases peak twice throughout the year; once in Spring and then
in Fall (left).
- Oscar nominated movies are typically released very late in the year
or just before the Academy Awards ceremony in March or April (with pre-release
dates at the end of the previous year).
- Out of the summer nominations only a handful of movies ever made it
out as winners, such as Crash, Gladiator, Pollock, Adaptation, Little
Miss Sunshine, and Road to Perdition.
Are those surprise nominations, or are they trying to beat the odds deliberately?
It'd be nice to know.
- All but one of the Oscar winning movies of the last years were declared
as dramas, while 94% of all nominated movies were dramas.
This gives definitely some guidelines for our next home video releases!
- Caption for exhibit:
Movie releases peak in Spring and Fall (left). Movies with nominations for
an Academy Award are painted in blue, Oscar winning movies are shown in red.
The spineplot (right) emphasizes the conditional probability of for nominations
and wins.
1.2 Scary Tuesdays
- Process:
We loaded the movie data into the software Mondrian and explored the connection
between release of movie by week day and ratings as well as movie genres and
rating.
- Image 1.2:
- Insight 1.2:
- This figure shows a barchart of the release day of the week, a histogram
of movie ratings and a spineplot of horror movies (0 = no, 1 = yes).
- Most movies are released on Fridays, followed by Tuesday.
- Ratings seem to be slightly skewed left with a peak between 6-7. Overall
ratings for movies appear skewed left indicating overall positive ratings
on average.
- Releases on different week days show in general the same pattern, i.e.
do not seem to have an influence on the ratings - except for movies released
on Tuesdays.
- Tuesday releases tend to get somewhat lower ratings with a center somewhere
around 4.
- Horror movies tend to be released on Tuesdays. Horror movies make up
18.5% of all Tuesday releases - a scary connection!
- Caption for exhibit:
Horror movies released on Tuesdays seem to be double scary because both of
content and ratings.
1.3 Box Office Flops and Surprises
- Process:
In the software Manet, we explored the relationship between budget and earnings
of movies. This is based on data from The-Numbers website.
- Image 1.3:
- Insight 1.3:
- The figure shows a scatterplot of the relationship between box
office earnings and budget.
- Higher budget typically implies higher earnings.
- Some of the (relatively) most successful movies are Meet the Fockers
and My Big Fat Greek Wedding, All have a small to medium budget
but very high earnings.
- A few of the big-budget movies might be considered flops because box
office, although big, was less than the budget, e.g. Sahara (2005)
and Van Helsing (2004).
- Superman Returns (2006) and Hulk (2003) earned close
to their budgets, but an average movie is expected to draw in 6% more
earnings than its budget.
- There is a seasonal relationship in the box office: late summer and
Christmas movies have higher earnings. This is slightly
lagged with the number of movies released, where the peaks are late spring
and late fall.
- Oscar winning movies are never among the top box office earners, but
they completely dominate the market in January.
- Caption for exhibit:
Box office earnings versus movie budget (left). Blue movies are doing surprisingly
well, whereas red movies are flops. Christmas movies and late summer movies
earn more money (right). Here, Oscar winners are painted orange.
1.4 Bankability: And the winner is ...
- Process:
To get this result, we extended the movie-person data base by budget numbers
and collapsed over person. This way we get a list of the most "bankable"
people in the business. We matched this list of people with all the movies
done between them. We then got the full cast of these movies, and started
to extract pairs of actors/directors/cinematographers, who have done at least
two movies toegther.
- Image 1.4
- Insight 1.4:
- This is a network of all people involved in working with the most bankable
actors between 2000-2006.
- Orlando Bloom is the most bankable actor. Orlando Bloom was involved
in both the box-office busting Lord of the Rings trilogy
and the Pirates of the Caribbean series.
- Some of the pairs we found are well known and established, such as the
Clooney/Soderbergh power connection. The two of them produced and directed
several films together, such as Ocean's Eleven, Ocean's Twelve, The
Good German, Syriana, Good Night and Good Luck, and Solaris.
- Other connections are based on sequels, such as the Rodriguez/Banderas
team (Spy Kids 1,2, and 3) or the Peet/Willis team (The
Whole Nine/Ten Yards) - but maybe there's a pair coming to watch out
for ...
- Some larger networks between actors also show up. We could call the
first one the "Sandler" clique (for obvious reasons).
- The second one stands out because we can find the current President,
George W. Bush, united with the former President, Bill W. Clinton. This
pair has a respectable three joint movie appearances: Bowling for Columbine,
Bush's Brain, and Enron: the smartest
guys in the room.
- Caption for exhibit:
The "Sandler" connection (left) and the Presidential network (right).
1.5 The Frat Pack
- Process:
Following the same process as 1.4, we extracted the largest connected network
of people in the movie business
- Image 1.5:
- Insight 1.5:
- The largest connected network of top bankable actors is shown in figure
1.5.
- The person with the most connections is Rick Kain - he's a stunt double
and with 25 movies between 2000 and 2006 the busiest person of all.
- In the whole graph there is only a single K4, i.e.a full four-way connection,
between actors. Look for it on the left hand side of the figure.
- Owen Wilson, Ben Stiller, Will Ferrell and Vince Vaughn all have made
at least two movies with each of the others. In Starsky & Hutch
they even appear all side by side. They are part of a tight group of actors
jokingly called the "Frat Pack" (because they did Old School
together).
- Other members of the group, such as Luke Wilson and Jack Black can be
found in the network in the immediate surroundings of the other four.
- Caption for exhibit:
Largest connected network amongst 150 most bankable actors.
1.6 Hail to the Chief
- Process:
By combining the person data base with number of appearances in all movie
projects we were aiming for a list of "top visibility" people. However,
the number of appearances in movies has to be taken with a grain of salt,
as length of appearance is neglected, but only separate appearances are counted,
which means that a 20 min dialogue counts as much as a 5 sec news clip from
the archives.
- Image 1.6:
- Insight 1.6:
- List of "most visible" actors according to #appearances
in movies betwen 2000-2006
- The surprise winner is none other but the current President, George
W. Bush!
- Caption for exhibit:
The list of "most visible" actors features George W Bush - a feature
of a strange measure!
1.7 Genres over Time
- Process:
We have pieced together genres of movies since 1888 to get an idea of the
development over time.
- Image 1.7:
- Insight 1.7:
- Figure 1.7 is a stacked time series. It shows an excerpt of the
development of individual genres.
- Generally, genres stay overall fairly level over time, as can be seen
e.g. for the last couple of decades of mystery movies top left.
- Animated movies have seen a couple of peaks in the past - just think
of Walt Disney or the Krazy Kat dancing with Betty Boop.
- The time of the Film Noir (middle left) is officially gone after a flurry
of activity between 1930 and 1960.
- War movies interestingly show a huge correlation to times of war. We
can easily recognize the World Wars, Korea, Vietnam and, to a lesser degree,
the Gulf Wars in the peaks (top right).
- Westerns seem to have dropped out of favor - their numbers are in a
steady decline.
- We've only seen the tip of the iceberg with Reality TV, it seems!
- Caption for exhibit:
Development of Genres over time (1888-2006). Most genres show the same steady
percentages as 'mystery' (top left). The other genres show interesting exceptions.
1.8 Romancing the Population?
- Process:
The number of romance movies released by year are aligned with yearly population
changes in the U.S.
- Image 1.8:
- Insight 1.8:
- Time series plots of romace movies released by year and yearly population
fluctuation.
- Starting around 1918, we see a strong correlation in the pattern: with
a lag of about 10 to 12 years, the number of romantic movies reflects
exactly the same pattern as yearly population fluctuation.
- With a lag of ten years, we found the correlation between these two
time series as high as 0.71! (right)
- Caption for exhibit:
Ever wondered, how babies are made? - We won't go there, but just have a look
at the fascinating relationship between romantic movies and population increase!
1.9 Movie Titles
- Process:
Movie titles are retrieved for all movies released since 1888. For the network,
the top eight words for each genre are extracted.
- Image 1.8:
- Insight 1.8:
- Tagcloud of words in titles of romace movies.
- Words with a female connotation seem to dominate the picture - a chick
flick effect?
- Network of top eight words of each genre -
- Westerns are isolated - except for the WILD ones!
- Caption for exhibit:
Chick flicks title words (left), network of top eight words of each genre
(right).
Conclusions
This analysis required an enormous amount of data cleaning and processing. More
than half of the movies are characterized to be short films, and the majority
of the movies have less than 1000 users rating them. These "movies'' are
probably not what most people consider to be movies, and hence using these samples
will likely produce spurious findings. The findings we've reported were made
using very careful dissection of the data, subsets, and scaling in different
ways to examine it at multiple resolutions.
Our toolbox contains a plethora of software and hacked together code, which
allowed us to extract many storylines from the motion picture data. The findings
reported here are a small subset of these. There are more revealed in the accompanying
video, and more detailed information on methods on the web page.
COMMENTS (optional)
Thanks to Robert Kosara, T.J. Jankun-Kelly and
Eleanor Chair for providing the original data and organizing the IEEE
InfoVis 2007 Contest.
Further thanks to Martin Wattenberg and
Fernanda Viégas for providing Many Eyes and pointing us towards
it.
Special thanks to Martin for the inspiration to the title! This
work was supported in part by National Science Foundation on grant #0706949.