InfoVis 2005 Contest
Boom and Bust of Technology Companies at the Turn of the 21st Century

Contest webpage:

Authors and Affiliations:


Data description

This data contains information on 84472 technology companies between 1989-2003. The companies produced 154912 unique products in this period. This period of time is notable for technology innovations such as the rise of the internet, the dot-com bubble and crash, Y2K, the 9/11 tragedy and changes between democratic and republican control of government.

TASK 1: Trends and multivariate relationships

1.1 Trends in technology companies and products over time

1.2 Trends by industry type

1.3 Is there anything between the East and West Coast?

TASK 2: Clusters

2.1 Is local growth fueled by natural disasters?

2.2 Software is out -- services are in

2.3 High market concentration in biochemical companies

TASK 3: Unusual features

3.1 There's something strange about Harris County, Texas!

3.2 Sales switch up between counties in Detroit, MI

3.3 Strange Values for Market Concentration

TASK 4: Other findings

Data cleaning

We spent of lot of energy early in the data release finding anomalies in the data and reporting these. This resulted in numerous revisions of the competition data. Some of the problems were fixed but there still seem to be numerous problems with this data. WIth data sets of this size, mainting quality is a very difficult problem. Here are some of the irregularities we found:

4.1 Can so many companies really be founded in 2000?

4.2 Why are there companies in the database before it is founded?


We were very surprised by many of our observations on the data. Initial disbelief was followed by intensive number crunching to check the values and extensive internet searches to find plausible explanations. Particularly, the potential relationship of local growth in companies with natural disasters and the increasing trend in the number of comapnies in Harris County, TX, did come as surprises.

We arrived at the association of natural disasters and local hot spots by an astute observation by one of the team members. The chaotic popping up of hot spots around the country looked spurious, until one person asked at the 93-94 hotspot in Iowa: "When were the floods in Iowa?" This led to extensive searches of geographic locations and natural disasters, and it cascaded into ways to explain many hotspots. Mostly, these could be found in the 93-99 period when Clinton was in government. Only then we started to come across accusations in online news stories about suspect use of FEMA funding during the Clinton administration. Letterman cracked a top 10 joke related to FEMA. Not all of the hotspots can be explained this way. We would also like to point out that this association between local economic activity and disasters is purely a proposal, not a conclusive finding.

The results on Harris County, TX, arose immediately from the longitudinal plots of county counts. The trend stands out in the graphic, in a manner probably not so detectable numerically. Checking the numbers and finding no other county in the USA that is even close to this trend was also a surprise. Identifying it as a county in Texas, was a tad surprising, and even further surprising to find accidentally that it is the residence of the current president's dad. There are many attractions, such as the Johnson Space Center, in Harris County, but this association raises big questions about political influence.

When we started exploring the data, we expected to see the bubble pop in Silicon Valley, some economic effects in the New York region after September 11, 2001, the effects of Microsoft developing in the Seattle area. And we saw these. We also had other expectations that did not pan out: companies that move a lot might be more likely to go bankrupt (disappear from the database), that there might be movement from away from the coasts after the bust to the mountain states and the Midwest. There is some movement of companies but these results were less interesting.


Thanks to Georges Grinstein, Urska Cvek, Mark Derthick and Marjan Trutschl for such intriguing data, and the enormous amount of work that was clearly needed to pull it together.