Monthly Archives March 2008

Stock Market dataset is up

40 Years of data on every NYSE, AMEX and NASDAQ listed stock:

These links were busted before but should be working now.

Statistical Abstract of the United States

Added the Statistical Abstract of the United States — the messily, messily formatted analyzed tables released by the US Census Department.  1350+ tables, yum.

infochimps.org is live

Just in time for SxSWi – the site is live.

Now that we’ve got the skeleton of the website in place, we can go back and apply the necessary metadata/package/import workflow we’ve developed.

Here’s a rundown of the datasets you can look forward to seeing over the next few weeks:

  • demographics
    • world
      • world bank development data—variety of country data from world bank
      • CIA factbook
    • us
      • Statistical Abstract of the US —an exhaustive categorization of demographic, commercial and social data for the US
      • The full US Census Summary File 3, at the zipcode level.
  • money:
    • US Stock market daily—Daily open/close/lo/hi for all listed stocks since 1970
    • US Campaign finance—Expenditures in US presidential, senate, house and governor races in 2004
    • Constantcurrency—Variety of currencies in constant dollars/pounds etc back to the 1600s
  • huge & Miscellaneous:
    • infoboxen:—All the infoboxes from Wikipedia broken out into individual semantically labelled tables
  • joins:
    • Common coding systems for
      • country codes, including a useful keying database from common names (“USA”, “U.S.A”, “United States”, …) to ISO country code
      • languages
      • currencies, etc.
    • time—conversions among all of the (curiously many) competing means of measuring dates and times
  • health
    • odds of dying—all causes of death in the US, broken down by category and given as rate and odds
    • middle east conflict casualties—civilian and military deaths in Iraq (OIF) and Afghanistan (OEF) since 2003/2001
  • science, math & engineering:
    • nasa_eclipse 5000 years of solar and lunar eclipse, lunar phase, and planetary transits from NASA
    • 270,000+ MSDS (Materials Safety datasheets) listing properties and hazards of common and industrial chemical substances
    • material properties—basic chemical and physical properties for common chemical substances
    • powergrid Network of Power Grid Connections in the Western US (Strogatz1998)
    • fastenerdata Screw, Bolt, and Threaded Fasteners: Dimensions, Mechanical Strengths and Properties, and other useful information
    • mechanical properties Mechanical Properties for a variety of useful materials
    • consts and units Universal constants and unit conversion factors
    • standard mathematical tables Tables of Elementary functions (log, bessel, etc) over large range
    • mathematical constants The fundamental mathematical constants calculated to millions and occasionally billions of decimal places
  • Art and Culture
    • Every movie, act(or|ess), and film courtesy of imdb.com
    • Every musician, album, track and label, courtesy of musicbrainz.org
    • WANTED:ISBN=> author, book, publisher dataset. If you have this please contact us.
  • geo:
    • A huge assortment of GIS layers from nationalatlas.gov
    • Geographical place names & locations from geonames.org
    • TigerLine, a mapping from street address to location for the full US (this will take a while)
    • Postal codes – map from zip code to city and latitude/longitude
  • time
    • tzinfo time zone info for everywhere
    • calendar_kitchensink 3000 years of time zone, calendar conversion, moon phase, accounting information, etc
    • accounting_calendar last fridays of each month, adjusted for holidays etc.
    • holidays major repeating holidays for most countries
  • language:
    • Usage frequency (in speech and print) of every english word, from the British National Corpus
    • Moby Word lists – Word Lists, Multiple Language Lists of Common Words, Hyphenation, Part of Speech, Pronunciation, Thesaurus
    • Natural Language Toolkit Corpora NLTK’s Word lists, semantic networks, lexical data, large text corpora; several languages
    • All the words legal to play in Scrabble™
  • sport:
    • Baseball:
    • retrosheet gamelogs: Game outcome and box score for every MLB game back to 1890s
    • retrosheet event files: Play by play information for almost every game back to 1957 (and all since the mid-1970s).
    • baseballdatabank: Season and Career stats for every MLB player, team, etc of all time
    • MLB Gameday: Players, Game state, Pitch-by-Pitch trajectory and Outcome for ~half of the 2007 MLB games.