- April 2, 2008
Many people don’t know how broad our rights to factual data actually are. Unlike the mishegaas that reigns in copyright land, the world of data is largely open (and rightfully so). To arrive at the age of ubiquitous information with a sound policy, however, we have to exercise those rights assertively, respectfully and prudently.
Let me start with the traditional IANAL and point out that if you take legal advice from a chimpanzee you deserve what you get. Instead, read iusmentis on database law and bitlaw on compilations and databases. (In which case you can probably skip the rest of this post.) (Also, the following only applies to the US, where the database laws are actually more liberal than elsewhere; I have no idea what the situation is outside the US)
In general, a comprehensive assemblage of facts cannot be copyrighted. Copyright only applies where there is creative content. A comprehensive list of cars and retail prices cannot be copyrighted; a comprehensive collection of reviews of those cars can be copyrighted. A list of all the musical albums released each year is data; the lyrics and music within them is creative. A list of word tokens sorted by artist, genre, release date and song length is data, and a list of the top-100 selling albums by year is data. This is the important Feist Publications v. Rural Telephone Service case:
“Facts, whether alone or as part of a compilation, are not original and therefore may not be copyrighted. A factual compilation is eligible for copyright if it features an original selection or arrangement of facts, but the copyright is limited to the particular selection or arrangement. In no event may copyright extend to the facts themselves.” — Sandra Day O’Connor for the Supreme Court
“A collections of facts are not copyrightable per se … A compilation, like any other work, is copyrightable only if it satisfies the originality requirement (“an original work of authorship”). Facts are never original, so the compilation author can claim originality, if at all, only in the way the facts are presented. The facts must be selected, coordinated, or arranged “in such a way” as to render the work as a whole original.” — Sandra Day O’Connor for the Supreme Court
A presentation of data can be creative — you can’t xerox the blue book and hand that out. However, a conversion of otherwise unrestricted data into your own creative presentation satisfies this restriction. So would a presentation (original or converted) that did not arise from a creative act — you couldn’t claim copyright on a .CSV file of some dataset.
Besides “presentation” and a couple edge cases (“hot news”, “selection and arrangement”), the main one to be aware of is “Terms of Service“. If you have to agree to terms of service that restrict the data, but you take it anyway, you can be guilty of trespass. My understanding there is that if you can a) access the site by robot (no person clicks anything) AND b) there is no robots.txt, they shouldn’t be able to sustain a claim that it’s a restricted resource.
I personally go by balancing two principles:
- It’s our world, and we deserve access to the information that describes it. Besides our legal rights, we have an even stronger moral claim to the chronicle of our collective story. And we all stand to benefit: there have to be incentives to gather and organize data, but the modest benefits of making a data provider a lot richer don’t stand against the much larger marginal benefit of making the world a timy bit smarter.
- Be a good neighbor. A lot of work goes in to gathering, processing, verifying, distributing an interesting dataset. If we infochimps run around ignoring people’s requests for modest usage conditions, we’ll have a bit extra of open data and a lot extra of pissed-off ex-kindred souls who feel like we stole their cake. Inevitably, this will mean that people won’t put data online at all for public access.
The best approach is
- Scrupulously credit contributions, make clear that their efforts are recognized, and that we’ll link back to them for their ultimate benefit.
- Clearly state the usage restrictions requested by the contributor, adhere to them, and ask that recipients of the data do the same.
- Make clear the benefits to the world for making this data available.
- Make clear the benefits to the contributor — this data will, for free, be enhanced with metadata, converted for use by diverse tools, interlinked with other rich datasets, and power interesting projects. If your mission statement is “build reliable and exciting cars” or “make powerful music”, then your mission statement isn’t “explore and explain unexpected correlations among disparate rich information pools”. Let someone else do it for you, and let them build the tools to do so around your data. Consider how much Baseball has benefitted from its statistical revolution — fed by its incredibly rich ecosystem of open data.
- Finally, as far as scientific or government prepared data that’s otherwise rights-free: gloves off, we’re taking that data. If you’re a researcher, and you’re not openly sharing your data, you’re not only a bad scientist but also a bad person. Ditto for data collected at taxpayer expense.