Sociology of Internet Data Sharing

The rise of the Internet allows widespread dissemination of scientific data, information, and knowledge. It provides unprecedented opportunity for international collaboration. Significant sociological challenges remain, turning on issues of intellectual property and access to digital resources.

The previous model for scientific progress and dissemination is the following. A scientist or group ofscientists conducts a study, typically by gathering primary data, analyzing the results, then publishing them in a journal after a typically anonymous peer-review process. Scientists are often evaluated by the number ofsignificant peer-reviewed papers they publish. Access to peer-reviewed journals is limited to those individuals and institutions who can afford to purchase subscriptions, though authors can distribute a limited number ofreprints purchased from the publisher.

Since the late 1980s, this model has been changing. Information seekers turn primarily to the WWW to access digital versions of journal articles, which now are often associated with supplemental material and archived data. Because it is much easier for individuals to share electronic reprints, journals have difficulty controlling access to their products. As not all individuals can afford the high costs of subscriptions, open access journals have emerged which charge the authors ofpapers so that readers can have free access. The peer-review process is now expected to go faster, as it is often conducted entirely online. Some journals are experimenting with open peer review, where a manuscript is placed online for open comment before acceptance.

The model for data dissemination has perhaps changed the most. It is now possible for entire data sets and even live streams of data to be made publicly available as soon as they are collected. Yet most data are not yet shared. There are technological and financial barriers to data sharing. Someone must spend some time or money to take the extra step to make data accessible online. Significant barriers are sociological. The same technology that makes data easy to disseminate raises concerns that it is too easy to steal or use without attribution. A reward system based solely on peer-reviewed journal articles does not reward the dissemination of data sets upon which those articles depend. If data are considered the intellectual property of the scientist, under what conditions would they be willing to share it? Who should have access to it, and when.?

There are calls for an open source approach to scientific data, inspired by the success of open source software. Open source software is developed and maintained by keeping all code accessible to the public online. Users of the software can publicly report problems and request new features. Anyone willing to fix the problems or develop new features can then make the code available to others. Scientific data that are made easily accessible on the Internet could be similarly 'patched' or extended.

