Thursday, August 7, 2008 - 10:00 AM

SYMP 18-9: Swimming in a sea of data: Validating large datasets collected by citizen scientists

David Bonter and Caren B. Cooper. Cornell Lab of Ornithology

Background/Question/Methods

Continental-scale citizen science programs rely upon dispersed networks of volunteers to gather information. Interactions between citizen scientists and principal investigators may be limited due to the large spatial scale and the number of participants involved. As such, overseeing data collection and ensuring the integrity of the data can be a major challenge confronting citizen science programs. A novel approach to reviewing data through an automated, geographically explicit set of data filters has been developed for the Cornell Lab of Ornithology’s Project FeederWatch. The project seeks to monitor the distribution and abundance of birds in winter across the United States and Canada. More than 14,000 participants submit in excess of 100,000 checklists to the FeederWatch database annually, reporting more than 5 million bird observations. With approximately 500 species of birds reported, the potential for species misidentifications and other errors is vast. The review system flags unusual observations and potential errors and requests confirmation from the observer as the data are being entered. “Confirmed” reports are then forwarded to experts for further review.

Results/Conclusions

Between November 2007 and March 2008 reviewers requested supporting documentation on 296 reports and received a response from 70% of participants. The review process led to the dismissal of 74 reports, the correction of 28 reports, and the confirmation of 104 reports including the documentation of two first state records. Unverified reports remain flagged and excluded from data analyses and web-based data output. This real-time validation system often allows participants to provide supporting documentation when the bird in question is still present at the site. The system also allows researchers to identify those volunteers that are in need of support and to focus educational efforts accordingly, ultimately improving data quality and integrity.