Our Holdings
 
Current Research
 
History
 
 
Collections Policies
 
Permissions
 
Related Links
 
Home
 

The Yale HerbariumBotany — The Yale Herbarium

Current Research in the Division of Botany

Rapid Digital Specimen Image and Data Capture:
A Web Service Solution

The proposed project offers proof of concept and an initial implementation of “one-button” specimen imaging and data capture. Clicking the shutter on a digital camera would initiate a sequence that culminates with the population of label data and a specimen image into a structured collection database with minimal labor. Our ultimate goal is to reduce the total cost of digital collection data capture by significantly reducing the human labor required and the total project duration. Significant gains can be achieved by developing appropriate protocols and methodologies, then packaging them as web services. Much of this can be accomplished by applying existing technology to data acquisition bottlenecks.

The technology required for digital image capture of specimens has become affordable, if not yet commonplace. Labor costs rather than the cost of equipment are likely the impediment for making digital image capture ubiquitous. It has already been noted that most institutions do nothing more with digital images than make them available for web display. However, an image can serve the additional purpose as the basis for label data capture. Along with specimen images, label data, particularly georeferenced label data, is a valuable public product for collections.

Certain components of the technology we intend to implement are derived from computer vision and automated document processing domains and have been commercialized into off-the-shelf technologies. Our aim is to use open source or commercial solutions (and to develop solutions where necessary) that accelerate the herbarium specimen data capture process. Each of these solutions (other than camera operation) will be embedded into web services, providing benefits such as cross-platform interoperability and scalability.

Services will be designed and coded using an open source policy with future applicability in mind, so that distributed mirrored services can provide redundancy, flexibility, and control for institutions wishing to operate some or all of their own services. Image capture stations could conceivably be set up at any number of institutions. On a larger scale, image capture operators could choose from among.distributed, mirrored services for OCR, NHR and NLP. Configuration options will allow service providers and clients to define the specifics of their own data pipeline.

Specific challenges in developing one-button herbarium specimen data capture include:

  1. Rapid image capture
  2. Web services development
  3. Image to text conversion of label data
  4. Text markup into data elements to simplify database loads
  5. Georeferencing

Our goal is to address these challenges as modular services that are mutually aware and configurable.

Principal Investigator Reed Beaman
Co-Principal Investigator Nico Cellinese
Co-Principal Investigator Michael Donoghue

Curatorial Affiliates

Reed S. Beaman
Lauren Brown
John Jay Engel
Leslie J. Mehrhoff
Norton G. Miller
Barbara M. Thiers

The honorary titles of Faculty Affiliate and Curatorial Affiliate are given to professionals who give some of their time and expertise to the Division. Affiliates are appointed by a vote of the Board of Curators and typically serve a 5-year renewable term.

Go to Top