Members of DigIn attended the Society of the Preservation of Natural History Collections (SPNHC) and the Biodiversity Information Standards (TDWG) joint conference hosted in Okinawa, Japan from September 2–6, 2024. The shared SPNHC and TDWG meeting aimed to encourage collaboration between organizations and welcome new participants from across Asia. The only other joint conference was held in 2018 in Dunedin, New Zealand, and both organizations recognized the value of hosting such conferences periodically.
At the conference, Johanna Loacker from the California Academy of Sciences (CAS) and Dean Pentcheff, Regina Wetzer, and Vijay Barve from the Natural History Museum of Los Angeles County (NHMLA) delivered presentations. Their talks are listed below.
This conference was the 39th annual meeting of the Society of the Preservation of Natural History Collections (SPNHC), a global organization dedicated to preserving, conserving, and managing natural history collections. The society advocates for collections care within academia, government, and the public. Its members identify and organize specimens and their data, conduct research on specimens and best practices for collections, and train future collection specialists, all to ensure that natural history collections remain accessible for current and future generations.
Originally established as the Taxonomic Databases Working Group, Biodiversity Information Standards (TDWG) is a non-profit scientific and educational association committed to fostering international collaboration among those who create, manage, and use biodiversity information. TDWG provides a platform for members to discuss biodiversity information management and to develop and promote data standards that enhance knowledge sharing about the planet's biological heritage.
Invertebrate biodiversity data is often scarce, particularly in developing countries. Citizen science initiatives can play a crucial role in addressing this gap. This project demonstrates the effectiveness of using a social media platform, Facebook group SpiderIndia, to collect spider observation data from citizen scientists.
Citizen scientists submitted photographs and details about spiders they encountered. Taxonomic experts then curated the data to ensure accuracy. The final dataset encompasses over 15,000 observations, providing valuable insights into spider diversity and distribution across India. This data is publicly available on GBIF, facilitating further research on Indian spider populations.
This project highlights the potential of citizen science through social media to enhance our understanding of invertebrate biodiversity. By engaging citizen scientists, we can generate substantial datasets that contribute significantly to scientific knowledge.
Marine invertebrate specimens, preserved in museum collections, provide an indispensible window on our planet’s biodiversity. Centuries of collecting have resulted in millions of specimen lots, each documented internally with label information. Exposing that information digitally is essential both for providing access to specimens and for direct use of the occurence data. But capturing data from labels in small jars and tiny vials is so time consuming that it has taken a major NSF-funded initiative, the DigIn digitization program, to make data capture possible across multiple institutions in the U.S. The digitization effort at the Natural History Museum of Los Angeles County has developed a range of accelerated approaches to maximize the speed of primary data capture from these specimens, including dedicated workstations, custom-designed labelling, and dedicated software. The goal is rapid and accurate collection of skeletal specimen data records (taxon, collection date, collecting location, etc.). A key challenge stems from the heterogeneity of label data. Label content may be anything from fully written collecting data to just a pencilled expedition station number. Capturing data across all these label types to maximize throughput and minimize error has driven the design of an adaptable approach using direct data entry with real-time custom error checking and selective label photography. This framework has permitted us to capture primary data on marine specimens for hundreds of thousands of specimen lots at a rate of about 40 seconds per lot for data-only capture, and 2 minutes per lot for label-photo capture. Achieving this rate of digitization is essential to capture and share this essential biodiversity information.
The DigIn digitization program is successfully capturing primary taxonomic and collection data from hundreds of thousands of lots of wet preserved marine invertebrate specimens at the Natural History Museum of Los Angeles County. In most cases, the data captured include expeditionary station identifiers, and in some cases are label photographs. The digitization effort has been designed to be as rapid and accurate as possible. But capturing a collecting station identifier or a label photograph is only the first part of the digitization process. In some cases, collection information must simply be transcribed from photographs of written or typed labels. In many cases, the collecting expedition must be identified and the station data must be acquired so that specimens can be linked to their collecting data, based on the expedition and station identifier. Station data are not digitally available for the majority of the relevant marine expeditions, and have been located, scanned, and processed into standard collecting event and locality formats. We use a balancing set of criteria to determine whether it is worthwhile to digitize entire expeditionary datasets (depending on the number of relevant specimens and the difficulty of digitizing the dataset) or just hand-enter the few stations needed. For fully digitized expeditions, numerical and visualization tools are helping us identify and correct the inevitable instances of erroneous station data (whether those errors are in the initial data or were introduced in the scanning, OCR, and formatting procedures). In the end, the process of minimal primary data collection from specimen labels, followed either by transcription or linkage with expeditionary collection records, is yielding hundreds of thousands of digital specimen records published to the world at aggregators such as iDigBio, GBIF, OBIS, and Invert-E-Base.
The California Academy of Sciences Invertebrate Zoology (CASIZ) collection is one of 19 collaborators of the Documenting Marine Biodiversity Through Digitization of Invertebrate Collections (DigIn) TCN. This initiative targets underrepresented non-molluscan marine invertebrate specimens within US natural history collections to broaden access to diverse historical marine biodiversity collection data. To accomplish our target of digitizing 60,000 previously uncataloged lots across numerous phyla, the CASIZ curatorial staff developed comprehensive digitization workflows incorporating label scanning and direct data entry, leveraging the assistance of volunteers to efficiently capture specimen information from fluid and dry collections.
Given the understaffed nature of natural history collections, which often heavily rely on volunteer labor to support curatorial duties, determining the effectiveness and efficiency of various digitization workflows provides valuable insights for optimizing resource allocation and more rapidly enhancing access to collections data.
In this presentation, we'll highlight CASIZ's efficiency gains from DigIn digitization workflows, showcasing how focused strategies and volunteer engagement accelerated progress toward our goals. By sharing our experiences, we aim to contribute practical insights to a broader discourse on natural history collection digitization, offering guidance for similar initiatives.