Congratulations to Ícaro Alzuru, ACIS Lab’s most recent PhD graduate who defended his dissertation on “Human-Machine Extraction of Information from Biological Collections”, on Monday, March 30th at 8:30 am. This dissertation is part of the HuMaIN project: http://humain.acis.ufl.edu/.
Ícaro is a doctoral candidate in the Department of Computer & Information Science & Engineering (CISE) at the University of Florida. He is currently a Graduate Research Assistant under the guidance of Dr. José A. B. Fortes. His research interests lie in Data Science and Big Data.
Dissertation: Human-Machine Extraction of Information from Biological Collections:
At present, the digitization of biological collections, including the extraction of information from images of labels, is completely done by humans (e.g., crowdsourcing and experts). This dissertation proposes human-machine hybrid information extraction methods that improve the efficiency of the transcription of Darwin Core terms from the specimens’ images.
The dissertation proposes to accept Darwin Core values generated by automated processes when their confidence in being correct has been estimated as high, also through automated methods. The images with extracted Darwin Core values of low-confidence are transcribed by humans. Different methods of automated information extraction and confidence estimation are presented in the dissertation.
The experimental results suggest that the application of the hybrid information extraction methods proposed by this research would allow reducing by 75% the number of crowdsourcing sessions required in the extraction of Darwin Core terms from specimens’ images.