OCR

Steve Masover's picture

Legal scholars mining millions of bankruptcy case pages

Professors Ken Ayotte (Berkeley Law) and Jared Ellias (UC Hastings School of Law)

Large corporate bankruptcy cases don’t easily lend themselves to empirical research, according to UC Berkeley Law Professor Ken Ayotte, because “sample sizes are small, and the financial data that’s available on the company leading up to bankruptcy is usually sparse and unreliable. We know when the company files, we have some basic background information about it, and we see whether the company reorganizes or liquidates at the end of the case, but we know very little about what happens during the case to drive those outcomes.”

Quinn Dombrowski's picture

Success stories from BRC’s undergraduate ops internship program

A new team of undergraduate interns joined the Berkeley Research Computing (BRC) team this fall, and quickly became an indispensable part of Savio cluster operations. The BRC Ops Intern Program was conceived as a pilot for a BRC undergraduate fellowship program, currently in development, that will provide UC Berkeley students with hands-on experience in HPC system operations, user support, and software engineering.

Digital Humanist aims to run OCR over a terabyte of rare book scans

Adam Anderson, Mellon Postdoctoral Fellow in the Digital Humanities

Since his college days at Brigham Young University (BYU), Adam Anderson has been measuring evenings and weekends in pages, rather than hours. “You can scan about 400 pages an hour, once you get in the groove,” he explains. Anderson, a Mellon Postdoctoral Fellow in Digital Humanities at UC Berkeley, has spent his career scanning texts in order to draw upon secondary literature in archaeology and computational linguistics.

Quinn Dombrowski's picture

Go from Analog to Digital Texts with OCR

An early modern text (English)

A collection of digitized texts marks the start of a research project —  or does it?

For many social sciences and humanities researchers, creating searchable, editable, and machine-readable digital texts out of heaps of paper in archival boxes or from books painstakingly sourced from overlooked corners of the library can be a tedious, time-consuming process.