Using Digitized Archaeological Literature as Big Data: Lessons from Using Open-Source Software to Text Mine Archaeological Site Numbers and Citation Information from JSTOR across the United States and Canada for the Digital Index of North American Archaeology (DINAA)

Summary

This is an abstract from the "SAA 2023: Individual Abstracts" session, at the 88th annual meeting of the Society for American Archaeology.

The Digital Index of North American Archaeology (DINAA) now contains citations to professional journal articles which mention specific archaeological sites in tens of thousands of instances across the United States and Canada. DINAA researchers have developed methods to identify Smithsonian Trinomial (USA) and Borden Grid (Canada) archaeological site numbers published in over a dozen archaeological journals made digitally available through JSTOR. Through Open Context’s linked open data infrastructure, and DINAA’s use of site numbers as unique identifiers linked to spatiotemporal scientific and cultural data, the entirety of these publications are capable of being queried regarding concepts like cultural horizons, site types, diagnostic artifacts, or calendar year boundaries. Since 2021, JSTOR has provided the Constellate analytics service through which users may tailor Jupyter Notebooks for text mining. Instructions are provided regarding how these same methods may be used to identify structured site numbers in any digitized archaeological publications, including journal articles beyond JSTOR, books, cultural resource management reports, and gray literature materials. These developments make indexing and connection of archaeological literature across the continent an immediately attainable possibility; strategies are suggested to help the archaeological community invest the needed ethical and practical efforts to ensure it is done appropriately and completely.

Cite this Record

Using Digitized Archaeological Literature as Big Data: Lessons from Using Open-Source Software to Text Mine Archaeological Site Numbers and Citation Information from JSTOR across the United States and Canada for the Digital Index of North American Archaeology (DINAA). Joshua J. Wells, Mackenzie Edmonds, Eric Kansa, Sarah Kansa, David Anderson. Presented at The 88th Annual Meeting of the Society for American Archaeology. 2023 ( tDAR id: 474939)

Spatial Coverage

min long: -168.574; min lat: 7.014 ; max long: -54.844; max lat: 74.683 ;

Record Identifiers

Abstract Id(s): 37275.0