Harvesting Zenodo Metadata with OAI-PMH (DataCite v3)¶
Zenodio’s zenodio.harvest module provides a Pythonic interface to record metadata in a Zenodo community collection.
Zenodio uses the standard OAI-PMH harvesting protocol (and specifically retrieves DataCite v3-flavored XML; it’s the best and latest metadata standard used by Zenodo).
Quick Start¶
To quickly show you harvesting metadata from Zenodo works, we’ll get records from from LSST Data Management’s lsst-dm Community:
You begin by providing the community’s identifier to zenodio.harvest.harvest_collection():
import zenodio.harvest
collection = harvest_collection('lsst-dm')
collection is a zenodio.harvest.Datacite3Collection instance for the Zenodo community’s record collection.
Use its records() method to generate Datacite3Record instances for each record stored in the Zenodo community:
for record in collection.records():
print(record.title)
Or you can get a list of all records:
records = [r for r in collection.records()]
With these Datacite3Record instances you can access information about individual artifacts on Zenodo through simple class attributes.
For example:
record = records[0]
print(record.title)
print(record.issue_date)
print(record.doi)
print(record.abstract_html)
For information about authors, Zenodio provides an Author class.
For example:
authors = record.authors
print(','.join([a.last_name for a in authors]))
API Reference¶
Convenience Functions¶
-
zenodio.harvest.harvest_collection(community_name)¶ Harvest a Zenodo community’s record metadata.
Examples
You can harvest record metadata for a Zenodo community via its identifier name. For example, the identifier for LSST Data Management’s Zenodo collection is
'lsst-dm':>>> import zenodio.harvest import harvest_collection >>> collection = harvest_collection('lsst-dm')
collectionis aDatacite3Collectioninstance. Use itsrecords()method to generateDatacite3Recordobjects for individual records in the Zenodo collection.Parameters: community_name (str) – Zenodo community identifier. Returns: collection – The Datacite3Collectioninstance with record metadata downloaded from Zenodo.Return type: zenodio.harvest.Datacite3Collection
Metadata Classes¶
-
class
zenodio.harvest.Datacite3Collection(xml_records)¶ Zenodo metadata for a Community collection derived from Datacite v3 metadata.
Use the
from_collection_xml()classmethod to build aDatacite3Collectionfrom XML obtained from the Zenodo OAI-PMH API. Most likely, users should useharvest_collection()to build aDatacite3Collectionfor a Community.-
classmethod
from_collection_xml(xml_content)¶ Build a
Datacite3Collectionfrom Datecite3-formatted XML.Users should use
zenodio.harvest.harvest_collection()to build aDatacite3Collectionfor a Community.Parameters: xml_content (str) – Datacite3-formatted XML content. Returns: collection – The collection parsed from Zenodo OAI-PMH XML content. Return type: Datacite3Collection
-
records()¶ Yield records from the collection.
Yields: record ( Datacite3Record) – TheDatacite3Recordfor an individual resource in the Zenodo collection.
-
classmethod
-
class
zenodio.harvest.Datacite3Record(xml_dict)¶ Zenodo metadata for a single record.
Use
Datacite3Records to access metadata about a record though a convient object properties.Parameters: xml_dict ( collections.OrderedDict) – A dict-like object mapping XML content for a single record (i.e., the contents of therecordtag in OAI-PMH XML). This dict is typically generated fromxmltodict.-
abstract_html¶ Abstract text, marked up with HTML (str).
List of
Authors (zenodio.harvest.Author).Authors correspond to creators in the Datacite schema.
-
doi¶ Digital object identifier str.
-
issue_date¶ Date when the DOI was issued (
datetime.datetime.Datetime).
-
title¶ Title of resource (str).
If there are multiple titles, the first title is returned.
-
-
class
zenodio.harvest.Author(last_first, orcid=None, affiliation=None)¶ Metadata about an author.
Authorinstances are typically built byDatacite3Record.authors().Parameters: - last_first (str) – Author’s name, formatted as ‘Last, First’.
- orcid (str, optional) – Author’s ORCiD.
- affiliation (str, optional) – Author’s affiliation.
-
last_first¶ str – Author’s name, formatted as ‘Last, First’.
-
orcid¶ str – Author’s ORCiD.
-
affiliation¶ str – Author’s affiliation.
-
first_name¶ Author’s first name (str).
-
classmethod
from_xmldict(xml_dict)¶ Create an Author from a datacite3 metadata converted by xmltodict.
Parameters: xml_dict ( collections.OrderedDict) – A dict-like object mapping XML content for a single record (i.e., the contents of therecordtag in OAI-PMH XML). This dict is typically generated fromxmltodict.
-
last_name¶ Author’s last name (str).