Including A Repository¶

In order for a data repository to be included in the POLDER Federated Search, the indexer has to know where its data sets are and be able to retrieve some metadata about them. The Federated Search App takes in JSON-LD metadata in order to make data sets searchable via its interface.

Broadly, if your data repository follows the POLDER Schema.org Best Practices (note: this document is still in progress), it will be a good fit for being included in the POLDER Federated Search. These Best Practices are based on the science-on-schema.org guidelines, and are summarized below.

Metadata Fields¶

Most of the work in getting a repository ready to be included in the POLDER Federated Search is in getting its metadata to a state where the app can consume it and use it in searches. A good thing to remember is that in order to search on a field, that field has to exist - so if you want people to be able to find your data set using, say, a date search, you have to attach temporal coverage information to it.

Required Metadata Fields¶

Identifier
Title (schema:name)
Description (can be any length, although Google’s data search requires it to be between 50 and 5000 characters)
Temporal coverage
Spatial coverage
Parameters/Variables
Citation
Creator/Author
Publisher
Licence

Optional Metadata Fields¶

SameAs; if you’re a person who doesn’t like to see duplicate search results, this is for you!
Keywords are helpful for people doing text searches.
Version is not being used right now, but in the future, this can be used to display only the most current version of a dataset. See also: SoSo’s provenance relationships guidelines.
Date Published
Distribution (i.e., how to get data) is good for if you have a way to get the data that doesn’t just involve going to the data set’s landing page (i.e. the sitemap url that was indexed in order to get this data set’s metadata)

Other Requirements¶

Your metadata catalog should provide a sitemap so that harvesters like Gleaner can know which pages to get information from. Or, if you have a robots.txt file that includes a list of sitemaps, that could work too.

Things that are nice to have¶

It’s better and faster for indexing if your metadata is included in the data set landing page directly, instead of being injected after the page loads.

POLDER Federated Search Documentation

Including A Repository

Contents