Frequently Asked Questions

What is the repository?

It is like a library for linguistic data and tools.

  • Search for data and tools and easily download them.
  • Deposit the data and be sure it is safely stored, everyone can find it, use it, and correctly cite it (giving you credit)

What submissions do we accept?

We accept any linguistic and/or NLP data and tools: corpora, treebanks, lexica, but also trained language models, parsers, taggers, MT systems, linguistic web services, etc. We do not strictly require you to upload the data itself, although it is always better to do it. Still, you can make a metadata-only record, if required. We also support online license-signing for immediate availability of restricted resources.

When uploading language resources, please try to use one of the recommended formats mentioned in LRT Standards.

What is the PID (handle) good for?

It is a special permanent URL. It provides a permanent link that will resolve correctly even if in some distant future the data is moved. Thus it should be used as URL in citations.

How to cite a submissions?

See our policies.

How do I get the most of my searches?

In contrast to other search engines this one uses OR as a default operator; see examples below that clarify this. If you are not satisfied with the results of your searches, you might wish to go beyond plain text searches. You may search only in certain fields, use negation, add score (emphasis) to some parts of the query and match more. The search engine is SOLR so use it's syntax if you know it or check it in the documentation.


PDT wordnet vs PDT AND wordnet
The default operator is OR; ie. the first example searches for PDT OR WordNet in all text fields.
dc.title:P?T && -dc.title:WordNet
Returns all items having P?T in title - ? stands for any character (eg. PDT) - and not having WordNet in the title
dc.title:"Czech WordNet"
Use double quotes (") for exact matches and multiword expressions
author:(Bojar && -Tamchyna) && (dc.language.iso:(ces AND eng) OR language:(czech AND english))
Search for items by one author and not the other; interesting are only items about both czech and english languages.