Implemented Hygiene tests:
No Untyped references
Any reference to a URI in any context other than as the object of an annotation property must have a type triple for that URI. https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0001.sparql
Crossing domains / ranges
If one property is a sub of another, then the domains (respectively ranges) should not be subClasses in the opposite direction. https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0002sparql
https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0003.sparql
Labels and Definitions
Every Class and Property defined in FIBO must have an rdfs:label and a skos:definition https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0004.sparql
Ontology Metadata
Every Ontology defined in FIBO must have a rdfs:label, sm:copyright, dct:license, dct:abstract https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0005.sparql
String printability
Text should not use special characters https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0114.sparql
References to owl:Thing
We should not make explicit references to owl:Thing https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0268.sparql
Unique Labels
Labels should be unique across FIBO for classes and properties. https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1067.sparql
Multiple Inverses
Object properties shouldn't have more than one inverse. https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1078.sparql
Annotation vocabulary
rdfs:comment shouldn't be used for FIBO annotation.https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1079.sparql
Equivalent Named Classes
Equivalent classes may indicate polysemy spread accross multiple classes https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1103.sparql
Disjunctive definitions
Definitions should not contain the "or" connective. https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1127.sparql
Proposed Hygiene tests:
Definition Format
https://wiki.edmcouncil.org/display/FLT/Policy+for+Naming+Conventions
No property may have more than one inverse
SPARQL:
SELECT ?p1 ?p2 ?p
WHERE { ?p1 owl:inverseOf ?p.
?p2 owl:inverseOf ?p.
FILTER (?p1 != ?p2) }
Annotations Conventions
Do not use rdfs:Comment for anything. Here is a SPARQL query to catch them. I found two, so this is no big deal.
SELECT ?Resource ?Type ?Comment
WHERE { ?Resource rdf:type ?Type .
?Resource rdfs:comment ?Comment.
FILTER(?Type in (owl:Class, owl:DatatypeProperty,owl:ObjectProperty ))
FILTER(!(STRSTARTS(STR(?Resource), # ignore things in owl namespace
"http://www.w3.org/2002/07/owl#") ))
FILTER(!(STRSTARTS(STR(?Resource), # ignore things in skos namespace
"http://www.w3.org/2004/02/skos/core#") ))
}
ORDER BY ?Type ?Resource
No Unused Imports
Do not import an ontology unless something in it is explicitly referenced.
Here is some pseudo-SPARQL which assumes we've used the nQuads - not sure of the graph selection syntax
SELECT ?Importer ?UnusedImported
WHERE (?Importer a owl:Ontology.
?Importer owl:imports ?UnusedImported.
?ImportedElement rdfs:isDefinedBy ?UnusedImported.
FILTER NOT EXISTS {
GRAPH ?g {?Importer a owl:Ontology # select the named graph containing the triple defining the ontology
?x ?p ?ImportedElement. # check to see if an imported element is the object of a triple in this graph
}
}}
No Unsatisfied References
Any resource referenced should be explicitly declared.
Here is some SPARQL which should be run without inferencing.
SELECT ?source ?property ?ref
WHERE {?property a owl:ObjectProperty.
?source ?property ?ref.
FILTER NOT EXISTS{?ref rdf:type ?t}
}
8 Comments
Dennis Wisnosky
Any use of Alt Label will not be accepted. We will use synonyms as a sub annotation of Alt Lab. This is our policy.
Elisa Kendall
See Policy: Naming Conventions for additional naming conventions and an alternative policy for definitions. The policy cited above, which recommends full sentences, is not actually recommended. The rule of thumb is that definitions should be able to be used to replace the concept in a sentence. One cannot do this if the definitions are long, or are complete sentences. That's what we have explanatory and other notes for.
Dean Allemang
This sounds right to me. What about capital letters and terminal period?
I am thinking of having a filter that at least checks that the first character is lower-case alpha, and the final character is not a period. How does that sound?
Elisa Kendall
I'll add more to the page - there should not be an initial capital unless the first word is a proper noun, so I'm not sure how you would test for that, but maybe if the first word is "A" or "The" you could do something there, and there should be no period at the end.
One challenge is that there are so many of these that need to be fixed ... including in released ontologies, that we might find a mess if we try to apply this when people upload content to GitHub at the moment ...
Maybe a first step would be to write a script to find them all in released ontologies so we can raise issues?
Dean Allemang
Yes, I noticed that; I just did that query (find all the ones that don't begin with "a " or "the ". There are a lot (some start with numbers!)
Also, there are a surprising number that have no label or no definition - even in Release!
Pete Rivett
what about "any " as a valid start word - we have lots of those?
Pete Rivett
See my comments on the policy page.
My concern is that we focus too much on the form and not the content of definitions. It's far more important to me that the definitions provide adequate business meaning; and use of short incomplete sentences tends to mitigate against that.
Dave Newman
An additional hygiene test for concepts is to check for duplicate labels