Implemented Hygiene tests:

No Untyped references

Any reference to a URI in any context other than as the object of an annotation property must have a type triple for that URI. https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0001.sparql 

Crossing domains / ranges

If one property is a sub of another, then the domains (respectively ranges) should not be subClasses in the opposite direction. https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0002sparql

 https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0003.sparql

Labels and Definitions

Every Class and Property defined in FIBO must have an rdfs:label and a skos:definition https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0004.sparql

Ontology Metadata

Every Ontology defined in FIBO must have a rdfs:label, sm:copyright, dct:license, dct:abstract https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0005.sparql 

String printability

Text should not use special characters https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0114.sparql

References to owl:Thing

We should not make explicit references to owl:Thing https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene0268.sparql

Unique Labels

Labels should be unique across FIBO for classes and properties. https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1067.sparql

Multiple Inverses

Object properties shouldn't have more than one inverse. https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1078.sparql

Annotation vocabulary

rdfs:comment shouldn't be used for FIBO annotation.https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1079.sparql

Equivalent Named Classes

Equivalent classes may indicate polysemy spread accross multiple classes https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1103.sparql

Disjunctive definitions

Definitions should not contain the "or" connective. https://github.com/edmcouncil/fibo/blob/master/etc/testing/hygiene/testHygiene1127.sparql






Proposed Hygiene tests:

Definition Format

 https://wiki.edmcouncil.org/display/FLT/Policy+for+Naming+Conventions 


No property may have more than one inverse

SPARQL: 

SELECT ?p1 ?p2 ?p 
WHERE { ?p1 owl:inverseOf ?p.
?p2 owl:inverseOf ?p.
FILTER (?p1 != ?p2) }

Annotations Conventions

Do not use rdfs:Comment for anything. Here is a SPARQL query to catch them. I found two, so this is no big deal.

SELECT ?Resource ?Type ?Comment 
WHERE { ?Resource rdf:type ?Type .
?Resource rdfs:comment ?Comment. 
FILTER(?Type in (owl:Class, owl:DatatypeProperty,owl:ObjectProperty ))
FILTER(!(STRSTARTS(STR(?Resource), # ignore things in owl namespace
"http://www.w3.org/2002/07/owl#") ))
FILTER(!(STRSTARTS(STR(?Resource), # ignore things in skos namespace
"http://www.w3.org/2004/02/skos/core#") ))
}
ORDER BY ?Type ?Resource

No Unused Imports

Do not import an ontology unless something in it is explicitly referenced.

Here is some pseudo-SPARQL which assumes we've used the nQuads - not sure of the graph selection syntax

SELECT ?Importer ?UnusedImported
WHERE (?Importer a owl:Ontology.
?Importer owl:imports ?UnusedImported.
?ImportedElement rdfs:isDefinedBy ?UnusedImported.
FILTER NOT EXISTS {
    GRAPH ?g {?Importer a owl:Ontology  # select the named graph containing the triple defining the ontology 
                      ?x ?p ?ImportedElement. # check to see if an imported element is the object of a triple in this graph 
              }
    }}

No Unsatisfied References

Any resource referenced should be explicitly declared.

Here is some SPARQL which should be run without inferencing.

    SELECT ?source ?property ?ref
WHERE {?property a owl:ObjectProperty.
?source ?property ?ref.
FILTER NOT EXISTS{?ref rdf:type ?t}
}
              

 


    





  • No labels

8 Comments

  1. Any use of Alt Label will not be accepted.  We will use synonyms as a sub annotation of Alt Lab.  This is our policy.

  2. See Policy: Naming Conventions for additional naming conventions and an alternative policy for definitions.  The policy cited above, which recommends full sentences, is not actually recommended.  The rule of thumb is that definitions should be able to be used to replace the concept in a sentence.  One cannot do this if the definitions are long, or are complete sentences.  That's what we have explanatory and other notes for.

    1. This sounds right to me.  What about capital letters and terminal period? 

      I am thinking of having a filter that at least checks that the first character is lower-case alpha, and the final character is not a period.  How does that sound? 


  3. I'll add more to the page - there should not be an initial capital unless the first word is a proper noun, so I'm not sure how you would test for that, but maybe if the first word is "A" or "The" you could do something there, and there should be no period at the end.

    One challenge is that there are so many of these that need to be fixed ... including in released ontologies, that we might find a mess if we try to apply this when people upload content to GitHub at the moment ...

    Maybe a first step would be to write a script to find them all in released ontologies so we can raise issues?

    1. Yes, I noticed that; I just did that query (find all the ones that don't begin with "a " or "the ".    There are a lot (some start with numbers!)

      Also, there are a surprising number that have no label or no definition - even in Release!

      1. what about "any " as a valid start word - we have lots of those?

  4. See my comments on the policy page.

    My concern is that we focus too much on the form and not the content of definitions. It's far more important to me that the definitions provide adequate business meaning; and use of short incomplete sentences tends to mitigate against that.

  5. An additional hygiene test for concepts is to check for duplicate labels