The euBusinessGraph project aims at simplifying cross-border and cross-lingual collection, reconciliation, aggregation and analysis of company-related information from several authoritative and non-authoritative sources.
The euBusinessGraph has drawn on the experience of its data providers and technology providers to tackle the complex task of combining company data from multiple sources. We have defined a common semantic model (ontology) to represent companies and their attributes in a consistent way.
- Based on project needs and provider datasets
- Rooted in and reuses existing ontologies and datasets
- Expressed in comprehensive EBG Semantic Model doc
- Formalized as ebg: ontology using schema:(domain|range)Includes
- Also defines URL patterns and authorities/lookup lists to use
- Will be validated with RDF Shapes
Common Semantic Model (ontology)
We created an initial company data model considering related works, data available from the partners, and the needs of their business cases. The model covers the following requirements:
• Capture the concept of a company and represent different types of companies.
• Represent company jurisdictions and registration information.
• Capture company contact information, such as the address and other locations.
• Capture social data of companies, such as their websites (together with Web languages), RSS/Atom feeds and Wikipedia URLs.
• Answer if a company is publicly traded or not, if it is state owned or not, and if it is registered in a startup register.
• Support languages: EN, IT, NO.
In developing the company data model we have reused from appropriate ontologies such as:
- EU Core Vocabs: W3C Org, RegOrg, Location, Person (not W3C)
- schema.org: widely used, some relevant properties (e.g., dates)
- ADMS: datasets and identifiers
Figure 1: Towards a common semantic model for company data
Since none of the existing ontologies covers the complete scope we need, we reuse where possible and extend and compose by:
- Add some classes and properties of our own (ebg: ontology)
- Use schema:(domain|range)Includes instead of rdfs:(domain|range) for easier composition (polymorphic vs monomorphic)
In addition we define RDF Shapes (SHACL and ShEx) to validate incoming data.
In its first release, the model focuses on capturing key company information present in official registers such as legal name, registered address and economic classification, and also information coming from online resources related to the company such as company websites, blogs and social media accounts. These aspects are explicitly incorporated into the model and describe company information that is shared across data providers and directly accessible through the graph. Additionally, the model supports advertising other company related information available from data providers directly.
Figure 2: Company data attributes that are covered by the model
We have performed a thorough analysis of identifiers in the context of euBusinessGraph. From the analysis of the different identifier systems and the requirements of the business cases of the project, we singled out key aspects about identifiers and addressed them in the common semantic model.
Achieving matching and reconciliation across jurisdictions and registers requires careful modelling of identifier use. This release models the different cases through properties that describe the lifecycle of each identifier issued and by encoding a series of characteristics of the identifier system to which the identifier belongs. We follow a pragmatic approach when describing identifier systems in terms of these characteristics.
We model expectations of a particular system that should help determine to which extent an indicator can be used for matching and reconciliation. Additionally, we model web resources that are frequently found for identifier systems such as search endpoints, templates for building identifier URLs through which company information can be reached and other resources that describe the system’s rules. Finally, the model supports the representation of the different agents that are in charge of setting and maintaining rules, issuing identifiers and publishing identifier databases.
Figure 3: Identifier System attributes that is covered by the modelFurther information
For further details about the euBusinessGraph ontology:
- [Deliverable D2.1], which describes the first release of the Company Data Model at month 12.
- The EBG Semantic Model document, which is continuously updated and represents the latest version of the ontology.
- GitHub Repository, which contains the sources files used in developing and maintaining the ontology.