Category:ICT-Architecture

From Epos WiKi
Jump to: navigation, search

PLEASE NOTE that this is a high level overview of the EPOS ICT architecture. Further and more detailed documentation is available on the EPOS GitHub Technical Documentation page.


Organization

The EPOS architecture has been designed to organize and manage the interactions among different EPOS actors and assets. To make it possible for the EPOS enterprise to work as a single, but distributed, sustainable research infrastructure, its architecture takes into account technical, governance, legal and financial issues. Four complementary elements form the infrastructure:

  1. The National Research Infrastructures (NRIs) contribute to EPOS while being owned and managed at a national level and represent the basic EPOS data providers. These require significant economic resources, both in terms of construction and yearly operational costs, which are typically covered by national investments that must continue during EPOS implementation, construction and operation.
  2. The Thematic Core Services (TCS) enable integration across specific scientific communities. They represent a governance framework where data and services are provided and where each community discusses its implementation and sustainability strategies as well as legal and ethical issues.
  3. The Integrated Core Services (ICS) represents the e-infrastructure consisting of services that will allow access to multidisciplinary resources provided by the NRIs and TCS. These will include data and data products as well as, synthetic data from simulations, processing, and visualization tools. The ICS will be composed of the ICS-Central Hub (ICS-C) and distributed computational resources including also processing and visualisation services (ICS-D). ICS is the place where integration occurs.
  4. The Executive and Coordination Office (ECO) is the EPOS Headquarters and the legal seat (ERIC) of the distributed infrastructure governing the construction and operation of the ICS and coordinating the implementation of the TCS.

The European Research Infrastructure Consortium (ERIC) has been chosen by the Board of Governmental Representatives as the legal model for EPOS and is used in designing the Governance Model. This includes a General Assembly of members and an Executive Director, supported by a Coordination Office. A funding model has been designed that will support the sustainable construction and operation of the whole EPOS enterprise. The model includes complementary funding sources for each of the key EPOS elements.

Figure 1 describes the EPOS technical architecture organised in three layers.


Epos technical architecture.jpg

Figure 1: EPOS technical architecture. The diagram shows the three layers in which the EPOS components (institutions and services) have been organized: National Layer, Community Layer, Integration Layer including also an Interoperability Layer.

The main concept is that the EPOS TCS data and services are provided to the ICS (see Fig.1) by means of a communication layer called the interoperability layer, as shown in the functional architecture (Fig. 2). This layer contains all the technology to integrate data, data products, services and software (DDSS) from many scientific, thematic communities into the single integrated environment of the Integrated Core Services (ICS). The ICS represents the “core” of the whole e-infrastructure and those responsible for its implementation will provide the specification of the “interoperability layer”. The ICS is conceptually a single, centralized facility but in practice is likely to be replicated (for resilience and performance) and localized for particular natural language groupings or legal jurisdictions.


Technical architecture

The ICS is made up of several, modular, interoperable building blocks (Fig. 2). The three layer structure adopted in the technical architecture of EPOS consists of the National Layer where the National Research Infrastructures provide the DDSS. Data providers in this layer are independent national institutions or organizations which have their own technical solutions that may (or may not) follow international standards in providing data and data products to the community. The second layer, Thematic Core Services (TCS) is the (European) Community Layer where community standards are applied to DDSS that are relevant to the specific thematic area of concern. The third (top) layer represents the integration of the DDSS that come from the TCSs, where high level international standards are applied. At this level metadata describing all DDSS need to be harmonized into a single metadata catalogue which is based on international standards. During the Preparatory Phase of the EPOS Project, a European metadata catalogue standard, CERIF (Common European Research Infrastructure Format), was tested and used for the prototype development. It has subsequently been adopted in the EPOS Implementation Phase. In order that the DDSS from the various TCSs can be converted into the chosen metadata catalogue standard (i.e. CERIF), there is a need for an additional layer where TCS data sets will be mapped and converted to CERIF. This is referred to as the “Interoperability layer”. The various components of the ICS are now explained in more detail.


Schermata 2017-02-03 alle 09.57.04.png

Figure 2: EPOS functional Architecture, describing the technical functional components of EPOS. It specifies for each layer the ICT modules and their function. At the ICS layer it describes the design of the integrating e-Infrastructure.



Metadata catalogue

Metadata describing the TCS DDSS are stored using the CERIF data model which differs from most metadata standards in that it (1) separates base entities from linking entities thus providing a fully connected graph structure; (2) using the same syntax, stores the semantics associated with values of attributes both for base entities and (for role of the relationship) for linking entities, which also store the temporal duration of the validity of the linkage. This provides great power and flexibility. CERIF also (as a superset) interoperates with widely adopted metadata formats such as DC (Dublin Core), DCAT (Data Catalogue Vocabulary), CKAN (Comprehensive Knowledge Archive Framework), INSPIRE (the EC version of ISO 19115 for geospatial data) and others. The metadata catalogue will also manage the semantics, in order to provide the meaning of the attribute values. CERIF stores the semantics in a ‘semantic layer’ referenced from the syntactic layer thus providing a single integrated semantic environment which is efficient because it uses standard IT (usually relational but all other database/processing environments may be used). CERIF interoperates with OWL (Web Ontology Language), SKOS (Simple Knowledge Organization System) and other semantic representation languages. Metadata from the communities will be mapped to the metadata catalogue in order to create appropriate links between common concepts in different disciplines. This process involves the harmonization and interoperability of the various DDSS from the different TCSs through dedicated software modules. It requires TCS APIs for converting DDSS to the TCS specific metadata standard. It also requires ICS APIs (wrappers) to map and store this in the ICS metadata catalogue (i.e. CERIF). These TCS APIs and the corresponding ICS APIs collectively form the “interoperability layer”, which is the link between the TCSs and the ICS.


System Managing Software

The system managing software will manage the metadata catalogue and all other modules (e.g. workflow engine and generally all the resources involved to satisfy the user requests).


Workflow engines and provenance

A key aspect of the semi-automatic composition of software to meet a user request is the provision of a workflow to link together the software services as they access appropriate data. Workflow engines available are many and each of them fits different use cases and architectures. We will take into account the computational models (i.e. from the Computational Earth Science community) to be supported and the communities’ requirements. We can anticipate that, following the experience gained by the EPOS partners in initiatives such as VERCE and ongoing work in RDA (Research Data Alliance), particular attention will be dedicated to cross platform streaming libraries. CERIF can also provide provenance information since the linking entities associate with the role have, as attributes, both date/time start and date/time end. This handles versioning and – via the linking entity record –the relationship of one base entity instance (e.g. a dataset) to another. On the other hand, for a comprehensive traceability of the processes and agents that contributed to the generation of the research product, we foresee the integration in CERIF of the W3C-PROV ontology. This will guarantee interoperability with other institutional data archives, fostering data preservation and curation across domains.

IAAA to data and computational resources (cloud, grid, HPC)

This module will manage and interoperate with all the major ‘common’ IAAA (Identification, Authentication, Authorisation, Accounting) services and standards from AAAI (Authentication, Authorisation, Accounting Infrastructure) such as SAML, OAuth, OpenID, X.509 and related products such as EduGAIN, Shibboleth, Kerberos and others and also for user directory services such as Microsoft Active Directory and LDAP. Addressing the IAAA in a satisfactory way is a challenge at the present stage, and is also being faced in other projects and initiatives following AAAI (e.g. AARC , EGI-Engage ), with which EPOS is collaborating. The goal in this collaboration is to implement a smart IAAA mechanism which is able to hide from the user all the complexity of delegation-based AAAI mechanisms.


ICS-D

As already described, Integrated Cores Services – Distributed (ICS-D) will include services from external computing facilities. These will include HPC (High Performance Computing) machines for modelling and simulation according to the requirements of the Computational Earth Science community, and HTC (High Throughput Computing) clusters for data intensive applications such as data mining. The data workflow will be managed by EPOS ICS-C in order to provide the end user with appropriate computational services, even though actual computations will be provided by ICS-D. Additional ICS-D services will provide visualization and processing capabilities. ICS-C will have to develop provisions for communicating with these external services in a seamless manner.


Web services / APIs

EPOS-IP, wherever possible, will use web services as the main vehicle for software services, defining APIs, and implementing the best practices for a sound microservices architecture, so that workflows can be composed semi-automatically. Web services and APIs, together with appropriate mapping of metadata which will drive data convertors, will also be the driving technology for the “connection” of TCS with “ICS”.

The detailed description of the whole EPOS e-infrastructure is out of the scope of this document. For further details and information follow the link to EPOS-ICT summary at: https://www.epos-ip.org/sites/default/files/repository/images/ICS-TCS-Integration-Guidelines-Level-2.pdf

This category currently contains no pages or media.