E-lab architecture

The proposed e-lab comprises three layers: the e-science layer and the data mining layer form a generic knowledge discovery platform that can be adapted to different scientific domains by customizing the application layer. The project's overall research strategy can be summarized as the bottom-up construction of this three-tiered architecture.

 

 

The foundation of the e-science layer is a suite of open-source components developed by the University of Manchester (e.g., myGrid e-science platform, Taverna workflow editor). To build the e-LICO infrastructure (figure below), these components will be extended with tools for content creation (e.g. semantic annotation, ontology engineering) as well as mechanisms for multiple levels and modes of collaboration in experimental research.

 

The e-science layer


The data mining layer is the core of e-LICO; it will provide a comprehensive set of multimedia (structured records, text, images, signals) data mining tools. Standard tools will be complemented with preprocessing or learning algorithms developed specifically to respond to problems of data-intensive, knowledge rich sciences, such as extremely high dimensionality and undersampling, learning from heterogeneous data, incorporating prior knowledge into learning. Methodologically sound use of these tools will be ensured by a knowledge-driven, planner-based data mining assistant (WP6), which will rely on a data mining ontology and knowledge base to plan the data mining process and propose ranked workflows for a given application problem. Extensive e-lab monitoring facilities will support comparison and analysis of experiments by a meta-miner, whose role will be to ensure that the data mining assistant's workflow recommendations improve with experience.

The application layer is always domain-specific. In the generic e-lab, the application layer is an empty shell. It is built by the domain user who will use the tools available in the e-science and DM layers to:

1. customize the infrastructure to the needs of the domain, e.g., identify in the e-science layer all the services that the user team would like to access and use;

2. either access existing domain ontologies or create a domain ontology using the collaborative authoring tools provided in the e-science layer;

3. design, run and analyse data mining experiments using tools (algorithms, workflows, models, datasets) in the data mining layer;

4. semantically annotate DM experiments and input data using the semantic annotation tools in the e-science layer.

 

The data mining and application layers


In the e-LICO prototype, the application layer will be instantiated for a systems biology task: biomarker discovery and pathway modelling for diseases affecting the kidney and urinary pathways (KUP). Domain-specific knowledge sources, such as a specialized ontology and a data base on kidney and urinary pathways, will be collaboratively authored by European specialists in the area. The data mining e-lab will be showcased on the discovery of molecular markers and pathways involved in the onset and progression of diseases affecting the KUP, in particular bladder cancer.

The final deliverable of the project will be a free, experimental prototype open to continuous collaborative expansion and refinement by the research community.