3. Etienne URBAH

Etienne URBAH
_ LAL, Univ Paris-Sud, IN2P3/CNRS
_ Bat 200 91898 ORSAY France
_ Tel: +33 1 64 46 84 87
_ urbah (at) lal.in2p3.fr

Profile
Having a broad scientific background, I am able to easily :
– understand the general goals and needs of scientific users,
– capture their use cases and requirements for the goal of automatic data processing.

My framework is professional software engineering and infrastructure operation, based on best practices and tools (CMMI, ITIL, UML) :
– Documentation (even through reverse engineering if needed),
– Change and configuration management,
– Life cycle : capture of initial requirements, design, specification, prototyping, validation of the requirements, development, testing, integration, packaging, validation of the software, documentation, deployment, operation, capture of software issues and new requirements, …
– Traceability, resilience to attacks, modularity, robustness, user friendliness, performance.
This permits software quality, software improvement, and software adoption by users.

For me, ‘computing’ is only a tiny part of the more general ‘data processing’, which requires precise data management, with definition and enforcement of access rights.

Overall summary from 2006 to 2012
– Software Engineer at ‘LAL, Univ Paris-Sud, IN2P3/CNRS’

– Focus on ‘service grids’ (EGI), ‘desktop grids’ (BOINC, XtremWeb-HEP) and the ‘3G bridge‘ connecting service grids to desktop grids.
Instead of ‘grid’, I prefer the concept of ‘distributed data processing’ using distributed infrastructures.

– Member of the EGEE-II, EDGeS and EDGI projects.

– Active contributor to OGF PGI and SIENA.

June 2010 – Mai 2012 : Member of the EDGI project
– Standardization, in particular with OGF PGI and SIENA :
-* Proposal of a Diagram presenting useful grid standards and OGF recommendations.
-* Proposal of a Diagram presenting grid functionalities and interfaces, with corresponding standards.
-* Active participation to PGI, JSDL and OCCI sessions at OGF29 and OGF30.

– Support to the scientific communities which have to develop a desktop grid version of their application(s).

– Quality Assurance for desktop grid versions of scientific applications (supported by other EDGI members, in order to avoid self-conflicts).

Mai 2008 – April 2010 : Member of the EDGeS project
– Standardization, in particular with OGF PGI :
-* Active participation to OGF events from OGF23 to OGF28,
-* Proposal of a document presenting requirements for credentials (obsolescence of GT2 proxies, broader adoption of SAML),
-* Proposal of a state diagram for the PGI Execution Service.

– Inside EDGeS, dissemination of information about useful grid standards and OGF recommendations, in particular :
-* RFC 3820 (X509 proxies)
-* GLUE
-* JSDL + BES

October 2006 – June 2008 : Member of the EGEE-II project
Work inside JRA2 ‘Quality Assurance’.

– Design and implementation of some tests concerning middleware performance, and publication of the results.

– Creation of the MIG web site presenting all available tools providing operational metrics on the EGEE grid.

– Review of EGEE-II DSA1.7: Assessment of production Grid infrastructure service status

– For EGEE, proposal to fully separate :
-* Support to scientific communities, using academic best practices,
-* Operation of the distributed infrastructure, using ITIL best practices,
-* Software engineering, using CMMI best practices.

– Inside Software Engineering, proposal to fully separate responsibilities :
-* On one side, the project owner is responsible for requirements and validation,
-* On the other side, the project manager is responsible for design, specification, implementation, testing and integration.

3. Etienne URBAH

Etienne URBAH
_ LAL, Univ Paris-Sud, IN2P3/CNRS
_ Bat 200 91898 ORSAY France
_ Tel: +33 1 64 46 84 87
_ urbah (at) lal.in2p3.fr

Profile
Having a broad scientific background, I am able to easily :
– understand the general goals and needs of scientific users,
– capture their use cases and requirements for the goal of automatic data processing.

My framework is professional software engineering and infrastructure operation, based on best practices and tools (CMMI, ITIL, UML) :
– Documentation (even through reverse engineering if needed),
– Change and configuration management,
– Life cycle : capture of initial requirements, design, specification, prototyping, validation of the requirements, development, testing, integration, packaging, validation of the software, documentation, deployment, operation, capture of software issues and new requirements, …
– Traceability, resilience to attacks, modularity, robustness, user friendliness, performance.
This permits software quality, software improvement, and software adoption by users.

For me, ‘computing’ is only a tiny part of the more general ‘data processing’, which requires precise data management, with definition and enforcement of access rights.

Overall summary from 2006 to 2012
– Software Engineer at ‘LAL, Univ Paris-Sud, IN2P3/CNRS’

– Focus on ‘service grids’ (EGI), ‘desktop grids’ (BOINC, XtremWeb-HEP) and the ‘3G bridge‘ connecting service grids to desktop grids.
Instead of ‘grid’, I prefer the concept of ‘distributed data processing’ using distributed infrastructures.

– Member of the EGEE-II, EDGeS and EDGI projects.

– Active contributor to OGF PGI and SIENA.

June 2010 – Mai 2012 : Member of the EDGI project
– Standardization, in particular with OGF PGI and SIENA :
-* Proposal of a Diagram presenting useful grid standards and OGF recommendations.
-* Proposal of a Diagram presenting grid functionalities and interfaces, with corresponding standards.
-* Active participation to PGI, JSDL and OCCI sessions at OGF29 and OGF30.

– Support to the scientific communities which have to develop a desktop grid version of their application(s).

– Quality Assurance for desktop grid versions of scientific applications (supported by other EDGI members, in order to avoid self-conflicts).

Mai 2008 – April 2010 : Member of the EDGeS project
– Standardization, in particular with OGF PGI :
-* Active participation to OGF events from OGF23 to OGF28,
-* Proposal of a document presenting requirements for credentials (obsolescence of GT2 proxies, broader adoption of SAML),
-* Proposal of a state diagram for the PGI Execution Service.

– Inside EDGeS, dissemination of information about useful grid standards and OGF recommendations, in particular :
-* RFC 3820 (X509 proxies)
-* GLUE
-* JSDL + BES

October 2006 – June 2008 : Member of the EGEE-II project
Work inside JRA2 ‘Quality Assurance’.

– Design and implementation of some tests concerning middleware performance, and publication of the results.

– Creation of the MIG web site presenting all available tools providing operational metrics on the EGEE grid.

– Review of EGEE-II DSA1.7: Assessment of production Grid infrastructure service status

– For EGEE, proposal to fully separate :
-* Support to scientific communities, using academic best practices,
-* Operation of the distributed infrastructure, using ITIL best practices,
-* Software engineering, using CMMI best practices.

– Inside Software Engineering, proposal to fully separate responsibilities :
-* On one side, the project owner is responsible for requirements and validation,
-* On the other side, the project manager is responsible for design, specification, implementation, testing and integration.

3. Posters

– May 2012 : PDF Poster introducing virtualization over Desktop Grid; CHEP 2012 – May 21-25th, 2012 – New York, USA

– October 2010 : PDF PPT Poster describing (in French) the Software Processing of Scientific Data, for the Science Days 2010.

– Septembre 2010 : PDF PPT Poster describing useful standards and necessary interfaces, at EGI-TF (European Grid Initiative – Technical Forum).

– Mai 2010 : PDF PPT Poster describing the Bridging of Institutional Grids, Desktop Grids and Academic Clouds at France Grilles 2010.

– February 2008 : Poster at EGEE 3rd User Forum describing our work inside the EDGeS project.

– Mai 2007 : Poster at EGEE 2nd User Forum.

3. Posters

– Mai 2012 : PDF Poster présentant la virtualization sur grille de PC; CHEP 2012 – 21-25 Mai 2012 – New York, USA

– Octobre 2010 : PDF PPT Poster en français décrivant le Traitement Informatique des Données Scientifiques, pour la Fête de la Science 2010.

– Septembre 2010 : PDF PPT Poster décrivant les standards utiles et les interfaces nécessaires, à EGI-TF (European Grid Initiative – Technical Forum).

– Mai 2010 : PDF PPT Poster décrivant l’intégration entre Grilles Institutionnelles, Grilles de PC et Nuages Académiques, à France Grilles 2010.

– Février 2008 : Poster au EGEE 3rd User Forum décrivant nos travaux à l’intérieur du projet EDGeS.

– Mai 2007 : Poster au EGEE 2nd User Forum.

DNA correlation

DNA correlation

Abstract

The sequential organization of genomes, i.e. the relations between distant base pairs and regions within sequences, and its connection to the three-dimensional organization of genomes is still a largely unresolved problem. Long-range power-law correlations were found using correlation analysis on almost the entire observable scale of 132 completely sequenced chromosomes of 0.5 × 106 to 3.0 × 107 bp from Archaea, Bacteria, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, and Homo sapiens. The local correlation coefficients show a species-specific multi-scaling behaviour: close to random correlations on the scale of a few base pairs, a first maximum from 40 to 3,400 bp (for Arabidopsis thaliana and Drosophila melanogaster divided in two submaxima), and often a region of one or more second maxima from 105 to 3 × 105 bp. Within this multi-scaling behaviour, an additional fine-structure is present and attributable to codon usage in all except the human sequences, where it is related to nucleosomal binding. Computer-generated random sequences assuming a block organization of genomes, the codon usage, and nucleosomal binding explain these results. Mutation by sequence reshuffling destroyed all correlations. Thus, the stability of correlations seems to be evolutionarily tightly controlled and connected to the spatial genome organization, especially on large scales. In summary, genomes show a complex sequential organization related closely to their three-dimensional organization.
Keywords: Genome organization, Nuclear architecture, Long-range correlations, Scaling analysis, DNA sequence classification

A. Abuseiris, Erasmus – NL

Audio Analysis

DART: A Framework for Distributed Audio Analysis and Music Information Retrieval

Abstract

Audio analysis algorithms and frameworks for Music Information Retrieval (MIR) are expanding rapidly, providing new ways to garnish non-trivial information from audio sources, beyond that which can be ascertained from unreliable metadata such as ID3 tags. The analysis component of MIR requires extensive computational resources. MIR is a broad field, and many aspects of the algorithms and analysis components that are used are more accurate given a larger dataset for analysis, and is often quite DSP/CPU intensive. A Desktop Grid based implementation would reduce computation time and provide access to potentially thousands of MP3 files on target machines, where the files analysed locally on clients’ machines, transferring back only the metadata/results of the analysis. This avoids legal issues, and saves bandwidth.

The DART application framework developed at Cardiff University focuses on the analysis of audio, with a particular interest into MIR. The existing application is designed and created in Triana, a graphical workflow-design environment that is used as a development test bed for the algorithms that will be distributed. The algorithms are programmed in a modular way, which allows only the relevant building blocks of the workflow to be converted into the standalone DART Java application, which is in turn converted into a JAR (multi-platform) executable, and is distributed to target machines across the Desktop Grid using BOINC or XtremWeb.

Eddie Al-Shakarchi, Cardiff University – UK

MATLAB Application

Porting Multiparametric MATLAB Application for Image and Video Processing to Desktop Grid for High-Performance Distributed Computing

Abstract

Optical microscopy is usually used for structural characterization of materials in narrow ranges of magnification, small region of interest (ROI), and in static regime. But many crucial processes of damage initiation and propagation take place dynamically in the wide observable time domain from 10-3 s to 103 s and on the many scales from 10-8 m (solitary defects places) to 10-2 m (correlated linked network of defects). We used one of them to observe in real-time regime the dynamic behavior of the material under mechanical deformation in loading machine, record its evolution, and apply our multiscale image processing software (MultiscaleIVideoP). Our calculations include many parameters of physical process (process rate, magnification, illumination conditions, hardware filters, etc.) and image processing parameters (size distribution, anisotropy, localization, scaling parameters, etc.), hence the calculations are very slow. That is why we have the extreme need of more powerful computational resources. The GRID-version of the proposed application MultiscaleIVideoP would have a very wide range of potential users, because modern laboratories has commercial microscopes with digital output connection to PC and perform everyday tasks of complex static and dynamic morphology analysis: in biology, geology, chemistry, physics, materials science, etc.

Deploying this application on a Grid computing infrastructure, utilising hundreds of machines at the same time, allows harnessing sufficient computational power to undertake the simulations on a larger scale and in a much shorter timeframe. Running the simulations and analysing the results on the Grid provides the excessive computational power required.

Yuri Gordienko, Institut de Physique du Metal – Kiev – Ukraine

Defect Aggregation in Materials Science

Kinetics of Defect Aggregation in Materials Science Simulated in Desktop Grid Computing Environment Installed in Ordinary Material Science Lab

Abstract

Aggregation processes are investigated in many branches of science: defect aggregation in materials science, population dynamics in biology, city growth and evolution in sociology. The typical simulation of crystal defect aggregation by our application SLinCA (Scaling Laws in Cluster Aggregation) takes several days and weeks on a single modern CPU, depending on the number of Monte Carlo steps (MCS). However, thousands of scenarios have to be simulated with different initial configurations to get statistically reliable results. Porting to distributed computing infrastructure (DCI) and parallel execution can reduce waiting time and scale up simulated systems to the desirable realistic values.Deploying this application on a Grid computing infrastructure, utilising hundreds of machines at the same time, allows harnessing sufficient computational power to undertake the simulations on a larger scale and in a much shorter timeframe. Running the simulations and analysing the results on the Grid provides the excessive computational power required

Yuri Gordienko, Institut de Physique du Metal – Kiev – Ukraine