2ème Tutoriel IDGF – Grilles de PC pour le calcul scientifique

Le 17 Septembre 2010 de 14h00 à 17h30, Beurs van Berlage, Amsterdam (NL).

Le but de ce second tutoriel IDGF est de fournir une introduction aux Grilles de PC : Calcul sur Grilles de PC, Programmation pour Grilles de PC, Interconnexion de Grilles de PC à des environnements standards de calcul scientifique type EGI (European Grid Infrastructure). 2ème Tutoriel IDGF – Utilisation de Grilles de PC pour le calcul scientifique (en anglais)
– Date :   17 Septembre 2010
– Heure :  14h00 – 17h30
– Lieu :   Beurs van Berlage, Amsterdam (NL)

Objectif
_ Le but de ce second tutoriel IDGF est de fournir une introduction aux Grilles de PC :  Calcul sur Grilles de PC,  Programmation pour Grilles de PC,  Interconnexion de Grilles de PC à des environnements standards de calcul scientifique type EGI (European Grid Infrastructure).

Les Grilles de PC intègrent des ressources de calcul inutilisées autrement, en les collectant et en les rendant utilisables pour des applications scientifiques.
– Une Grille de PC peut, par exemple, intégrer dans une université des machines de bureau et/ou des ordinateurs de salles de cours.  Elle est alors qualifiée de locale.
– Une Grille de PC peut aussi intégrer du temps de calcul inutilisé offert par le grand public.  Elle est alors qualifiée de publique.
_ Le tutoriel fournit une introduction aux Grilles de PC et montre comment installer une Grille de PC.

Pour qu’une application de calcul scientifique puisse utiliser la puissance de calcul des ordinateurs intégrés dans une Grille de PC, il faut préalablement adapter cette application.  Ce tutoriel montre comment adapter une application pour exécution sur une Grille de PC.

Les Grilles de PC connectées à l’aide de la technologie ‘3G Bridge’ peuvent être intégrées à l’Infrastructure de Grille Européenne EGI.  Le tutoriel fournit de l’information de base sur cette technologie.

Auditoire visé
– Tout responsable envisageant de mettre en place une Grille de PC locale ou publique,
– Les scientifiques et les développeurs d’applications recherchant les moyens d’utiliser de la puissance de calcul supplémentaire,
– Les opérateurs de Grille NGI / EGI recherchant à étendre leurs services avec plus de puissance de calcul, et envisageant l’utilisation de Grilles de PC.

Organisation
_ Ce second tutoriel IDGF est organisé par IDGF (International Desktop Grid Federation), avec le soutien du projet EDGI et de Gridforum.nl
_ Il est facilité par le Technical Forum de EGI.
_ Organisateur local :  AlmereGrid.

Plus d’information sur

3. Etienne URBAH

Etienne URBAH
_ LAL, Univ Paris-Sud, IN2P3/CNRS
_ Bat 200 91898 ORSAY France
_ Tel: +33 1 64 46 84 87
_ urbah (at) lal.in2p3.fr

Profile
Having a broad scientific background, I am able to easily :
– understand the general goals and needs of scientific users,
– capture their use cases and requirements for the goal of automatic data processing.

My framework is professional software engineering and infrastructure operation, based on best practices and tools (CMMI, ITIL, UML) :
– Documentation (even through reverse engineering if needed),
– Change and configuration management,
– Life cycle : capture of initial requirements, design, specification, prototyping, validation of the requirements, development, testing, integration, packaging, validation of the software, documentation, deployment, operation, capture of software issues and new requirements, …
– Traceability, resilience to attacks, modularity, robustness, user friendliness, performance.
This permits software quality, software improvement, and software adoption by users.

For me, ‘computing’ is only a tiny part of the more general ‘data processing’, which requires precise data management, with definition and enforcement of access rights.

Overall summary from 2006 to 2012
– Software Engineer at ‘LAL, Univ Paris-Sud, IN2P3/CNRS’

– Focus on ‘service grids’ (EGI), ‘desktop grids’ (BOINC, XtremWeb-HEP) and the ‘3G bridge‘ connecting service grids to desktop grids.
Instead of ‘grid’, I prefer the concept of ‘distributed data processing’ using distributed infrastructures.

– Member of the EGEE-II, EDGeS and EDGI projects.

– Active contributor to OGF PGI and SIENA.

June 2010 – Mai 2012 : Member of the EDGI project
– Standardization, in particular with OGF PGI and SIENA :
-* Proposal of a Diagram presenting useful grid standards and OGF recommendations.
-* Proposal of a Diagram presenting grid functionalities and interfaces, with corresponding standards.
-* Active participation to PGI, JSDL and OCCI sessions at OGF29 and OGF30.

– Support to the scientific communities which have to develop a desktop grid version of their application(s).

– Quality Assurance for desktop grid versions of scientific applications (supported by other EDGI members, in order to avoid self-conflicts).

Mai 2008 – April 2010 : Member of the EDGeS project
– Standardization, in particular with OGF PGI :
-* Active participation to OGF events from OGF23 to OGF28,
-* Proposal of a document presenting requirements for credentials (obsolescence of GT2 proxies, broader adoption of SAML),
-* Proposal of a state diagram for the PGI Execution Service.

– Inside EDGeS, dissemination of information about useful grid standards and OGF recommendations, in particular :
-* RFC 3820 (X509 proxies)
-* GLUE
-* JSDL + BES

October 2006 – June 2008 : Member of the EGEE-II project
Work inside JRA2 ‘Quality Assurance’.

– Design and implementation of some tests concerning middleware performance, and publication of the results.

– Creation of the MIG web site presenting all available tools providing operational metrics on the EGEE grid.

– Review of EGEE-II DSA1.7: Assessment of production Grid infrastructure service status

– For EGEE, proposal to fully separate :
-* Support to scientific communities, using academic best practices,
-* Operation of the distributed infrastructure, using ITIL best practices,
-* Software engineering, using CMMI best practices.

– Inside Software Engineering, proposal to fully separate responsibilities :
-* On one side, the project owner is responsible for requirements and validation,
-* On the other side, the project manager is responsible for design, specification, implementation, testing and integration.

3. Etienne URBAH

Etienne URBAH
_ LAL, Univ Paris-Sud, IN2P3/CNRS
_ Bat 200 91898 ORSAY France
_ Tel: +33 1 64 46 84 87
_ urbah (at) lal.in2p3.fr

Profile
Having a broad scientific background, I am able to easily :
– understand the general goals and needs of scientific users,
– capture their use cases and requirements for the goal of automatic data processing.

My framework is professional software engineering and infrastructure operation, based on best practices and tools (CMMI, ITIL, UML) :
– Documentation (even through reverse engineering if needed),
– Change and configuration management,
– Life cycle : capture of initial requirements, design, specification, prototyping, validation of the requirements, development, testing, integration, packaging, validation of the software, documentation, deployment, operation, capture of software issues and new requirements, …
– Traceability, resilience to attacks, modularity, robustness, user friendliness, performance.
This permits software quality, software improvement, and software adoption by users.

For me, ‘computing’ is only a tiny part of the more general ‘data processing’, which requires precise data management, with definition and enforcement of access rights.

Overall summary from 2006 to 2012
– Software Engineer at ‘LAL, Univ Paris-Sud, IN2P3/CNRS’

– Focus on ‘service grids’ (EGI), ‘desktop grids’ (BOINC, XtremWeb-HEP) and the ‘3G bridge‘ connecting service grids to desktop grids.
Instead of ‘grid’, I prefer the concept of ‘distributed data processing’ using distributed infrastructures.

– Member of the EGEE-II, EDGeS and EDGI projects.

– Active contributor to OGF PGI and SIENA.

June 2010 – Mai 2012 : Member of the EDGI project
– Standardization, in particular with OGF PGI and SIENA :
-* Proposal of a Diagram presenting useful grid standards and OGF recommendations.
-* Proposal of a Diagram presenting grid functionalities and interfaces, with corresponding standards.
-* Active participation to PGI, JSDL and OCCI sessions at OGF29 and OGF30.

– Support to the scientific communities which have to develop a desktop grid version of their application(s).

– Quality Assurance for desktop grid versions of scientific applications (supported by other EDGI members, in order to avoid self-conflicts).

Mai 2008 – April 2010 : Member of the EDGeS project
– Standardization, in particular with OGF PGI :
-* Active participation to OGF events from OGF23 to OGF28,
-* Proposal of a document presenting requirements for credentials (obsolescence of GT2 proxies, broader adoption of SAML),
-* Proposal of a state diagram for the PGI Execution Service.

– Inside EDGeS, dissemination of information about useful grid standards and OGF recommendations, in particular :
-* RFC 3820 (X509 proxies)
-* GLUE
-* JSDL + BES

October 2006 – June 2008 : Member of the EGEE-II project
Work inside JRA2 ‘Quality Assurance’.

– Design and implementation of some tests concerning middleware performance, and publication of the results.

– Creation of the MIG web site presenting all available tools providing operational metrics on the EGEE grid.

– Review of EGEE-II DSA1.7: Assessment of production Grid infrastructure service status

– For EGEE, proposal to fully separate :
-* Support to scientific communities, using academic best practices,
-* Operation of the distributed infrastructure, using ITIL best practices,
-* Software engineering, using CMMI best practices.

– Inside Software Engineering, proposal to fully separate responsibilities :
-* On one side, the project owner is responsible for requirements and validation,
-* On the other side, the project manager is responsible for design, specification, implementation, testing and integration.

3. Posters

– May 2012 : PDF Poster introducing virtualization over Desktop Grid; CHEP 2012 – May 21-25th, 2012 – New York, USA

– October 2010 : PDF PPT Poster describing (in French) the Software Processing of Scientific Data, for the Science Days 2010.

– Septembre 2010 : PDF PPT Poster describing useful standards and necessary interfaces, at EGI-TF (European Grid Initiative – Technical Forum).

– Mai 2010 : PDF PPT Poster describing the Bridging of Institutional Grids, Desktop Grids and Academic Clouds at France Grilles 2010.

– February 2008 : Poster at EGEE 3rd User Forum describing our work inside the EDGeS project.

– Mai 2007 : Poster at EGEE 2nd User Forum.

3. Posters

– Mai 2012 : PDF Poster présentant la virtualization sur grille de PC; CHEP 2012 – 21-25 Mai 2012 – New York, USA

– Octobre 2010 : PDF PPT Poster en français décrivant le Traitement Informatique des Données Scientifiques, pour la Fête de la Science 2010.

– Septembre 2010 : PDF PPT Poster décrivant les standards utiles et les interfaces nécessaires, à EGI-TF (European Grid Initiative – Technical Forum).

– Mai 2010 : PDF PPT Poster décrivant l’intégration entre Grilles Institutionnelles, Grilles de PC et Nuages Académiques, à France Grilles 2010.

– Février 2008 : Poster au EGEE 3rd User Forum décrivant nos travaux à l’intérieur du projet EDGeS.

– Mai 2007 : Poster au EGEE 2nd User Forum.

DNA correlation

DNA correlation

Abstract

The sequential organization of genomes, i.e. the relations between distant base pairs and regions within sequences, and its connection to the three-dimensional organization of genomes is still a largely unresolved problem. Long-range power-law correlations were found using correlation analysis on almost the entire observable scale of 132 completely sequenced chromosomes of 0.5 × 106 to 3.0 × 107 bp from Archaea, Bacteria, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, and Homo sapiens. The local correlation coefficients show a species-specific multi-scaling behaviour: close to random correlations on the scale of a few base pairs, a first maximum from 40 to 3,400 bp (for Arabidopsis thaliana and Drosophila melanogaster divided in two submaxima), and often a region of one or more second maxima from 105 to 3 × 105 bp. Within this multi-scaling behaviour, an additional fine-structure is present and attributable to codon usage in all except the human sequences, where it is related to nucleosomal binding. Computer-generated random sequences assuming a block organization of genomes, the codon usage, and nucleosomal binding explain these results. Mutation by sequence reshuffling destroyed all correlations. Thus, the stability of correlations seems to be evolutionarily tightly controlled and connected to the spatial genome organization, especially on large scales. In summary, genomes show a complex sequential organization related closely to their three-dimensional organization.
Keywords: Genome organization, Nuclear architecture, Long-range correlations, Scaling analysis, DNA sequence classification

A. Abuseiris, Erasmus – NL