XtremWeb-HEP 7

Documentation

The documentation is available here.

Bug tracking

You can glance at our trac server.

Versions

  • Dec 12th, 2011 : XWHEP 7.6.4
    • Corrections

      1. a bug corrected on work update (e.g. on completion)
      2. a bug corrected on submission
    • New features
      • none
  • Oct 22nd, 2011 : XWHEP 7.6.3
    • Corrections

      1. usage of Zipper.java modified so that we now use file name only for entries having a path starting with a ‘/’ or containing “../” (e.g. “/var/log/dummy.log” or “/home../var/log/dummy.log” are compressed as “dummy.log”);
      2. a bug corrected in Zipper.java so that compressed entries don’t start with ‘/’ any more
      3. to avoid misbehavior, a worker not presenting the same version as the server receives no job
    • New features
      • none
  • Oct 4th, 2011 : XWHEP 7.6.2
    • Corrections

      1. scripts are now compatible with dash
      2. data removal improved
      3. the bridge installation package corrected
    • New features
      • URI passthrough implemented to comply to EDGI JRA1
  • July 18th, 2011 : XWHEP 7.5.0
    • Corrections

      1. windows XP and 7 supported (Vista not tested)
      2. on server side : access logs are now stored in HomeDir as found in config file and not in /tmp; a bug corrected in group management; DB access improved
      3. remove leading and trailing spaces from job command line
      4. the communication protocol has been improved so that the server always sends an answer, especially on error.
        The client now displays errors received from the server, if any.
        The client also sets its return code accordingly.

      5. the worker-server communication protocol has been corrected so that the worker keeps a chance to upload results, even on server FS error.
        This feature exists for long (since INRIA XtremWeb 1.6) but was buggy.

      6. client GUI improved
      7. a security hole corrected on object removal
      8. on Mac OS X, the server is now correctly installed and works well.
        We don’t use Mac OS X specific stuffs, like StartupItems or LaucnhDaemons, since we have some problems with them.
        Instead, we use linux like scripts : xtremweb.server [start | stop | restart | status].
        There are two drawbacks : the server does not automatically starts at boot time and the server runs as root under Mac OS X.

      9. Attic removed until further notification : Attic may crash the worker under certain circumstances
    • New features
      • client can connect using its certificate and associated private key.
        The client-server connection is challenged using this key pair.
        The client certificate must be certified by a CA cert path known by the server.

      • client accepts a new –xwout command line option to set the output file name (used with xwresults and xwdownload)
      • client can now retrieve applications by their name and users by their login
  • May 12th, 2011 : XWHEP 7.4.1
    • Corrections

      1. worker, server: xtremwebconf.sh corrected (‘status’ param now accepted)
      2. worker, server: launching services do not display any more “ls: *.zip: No such file or directory”
      3. worker, server : log files are now stored in /var/log
      4. server : a bug corrected on DB cache usage
      5. client : a bug corrected on xwversion command
      6. client : a message is now displayed if config file not found
      7. client : the config file is not modified since it appeared to be too confusing for the user
      8. works and datas tables modified as needed for the 3G bridge plugin : label and name set to char(150)
      9. a bug corrected on group jobs management
    • New features
      • none
  • Mar 25th, 2011 : XWHEP 7.4.0

    This version introduces corrections and new features to dramatically improve performances.

    We have benchmarked our versions using Grid5000. The runs were sent from 1 client to 1 server managing more than 2,000 workers.

    Time necessary to submit 10,000 jobs from one client macro file is reduced by up to 40% as shown in the next graph comparing XWHEP 7.3.2 and 7.4.0

    The following are extracted from run using 7.4.0

    Next graph shows jobs submission times (in green), job launch times on worker side (in blue) and completed times in (red). Submitting 10K jobs took less than 30mn for a total execution time about 45mn for these 10K jobs.

    The last graph shows a good load balancing within the 2323 connected workers.

    • Corrections

      1. The communication layer now accepts both connected and non connected mode. The default is as it used to be : non connected mode (one message per socket), but the communication layer can now be used in connected mode : it accepts up to 2000 messages per socket.

        To protect the platform against too easy DoS attacks, the server :

        • closes socket after 2000 messages (the client is written so that it is transparent) [this is why there are steps on green line in 1st graph : these are reconnection times every 2000 messages]
        • sets a socket time out (SOTIMEOUT from config file)
    • New features
      • amount of simultaneous connections to DB is now read from server config file
      • introducing SORETRIES (from config file) : max times trying to connect on socket error
      • reintroducing (from former versions 5) write through cache in front of DB to improve perfs
      • reintroducing (from former versions 5) pool of TCP and DB handlers to reduce malloc/dealloc
  • Feb 10th, 2011 : XWHEP 7.3.2
    • Corrections

      1. a bug corrected on URL usage
    • New features
      • none
  • Feb 8th, 2011 : XWHEP 7.3.1
    • Corrections

      1. a bug corrected on communication proxy usage
    • New features
      • none
  • Feb 8th, 2011 : XWHEP 7.3.0
    • Corrections

      1. a bug corrected on communication layer
      2. a bug corrected on meta data usage
    • New features
        1. we can define a proxy in client and worker configuration file. This allows resource aggregation from Grid5000
  • Feb 2nd, 2011 : XWHEP 7.2.2
    • Corrections

      1. a bug on job management introduced in 7.2.1 has been corrected
    • New features
      1. none
  • Jan 19th, 2011 : XWHEP 7.2.1

    See details at Trac Report

    • Corrections

      1. xwconfigure corrected so that DB reset now works corretly
      2. client corrected : it now correctly display access rights in hexadecimal and not in decimal
    • New features
      1. none
  • Jan 20th, 2011 : two bugs found in XWHEP 7.2.0
    1. the xwconfigure is erroneus on the database resetting,this may lead to some misbehaviours.

      To reset the DB, please do it manually and relaunch the xwconfigure script.

    2. on client side, access rights are displayed in decimal and not in octal.
  • Jan 19th, 2011 : XWHEP 7.2.0

    See details at Trac Report

    • Corrections

      1. xwconfigure corrected
      2. server now handles SSL handshake error correctly and does not hang any more
      3. on client side, if valid the X509 proxy is used even if there are login/password
    • New features
      1. batch job management for EDGI/JRA2 SpeQuLoS
  • Nov 8th, 2010 : XWHEP 7.1.1
    • Corrections

      1. the server RPM installation package has been corrected
    • New features
      1. none
  • Nov 23rd, 2010 : XWHEP 7.1.0
    • Corrections

      1. hsqldb usage restored (http://hsqldb.org).
        hsqldb is a relational DB written in java.
        it is embedded in the XWHEP server.
        this has not been tested in production
        (I mean it works but I can’t say a word on scalabitity and performances).

        embedding hsqldb enables quick deployment.
        this specifically allows EDGI SA1 and SA2 to create quick demo and virtual disks.
    • New features
      1. a new script make-distribs.sh to manage several worker configurations
  • Nov 10th, 2010 : XWHEP 7.0.3
    • Corrections

      1. server installation packages corrected
  • Nov 5th, 2010 : XWHEP 7.0.2
    • Corrections

      1. forgot to clean some little things… corrected
  • Nov 3rd, 2010 : XWHEP 7.0.0
    • Corrections

      1. on client side, a bug corrected when creating object from XML file
        (using ‘–xwxml’ command line paramter)

      2. logger rewritten
      3. a bug corrected in cache; it now uses URI as key
      4. the bridge registers only once
      5. standard users can retrieve their own works, datas and tasks. Advanced privileges are needed to retrieve all works, datas or tasks.

        This aims to improve scalability by decreasing amount of unecessary communications.

      6. a bug corrected on client side : some problems occured on file access if two users were using the same client config file.

        (e.g. ‘sudo xwworks’ followed by ‘xwworks’…)

      7. on worker side, concurrent file acces problems corrected the worker can then now manage several simultaneous jobs

        (min = 1; max = amount of detected CPUs)

      8. client GUI simplified and functionnal
    • New features

      1. on worker side, introducing the Apple sandbox usage
      2. full usage of X509 certificates : credentials can be either login/password or X509 ceritifcate.

        The X509_USER_PROXY environment variable must be set before using the client.

        In conjonction with XtremWeb-HEP, users are encouraged to use jlite by Oleg Sukhoroslov – http://code.google.com/p/jlite .

        The X509_USER_PROXY may contain an X509 proxy as well as an X509 certificate “only”.

        This makes no difference to connect to XWHEP server. But an X509 proxy allows EGEE ressources usage, whereas an X509 certificate don’t.

        This is transparent for the end user. Ressource usage is still on best effort mode.

        The X509_CERT_DIR variable must be set in server config file and points to the directory of CA certificates.

        The server validates certificates through its known certificate paths created from X509_CERT_DIR.

        This clearly means that self signed user certificates are not allowed. Users with an X509 certificate that can

        be validated through the XW server CA cert paths are automatically registered with STANDARD_USER user rights.

        While users using login/password still need to be registered by the XW administrator.

      3. introducing _history tables to decrease production tables sizes by moving row beeing deleted into _history tables
      4. introducing more logging levels (FINEST, CONFIG) to decrease debug outputs (for Gilles 😉 )
      5. in prevision of a new improved DG QoS, database now stores (but this is not used yet)
        • amount of pending, running and erroneus jobs per application
        • amount of pending, running and erroneus jobs per worker
        • amount of pending, running and erroneus jobs per user
        • usedcputime per user
        • webpage per application
        • webpage per usergroup
      6. new columns in hosts table
        • totaltmp : total space available in the partition used by the worker
        • freetmp : free space available in the partition used by the worker
        • poolworksize : the amount of job the worker can run simultaneously
        • sgid : Service Grid Identifier. This deprecates pilotjob field (even if it is still used for the moment)

          This is automatically set by worker from System.getenv(“GLITE_WMS_JOBID”) this can still be faked by a malicious (just as “pilotjob” field is)

          but the monitoring has the opportunity to check if this is a valid SGID or not

          which was not the case with the field “pilotjob”

      7. the client accepts a new parameters : “–xwshell” that instanciates a daemon client.

        This daemon accept incoming connections on port 4327 and forwards received XMLRPCCommand

        to the server (and sends answers back). This specifically aims to improve bridges performances.

      8. worker accessrights reflects confinement
        • a public worker has a 0X755 accessrights
        • a group worker has a 0X750 accessrights
        • a private worker has a 0X700 accessrights
      9. a new REST interface. User can now connect to server through HTTPS. Example :

        To retrieve work UIDs
        http://an_xwhep_server/?xwcommand=
        This gets :

        ...

        Then to retreive a given work
        http://an_xwhep_server/?xwcommand=
        This gets :

      10. introducing intel itanium for linux