News & Announcements
Important message for NIH Biowulf Users (Biowulf)
Date: 07 May 2012 13:05:00From: steven fellini (sfellini@NIH.GOV)
Over the next few weeks, the Biowulf batch system will be reconfigured to ENFORCE MEMORY LIMITS on jobs submitted to the cluster. The first nodes to be reconfigured are the "e2666:g24" nodes.=20 This means that, if your job needs more memory than exists=20 on the allocated node, rather than hang the node (which is what happens now, and requires both staff and operations intervention), the job will be killed by the system. What will change? -- you will no longer get emails from staff saying that a job has been deleted because of exceeding memory limits. -- instead, the job will be killed without notification. -- users will be responsible for keeping track of jobs that do not return results. -- a new tool, 'jobcheck' will allow you to check the status of a completed job. See below for an example of 'jobcheck' output. $ jobcheck 756134 Job 756134 (memhog) ran for 0 hours 53 minutes 48 seconds Started at 2012-05-02 13:50:30 and ended at 2012-05-02 14:44:18 with = exit status 265 Used 22703 MB memory out of 22700 MB JOB EXHAUSTED MEMORY and was probably killed by the system