Enforce job level memory limits in LSF
LSF enforcement of job memory limits with job termination:
This gives LSF control over how much memory jobs can use. LSF terminates any job that reaches the configured memory limit. LSF looks at the sum of the memory all job processes consume to determine if a job has reached the memory limit.
Add memory limit parameters to lsf.conf:
LSB_MEMLIMIT_ENFORCE=Y LSB_JOB_MEMLIMIT=Y
Specify a memory limit in lsb.queues or lsb.applications:
MEMLIMIT = 5000 #Memory limit of 5000 KB
Reconfigure LSF:
lsadmin reconfig badmin reconfig badmin hrestart all
You can specify memory limit at the queue level (lsb.queues
), application profile level (lsb.application
) or at job submission. Use the –M option when submitting jobs to specify a memory limit. For example,
bsub –M 50000 myjob.sh
LSF will allow this job to consume a maximum of 5000 KB of memory before terminating it.
The difference between LSB_JOB_MEMLIMIT set to y and LSB_MEMLIMIT_ENFORCE set to y is that with LSB_JOB_MEMLIMIT, only the per-job memory limit enforced by LSF is enabled. The per-process memory limit enforced by the OS is disabled. With LSB_MEMLIMIT_ENFORCE set to y, both the per-job memory limit enforced by LSF and the per-process memory limit enforced by the OS are enabled.
LSB_JOB_MEMLIMIT disables per-process memory limit enforced by the OS and enables per-job memory limit enforced by LSF. When the total memory allocated to all processes in the job exceeds the memory limit, LSF sends the following signals to kill the job: SIGINT first, then SIGTERM, then SIGKILL.
On UNIX, the time interval between SIGINT, SIGKILL, SIGTERM can be configured with the parameter JOB_TERMINATE_INTERVAL in lsb.params
.
网友留言:
如果不能单独修改,可以尝试将这个节点单独作为一个queue来修改。然后测试任务。
当然,最终的建议是将任务分类,确认job类型,然后限制job数量的方式来解决可能更好一些
这个有可能遇到bug
以前我们跑vcs,陷入了死循环,最后放弃