Red Hat Bugzilla – Bug 817199
Munge restart fails
Last modified: 2013-08-30 09:47:58 EDT
Description of problem:
Munge won't restart in some circumstances. Problem can only be cured by a reboot
Version-Release number of selected component (if applicable): 0.5.10
How reproducible: Always
Steps to Reproduce:
1. Queue and dequeue thousands of jobs to put munge into the problem state
(at this point, systemctl status munge.service still reports OK, but I don't
known wheter it is really OK or not)
2. systemctl restart munge.service
Restart fails, logs show:
2012-04-28 09:51:48 Info: PRNG seeded with 1024 bytes from "/var/lib/munge/
2012-04-28 09:51:48 Info: Updating supplementary group mapping every 3600 s
2012-04-28 09:51:48 Info: Enabled supplementary group mtime check of "/etc/
2012-04-28 09:51:48 Error: Found existing socket "/var/run/munge/munge.socke
I'm puzzled why DAEMON_ARGS in /etc/sysconfig/munge doesn't include --force
The related torque/pbs bug is bug 817198
I've now discovered that the failure to restart may not be the real problem - there is an old munge process still running at the point the problem (inability to get any queued pbs jobs to run) arises, but it doesn't seem to be killed by systemctl restart munge.service, and pbs doesn't seem to be able to communicate with it. So the real problem looks like a zombified munge process. Happy to supply any useful diagnostics, just not sure what they are...
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '16'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 16's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 16 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" and open it against that version of Fedora.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.
(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)
More information and reason for this action is here: