Description of problem: Munge won't restart in some circumstances. Problem can only be cured by a reboot Version-Release number of selected component (if applicable): 0.5.10 How reproducible: Always Steps to Reproduce: 1. Queue and dequeue thousands of jobs to put munge into the problem state (at this point, systemctl status munge.service still reports OK, but I don't known wheter it is really OK or not) 2. systemctl restart munge.service 3. Actual results: Restart fails, logs show: 2012-04-28 09:51:48 Info: PRNG seeded with 1024 bytes from "/var/lib/munge/ munge.seed" 2012-04-28 09:51:48 Info: Updating supplementary group mapping every 3600 s econds 2012-04-28 09:51:48 Info: Enabled supplementary group mtime check of "/etc/ group" 2012-04-28 09:51:48 Error: Found existing socket "/var/run/munge/munge.socke t.2" Expected results: munage restarts Additional info: I'm puzzled why DAEMON_ARGS in /etc/sysconfig/munge doesn't include --force
The related torque/pbs bug is bug 817198
I've now discovered that the failure to restart may not be the real problem - there is an old munge process still running at the point the problem (inability to get any queued pbs jobs to run) arises, but it doesn't seem to be killed by systemctl restart munge.service, and pbs doesn't seem to be able to communicate with it. So the real problem looks like a zombified munge process. Happy to supply any useful diagnostics, just not sure what they are...
This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19