Bug 817199 - Munge restart fails
Munge restart fails
Product: Fedora
Classification: Fedora
Component: munge (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Steve Traylen
Fedora Extras Quality Assurance
Depends On:
  Show dependency treegraph
Reported: 2012-04-27 22:14 EDT by bob mckay
Modified: 2013-08-30 09:47 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-08-30 09:47:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description bob mckay 2012-04-27 22:14:58 EDT
Description of problem:
Munge won't restart in some circumstances. Problem can only be cured by a reboot

Version-Release number of selected component (if applicable): 0.5.10

How reproducible: Always

Steps to Reproduce:
1. Queue and dequeue thousands of jobs to put munge into the problem state
   (at this point, systemctl status munge.service still reports OK, but I don't
    known wheter it is really OK or not)
2. systemctl restart munge.service
Actual results:
Restart fails, logs show:
2012-04-28 09:51:48 Info:      PRNG seeded with 1024 bytes from "/var/lib/munge/
2012-04-28 09:51:48 Info:      Updating supplementary group mapping every 3600 s
2012-04-28 09:51:48 Info:      Enabled supplementary group mtime check of "/etc/
2012-04-28 09:51:48 Error:     Found existing socket "/var/run/munge/munge.socke

Expected results:
munage restarts

Additional info:
I'm puzzled why DAEMON_ARGS in /etc/sysconfig/munge doesn't include --force
Comment 1 bob mckay 2012-04-27 22:19:50 EDT
The related torque/pbs bug is  bug 817198
Comment 2 bob mckay 2012-04-30 06:17:43 EDT
I've now discovered that the failure to restart may not be the real problem - there is an old munge process still running at the point the problem (inability to get any queued pbs jobs to run) arises, but it doesn't seem to be killed by systemctl restart munge.service, and pbs doesn't seem to be able to communicate with it. So the real problem looks like a zombified munge process. Happy to supply any useful diagnostics, just not sure what they are...
Comment 3 Fedora End Of Life 2013-01-16 12:35:40 EST
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
Comment 4 Fedora End Of Life 2013-04-03 13:41:16 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:

Note You need to log in before you can comment on or make changes to this bug.