Bug 817199

Summary: Munge restart fails
Product: [Fedora] Fedora Reporter: bob mckay <urilabob>
Component: mungeAssignee: Steve Traylen <steve.traylen>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: steve.traylen
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-30 13:47:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description bob mckay 2012-04-28 02:14:58 UTC
Description of problem:
Munge won't restart in some circumstances. Problem can only be cured by a reboot

Version-Release number of selected component (if applicable): 0.5.10

How reproducible: Always

Steps to Reproduce:
1. Queue and dequeue thousands of jobs to put munge into the problem state
   (at this point, systemctl status munge.service still reports OK, but I don't
    known wheter it is really OK or not)
2. systemctl restart munge.service
3.
  
Actual results:
Restart fails, logs show:
2012-04-28 09:51:48 Info:      PRNG seeded with 1024 bytes from "/var/lib/munge/
munge.seed"
2012-04-28 09:51:48 Info:      Updating supplementary group mapping every 3600 s
econds
2012-04-28 09:51:48 Info:      Enabled supplementary group mtime check of "/etc/
group"
2012-04-28 09:51:48 Error:     Found existing socket "/var/run/munge/munge.socke
t.2"

Expected results:
munage restarts

Additional info:
I'm puzzled why DAEMON_ARGS in /etc/sysconfig/munge doesn't include --force

Comment 1 bob mckay 2012-04-28 02:19:50 UTC
The related torque/pbs bug is  bug 817198

Comment 2 bob mckay 2012-04-30 10:17:43 UTC
I've now discovered that the failure to restart may not be the real problem - there is an old munge process still running at the point the problem (inability to get any queued pbs jobs to run) arises, but it doesn't seem to be killed by systemctl restart munge.service, and pbs doesn't seem to be able to communicate with it. So the real problem looks like a zombified munge process. Happy to supply any useful diagnostics, just not sure what they are...

Comment 3 Fedora End Of Life 2013-01-16 17:35:40 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 4 Fedora End Of Life 2013-04-03 17:41:16 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19