This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 506988 - grubby runs "forever" during kernel upgrades in systems with multiple disks
grubby runs "forever" during kernel upgrades in systems with multiple disks
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: mkinitrd (Show other bugs)
10
i686 Linux
low Severity medium
: ---
: ---
Assigned To: Peter Jones
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-19 12:56 EDT by Ken Chilton
Modified: 2009-12-18 04:34 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-12-18 04:34:28 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
strace output of grubby (3.75 MB, text/plain)
2009-06-22 10:55 EDT, Ken Chilton
no flags Details

  None (edit)
Description Ken Chilton 2009-06-19 12:56:11 EDT
Description of problem:

I have a small torque cluster (8 nodes) using a shared pool of disks for application storage.  Each node also has a private disk for the OS, swap, etc.

The shared pool of storage consists of 15 Fiber Channel disks, each with 11 partitions (three primary, eight extended).  LVM (lvm2-cluster) is used to stripe the partitions to present 11 LVs.  Each node in the cluster has at least one LV marked as active, which it then mounts (via fstab).  However, each node has /dev entries for all of the physical devices and all of their partitions.

So, in /dev (and in the /sys structure) there are many (well over 100) partitions visible by each node in this system.

When upgrading the kernel recently, two parts of mkinitrd apparently hangs: nash and grubby.  However, after a recent upgrade of nash, it no longer hangs.  Grubby still hangs.  By hangs, I mean it runs for days, consuming near 100% of a CPU core.  As a result, kernel upgrades do not make the /etc/grub.conf changes, and other upgrades, such as with yum, do not occur.

I have performed an strace on the grubby process.  I see that grubby is walking through the partitions.  The sequence for each partition:
 a getdents, followed by a close
 an open to /sys/block/[the physical]/[the partition]/dev,
   a read(8:197\n),
   another read,
 a close,
 an access to the partition via /dev/[partition],
 an open of /proc/devices
  a read(Character devices:\n 1 mem\n 4 /d)
  another read
 a close,
 an open of /proc/misc
  a read(229 fuse\n 57 dlm_plock\n 58 dlm...)
  another read
 a close
 an open of /sys/block/[the physical]/[the partition]/slaves
   -- result is -1, no such file
 an open of /sys/block/[the physical]/[the partition]
  an fcntl64(F_GETFD)
  an fcntl64(F_SETFDmFD_CLOEXEC)
  a getdents
 an open of /sys/block/[the physical]/[the partition]/uevent/dev
   -- result is -1, not a directory
 an open of /sys/block/[the physical]/[the partition]/dev/dev
   -- result is -1, not a directory
 an open of /sys/block/[the physical]/[the partition]/subsystem/dev
   -- result is -1, no such file or directory
 an open of /sys/block/[the physical]/[the partition]/start/dev
   -- result is -1, not a directory
 an open of /sys/block/[the physical]/[the partition]/size/dev
   -- result is -1, not a directory
 an open of /sys/block/[the physical]/[the partition]/stat/dev
   -- result is -1, not a directory
 an open of /sys/block/[the physical]/[the partition]/power/dev
   -- result is -1, no such file or directory
 an open of /sys/block/[the physical]/[the partition]/holders/dev
   -- result is -1, no such file or directory
  a getdents
 a close

Each scan of the partition I have seen looks the same, but each partition takes longer to run this sequence than the one before.  The first few run this sequence in a few seconds.  By the time it gets to /dev/sdn, it takes several minutes.  Eventually, it stops making apparent forward progress.

If I remove lvm2-cluster and reboot, I can get the system to come up with only the local /dev/sda (and /dev/sda1, /dev/sda2).  Then I can install the kernel without a hitch, and the whole rpm/mkinitrd/nash/grubby completes very quickly and makes the appropriate /boot/grub/grub.conf entries.  However, if I re-install lvm2-cluster, reboot, and see the 100+ partitions in /dev again, I can no longer do a kernel upgrade.

A grubby command line which exhibits the problem was:
/sbin/grubby --add-kernel=/boot/vmlinuz-2.6.29.4-167.fc11.i686.PAE --initrd /boot/initrd-2.6.29.4-167.fc11.i686.PAE.img --copy-default --make-default --title Fedora (2.6.29.4-167.fc11.i686.PAE) --args=root=/dev/VolGroup00/LogVol00 --remove-kernel=TITLE=Fedora (2.6.29.4-167.fc11.i686.PAE)

Version-Release number of selected component (if applicable):
grubby-6.0.86-2.fc11.i586

How reproducible:
Every time, on all eight nodes

Steps to Reproduce:
1. yum -y upgrade kernel
2.
3.
  
Actual results:
grubby runs for days consuming 100% of a CPU core

Expected results:
grubby should run quickly and modify the /boot/grub/grub.conf as appropriate

Additional info:
available on request
Comment 1 Ken Chilton 2009-06-22 10:55:10 EDT
Created attachment 348924 [details]
strace output of grubby

This is an strace of the grubby run.  It did eventually complete.
Comment 2 Ken Chilton 2009-06-22 10:56:22 EDT
I have attached an strace of a grubby run that finally completed after several days.
Comment 3 Bug Zapper 2009-11-18 07:06:39 EST
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 4 Bug Zapper 2009-12-18 04:34:28 EST
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.