Bug 447841 - prevents new kernels from installing
Summary: prevents new kernels from installing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: mkinitrd
Version: 5.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Peter Jones
QA Contact:
URL:
Whiteboard:
: 448421 452927 460823 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-05-22 03:02 UTC by John T. Rose
Modified: 2018-10-20 03:16 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 22:12:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0237 0 normal SHIPPED_LIVE mkinitrd bug fix and enhancement update 2009-01-20 16:06:39 UTC

Description John T. Rose 2008-05-22 03:02:54 UTC
Description of problem:

During and after an update from 5.1 to 5.2 the new kernel fails to install.
During the post I see

/sbin/mkinitrd: line 368: cd: slaves: No such file or directory

and mkinitrd continues running consuming nearly all CPU until killed.

This system is a Desktop+Workstation.

Version-Release number of selected component (if applicable):

mkinitrd-5.1.6-28
kernel-2.6.18-92

How reproducible:

yum update kernel reproduces the problem after the update to RHEL5.2

Steps to Reproduce:
1. see description above
  
Actual results:

mkinitrd hangs while installing new kernel

Expected results:

new kernel installs

Additional info:

Downgrading both mkinitrd and nash to the previous version 5.1.6-19 allows the
kernel to install successfully.

Comment 3 John T. Rose 2008-05-27 22:35:27 UTC
Ok consider the following snippet from mkinitrd:

findstoragedriverinsys () {
    if echo $PWD | grep -q "/sys/block/dm-[[:digit:]]$"; then
        while [ ! -L device ]; do
            if [ -L subsystem ]; then
                cd slaves
                for x in *;do
                    if [ -L $x ]; then
                        cd $x;
                        break
                    fi;
                done
            fi
        done
    fi

and the following directory tree:

[root@shadow ~]# ls -lRF /sys/block/dm-1/
/sys/block/dm-1/:
total 0
-r--r--r-- 1 root root 4096 May 27 16:48 dev
drwxr-xr-x 2 root root    0 May 27 16:48 holders/
-r--r--r-- 1 root root 4096 May 27 16:48 range
-r--r--r-- 1 root root 4096 May 27 16:50 removable
-r--r--r-- 1 root root 4096 May 27 16:50 size
drwxr-xr-x 2 root root    0 May 27 16:48 slaves/
-r--r--r-- 1 root root 4096 May 27 16:50 stat
lrwxrwxrwx 1 root root    0 May 27 16:48 subsystem -> ../../block/
--w------- 1 root root 4096 May 27 16:50 uevent

/sys/block/dm-1/holders:
total 0

/sys/block/dm-1/slaves:
total 0
lrwxrwxrwx 1 root root 0 May 27 16:48 hda4 -> ../../../block/hda/hda4/

This descends to /sys/block/dm-1/slaves/hda4/ and breaks the for loop. Back we
go to the top with

[root@shadow ~]# ls -lRF /sys/block/dm-1/slaves/hda4/
/sys/block/dm-1/slaves/hda4/:
total 0
-r--r--r-- 1 root root 4096 May 27 16:47 dev
drwxr-xr-x 2 root root    0 May 27 16:50 holders/
-r--r--r-- 1 root root 4096 May 27 16:48 size
-r--r--r-- 1 root root 4096 May 27 16:48 start
-r--r--r-- 1 root root 4096 May 27 16:50 stat
lrwxrwxrwx 1 root root    0 May 27 16:48 subsystem -> ../../../block/
--w------- 1 root root 4096 May 27 16:47 uevent

/sys/block/dm-1/slaves/hda4/holders:
total 0
lrwxrwxrwx 1 root root 0 May 27 16:50 dm-1 -> ../../../../block/dm-1/

Here we fail to cd to the slaves directory since it doesn't exist but we do cd
into the one link that does exist, subsystem, taking us back to /sys/block/
where we get stuck for the remainder of our looping.

Hope the added detail sheds some light on the problem.

Comment 4 Ian Pilcher 2008-06-03 17:53:19 UTC
*** Bug 448421 has been marked as a duplicate of this bug. ***

Comment 5 John T. Rose 2008-06-08 02:31:04 UTC
In my case I have two two partitions that are falling into this category, an
encrypted home partition and an encrypted swap partition.

I would request that this bug be marked a REGRESSION and considered for an
update in the current release. Thanks.

Comment 6 Devin Reade 2008-06-26 05:41:40 UTC
See also http://bugs.centos.org/view.php?id=2914 which contains a patch to fix
this.  Note that the patch has three chunks, only the first of which is strictly
necessary in this case.  See the above link for details.

My system uses encryption, too.

Comment 7 Till Maas 2008-06-27 11:08:36 UTC
I encountered this bug also on a machine that does not use encryption.

Comment 8 bugreports2005 2008-07-01 07:46:32 UTC
Since this bug causes kernel updates to hang in a busyloop I think the severity
should be elevated a bit from "medium". The patch two clicks from comment #6
works for me.

Please fix as soon as possible.

Comment 13 Ted Peterson 2008-08-12 19:32:11 UTC
I am seeing this machine on a x86_64 system that does not use encryption. Both the root file system and swap are on a dm device. Downgrading mkinitrd didn't help. Looks like we need another mkinitrd update.

Please fix this as soon as possible so we can upgrade the Kernel to 2.6.18-92.1.10.el5.

Comment 14 Peter Jones 2008-08-13 19:08:36 UTC
Can you please try the packages at http://people.redhat.com/pjones/mkinitrd/RHEL-5/ and report back if those work?

Comment 15 Sebastian Skracic 2008-08-14 12:01:49 UTC
I was able to successfully up2date kernels using nash-5.1.19.6-28.1.i386.rpm and mkinitrd-5.1.19.6-28.1.i386.rpm.  Thumbs up from me!

Comment 16 Ted Peterson 2008-08-14 18:49:00 UTC
Thanks. The updated nash and mkinitrd packages fixed the "/sbin/mkinitrd: line 368..." bug for me.  


(In reply to comment #13)
> I am seeing this machine on a x86_64 system that does not use encryption. Both
> the root file system and swap are on a dm device. Downgrading mkinitrd didn't
> help. Looks like we need another mkinitrd update.
> 
> Please fix this as soon as possible so we can upgrade the Kernel to
> 2.6.18-92.1.10.el5.

Comment 17 Jeff Lawson 2008-08-27 20:34:23 UTC
After installing the relevant RPMs from comment 14, I no longer received the 
"/sbin/mkinitrd: line 368: cd: slaves: No such file or directory" error, however I did encounter a new one:

$ rpm -Uvh http://people.redhat.com/pjones/mkinitrd/RHEL-5/x86_64/mkinitrd-5.1.19.6-28.1.x86_64.rpm http://people.redhat.com/pjones/mkinitrd/RHEL-5/x86_64/nash-5.1.19.6-28.1.x86_64.rpm http://people.redhat.com/pjones/mkinitrd/RHEL-5/x86_64/libbdevid-python-5.1.19.6-28.1.x86_64.rpm http://people.redhat.com/pjones/mkinitrd/RHEL-5/i386/mkinitrd-5.1.19.6-28.1.i386.rpm
...

$ yum update
...
Running Transaction
  Installing: kernel                       ######################### [1/1]
/sbin/scsi_id: option requires an argument -- s
/sbin/scsi_id: option requires an argument -- s

Installed: kernel.x86_64 0:2.6.18-92.1.10.el5
Complete!

Comment 18 Peter Jones 2008-09-02 18:03:37 UTC
(In reply to comment #17)
> $ yum update
> ...
> Running Transaction
>   Installing: kernel                       ######################### [1/1]
> /sbin/scsi_id: option requires an argument -- s
> /sbin/scsi_id: option requires an argument -- s

Jeff, this appears to be an unrelated problem.  Does the resulting initrd work for you?  I haven't fully narrowed this problem down yet, but as far as I can see, it should result in an unbootable system if you're using an EMC powerpath based multipath device for your root filesystem.  Is that your setup?

Comment 19 Jeff Lawson 2008-09-03 06:53:18 UTC
My system seems able to boot using the initrd, so I don't know the impact of that above scsi_id error message.  I'm using the SATA controller on my Gigabyte X46 DQ6 motherboard, so there's nothing EMC related here:

SCSI subsystem initialized
libata version 3.00 loaded.
ata_piix 0000:00:1f.2: version 2.12
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:00:1f.2 to 64
scsi0 : ata_piix
scsi1 : ata_piix
ata1: SATA max UDMA/133 cmd 0xd600 ctl 0xd700 bmdma 0xda00 irq 177
ata2: SATA max UDMA/133 cmd 0xd800 ctl 0xd900 bmdma 0xda08 irq 177
ata2.00: HPA detected: current 390719855, native 390721968
ata2.00: ATA-6: WDC WD2000JD-00GBB0, 02.05D02, max UDMA/100
ata2.00: 390719855 sectors, multi 16: LBA48
ata2.00: applying bridge limits
ata2.00: configured for UDMA/100
  Vendor: ATA       Model: WDC WD2000JD-00G  Rev: 02.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 390719855 512-byte hdwr sectors (200049 MB)

Comment 20 Peter Jones 2008-09-18 16:27:29 UTC
*** Bug 452927 has been marked as a duplicate of this bug. ***

Comment 22 Zbysek MRAZ 2008-11-04 12:47:12 UTC
*** Bug 460823 has been marked as a duplicate of this bug. ***

Comment 23 Devin Reade 2008-11-10 03:26:18 UTC
This version described in comment 14 seems to be fine.  Can we get it pushed out
to the main stream 5.2 release?

Comment 24 John T. Rose 2008-11-10 04:29:36 UTC
I should report back that I tested installing the recent new kernel with the packages from comment #14 and all worked as expected for me.

Comment 28 Larry 2008-12-18 18:19:58 UTC
I am running into this issue with RHEL5.2.  The URL referred to in Comment #14 doesn't seem to be available anymore (404).  What is the best way to address it?

Comment 29 Peter C. Lai 2009-01-15 06:01:06 UTC
As far as I can tell, this is still broken, on i386 xen (dom0) as well. No encryption and the root filesystem should not be under dm so no idea why it would be running into this problem.

Comment 30 Peter C. Lai 2009-01-15 06:45:00 UTC
(In reply to comment #29)
> As far as I can tell, this is still broken, on i386 xen (dom0) as well. No
> encryption and the root filesystem should not be under dm so no idea why it
> would be running into this problem.

I manually patched mkinitrd to break if the slave entry is not a dir. Will allow mkinitrd to complete but the kernel install exits with "unable to find grub template". grub.conf will have to be manually edited to add the references to the new kernel and initrd, but will be bootable after that...

Comment 31 errata-xmlrpc 2009-01-20 22:12:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0237.html


Note You need to log in before you can comment on or make changes to this bug.