Bug 76312

Summary: pivotroot panic on 7.3 kernel when an external scsi data disk is attached
Product: [Retired] Red Hat Linux Reporter: Need Real Name <dwtrusty>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-10-31 17:33:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2002-10-19 19:17:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010726
Netscape6/6.1

Description of problem:
We have been trying for weeks to solve this problem to no avail.

We tried to upgrade our Dell 4600 system to Redhat 7.3.  

The system has an external SCSI device, with three SCSI disks (IDs 2, 3, and 4).
The external SCSI controller is an AIC7899.

The internal root disks are SCSI RAID using an AIC7890 controller.  
Our 7.3 kernel 2.4.18-10smp.

When our external SCSI disk device is connected, we get these errors:

   VFS: mounted root (ext2 filesystem)
   Redhat nash version 3.3.10
   Loading jdb module
   Journalled Block Device driver loaded
   Loading ext3 filesystem
   Mounting /proc filesystem
   Creating root device
   Mounting root filesystem
   kmod: failed to exec /sbin/modprobe -s -k block-major-8 errno=2
   mount: error 6 mounting ext3
   pivotroot: pivot_root (/sysroot, /sysroot/initrd) failed: 2
   freeing unused kernel memory: 320k
   kernel panic: No init found

Here are some more symptoms:
   1. If we disconnect the external SCSI device, it boots fine.
   2. We get the same error even if we try to boot with the boot floppy
      made during the Redhat installation process.
   3. The filesystems on the external SCSI disks are not mounted (they are
      commented-out in /etc/fstab).
   4. The upgrade procedure would not start properly until we disconnected
      the external SCSI disks.
   5. We use GRUB as our boot utility, and we can select different
      kernels, so it can definitely see the boot disk at some
      point in the boot process.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Attach an external SCSI disk set to the server
2. Select kernel 2.4.18-10smp from the GRUB menu
3.
	

Actual Results:     VFS: mounted root (ext2 filesystem)
   Redhat nash version 3.3.10
   Loading jdb module
   Journalled Block Device driver loaded
   Loading ext3 filesystem
   Mounting /proc filesystem
   Creating root device
   Mounting root filesystem
   kmod: failed to exec /sbin/modprobe -s -k block-major-8 errno=2
   mount: error 6 mounting ext3
   pivotroot: pivot_root (/sysroot, /sysroot/initrd) failed: 2
   freeing unused kernel memory: 320k
   kernel panic: No init found

Expected Results:  Normal boot

Additional info:

Comment 1 Arjan van de Ven 2002-10-20 09:14:28 UTC
this looks like the attached disks come BEFORE the one you had before (eg your
root filesystem). Linux will sort by LUN id and "sda" will be the first.
Can you see in grub what the root= line looks like (type "e" to see that)

in the "e" screen you can also edit that line and replace the "a" in sda with a
"d" ?

Comment 2 Need Real Name 2002-10-20 14:23:26 UTC
Here is the grub config info:

        root (hd0,2)
        kernel /vmlinuz-2.4.18-10smp ro root=/dev/sda8
        initrd /initrd-2.4.18-10smp.img

Is there a document showing the LUN algorithm you mentioned?

Comment 3 Arjan van de Ven 2002-10-20 14:26:38 UTC
I'm not sure if it's written down in a doc but basically it's
1) Load the drivers as per order in /etc/modules.conf
2) In case of multiple controllers per driver: use PCI bus order
3) Within a controller: sort by lun

and the numbering per disk is "sda" "sdb" "sdc" etc, eg alphabetical
(and after sdz comes sdaa etc)

Comment 4 Need Real Name 2002-10-20 17:37:55 UTC
Thanks.

Can you tell me how you determined the value of "/dev/sdd"?

The reason I is that when we boot the machine under the old 7.2 kernel,
device /dev/sdd is on the external (data) scsi disk set.

If your suspicion is correct as to the cause, it seems that there are
three ways to fix it:
   1. Change the /dev/sda8 to /dev/sdd8 in GRUB
   2. Change the external SCSI disks to have different LUNs (maybe 1,2,3 ?)
   3. Change the PCI order - do I have to move the cards to do this?

If you concur with these possible ways to fix, can you advise which one 
you think is best?

I really appreciate your help in solving this problem!!


Comment 5 Arjan van de Ven 2002-10-20 17:46:26 UTC
There is a 4th, more elegant method even, and we use that in RHL8:
mount-by-label.
Instead of having to specify the device name of the / partition, filesystem
labels (see man e2label) are used to find the right device automatically.
All this is done by the mkinitrd package/program, so you can probably just use
the 8.0 one ;)

well as for sdd; it appears that you have 3 devices "in front" of the real one.
However: you can see this sort of: the kernel prints out the scsi map when it
boots. I know it's nasty to see because it scrolls by fast though ;(

Comment 6 Need Real Name 2002-10-20 18:09:37 UTC
OK Thanks!  I'll try it as soon as I can get a timeslot to reboot the system.
I will let you know what we find.



Comment 7 Need Real Name 2002-10-22 18:46:27 UTC
We tried changing the grub configuration to /dev/sdd8 but it did not work.

We also tried changing to drive sequence in the BIOS and that did not work
either.

Finally, we also tried changing the LUNs on the external disks to be LUN 1,
for each of the SCSI IDs (2,3,4).  That did not work either.

Please help us.

Comment 8 Need Real Name 2002-10-22 21:55:03 UTC
I did another test and have some more info.  

I think you are correct about the extra (external) SCSI devices being "seen"
first by the BIOS/kernel,
and then throwing off the device naming on boot.

The PCI controllers are built into the motherboard on these Dell 4600 servers. 
Changing the BIOS
settings doesn't seem to affect the device naming sequence.

Is there any way from GRUB to control the device naming order? 

If not, can you tell me how to get the release 8 mkinitrd, and run it under 7.3?

Thanks!!!


Comment 9 Arjan van de Ven 2002-10-22 21:58:25 UTC
grub can't control this order ;(
however, you may be able to see the order since the kernel prints it during boot.

To get the 8.0 mkinitrd you will need to download the mkinitrd src.rpm from the
RH ftp site and rpmbuild --rebuild mkinitrd-<version.src.rpm it
and then /boot/grub/grub.conf ought to have something like

root=LABEL=/

in it (assuming you named your root fs "/" with e2label)


Comment 10 Need Real Name 2002-10-23 15:45:01 UTC
I tried to build the new version of mkinitrd and I get these errors:
rpmbuild --rebuild mkinitrd-3.4.28-1.src.rpm
Installing mkinitrd-3.4.28-1.src.rpm
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.94174
+ umask 022
+ cd /usr/src/redhat/BUILD
+ cd /usr/src/redhat/BUILD
+ rm -rf mkinitrd-3.4.28
+ /usr/bin/bzip2 -dc /usr/src/redhat/SOURCES/mkinitrd-3.4.28.tar.bz2
+ tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd mkinitrd-3.4.28
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chown -Rhf root .
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chgrp -Rhf root .
+ /bin/chmod -Rf a+rX,g-w,o-w .
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.94174
+ umask 022
+ cd /usr/src/redhat/BUILD
+ cd mkinitrd-3.4.28
+ make
for n in nash grubby ; do make -C $n; done
make[1]: Entering directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/nash'
diet cc -Wall -DVERSION=\"3.4.28\" -g   -c -o nash.o nash.c
make[1]: diet: Command not found
make[1]: *** [nash.o] Error 127
make[1]: Leaving directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/nash'
make[1]: Entering directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/grubby'
cc -Wall -g -O2 -march=i386 -mcpu=i686 -DVERSION=\"3.4.28\"   -c -o grubby.o
grubby.c
grubby.c: In function `addNewKernel':
grubby.c:1584: warning: `newLine' might be used uninitialized in this function
cc -Wall -g -O2 -march=i386 -mcpu=i686 -DVERSION=\"3.4.28\"   -c -o
mount_by_label.o mount_by_label.c
mount_by_label.c: In function `uuidcache_init':
mount_by_label.c:169: warning: `deviceDir' might be used uninitialized in this
function
cc -g  grubby.o mount_by_label.o /usr/lib/libpopt.a  -o grubby
make[1]: Leaving directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/grubby'
+ make test
for n in nash grubby ; do make -C $n; done
make[1]: Entering directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/nash'
diet cc -Wall -DVERSION=\"3.4.28\" -g   -c -o nash.o nash.c
make[1]: diet: Command not found
make[1]: *** [nash.o] Error 127
make[1]: Leaving directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/nash'
make[1]: Entering directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/grubby'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/grubby'
cd grubby; make test
make[1]: Entering directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/grubby'
Parse/write comparison...
Permission preservation...
GRUB default directive...
LILO default directive...
GRUB fallback directive...
GRUB new kernel argument handling...
GRUB remove directive...
GRUB update kernel argument handling...
LILO update kernel argument handling...
LILO add kernel...
GRUB add kernel...
make[1]: Leaving directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/grubby'
+ '[' 0 '!=' 0 ']'
+ exit 0
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.39436
+ umask 022
+ cd /usr/src/redhat/BUILD
+ cd mkinitrd-3.4.28
+ rm -rf /var/tmp/mkinitrd-root
+ make BUILDROOT=/var/tmp/mkinitrd-root mandir=/usr/share/man install
for n in nash grubby ; do make -C $n install BUILDROOT=/var/tmp/mkinitrd-root;
done
make[1]: Entering directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/nash'
mkdir -p /var/tmp/mkinitrd-root/sbin
mkdir -p /var/tmp/mkinitrd-root//usr/share/man/man8
install -m 755 -s nash /var/tmp/mkinitrd-root/sbin
install: cannot stat `nash': No such file or directory
make[1]: *** [install] Error 1
make[1]: Leaving directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/nash'
make[1]: Entering directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/grubby'
mkdir -p /var/tmp/mkinitrd-root/sbin
mkdir -p /var/tmp/mkinitrd-root/usr/share/man/man8
install -m 755 -s grubby /var/tmp/mkinitrd-root/sbin
install -m 755 new-kernel-pkg /var/tmp/mkinitrd-root/sbin
install -m 644 grubby.8 /var/tmp/mkinitrd-root/usr/share/man/man8
make[1]: Leaving directory `/usr/src/redhat/BUILD/mkinitrd-3.4.28/grubby'
for i in sbin /usr/share/man/man8; do \
        if [ ! -d /var/tmp/mkinitrd-root/$i ]; then \
                mkdir -p /var/tmp/mkinitrd-root/$i; \
        fi; \
done
sed 's/%VERSIONTAG%/3.4.28/' < mkinitrd > /var/tmp/mkinitrd-root/sbin/mkinitrd
install -m755 installkernel /var/tmp/mkinitrd-root/sbin/installkernel
chmod 755 /var/tmp/mkinitrd-root/sbin/mkinitrd
install -m644 mkinitrd.8 /var/tmp/mkinitrd-root//usr/share/man/man8/mkinitrd.8
+ /usr/lib/rpm/brp-compress
+ /usr/lib/rpm/brp-strip
+ /usr/lib/rpm/brp-strip-comment-note
Processing files: mkinitrd-3.4.28-1
error: File not found: /var/tmp/mkinitrd-root/sbin/nash
error: File not found by glob: /var/tmp/mkinitrd-root/usr/share/man/man8/nash.8*
PreReq: dev
Requires: e2fsprogs /bin/sh fileutils grep mount gzip tar /sbin/insmod.static
/sbin/losetup mktemp >= 1.5-5 findutils lvm filesystem >= 2.1.0


RPM build errors:
    File not found: /var/tmp/mkinitrd-root/sbin/nash
    File not found by glob: /var/tmp/mkinitrd-root/usr/share/man/man8/nash.8*


Comment 11 Need Real Name 2002-10-23 17:14:07 UTC
I am now able to get it to build and have created a new ramdisk.
I have a question about swap.  Here is my fstab:
LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
/dev/sda6               /opt                    ext3    defaults        1 2
none                    /proc                   proc    defaults        0 0
none                    /dev/shm                tmpfs   defaults        0 0
LABEL=/tmp              /tmp                    ext3    defaults        1 2
LABEL=/usr              /usr                    ext3    defaults        1 2
LABEL=/var              /var                    ext3    defaults        1 2
/dev/sda2               swap                    swap    defaults        0 0
/dev/sda9               /usr/local              ext3    defaults        1 2
/dev/sdb1               /home                   ext3    defaults        1 2
/dev/cdrom              /mnt/cdrom              iso9660 noauto,owner,kudzu,ro 0
0
/dev/fd0                /mnt/floppy             auto    noauto,owner,kudzu 0 0

How do I label the swap device?




Comment 12 Need Real Name 2002-10-26 02:28:49 UTC
I have not been able to find a way to label the swap partition.

I heard that there was a kernel parameter "scsihosts" which could
help control the order.  I tried specifying "scsihosts=aacraid:aic7xxx"
on the kernel command in grub, but it did not help.

I also heard about "devfs" as maybe helping.  How can I enable devfs
on the 7.3 kernel?

Thanks,

David


Comment 13 Arjan van de Ven 2002-10-31 17:33:33 UTC
ok labels for swap: we also fixed that in 8.0 by having swapon be smart and just
have an option for enabling all swap devices ;(

as for devfs: that doesn't actually fix this

Comment 14 Need Real Name 2002-11-05 00:04:47 UTC
I have a solution!  Here it is:

First I needed to change my /etc/modules.conf file to have the aic7xxx after the
aacraid module:
   alias parport_lowlevel parport_pc
   alias eth0 eepro100
   alias scsi_hostadapter aacraid
   alias scsi_hostadapter1 aacraid
   alias eth1 tg3
   alias scsi_hostadapter2 aic7xxx

Then I needed to run 'mkinitrd' to rebuild the initial ramdisk.

After I did these two things, it works!

Thanks,

David