Bug 244454

Summary: kernel upgrade hang in lvm-static (mkinitrd)
Product: Red Hat Enterprise Linux 5 Reporter: Axel Thimm <axel.thimm>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: low    
Version: 5.0CC: agk, boris
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-23 12:34:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Trace produced using set -x
none
strace output for lvm call none

Description Axel Thimm 2007-06-15 19:19:02 UTC
Description of problem:
Upgrading to the latest kernel (8.1.6) hangs since 4.5h in the following state:


root      3605  0.0  0.0  62888  1092 pts/0    S+   16:46   0:00  |           \_
/bin/sh /var/tmp/rpm-tmp.34420 4
root      3609  0.0  0.0  62892  1220 pts/0    S+   16:46   0:00  |            
  \_ /bin/bash /sbin/new-kernel-pkg --package kernel --mkinitrd --depmod
--install 2.6.18-8.1.6.el5
root      3618  0.0  0.0  63288  1560 pts/0    S+   16:46   0:00  |            
      \_ /bin/bash --norc /sbin/mkinitrd --allow-missing -f
/boot/initrd-2.6.18-8.1.6.el5.img 2.6.18-8.1.6.el5
root      3872  0.0  0.0  63288   912 pts/0    S+   16:46   0:00  |            
          \_ /bin/bash --norc /sbin/mkinitrd --allow-missing -f
/boot/initrd-2.6.18-8.1.6.el5.img 2.6.18-8.1.6.el5
root      3873  0.0  0.0  56348   928 pts/0    S+   16:46   0:00  |            
              \_ lvm.static lvs --ignorelockingfailure --noheadings -o vg_name
/dev/systemjunior/root

# strace -p 3873
Process 3873 attached - interrupt to quit
read(3,  <unfinished ...>
Process 3873 detached

# lsof -p 3873
COMMAND    PID USER   FD   TYPE             DEVICE     SIZE     NODE NAME
lvm.stati 3873 root  cwd    DIR              253,0     4096        2 /
lvm.stati 3873 root  rtd    DIR              253,0     4096        2 /
lvm.stati 3873 root  txt    REG              253,0  1696856  7798889
/sbin/lvm.static
lvm.stati 3873 root  mem    REG              253,0 55516736  5475361
/usr/lib/locale/locale-archive
lvm.stati 3873 root  mem    REG              253,0    25462  5570819
/usr/lib64/gconv/gconv-modules.cache
lvm.stati 3873 root    0r  FIFO                0,6          99681795 pipe
lvm.stati 3873 root    1w  FIFO                0,6          99682059 pipe
lvm.stati 3873 root    2w   CHR                1,3              1457 /dev/null
lvm.stati 3873 root    3u  unix 0xffff880043e73c40          99682062 socket
# lsof | grep 99682062
lvm.stati  3873      root    3u     unix 0xffff880043e73c40                    
  99682062 socket

So it's some socket lvm.static tries to read from, but no process talks to.

Version-Release number of selected component (if applicable):
kernel-2.6.18-8.1.6.el5
lvm2-2.02.16-3.el5
mkinitrd-5.1.19.6-1

How reproducible:
-

Steps to Reproduce:
1.yum upgrade from an otherwise up to date RHEL5 system.
2.
3.
  
Actual results:
See above

Expected results:
No hanging of mkinitrd

Additional info:

Comment 1 Boris Folgmann 2009-05-11 12:05:22 UTC
I have the same problem on RHEL 5.3! The affected system has been carefully upgraded from 4.7 using a 5.3 DVD. All issues with .rpmsave/.rpmnew have been solved. Everything is working fine, besides the problem with kernel upgrades. Here's my process tree:

 9909    |     - /usr/bin/python /usr/bin/yum upgrade
10510    |       - /bin/sh /var/tmp/rpm-tmp.20713 3
10512    |         - /bin/bash /sbin/new-kernel-pkg --package kernel-PAE
--mkinitrd --depmod --install 2.6.18-128.1.10.el5PAE
10521    |           - /bin/bash --norc /sbin/mkinitrd --allow-missing -f
/boot/initrd-2.6.18-128.1.10.el5PAE.img 2.6.18-128.1.10.el5PAE
10649    |             - /bin/bash --norc /sbin/mkinitrd --allow-missing -f
/boot/initrd-2.6.18-128.1.10.el5PAE.img 2.6.18-128.1.10.el5PAE
10650 D  |               - lvm.static lvs --ignorelockingfailure
--noheadings -o vg_name /dev/md1

I don't know for what this lvm.static call is. /dev/md1 is used for swap only. The only LVM2 physical volume I have is on /dev/md2. /dev/md0 is directly used for /boot. All three /dev/mdX are a RAID1 of two IDE HDs.

Nevertheless I found out why it is hanging! Running

strace lvm.static lvs --ignorelockingfailure --noheadings -o vg_name /dev/md1

instead of strace -p PID as Axel tried gave me the whole picture:
lvm.static is trying to access /dev/cdrom. This fails as there is no CD in the drive. If there is a CD in the drive, the kernel upgrade succeeds!
The drive is a Panasonic slot-in IDE CD-ROM.

Please fix that problem as it is really annoying and certainly a bug. This  is 100% reproducible and happens everytime on my system.

Comment 2 Alasdair Kergon 2009-05-11 13:04:06 UTC
- lvm.static lvs --ignorelockingfailure --noheadings -o vg_name /dev/md1

(1) Why is this running lvm.static instead of lvm?  (lvm.static is deprecated and nothing should be using it any more.)
(2) Why is it ignoring locking failure when run as a normal process on the system?
(Because sometimes in other cases does it have to be run in environments that need that argument?)
(3) Why is it using 'lvs' to output a VG property?  ('vgs' would suffice.)
(4) Why is it asking LVM to say whether or not there is a VG called 'md1'?

Comment 3 Boris Folgmann 2009-05-11 14:13:21 UTC
As /sbin/mkinitrd is a shell script I added a set -x at the top and called
/sbin/mkinitrd --allow-missing -f /boot/initrd-testing.img 2.6.18-128.1.10.el5PAE

It's now hanging at 
lvm.static lvs --ignorelockingfailure --noheadings -o vg_name /dev/Volume00/root
making no progress.

Please see the attached output from set -x which reveals every command that is called.

Comment 4 Boris Folgmann 2009-05-11 14:15:43 UTC
Created attachment 343450 [details]
Trace produced using set -x

Comment 5 Boris Folgmann 2009-05-11 14:23:25 UTC
I could simply Ctrl-c lvm.static. To show you that it hangs at

open("/dev/cdrom", O_RDONLY|O_DIRECT|O_LARGEFILE|O_NOATIME

I called

strace -p lvm.static lvs --ignorelockingfailure --noheadings -o vg_name /dev/Volume00/root

Please see the attached output. Note that /dev/Volume00/root is an ext3 logvol and mounted as /

Comment 6 Boris Folgmann 2009-05-11 14:25:33 UTC
Created attachment 343451 [details]
strace output for lvm call

Comment 8 Milan Broz 2010-06-23 12:34:46 UTC
There were some changes to mkinird to cache dm devices (and not scan everything, bug #516047) so kernel update should be quick even with many DM/LVM devices.

For the hang on opening /dev/cdrom - this is possibly other problem, I think it must block even other commands? like "blkid -p /dev/cdrom" or blockdev --getsz /dev/cdrom? (this is then maybe kernel problem)

For now closing that, if you still see the problem with recent update, please reopen it with new logs, thanks.