Bug 213476

Summary: udev timeout at boot; vol_id process stuck
Product: [Fedora] Fedora Reporter: Gary Myers <gmyers>
Component: udevAssignee: Harald Hoyer <harald>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 6   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: F7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-20 12:01:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of strace to where vol_id processes hang.
none
Output of 'strace -Ff /sbin/start_udev' none

Description Gary Myers 2006-11-01 16:57:59 UTC
On boot, udev-095-14 fails to start in a timely manner; server then takes over
10 minutes to load with other errors generated due to vol_id process hogging CPU.

1. Error occurs on all reboots.
2. Commenting out /sbin/start_udev in /etc/rc.d/rc.sysinit allows for fast
restart, but causes other apps to fail.
  
Actual results:

/dev fills with .tmp files and vol_id processes hog CPU.

Expected results:

4 minute reboot, not 15!

Additional info:

Server hardware is Dell PowerEdge 1850, BIOS A05. Two hard drives - server 1 has
dmraid (software); server 2 has hardware RAID. Server 1 has 3 dual port NICs,
server 2 has one dual port NIC.

kill -SIGTERM on udev process cleanly exits processes and returns CPU to idle.
Running /sbin/start_udev causes vol_id to hang again.

/lib/udev/vol_id is statically linked to 2.6.9 kernel. File size is 526k on
x86_64. Replacing file with /sbin/vol_id from FC5 (udev-084-13.fc5.2) stops hang
and "Staring udev" returns quickly with OK.

Comment 1 Gary Myers 2006-11-02 13:45:34 UTC
Time-out error re-produced on Dell PowerEdge 850. Temporarily fixed with older
copy of 'vol_id'. As this is not a Plug-and-play environment, there is no issue
at present.

Comment 2 Gary Myers 2006-11-02 14:02:59 UTC
Output of "ps aux | grep udev" showing hung vol_id processes:

root      3067  0.2  0.0  12564   668 ?        S<s  13:58   0:00 /sbin/udevd -d
root      3829 91.6  0.0    760   128 ?        R<   13:58   3:07
/lib/udev/vol_id --export /dev/.tmp-9-0
root      3866 92.5  0.0    764   132 ?        R<   13:58   3:09
/lib/udev/vol_id --export /dev/.tmp-8-0
root      3907  0.0  0.0  12564   640 ?        S<   14:01   0:00 /sbin/udevd -d
root      3908  0.0  0.0  12564   640 ?        S<   14:01   0:00 /sbin/udevd -d
root      3909  0.0  0.0  12564   640 ?        S<   14:01   0:00 /sbin/udevd -d
root      3910 49.8  0.0    760   124 ?        R<   14:01   0:12
/lib/udev/vol_id --export /dev/.tmp-8-3
root      3911 33.6  0.0    760   128 ?        R<   14:01   0:08
/lib/udev/vol_id --export /dev/.tmp-8-2
root      3912 31.7  0.0    760   128 ?        R<   14:01   0:07
/lib/udev/vol_id --export /dev/.tmp-8-1


Comment 3 Gary Myers 2006-11-02 14:11:36 UTC
Created attachment 140131 [details]
Output of strace to where vol_id processes hang.

strace output ends at "wait4(-1", then I pressed Ctrl + C to exit as 'vol_id'
had hung up.

Comment 4 Harald Hoyer 2006-11-02 14:15:05 UTC
please run strace  with the F and f flag..
# strace -Ff

Comment 5 Gary Myers 2006-11-02 14:24:02 UTC
Created attachment 140134 [details]
Output of 'strace -Ff /sbin/start_udev'

Re-run of strace with F and f flags.

Comment 6 Harald Hoyer 2006-11-02 14:34:32 UTC
there is no vol_id in the last strace...
you may run vol_id alone.. not the whole start_udev :)

# for i in /dev/hd* /dev/sd*; do /lib/udev/vol_id --export $i;done

Comment 7 Gary Myers 2006-11-02 15:01:08 UTC
Running 'for i in /dev/hd* /dev/sd*; do /lib/udev/vol_id --export $i; done'
produces:

/dev/hda: error open volume

And "ps aux | grep vol_id" reports /lib/udev/vol_id --export /dev/sda running at
100% CPU. Killing the process with -SIGTERM moves on to /dev/sda1 and hogs the
processor again. /dev/sda2 and /dev/sda3 produce the same results. Killing all
processes returns the CPU to normal.

Testing carried out on Dell PowerEdge 850 with LSI MegaRAID SCSI RAID with two
73GB drives in RAID 1. This is a production server, but I'll not be fried if I
kill it :D


Output of 'strace /lib/udev/vol_id --export /dev/sda':

8534  execve("/lib/udev/vol_id", ["/lib/udev/vol_id", "--export", "/dev/sda"],
[/* 18 vars */]) = 0
8534  uname({sys="Linux", node="theoline.aminocom.com", ...}) = 0
8534  brk(0)                            = 0x686000
8534  brk(0x686f20)                     = 0x686f20
8534  arch_prctl(ARCH_SET_FS, 0x686850) = 0
8534  brk(0x6a7f20)                     = 0x6a7f20
8534  brk(0x6a8000)                     = 0x6a8000
8534  open("/dev/sda", O_RDONLY)        = 3
8534  ioctl(3, BLKGETSIZE64, 0x7fffe4ac2218) = 0
8534  open("/etc/passwd", O_RDONLY)     = 4
8534  fstat(4, {st_mode=S_IFREG|0644, st_size=2654, ...}) = 0
8534  mmap(NULL, 2654, PROT_READ, MAP_SHARED, 4, 0) = 0x2aaaaaaab000
8534  close(4)                          = 0
8534  --- SIGTERM (Terminated) @ 0 (0) ---
8534  +++ killed by SIGTERM +++

Comment 8 Harald Hoyer 2007-09-20 10:53:38 UTC
still a problem with latest kernel/udev/fedora versions?

Comment 9 Gary Myers 2007-09-20 11:00:01 UTC
No longer a problem with Fedora 7. Updated all Dell servers with this issue some
time ago and they are all fine.

Problem solved and closed.

:)(In reply to comment #8)
> still a problem with latest kernel/udev/fedora versions?