Bug 453919
Summary: | On server HP ProLiant DL380 G5 randomly freeze during boot | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Dario Lesca <d.lesca> | ||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 9 | CC: | jarod, lwang, menthos | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-PAE-2.6.25.6-27.fc8.i686 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-07-22 16:05:18 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Dario Lesca
2008-07-03 08:52:54 UTC
I think this was a bug in the watchdog driver that should be fixed in the latest update. Can you test that please? anything newer than 2.6.25.7-67 should have the fix. I am out of office, then I have update the server via remote ssh with last kernel (kernel-2.6.25.9-76.fc9.i686) and reboot the server: after 10 minuts the server is not come alive (sig!) [lesca@s-wallgate ~]$ ssh -v 192.168.8.115 OpenSSH_4.5p1, OpenSSL 0.9.8b 04 May 2006 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: Connecting to 192.168.8.115 [192.168.8.115] port 22. debug1: connect to address 192.168.8.115 port 22: No route to host ssh: connect to host 192.168.8.115 port 22: No route to host I assume that the update did not solve the problem. Some suggest? do you happen to have a serial console or similar to find out if it crashed in the same way ? Created attachment 311109 [details]
the /var/log/message file
var/log/message file take it via rescue CD
Comment on attachment 311109 [details] the /var/log/message file Whit new kernel the server freeze during boot when show the boot message "start udev ...". The attach (https://bugzilla.redhat.com/attachment.cgi?id=311109) is take from HP server via Rescue CD. I'm getting an F8 install onto a DL380 G5 right now... Then try update it with new kernel and reboot. tell us what'happens. I have do some test. see the attach. Created attachment 311164 [details]
output debug of start_udev and settle
The file attach is a collect of output of "sh -x /sbin/start_udev" and the
output of "strace -f /sbin/udevsettle" (the command witch freeze) when the
server boot with kernel 2.6.24 and 2.6.25.
In this case I have work with Fedora 8 + Update.
Hope this help.
Base F8 install (kernel 2.6.23.1): boots fine. Base F8 install with kernel-2.6.25.9-40.fc8 installed on top: boots fine. Base F8 install with kernel-2.6.25.9-40.fc8 and udev-118-1.fc8 installed on top: boots fine. Updating the rest of the OS now too, will bounce the box again once that's done. One difference here I'm noting... The bug report is for i386, I think I actually put an x86_64 install on the machine here. Has anyone else tried x86_64 and/or had problems there too? If I can't reproduce any problems here (I'm going to start stepping back through 2.6.25.x kernel builds shortly, if 2.6.25.9 still boots okay), I suppose I should have this machine reinstalled with a 32-bit load... Also booted fine with a fully up-to-date F8 on it. Will try an earlier 2.6.25 kernel to see if I can get it to tank, then on to a 32-bit load... 2.6.25.4-16.fc8 booted just fine too. So I somewhat stupidly did an F9 i386 install instead of an F8 i386 install, but right off the bat, there's the spew w/kernel-2.6.25-14.fc9.i686. Starting udev: general protection fault: 0018 [#1] SMP Modules linked in: hpwdt(+) pcspkr i5000_edac edac_core dm_snapshot dm_zero dm_mirror dm_mod cciss scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan] Pid: 1122, comm: modprobe Not tainted (2.6.25-14.fc9.i686 #1) EIP: 0060:[<c0100010>] EFLAGS: 00010046 CPU: 0 EIP is at 0xc0100010 EAX: 00000018 EBX: c00ffee0 ECX: 000f1fff EDX: 00000000 ESI: f88dd608 EDI: f7a1b000 EBP: f7301dd0 ESP: f7301db0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process modprobe (pid: 1122, ti=f7301000 task=f6b42000 task.ti=f7301000) Stack: f7a1b000 c00f0000 f7301dd0 c00f0000 c00ffee0 f88dd57c f88dd54c f7a1b000 f7301de4 c04feac9 f7a1b054 00000000 f88dd57c f7301df8 c0562826 f7a1b054 f7a1b10c f88dd57c f7301e0c c0562935 f7301e18 00000000 c0723490 f7301e30 Call Trace: [<c04feac9>] ? pci_device_probe+0x39/0x59 [<c0562826>] ? driver_probe_device+0xa0/0x136 [<c0562935>] ? __driver_attach+0x79/0xaf [<c05621d3>] ? bus_for_each_dev+0x3b/0x63 [<c05626cb>] ? driver_attach+0x14/0x16 [<c05628bc>] ? __driver_attach+0x0/0xaf [<c0561ba4>] ? bus_add_driver+0x9d/0x1ba [<c0562ab8>] ? driver_register+0x47/0xa7 [<c047592d>] ? __vunmap+0x93/0x9b [<c04fec75>] ? __pci_register_driver+0x35/0x64 [<f888e017>] ? hpwdt_init+0x17/0x19 [hpwdt] [<c0446f93>] ? sys_init_module+0x17be/0x18f6 [<c04d3577>] ? selinux_file_permission+0x100/0x106 [<c04374b1>] ? param_get_int+0x0/0x15 [<c04cc41c>] ? security_file_permission+0xf/0x11 [<c04835e1>] ? sys_read+0x3b/0x60 [<c0405bf2>] ? syscall_call+0x7/0xb [<c0620000>] ? acpi_pci_root_add+0x22f/0x2a0 ======================= Code: 20 20 38 30 33 43 4f 4d 50 41 51 ea 00 50 00 f0 31 32 2f 33 31 2f 39 39 20 fc 00 fc f6 86 11 02 00 00 40 75 10 fa b8 18 00 00 00 <8e> d8 8e c0 8e e0 8e e8 8e d0 8d a6 e8 01 00 00 e8 00 00 00 00 EIP: [<c0100010>] 0xc0100010 SS:ESP 0068:f7301db0 ---[ end trace d77408c1b65ae1c8 ]--- Simply blacklisting hpwdt gets 2.6.25-14.fc9.i686 booting. Installing 2.6.25.9-76.fc9.i686 now... All's well on my end with 2.6.26.9-76.fc9.i686 running. Well, at least, the machine booted w/hpwdt un-blacklisted, and I've encountered no spew and no hangs. Will have to try the same 2.6.25.x F8 kernel as Dario to see if I can reproduce the hang in udev. That should have been 'with 2.6.25.9-76.fc9.i686 running' in the prior comment. Things look to be okay with 2.6.25.9-40.fc8.i686 as well. Hrm. Dario, any interesting devices in that machine that aren't in mine? Ew, lspci is long and ugly, but here she comes... # lspci 00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller Hub (rev b1) 00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 2 (rev b1) 00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 3 (rev b1) 00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev b1) 00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 5 (rev b1) 00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 6-7 (rev b1) 00:07.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 7 (rev b1) 00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1) 00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1) 00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1) 00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1) 00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1) 00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1) 00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1) 00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09) 00:1c.1 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 2 (rev 09) 00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09) 00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #4 (rev 09) 00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) 00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09) 00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09) 01:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) 01:04.0 System peripheral: Compaq Computer Corporation Integrated Lights Out Controller (rev 03) 01:04.2 System peripheral: Compaq Computer Corporation Integrated Lights Out Processor (rev 03) 01:04.4 USB Controller: Hewlett-Packard Company Proliant iLO2 virtual USB controller 01:04.6 IPMI SMIC interface: Hewlett-Packard Company Proliant iLO2 virtual UART 02:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c2) 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 11) 04:00.0 PCI bridge: Broadcom EPB PCI-Express to PCI-X Bridge (rev c2) 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 11) 06:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Controller (rev 01) 09:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01) 09:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01) 0a:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01) 0a:01.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E2 (rev 01) 0a:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01) See this: http://www.smolts.org/client/show/pub_e04ad1d7-e691-4b54-a3b6-1a5ff974d5bd Now I can not do a lspci. I have install F8 respin whit 2.6.24.3-50 kernel, then update it. Now I have 2 kernel installed and if I boot with: 2.6.24.3-50.fc8.i686 YES boot. 2.6.25.9-40.fc8.i386 NOT boot If I install and update F8 or F9 x86_64 all work fine. I can do some testing on the server Saturday ... any suggestions? smolt isn't quite as detailed as lspci, but it seems these two boxes have more or less identical hardware. That's rather annoying that I can't reproduce the udev hang on my end. :\ So to recap progress so far, the general protection fault in the hpwdt module is gone in 2.6.25.9, but we still have a hang when we get to udev startup, correct? And only on a 32-bit install. At the moment, I'm tempted to reassign this bug over to udev and see if any of the udev folks have some insight into how to further trouble-shoot this, since the strace of the hang doesn't really give us much to go on. Just for clarity's sake, is the failure mode with a Fedora 9 32-bit install and the latest updates 2.6.25.9 kernel the same as an F8 32-bit install? (I've only tried F9 32-bit, which is fine here, so I'm wondering if this issue is specific to F8 somehow -- maybe something in udev that is fixed in F9?). Installing 32-bit F8 now myself... No problems with either the initial 32-bit F8 install or a fully yum updated F8 install. Phooey. Yesterday I have update a F8 on ML380 G5 with new kernel: [lesca@s-vmware ~]$ uname -rv 2.6.25.10-47.fc8PAE #1 SMP Mon Jul 7 18:32:37 EDT 2008 the system boot and all work fine. With the previous kernel Version: kernel-PAE-2.6.25.6-27.fc8.i686 the server do not boot and stop on "Start Udev" Thanks to all Hrm. Would be good to figure out exactly what change it was that fixed things. Just the same, will close this bug out as fixed in the current release. |