Bug 364981
Summary: | rc.sysinit hangs on Mac Pro | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Gabriel Somlo <somlo> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED NEXTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 8 | CC: | acpi-bugzilla, chris.brown, harald, jonstanley | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-02-04 17:54:07 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 165150 | ||||||
Attachments: |
|
Description
Gabriel Somlo
2007-11-03 01:56:21 UTC
This is most likely a kernel problem with a module, that gets loaded from udev. Try adding "udevinfo" to the kernel command line and maybe we can get the name of the module. On http://torrent.fedoraproject.org/ you will find "F8 Release Candidate 3", which you may try first. I'm in the process of downloading f8rc3, but wouldn't running yum update on test2 get me updated to the same package versions and all ? The following is the tail end of what gets logged to the terminal while the machine is attempting to boot. At no point in time can I shift+page_up to see what comes before this. There's a cca 1-minute delay before the last two lines of text are logged and udev fails. I'm not sure this is useful, and fear that whatever useful info might get dumped to the terminal before all this stuff pushes it out of the way. Is there any better method to capture all this information (like maybe a kernel command line that would force /var/log/messages to be written to disk in unbuffered mode, so I could find it on a subsequent boot from the rescue image ? udevd[693]: udev_done: seq 1047, pid [1515] exit with 0, 0 seconds old udevd[693]: udev_done: seq 1050, pid [1519] exit with 0, 0 seconds old udevd[693]: msg_queue_insert: seq 1057 queued, 'add' 'firewire' udevd-event[1531]: run_program: 'fw_unit_symlinks.sh fw0 0' udevd[693]: udev_event_run: seq 1057 forked, pid [1531], 'add' 'firewire', 0 seconds old udevd-event[1531]: run_program: '/lib/udev/fw_unit_symlinks.sh' (stdout) '' udevd-event[1531]: run_program: '/lib/udev/fw_unit_symlinks.sh' returned with status 0 udevd-event[1531]: udev_rules_get_name: no node name set, will use kernel name 'fw0' udevd-event[1531]: udev_db_get_device: no db file to read /dev/.udev/db\x2fdevices\x2fpci0000:00\x2f0000:00:1e.0\x2f0000:10:0b.0\x2ffw0: No such file or directory udevd-event[1531]: udev_node_add: creating device node '/dev/fz0', major=250, minor=0, mode=0660, uid=0, gid=0 udevd-event[1531]: name_index: creating index: '/dev/.udev/names/fw0/\x2fdevices\x2fpci0000:00\x2f0000:00:1e.0\x2f0000:10:0b.0\x2ffw0' udevd-event[1531]: pass_env_to_socket: passed -1 bytes to socket '/org/freedesktop/hal/udev_event', udevd-event[1531]: pass_env_to_socket: passed -1 bytes to socket '/org/kernel/udev/monitor', udevd-event[1531]: udev_event_run: seq 1057 finished with 0 udevd[693]: udev_done: seq 1057, pid [1531] exit with 0, 0 seconds old ... <cca 1-minute delay here> udevd[693]: get_ctrl_msg: udevd message (ENV) received, unset 'STARTUP' Wait timeout. Will continue in the background. [FAILED] loading default keymap (us): [OK] setting hostname glstest.net cmu.edu: [OK] <machine hangs here, or at least stops accepting keyboard input> > Is there any better method to capture all this
> information (like maybe a kernel command line
> that would force /var/log/messages to be written
> to disk in unbuffered mode, so I could find it
> on a subsequent boot from the rescue image ?
Not really, syslog is not running and / is not mounted writeable, yet.
Maybe a serial console setup would help.
You may: # mv /etc/udev/rules.d/80-drivers.rules /etc/udev/rules.d/80-drivers.rules.bak Then no modules are autoloaded during the startup. If you have a running syslog after that, with a rw mounted /, you can move the rules back and do: # udevcontrol log_priority=debug # udevtrigger Then all messages are logged to /var/log/messages. I followed instructions as per comment #4. System came up fine. I then ran udevcontrol and udevtrigger. Lots of udevd entries in /var/log/messages (starting at timestamp 17:51:49, but I included all preceding entries from the time the machine was booted just in case). one process in particular got stuck using 100%cpu: /sbin/modprobe acpi:ACPI0001: killing it was impossible, even with '-9'. I'm attaching the contents of /var/log/messages as a separate upload. Created attachment 249821 [details]
contents of syslog including boot dmesg and udev debug output
ok, then try the following, which should give us the kernel module: # modprobe -v acpi:ACPI0001: [root@harryh ~]# modprobe -v acpi:ACPI0001: insmod /lib/modules/2.6.23.1-31.fc8/kernel/drivers/acpi/sbs.ko [root@harryh ~]# modinfo /lib/modules/2.6.23.1-31.fc8/kernel/drivers/acpi/sbs.ko filename: /lib/modules/2.6.23.1-31.fc8/kernel/drivers/acpi/sbs.ko license: GPL description: Smart Battery System ACPI interface driver author: Rich Townsend alias: acpi*:ACPI0005:* alias: acpi*:ACPI0001:* depends: vermagic: 2.6.23.1-31.fc8 SMP mod_unload parm: update_time:uint parm: capacity_mode:uint on my system... ok, you can do the following: # echo "blacklist sbs" >> /etc/modprobe.conf and move back the udev rules: # mv /etc/udev/rules.d/80-drivers.rules.bak /etc/udev/rules.d/80-drivers.rules Hope it works after that. blacklisting sbs fixes the problem, and now my machine boots fine -- Thanks! Should we (I ?) file a bug against something else (kernel ?) or do you think this must be a well-known issue in those circles already ? The only reference to this I found on google (searched for 'sbs module "mac pro"' was https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/96480 which seems kinda old... This bug is already assigned to component "kernel". Changed the version to "f8". Possibly fixed by commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f Per comment #12, this may be fixed... Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged. Doing a 'modprobe sbs' still hangs the machine. If done during bootup by rc.sysinit, it will prevent the machine from booting. If done from a terminal after the machine is fully up, the 'modprobe' process will hang with 100% of one CPU; one can still log in and run jobs on the machine, but the machine will hang at the latest when one attempts a reboot. This is as of kernel 2.6.23.9-85.fc8. For whatever it's worth, here's an excerpt of 'dmidecode' from the machine it's happening on: Handle 0x001C, DMI type 0, 24 bytes. BIOS Information Vendor: Apple Computer, Inc. Version: MP11.88Z.005D.B00.0709141354 Release Date: 09/14/07 Address: 0x00000 Runtime Size: 1024 kB ROM Size: 2048 kB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported ACPI is supported Smart battery is supported BIOS boot specification is supported Function key-initiated network boot is supported BIOS Revision: 0.0 Firmware Revision: 0.0 Handle 0x001D, DMI type 1, 27 bytes. System Information Manufacturer: Apple Computer, Inc. Product Name: MacPro1,1 Version: 1.0 Serial Number: G87421B908S UUID: 9CFE245E-D0C8-BD45-A79F-54EA5FBD3D97 Wake-up Type: Power Switch SKU Number: System SKU# Family: MacPro Handle 0x001E, DMI type 2, 16 bytes. Base Board Information Manufacturer: Apple Computer, Inc. Product Name: Mac-F4208DC8 Version: PVT Serial Number: Base Board Serial# Asset Tag: Base Board Asset Tag# Features: Board is a hosting board Board is replaceable Location In Chassis: Part Component Chassis Handle: 0x001F Type: Motherboard Contained Object Handles: 0 Handle 0x001F, DMI type 3, 21 bytes. Chassis Information Manufacturer: Apple Computer, Inc. Type: Unknown Lock: Not Present Version: Mac-F4208DC8 Serial Number: G87421B908S Asset Tag: Asset Tag# Boot-up State: Safe Power Supply State: Safe Thermal State: Other Security Status: Other OEM Information: 0x00000000 Heigth: Unspecified Number Of Power Cords: Unspecified Contained Elements: 0 adding some metadata to triage this to ACPI folks. Here's some more debugging information: when sbs.ko is loaded, acpi_sbs_init() eventually calls acpi_bus_register_driver() and never returns. The following info is logged to /var/log/messages after the call to acpi_bus_register_driver() is made: ACPI: Smart Battery System [SMB0]: AC Adapter [AC0] (on-line) ACPI Exception (sbs-0294): AE_ERROR, transaction 1 error [20070126] ACPI Exception (sbs-0380): AE_ERROR, acpi_ec_sbs_access() failed [20070126] ACPI Exception (sbs-0498): AE_ERROR, acpi_sbs_write_word() failed [20070126] ACPI Exception (sbs-1309): AE_ERROR, acpi_battery_select() failed [20070126] ACPI Exception (sbs-1684): AE_ERROR, acpi_battery_add() failed [20070126] BUG: soft lockup - CPU#1 stuck for 11s! [modprobe:2730] Pid: 2730, comm: modprobe EIP: 0060:[<c0434a8d>] CPU: 1 EIP is at lock_timer_base+0x10/0x35 EFLAGS: 00200246 Not tainted (2.6.23.9-85.fc8 #1) EAX: f2c652bc EBX: 00000000 ECX: f2f04dc8 EDX: f2f04db0 ESI: 00000000 EDI: f2c652bc EBP: f2f04db0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 002ea09c CR3: 33b75000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0434ac3>] try_to_del_timer_sync+0x11/0x4a [<c0434b06>] del_timer_sync+0xa/0x14 [<f8a6b0c0>] acpi_sbs_remove+0x3c/0xa8 [sbs] [<f8a6c878>] acpi_sbs_add+0x286/0x2e9 [sbs] [<c05321ae>] acpi_device_probe+0x33/0x7c [<c0566f93>] driver_probe_device+0xea/0x168 [<c04f3561>] kobject_uevent_env+0x37a/0x3a1 [<c05670b7>] __driver_attach+0x0/0xa5 [<c0567127>] __driver_attach+0x70/0xa5 [<c056651f>] bus_for_each_dev+0x39/0x5b [<c0566dee>] driver_attach+0x16/0x18 [<c05670b7>] __driver_attach+0x0/0xa5 [<c0566803>] bus_add_driver+0x6d/0x16d [<f88eb103>] acpi_sbs_init+0x103/0x14b [sbs] [<c0437d1f>] blocking_notifier_call_chain+0x17/0x1a [<c044ab86>] sys_init_module+0x150d/0x1651 [<c0532441>] acpi_bus_register_driver+0x0/0x38 [<c040518a>] syscall_call+0x7/0xb ======================= BUG: soft lockup - CPU#1 stuck for 11s! [modprobe:2730] Pid: 2730, comm: modprobe EIP: 0060:[<c0434aa9>] CPU: 1 EIP is at lock_timer_base+0x2c/0x35 EFLAGS: 00200246 Not tainted (2.6.23.9-85.fc8 #1) EAX: f2c652bc EBX: 00000000 ECX: f2f04dc8 EDX: f2f04db0 ESI: 00000000 EDI: f2c652bc EBP: f2f04db0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 002ea09c CR3: 33b75000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0434ac3>] try_to_del_timer_sync+0x11/0x4a [<c0434b06>] del_timer_sync+0xa/0x14 [<f8a6b0c0>] acpi_sbs_remove+0x3c/0xa8 [sbs] [<f8a6c878>] acpi_sbs_add+0x286/0x2e9 [sbs] [<c05321ae>] acpi_device_probe+0x33/0x7c [<c0566f93>] driver_probe_device+0xea/0x168 [<c04f3561>] kobject_uevent_env+0x37a/0x3a1 [<c05670b7>] __driver_attach+0x0/0xa5 [<c0567127>] __driver_attach+0x70/0xa5 [<c056651f>] bus_for_each_dev+0x39/0x5b [<c0566dee>] driver_attach+0x16/0x18 [<c05670b7>] __driver_attach+0x0/0xa5 [<c0566803>] bus_add_driver+0x6d/0x16d [<f88eb103>] acpi_sbs_init+0x103/0x14b [sbs] [<c0437d1f>] blocking_notifier_call_chain+0x17/0x1a [<c044ab86>] sys_init_module+0x150d/0x1651 [<c0532441>] acpi_bus_register_driver+0x0/0x38 [<c040518a>] syscall_call+0x7/0xb ======================= BUG: soft lockup - CPU#1 stuck for 11s! [modprobe:2730] ... still happening with 2.6.23.14-107.fc8 sbs has been rewritten in 2.6.24 and the problem should be gone. I'd suggest blacklisting the driver for now. Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage Please test against a 2.6.24 and update this bug when you are able. is there a 2.6.24 test (s)rpm I could use for this ? happy to confirm the problem no longer occurs with kernel 2.6.24-9. |