Bug 364981

Summary: rc.sysinit hangs on Mac Pro
Product: [Fedora] Fedora Reporter: Gabriel Somlo <somlo>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 8CC: acpi-bugzilla, chris.brown, harald, jonstanley
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-04 17:54:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 165150    
Attachments:
Description Flags
contents of syslog including boot dmesg and udev debug output none

Description Gabriel Somlo 2007-11-03 01:56:21 UTC
Description of problem:
I have a Mac Pro 2xDualcore 2.66Ghz on which I installed f8test2. Install
worked fine, but when I rebooted into the installed OS, rc.sysinit went
as far as starting udev (/sbin/start_udev), then the machine crashed.

After booting via the rescue image, I ran 'yum update' and tried again.

This time, it went as far as setting the host name before the system
froze up again.

Version-Release number of selected component (if applicable):


How reproducible:

Watch f8test boot on a mac pro; it should hang during rc.sysinit.

Steps to Reproduce:
1. wipe the hard drive clean: dd if=/dev/zero of=/dev/sda bs=1024k count=1
2. install f8test (I used the test2 disk)
3. boot via the rescue image, start network, chroot /mnt/sysimage
4. update: yum update (mine is up-to-date as of Nov. 01 9pm EST)
5. reboot from hard drive
  
Actual results:

system will hang during rc.sysinit shortly after calling /sbin/start_udev

Expected results:

system should boot into f8test

Additional info:

Comment 1 Harald Hoyer 2007-11-05 08:07:30 UTC
This is most likely a kernel problem with a module, that gets loaded from udev.

Try adding "udevinfo" to the kernel command line and maybe we can get the name
of  the module.

On http://torrent.fedoraproject.org/ you will find "F8 Release Candidate 3",
which you may try first.

Comment 2 Gabriel Somlo 2007-11-05 22:50:47 UTC
I'm in the process of downloading f8rc3, but wouldn't
running yum update on test2 get me updated to the same
package versions and all ?

The following is the tail end of what gets logged to
the terminal while the machine is attempting to boot.

At no point in time can I shift+page_up to see what
comes before this.

There's a cca 1-minute delay before the last two
lines of text are logged and udev fails.

I'm not sure this is useful, and fear that whatever
useful info might get dumped to the terminal before
all this stuff pushes it out of the way.

Is there any better method to capture all this
information (like maybe a kernel command line
that would force /var/log/messages to be written
to disk in unbuffered mode, so I could find it
on a subsequent boot from the rescue image ?


udevd[693]: udev_done: seq 1047, pid [1515] exit with 0, 0 seconds old
udevd[693]: udev_done: seq 1050, pid [1519] exit with 0, 0 seconds old
udevd[693]: msg_queue_insert: seq 1057 queued, 'add' 'firewire'
udevd-event[1531]: run_program: 'fw_unit_symlinks.sh fw0 0'
udevd[693]: udev_event_run: seq 1057 forked, pid [1531], 'add' 'firewire', 0
seconds old
udevd-event[1531]: run_program: '/lib/udev/fw_unit_symlinks.sh' (stdout) ''
udevd-event[1531]: run_program: '/lib/udev/fw_unit_symlinks.sh' returned with
status 0
udevd-event[1531]: udev_rules_get_name: no node name set, will use kernel name 'fw0'
udevd-event[1531]: udev_db_get_device: no db file to read
/dev/.udev/db\x2fdevices\x2fpci0000:00\x2f0000:00:1e.0\x2f0000:10:0b.0\x2ffw0:
No such file or directory
udevd-event[1531]: udev_node_add: creating device node '/dev/fz0', major=250,
minor=0, mode=0660, uid=0, gid=0
udevd-event[1531]: name_index: creating index:
'/dev/.udev/names/fw0/\x2fdevices\x2fpci0000:00\x2f0000:00:1e.0\x2f0000:10:0b.0\x2ffw0'
udevd-event[1531]: pass_env_to_socket: passed -1 bytes to socket
'/org/freedesktop/hal/udev_event',
udevd-event[1531]: pass_env_to_socket: passed -1 bytes to socket
'/org/kernel/udev/monitor',
udevd-event[1531]: udev_event_run: seq 1057 finished with 0
udevd[693]: udev_done: seq 1057, pid [1531] exit with 0, 0 seconds old

...
<cca 1-minute delay here>

udevd[693]: get_ctrl_msg: udevd message (ENV) received, unset 'STARTUP'
Wait timeout. Will continue in the background.

                                      [FAILED]
loading default keymap (us):          [OK]
setting hostname glstest.net cmu.edu: [OK]

<machine hangs here, or at least stops accepting keyboard input>


Comment 3 Harald Hoyer 2007-11-06 07:40:51 UTC
> Is there any better method to capture all this
> information (like maybe a kernel command line
> that would force /var/log/messages to be written
> to disk in unbuffered mode, so I could find it
> on a subsequent boot from the rescue image ?

Not really, syslog is not running and / is not mounted writeable, yet.

Maybe a serial console setup would help.

Comment 4 Harald Hoyer 2007-11-06 07:46:48 UTC
You may:
# mv /etc/udev/rules.d/80-drivers.rules /etc/udev/rules.d/80-drivers.rules.bak

Then no modules are autoloaded during the startup. 
If you have a running syslog after that, with a rw mounted /, you can move the
rules back and do:

# udevcontrol log_priority=debug
# udevtrigger

Then all messages are logged to /var/log/messages.

Comment 5 Gabriel Somlo 2007-11-07 02:14:04 UTC
I followed instructions as per comment #4. System came up fine.
I then ran udevcontrol and udevtrigger. Lots of udevd entries in
/var/log/messages (starting at timestamp 17:51:49, but I included
all preceding entries from the time the machine was booted just
in case).

one process in particular got stuck using 100%cpu:

      /sbin/modprobe acpi:ACPI0001:

killing it was impossible, even with '-9'.

I'm attaching the contents of /var/log/messages as a separate upload.

Comment 6 Gabriel Somlo 2007-11-07 02:16:12 UTC
Created attachment 249821 [details]
contents of syslog including boot dmesg and udev debug output

Comment 7 Harald Hoyer 2007-11-07 09:29:50 UTC
ok, then try the following, which should give us the kernel module:

# modprobe -v acpi:ACPI0001:

Comment 8 Harald Hoyer 2007-11-07 09:30:51 UTC
[root@harryh ~]# modprobe -v acpi:ACPI0001:
insmod /lib/modules/2.6.23.1-31.fc8/kernel/drivers/acpi/sbs.ko

[root@harryh ~]# modinfo /lib/modules/2.6.23.1-31.fc8/kernel/drivers/acpi/sbs.ko
filename:       /lib/modules/2.6.23.1-31.fc8/kernel/drivers/acpi/sbs.ko
license:        GPL
description:    Smart Battery System ACPI interface driver
author:         Rich Townsend
alias:          acpi*:ACPI0005:*
alias:          acpi*:ACPI0001:*
depends:
vermagic:       2.6.23.1-31.fc8 SMP mod_unload
parm:           update_time:uint
parm:           capacity_mode:uint

on my system...

Comment 9 Harald Hoyer 2007-11-07 09:32:25 UTC
ok, you can do the following:

# echo "blacklist sbs" >> /etc/modprobe.conf

and move back the udev rules:

# mv /etc/udev/rules.d/80-drivers.rules.bak /etc/udev/rules.d/80-drivers.rules

Hope it works after that.

Comment 10 Gabriel Somlo 2007-11-07 20:36:48 UTC
blacklisting sbs fixes the problem, and now my machine boots fine -- Thanks!

Should we (I ?) file a bug against something else (kernel ?) or do you think
this must be a well-known issue in those circles already ?
The only reference to this I found on google (searched for 'sbs module "mac pro"'
was https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/96480
which seems kinda old...

Comment 11 Harald Hoyer 2007-11-08 11:44:54 UTC
This bug is already assigned to component "kernel".
Changed the version to "f8".

Comment 12 Chuck Ebbert 2007-11-08 20:11:43 UTC
Possibly fixed by commit 8fa5913d54f3b1e09948e6a0db34da887e05ff1f


Comment 13 Jon Stanley 2007-12-30 05:59:25 UTC
Per comment #12, this may be fixed...

Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Comment 14 Gabriel Somlo 2007-12-30 06:48:20 UTC
Doing a 'modprobe sbs' still hangs the machine. If done during bootup 
by rc.sysinit, it will prevent the machine from booting. If done from
a terminal after the machine is fully up, the 'modprobe' process will
hang with 100% of one CPU; one can still log in and run jobs on the
machine, but the machine will hang at the latest when one attempts a
reboot.

This is as of kernel 2.6.23.9-85.fc8.

For whatever it's worth, here's an excerpt of 'dmidecode' from the
machine it's happening on:

Handle 0x001C, DMI type 0, 24 bytes.
BIOS Information
        Vendor: Apple Computer, Inc.
        Version:     MP11.88Z.005D.B00.0709141354
        Release Date: 09/14/07
        Address: 0x00000
        Runtime Size: 1024 kB
        ROM Size: 2048 kB
        Characteristics:
                PCI is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                ACPI is supported
                Smart battery is supported
                BIOS boot specification is supported
                Function key-initiated network boot is supported
        BIOS Revision: 0.0
        Firmware Revision: 0.0

Handle 0x001D, DMI type 1, 27 bytes.
System Information
        Manufacturer: Apple Computer, Inc.
        Product Name: MacPro1,1
        Version: 1.0
        Serial Number: G87421B908S
        UUID: 9CFE245E-D0C8-BD45-A79F-54EA5FBD3D97
        Wake-up Type: Power Switch
        SKU Number: System SKU#
        Family: MacPro

Handle 0x001E, DMI type 2, 16 bytes.
Base Board Information
        Manufacturer: Apple Computer, Inc.
        Product Name: Mac-F4208DC8
        Version: PVT
        Serial Number: Base Board Serial#
        Asset Tag: Base Board Asset Tag#
        Features:
                Board is a hosting board
                Board is replaceable
        Location In Chassis: Part Component
        Chassis Handle: 0x001F
        Type: Motherboard
        Contained Object Handles: 0

Handle 0x001F, DMI type 3, 21 bytes.
Chassis Information
        Manufacturer: Apple Computer, Inc.
        Type: Unknown
        Lock: Not Present
        Version: Mac-F4208DC8
        Serial Number: G87421B908S
        Asset Tag: Asset Tag#
        Boot-up State: Safe
        Power Supply State: Safe
        Thermal State: Other
        Security Status: Other
        OEM Information: 0x00000000
        Heigth: Unspecified
        Number Of Power Cords: Unspecified
        Contained Elements: 0


Comment 15 Jon Stanley 2007-12-30 06:59:38 UTC
adding some metadata to triage this to ACPI folks.

Comment 16 Gabriel Somlo 2007-12-30 20:08:13 UTC
Here's some more debugging information: when sbs.ko is loaded, acpi_sbs_init()
eventually calls acpi_bus_register_driver() and never returns. The following
info is logged to /var/log/messages after the call to acpi_bus_register_driver()
is made:

ACPI: Smart Battery System [SMB0]: AC Adapter [AC0] (on-line)
ACPI Exception (sbs-0294): AE_ERROR, transaction 1 error [20070126]
ACPI Exception (sbs-0380): AE_ERROR, acpi_ec_sbs_access() failed [20070126]
ACPI Exception (sbs-0498): AE_ERROR, acpi_sbs_write_word() failed [20070126]
ACPI Exception (sbs-1309): AE_ERROR, acpi_battery_select() failed [20070126]
ACPI Exception (sbs-1684): AE_ERROR, acpi_battery_add() failed [20070126]
BUG: soft lockup - CPU#1 stuck for 11s! [modprobe:2730]

Pid: 2730, comm:             modprobe
EIP: 0060:[<c0434a8d>] CPU: 1 
EIP is at lock_timer_base+0x10/0x35
 EFLAGS: 00200246    Not tainted  (2.6.23.9-85.fc8 #1) 
EAX: f2c652bc EBX: 00000000 ECX: f2f04dc8 EDX: f2f04db0
ESI: 00000000 EDI: f2c652bc EBP: f2f04db0 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 002ea09c CR3: 33b75000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c0434ac3>] try_to_del_timer_sync+0x11/0x4a
 [<c0434b06>] del_timer_sync+0xa/0x14
 [<f8a6b0c0>] acpi_sbs_remove+0x3c/0xa8 [sbs]
 [<f8a6c878>] acpi_sbs_add+0x286/0x2e9 [sbs]
 [<c05321ae>] acpi_device_probe+0x33/0x7c
 [<c0566f93>] driver_probe_device+0xea/0x168
 [<c04f3561>] kobject_uevent_env+0x37a/0x3a1
 [<c05670b7>] __driver_attach+0x0/0xa5
 [<c0567127>] __driver_attach+0x70/0xa5
 [<c056651f>] bus_for_each_dev+0x39/0x5b
 [<c0566dee>] driver_attach+0x16/0x18
 [<c05670b7>] __driver_attach+0x0/0xa5
 [<c0566803>] bus_add_driver+0x6d/0x16d
 [<f88eb103>] acpi_sbs_init+0x103/0x14b [sbs]
 [<c0437d1f>] blocking_notifier_call_chain+0x17/0x1a
 [<c044ab86>] sys_init_module+0x150d/0x1651 
 [<c0532441>] acpi_bus_register_driver+0x0/0x38
 [<c040518a>] syscall_call+0x7/0xb
 =======================
BUG: soft lockup - CPU#1 stuck for 11s! [modprobe:2730]

Pid: 2730, comm:             modprobe
EIP: 0060:[<c0434aa9>] CPU: 1 
EIP is at lock_timer_base+0x2c/0x35
 EFLAGS: 00200246    Not tainted  (2.6.23.9-85.fc8 #1) 
EAX: f2c652bc EBX: 00000000 ECX: f2f04dc8 EDX: f2f04db0
ESI: 00000000 EDI: f2c652bc EBP: f2f04db0 DS: 007b ES: 007b FS: 00d8
CR0: 8005003b CR2: 002ea09c CR3: 33b75000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c0434ac3>] try_to_del_timer_sync+0x11/0x4a
 [<c0434b06>] del_timer_sync+0xa/0x14
 [<f8a6b0c0>] acpi_sbs_remove+0x3c/0xa8 [sbs]
 [<f8a6c878>] acpi_sbs_add+0x286/0x2e9 [sbs]
 [<c05321ae>] acpi_device_probe+0x33/0x7c
 [<c0566f93>] driver_probe_device+0xea/0x168
 [<c04f3561>] kobject_uevent_env+0x37a/0x3a1
 [<c05670b7>] __driver_attach+0x0/0xa5
 [<c0567127>] __driver_attach+0x70/0xa5
 [<c056651f>] bus_for_each_dev+0x39/0x5b
 [<c0566dee>] driver_attach+0x16/0x18
 [<c05670b7>] __driver_attach+0x0/0xa5
 [<c0566803>] bus_add_driver+0x6d/0x16d
 [<f88eb103>] acpi_sbs_init+0x103/0x14b [sbs]
 [<c0437d1f>] blocking_notifier_call_chain+0x17/0x1a
 [<c044ab86>] sys_init_module+0x150d/0x1651 
 [<c0532441>] acpi_bus_register_driver+0x0/0x38
 [<c040518a>] syscall_call+0x7/0xb
 =======================
BUG: soft lockup - CPU#1 stuck for 11s! [modprobe:2730]

...


Comment 17 Gabriel Somlo 2008-01-28 19:28:37 UTC
still happening with 2.6.23.14-107.fc8

Comment 18 Chuck Ebbert 2008-01-31 23:51:03 UTC
sbs has been rewritten in 2.6.24 and the problem should be gone. I'd suggest
blacklisting the driver for now.

Comment 19 Christopher Brown 2008-02-03 23:40:56 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

Please test against a 2.6.24 and update this bug when you are able.

Comment 20 Gabriel Somlo 2008-02-04 12:19:30 UTC
is there a 2.6.24 test (s)rpm I could use for this ?

Comment 22 Gabriel Somlo 2008-02-04 17:54:07 UTC
happy to confirm the problem no longer occurs with kernel 2.6.24-9.