Bug 235787 - FC6 kernels above 2.6.20-1.2925.fc6 unbootable on M2R32-MVP board
FC6 kernels above 2.6.20-1.2925.fc6 unbootable on M2R32-MVP board
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
6
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-04-10 02:00 EDT by Michal Jaegermann
Modified: 2007-11-30 17:12 EST (History)
0 users

See Also:
Fixed In Version: 2.6.20-1.2952.fc6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-18 22:19:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg output from bootin a rawhide kernel on M2R32-MVP (17.66 KB, text/plain)
2007-04-10 02:00 EDT, Michal Jaegermann
no flags Details
PCI layout on M2R32-MVP (1.42 KB, text/plain)
2007-04-10 02:02 EDT, Michal Jaegermann
no flags Details

  None (edit)
Description Michal Jaegermann 2007-04-10 02:00:25 EDT
Description of problem:

On the board in question 2.6.19-1.2911.6.5.fc is the last kernel
which boots without any extra parameters.

2.6.20-1.2925.fc6 still boots but in order to achieve that feat
it requires 'acpi=off irqpoll'.  See bug 232490 for details.
This kernel, in a cooperation with hald, also is producing a
constand flood of "hde: status error: status=0x58" messages in
logs.  Details are given in bug 221691 starting with comment #12.

At last I had an opportunity to try 2.6.20-1.2933.fc6 and
2.6.20-1.2943.fc6 on that hardware and does not matter which
options I try (some of those tried include, separately or in
combinations, 'acpi=off', 'pci=noacpi', 'pci=nomsi', 'acpi=noirq',
'pci=noacpi', 'irqpoll', 'irqfixup') I am invariably getting
during a boot:
hdc: IRQ probe failed (0xfffffffffffefefa)
followed by more for some interfaces actually not used and
after that I see quite regularly, if slowly, "hdc: lost interrupt".
With that I cannot really get anything out of a hard disk and
a pretty good chance that I will be not able to reboot from
a keyboard.  In some cases I cannot even get a reset button to
work and have to power down the whole machine before beeing able
to boot again.

Now for better news.  It is possible to boot on this board
kernel-2.6.20-1.3053.fc7.  If I will skip 'acpi=off' the whole
boot gets stuck immediately after "Setting up hotplug" is printed.
Still with 'acpi=off' this kernel boots.  On the first try,
though, I got immediately the following (and never reached a shell
prompt):

BUG: soft lockup detected on CPU#0!

Call Trace:
 <IRQ>  [<ffffffff802bc22d>] softlockup_tick+0xcb/0xe0
 [<ffffffff8024cde5>] run_local_timers+0x13/0x15
 [<ffffffff80293548>] update_process_times+0x4c/0x78
 [<ffffffff8027593c>] smp_local_timer_interrupt+0x34/0x54
 [<ffffffff802760e9>] smp_apic_timer_interrupt+0x43/0x5a
 [<ffffffff8025cdcb>] apic_timer_interrupt+0x6b/0x70
 [<ffffffff80288490>] task_running_tick+0x38/0x2c6
 [<ffffffff80211ca5>] __do_softirq+0x4b/0xe3
 [<ffffffff80211cad>] __do_softirq+0x53/0xe3
 [<ffffffff8034a5c1>] kobject_uevent_env+0x28c/0x46e
 [<ffffffff8025d31c>] call_softirq+0x1c/0x28
 [<ffffffff8026be1b>] do_softirq+0x3d/0xab
 [<ffffffff802901c3>] irq_exit+0x4e/0x50
 [<ffffffff802760ee>] smp_apic_timer_interrupt+0x48/0x5a
 [<ffffffff8025cdcb>] apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffff8034a5a4>] kobject_uevent_env+0x26f/0x46e
 [<ffffffff802a3b6a>] lock_release+0x153/0x160
 [<ffffffff802639eb>] _spin_unlock+0x1e/0x2a
 [<ffffffff8034a5c1>] kobject_uevent_env+0x28c/0x46e
 [<ffffffff80256eaf>] kobject_uevent+0xb/0xd
 [<ffffffff803b97ee>] device_add+0x457/0x769
 [<ffffffff803b9b19>] device_register+0x19/0x1e
 [<ffffffff803ba0c0>] device_create+0xdf/0x110
 [<ffffffff802a29b5>] debug_check_no_locks_freed+0x120/0x12f
 [<ffffffff802a2871>] trace_hardirqs_on+0x136/0x15a
 [<ffffffff80263db6>] _spin_unlock_irqrestore+0x3f/0x47
 [<ffffffff802a2871>] trace_hardirqs_on+0x136/0x15a
 [<ffffffff8039e7c5>] vcs_make_sysfs+0x33/0x62
 [<ffffffff803a40c2>] con_open+0x89/0x9b
 [<ffffffff8039bc59>] tty_open+0x19c/0x321
 [<ffffffff80248bc1>] chrdev_open+0x148/0x198
 [<ffffffff80248a79>] chrdev_open+0x0/0x198
 [<ffffffff8021eaa2>] __dentry_open+0xe9/0x1c3
 [<ffffffff80226319>] nameidata_to_filp+0x2d/0x3e
 [<ffffffff802281dd>] do_filp_open+0x36/0x46
 [<ffffffff802639f3>] _spin_unlock+0x26/0x2a
 [<ffffffff802158d5>] get_unused_fd+0xfb/0x10c
 [<ffffffff802198e2>] do_sys_open+0x4f/0xd0
 [<ffffffff8023220f>] sys_open+0x1b/0x1d
 [<ffffffff8025c11e>] system_call+0x7e/0x83

This could be an effect of a presence of 'irqpoll' though.
When I later tried without it I was able to login and on a level
5 as well.  A dmesg output from a boot with 2.6.20-1.3053.fc7
is attached.  BTW - it appears that "message flood" problem
from 2.6.20-1.2925.fc6 is absent with 2.6.20-1.3053.fc7 (or
at least it did not show up in that short time I could test
that one).

An output from 'cat /proc/interrupts' differ somewhat between
2.6.19-1.2911.6.5.fc6 and 2.6.20-1.3053.fc7 (both from level 3).
Here is for the first one:

           CPU0       CPU1
  0:   19977824          0   IO-APIC-edge      timer
  1:        238         13   IO-APIC-edge      i8042
  6:          3          0   IO-APIC-edge      floppy
  8:          0          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          4          0   IO-APIC-edge      i8042
 14:       6047      74820   IO-APIC-edge      ide0
 15:        488         64   IO-APIC-edge      ide1
 16:        612     178336   IO-APIC-fasteoi   ide2, ohci_hcd:usb1, HDA Intel
 17:         52       1230   IO-APIC-fasteoi   ohci_hcd:usb2, ohci_hcd:usb4, libata
 18:        247    2355535   IO-APIC-fasteoi   ohci_hcd:usb3, ohci_hcd:usb5
 19:       1311    1107319   IO-APIC-fasteoi   ehci_hcd:usb6
 23:        295     140394   IO-APIC-fasteoi   eth0
NMI:          0          0
LOC:   19976421   19976349
ERR:          0

and this for 2.6.20-1.3053.fc7

           CPU0       CPU1
  0:      84608          0   IO-APIC-edge      timer
  1:        298          0   IO-APIC-edge      i8042
  2:          0          0    XT-PIC-XT        cascade
  8:          0          0   IO-APIC-edge      rtc
 12:          4          0   IO-APIC-edge      i8042
 16:        489          0   IO-APIC-fasteoi   ohci_hcd:usb1, libata, HDA Intel
 17:         29          0   IO-APIC-fasteoi   ohci_hcd:usb2, ohci_hcd:usb4, libata
 18:          3          0   IO-APIC-fasteoi   ohci_hcd:usb3, ohci_hcd:usb5
 19:        361          0   IO-APIC-fasteoi   ehci_hcd:usb6
 21:       4020          0   IO-APIC-fasteoi   libata
 23:          0          0   IO-APIC-fasteoi   eth0
NMI:          0          0
LOC:      84483      84542
ERR:          0

I do not know if something more delicate that 'acpi=off' would
be enough with 2.6.20-1.3053.fc7.
Comment 1 Michal Jaegermann 2007-04-10 02:00:25 EDT
Created attachment 152102 [details]
dmesg output from bootin a rawhide kernel on M2R32-MVP
Comment 2 Michal Jaegermann 2007-04-10 02:02:55 EDT
Created attachment 152106 [details]
PCI layout on M2R32-MVP
Comment 3 Chuck Ebbert 2007-04-16 20:21:59 EDT
Can you try kernel 1.2944?
Comment 4 Michal Jaegermann 2007-04-23 14:18:07 EDT
> Can you try kernel 1.2944?

Ok, at last I was able to; even if only indirectly, via telephone.
I was told that, when not booting 'quiet', the last thing printed
on a screen is:

hda: max request size: 512KiB

(I am not entirely sure if the last message is for hda or hdc).
After that there is a long pause and after that all these "lost
interrupt" messages start to show up, slowly, and one has to push
a reset button to get a machine back.
Comment 5 Michal Jaegermann 2007-04-23 14:20:09 EDT
I should add to the previous comment: this was tried without extra
kernel parameters and also with 'acpi=off'.  There was no difference.
Comment 6 Michal Jaegermann 2007-06-18 22:19:22 EDT
I had at last an opportunity to try 2.6.20-1.2952.fc6 (a machine
is remote and I cannot risk to leave it with a stuck boot).
That kernel booted on a board in question without any extra parameters.
Who knows what will happen with the next one.

Note You need to log in before you can comment on or make changes to this bug.