Bug 230805

Summary: Soft Lockup detected when loading cyclades firmware
Product: Red Hat Enterprise Linux 5 Reporter: Andreas Thienemann <andreas>
Component: kernelAssignee: Aristeu Rozanski <arozansk>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: dzickus
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-07 21:06:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andreas Thienemann 2007-03-02 23:24:01 UTC
Description of problem:
When loading the firmware on a 4way (2xDual-Core) Opteron for the Cyclades-Z
serial port multiplexer card, the system quickly becomes unresponsive and crashed.

Calling dmesg right after loading the firmware with cyzload -f cyzfirm.bin shows
the following:


Cyclades driver 2.3.2.20 2004/02/25 18:14:16
        built Nov  9 2006 18:53:40
Cyclades-8Zo/PCI #1: 0xff500000-0xff57ffff, 8 channels starting from port 0.
BUG: soft lockup detected on CPU#1!

Call Trace:
 [<ffffffff80069632>] show_trace+0x34/0x47
 [<ffffffff80069657>] dump_stack+0x12/0x17
 [<ffffffff800b4d8b>] softlockup_tick+0xdb/0xed
 [<ffffffff8009432e>] update_process_times+0x42/0x68
 [<ffffffff8007427c>] smp_local_timer_interrupt+0x23/0x47
 [<ffffffff80074938>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff8005c042>] apic_timer_interrupt+0x66/0x6c
DWARF2 unwinder stuck at apic_timer_interrupt+0x66/0x6c
Leftover inexact backtrace:
 <IRQ>  [<ffffffff88451e8d>] :cyclades:cyz_poll+0x2fc/0x77f
 [<ffffffff88451b91>] :cyclades:cyz_poll+0x0/0x77f
 [<ffffffff80093b54>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c0e>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c6b0>] call_softirq+0x1c/0x28
 [<ffffffff8006a7eb>] do_softirq+0x2c/0x85
 [<ffffffff80068df5>] default_idle+0x0/0x50
 [<ffffffff8005c042>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80068e1e>] default_idle+0x29/0x50
 [<ffffffff800472a9>] cpu_idle+0x95/0xb8
 [<ffffffff8007409a>] start_secondary+0x45a/0x469

BUG: soft lockup detected on CPU#1!

Call Trace:
 [<ffffffff80069632>] show_trace+0x34/0x47
 [<ffffffff80069657>] dump_stack+0x12/0x17
 [<ffffffff800b4d8b>] softlockup_tick+0xdb/0xed
 [<ffffffff8009432e>] update_process_times+0x42/0x68
 [<ffffffff8007427c>] smp_local_timer_interrupt+0x23/0x47
 [<ffffffff80074938>] smp_apic_timer_interrupt+0x41/0x47
 [<ffffffff8005c042>] apic_timer_interrupt+0x66/0x6c
DWARF2 unwinder stuck at apic_timer_interrupt+0x66/0x6c
Leftover inexact backtrace:
 <IRQ>  [<ffffffff88451ea4>] :cyclades:cyz_poll+0x313/0x77f
 [<ffffffff88451b91>] :cyclades:cyz_poll+0x0/0x77f
 [<ffffffff80093b54>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c0e>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c6b0>] call_softirq+0x1c/0x28
 [<ffffffff8006a7eb>] do_softirq+0x2c/0x85
 [<ffffffff80068df5>] default_idle+0x0/0x50
 [<ffffffff8005c042>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80068e1e>] default_idle+0x29/0x50
 [<ffffffff800472a9>] cpu_idle+0x95/0xb8
 [<ffffffff8007409a>] start_secondary+0x45a/0x469

[root@sysiphus2 bin]# dmesg

The second dmesg call hangs as the kernel seems to have locked up.
The message about the cyclades driver on the top comes from the successfull
module insertion with modprobe cyclades.

Version-Release number of selected component (if applicable):
kernel-2.6.18-1.2747.el5

How reproducible:
Always

Steps to Reproduce:
1. modprobe cyclades
2. cyzload -f $firmware file
3. lean back, wait a few seconds.
  
If there's anything I can do to help debugging this, just say so. thx.

Comment 1 Aristeu Rozanski 2007-09-12 12:26:33 UTC
Hi Andreas,
just to make sure: did you tested this card with this firmware with success in
another machine or RHEL version perhaps?


Comment 2 Andreas Thienemann 2007-09-12 12:39:23 UTC
Hey Aristeu,

not yet. I'll have a new 5.1beta CSB build available soonish though. I'll try
the card on it then.

Comment 3 Aristeu Rozanski 2007-09-14 13:03:14 UTC
Andreas, I've checked where it could be possible generating this problem and my
suspects of firmware problem increased. Are you willing to test a kernel that will
be a bit more verbose when the problem happens?



Comment 4 Andreas Thienemann 2007-09-14 22:02:10 UTC
Hello,

no problem, I'll gladly try the debugging kernel. I think I'll have access to
the test-rig next week or the week after. So the answer might take a bit
unfortunately. :(

Comment 5 Andreas Thienemann 2007-09-18 20:45:26 UTC
Aristeu, I got access to the test-rig today but could only install the rhel5
2.6.18-8 kernel on the box. The problem is the same though, down to the stacktrace.
I'll have access to the system from now on, so I'd be glad to try your debugging
kernel.


Comment 6 Aristeu Rozanski 2007-09-18 21:05:39 UTC
Andreas, did you tried other firmware versions?


Comment 7 Andreas Thienemann 2007-09-18 21:26:59 UTC
I tried the latest firmware there is, which is from 2005.
ftp://ftp.cyclades.com/pub/cyclades/async/linux/cyc_async-700-1.tar.gz



Comment 8 Aristeu Rozanski 2007-09-25 19:37:28 UTC
Andreas, the test kernels are in:
http://people.redhat.com/arozansk/cyclades/
try to get the dmesg output and attach here (using serial console may help)


Comment 9 Andreas Thienemann 2007-09-25 22:59:23 UTC
(In reply to comment #8)

> http://people.redhat.com/arozansk/cyclades/
> try to get the dmesg output and attach here (using serial console may help)

I just tried your kernel and it's looking quite good right now. The system is
stable, firmware has been loaded into the card and the cyclades.ko module has
been loaded as well.


serial output is rather sparse right now:

Cyclades driver 2.3.2.20 2004/02/25 18:14:16
        built Sep 19 2007 13:54:39
Cyclades-8Zo/PCI #1: 0xff500000-0xff57ffff, 8 channels starting from port 0.


I'll leave the machine running over night and do some tests tomorrow but right
now it's looking quite good as the box locked up in the past nearly instantly
after loading the firmware.

Comment 10 Aristeu Rozanski 2007-10-02 13:56:22 UTC
Andreas, any news? it's still running?



Comment 11 Aristeu Rozanski 2008-01-07 16:46:53 UTC
Andreas, any updates on this one?


Comment 12 Andreas Thienemann 2008-01-07 16:53:51 UTC
Sorry, missed that. Thx for the heads-up.

System seems to be running fine for some time now in production use.

Comment 13 Aristeu Rozanski 2008-01-07 17:00:46 UTC
OK, this is strange. The patch does nothing but warn when it gets too many
packets from the card. Please try it with the newest RHEL5 kernel you have
access and tell me how it goes.


Comment 14 Aristeu Rozanski 2008-02-07 16:50:32 UTC
Andreas, any news?


Comment 15 Andreas Thienemann 2008-02-07 20:37:25 UTC
Sorry, forgot about that. Newest kernels are fine. It's currently been running
for a month without too much troubles with 2.6.18-59.el5bbHPmgmtxen, a special
xen build von brew.

I'd say, close that bug.