Bug 480315

Summary: BUG: soft lockup - CPU#0 stuck for 10s! [dmsetup:27345]
Product: Red Hat Enterprise Linux 5 Reporter: Richard Wojdak <richard.wojdak>
Component: kernelAssignee: Marcus Barrow <mbarrow>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.3CC: agk, andrew.vasquez, andriusb, christophe.varoqui, coughlan, dwysocha, edamato, egoggin, heinzm, jbrassow, junichi.nomura, kueda, lmb, mbarrow, mbroz, prockai, qlogic-redhat-ext, richard.wojdak, rpacheco, tranlan, yufan.chen
Target Milestone: rcKeywords: OtherQA
Target Release: 5.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-14 13:55:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 460170    
Attachments:
Description Flags
Serail console output showing soft lockup - WINZIP file
none
Qlogic serial console output with extended error logging enabled none

Description Richard Wojdak 2009-01-16 13:29:37 UTC
Created attachment 329205 [details]
Serail console output showing soft lockup - WINZIP file

Description of problem:
On Thu, 15 Jan 2009, Wojdak, Richard wrote:


Test: Hazard C8 Failure: System Hang - can't input at console /Hazard Process  
      Timeouts

Problem: BUG: soft lockup - CPU#0 stuck for 10s! [dmsetup:27345]
	Pid: 27345, comm:              dmsetup
	EIP: 0060:[<c0415911>] CPU: 0
	EIP is at smp_call_function+0x99/0xc3
	 EFLAGS: 00000297    Not tainted  (2.6.18-128.el5PAE #1)
	EAX: 00000006 EBX: 00000001 ECX: 00000001 EDX: 000000fb
	ESI: 00000007 EDI: 00000000 EBP: c0415a86 DS: 007b ES: 007b
	CR0: 8005003b CR2: 0804b590 CR3: 379271e0 CR4: 000006f0
	 [<c0415a86>] do_flush_tlb_all+0x0/0x5a
	 [<c0415a86>] do_flush_tlb_all+0x0/0x5a
	 [<c0428972>] on_each_cpu+0x10/0x1f
	 [<c04159c7>] flush_tlb_all+0x13/0x15
	 [<c0466788>] __remove_vm_area+0x1f/0x30
	 [<c04667ad>] remove_vm_area+0x14/0x21
	 [<c04667f2>] __vunmap+0x38/0xbd
	 [<f88e56bf>] ctl_ioctl+0x21b/0x238 [dm_mod]
	 [<f88e4594>] dev_status+0x0/0x2b [dm_mod]
	 [<c0480e27>] do_ioctl+0x47/0x5d
	 [<c0481087>] vfs_ioctl+0x24a/0x25c
	 [<c04810e1>] sys_ioctl+0x48/0x5f
	 [<c0404f17>] syscall_call+0x7/0xb
	 =======================
	BUG: soft lockup - CPU#0 stuck for 10s! [dmsetup:27345]

Environment: HP Proliant Blade BL680G5 
             RH53rc2 X86 and x86_64 with Inbox/HPDM 4.2.0-Beta1 
             QMH2462 FC HBAs - 6 total paths
             BFS/MSA1500  
             105 LUNs under test

This issue was also seen on x86 (32 bit), but 32 bit also has out-of-memory issues. I have not seen the out-of-memory problem on 64 bit.
 

Again, this looks similar to the out-of-memory issue I just replied to, TLB flushes are taking an inordinate amount of time after I/O failures.

Going through the logs, I don't see any qla2xxx function in any of the
backtraces.  I'd suggest you ask RH take a look as to why the TLB
flushes are taking such a long time (and appear to be choking up
memory).
  

	end_request: I/O error, dev sdaa, sector 3408169^M
	end_request: I/O error, dev sdaa, sector 3408369^M
	end_request: I/O error, dev sdaa, sector 3408569^M
	end_request: I/O error, dev sdaa, sector 3408769^M


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Run Hazard C8 
2.System will hang within 1 hour

  
Actual results:


Expected results:


Additional info:

Comment 1 Richard Wojdak 2009-01-21 20:07:30 UTC
Created attachment 329647 [details]
Qlogic serial console output with extended error logging enabled

Comment 2 Richard Wojdak 2009-02-24 15:10:28 UTC
This issue was corrected in a new QLogic driver. HP requests this new driver be incorporated into a Red Hat update. 

The debug version that resolved this issue was qla2xxx_8.02.00.06.05.03-k_HPQ1. QLogic should supply Red Hat with a driver that has the fix for this issue.


This trace looks much like the last one and has the benefit of 
> sufficient length to check for abort status (ABTS). The trace again 
> shows very heavy Write traffic to the MSA. When the MSA gets to a 
> Target Port queue depth of 128 outstanding exchanges/IO-commands it 
> begins issuing queue full responses (Expert counted 7,700 times that 
> the 128 outstanding/open exchanges condition was met at the MSA port 
> during the trace. There are 435 queue full responses issued in the two 
> and a half minutes of trace – they start 14 seconds into the trace and 
> are seen consistently throughout.
> 
> In fact there is a period starting at timestamp
> 000:01:35.611_146_265 and lasting through timestamp 
> 000:01:35.647_403_540 (~36 milliseconds) where the Initiator port is 
> sending SCSI commands every 300 microseconds at the start of the time 
> period down to every 100 microseconds at the end of the time period; 
> even though every one of the many dozens of commands receives a queue 
> full response. The back-off algorithm appears to be a cause for 
> concern. I’d recommend that QLogic be asked to take a look at the back 
> off algorithm.

It actually turns out the queue-full 'back-off' logic is not even triggering due to precedence checking of statuses within the driver.
Here's how the driver receives status for these I/Os returning back a scsi status of task-set-full (queue-full):

	scsi(4:1:4) UNDERRUN status detected 0x15-0x28. resid=0x0 fw_resid=0x5000 cdb=0x2a os_underflow=0x5000
	scsi(4:0:1:4) Dropped frame(s) detected (5000 of 5000 bytes)...retrying command.

the ISP firmware is returning back a completion-status of CS_DATA_UNDERRUN for the WRITE_10 since none of the 5000h bytes of data were successfully transferred.  The driver upon receiving the CS_DATA_UNDERRUN status will check to see if there was any mismatch in both residual-underrun status (bits 8-15 of the scsi-status) and residual-underrun count reported by the storage.  As is the case with this MSA and the TASK_SET_BUSY status, if the storage reports no residual-underrun (BIT-11 cleared), then the driver will clear
(zero-out) any lower scsi-status (bits 7-0) and treat the I/O as failed due to a mismatch in data transfer -- a 'dropped frame' message and the I/O being returned to the midlayer with a DID_BUS_BUSY status.

Regardless of 'interpretation' or 'over-interpretation of distilled statuses', ideally, the driver should be falling into it's TASK_SET_FULL handling and returning back an appropriate status to the midlayer (DID_OK logically-ORd with TASK_SET_FUL

Comment 4 Marcus Barrow 2009-02-27 15:24:34 UTC
This patch is currently undergoing further development and testing. We are trying to make sure that all conditions are properly handled. It will be posted here soon.

Comment 5 RHEL Program Management 2009-02-27 17:45:33 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Andrius Benokraitis 2009-07-14 13:55:34 UTC
I'm assuming this issue was solved via another BZ - if this isn't the case, please reopen.

Comment 9 Marcus Barrow 2009-10-08 16:46:11 UTC
The patch to fix this problem is being included as part of the rhel 5.5 updates in BZ 519447 - updates for qla2xxx.

THe relevant parts of that patch do the following:
- Check for FW valid flags
- Do not mask off lscsi status, instead pass it to midlayer to handle it

*** This bug has been marked as a duplicate of bug 519447 ***