Bug 702674

Summary: powerpc: Only sleep in rtas_busy_delay if we have useful work to do
Product: Red Hat Enterprise Linux 6 Reporter: IBM Bug Proxy <bugproxy>
Component: kernelAssignee: Steve Best <sbest>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: arozansk, balkov, dhowells, eguan, jkachuck, kzhang, pbenas, peterm, syeghiay
Target Milestone: rcKeywords: OtherQA
Target Release: ---   
Hardware: ppc64   
OS: All   
Whiteboard:
Fixed In Version: kernel-2.6.32-153.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 13:24:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 684953    
Attachments:
Description Flags
Backport of eca590f402332ab873d13f2d8d00fa0b91cfff36 none

Description IBM Bug Proxy 2011-05-06 14:40:16 UTC
---Problem Description---
With sufficient memory and the dynamic DMA window enabled, RHEL6.1 kernels will produce spurious messages like:

INFO: task modprobe:3333 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
modprobe      D 0000008060df9708     0  3333   2795 0x00008000
Call Trace:
[c0000016f2217670] [c0000016f2217720] 0xc0000016f2217720 (unreliable)
[c0000016f2217840] [c000000000014288] .__switch_to+0xf8/0x1d0
[c0000016f22178d0] [c00000000059cc68] .schedule+0x408/0xd30
[c0000016f2217bb0] [c0000000003aefd4] .wait_for_device_probe+0x64/0xc0
[c0000016f2217c70] [d00000001d1e0014] .wait_scan_init+0x10/0xc4 [scsi_wait_scan]
[c0000016f2217ce0] [c00000000000976c] .do_one_initcall+0x5c/0x200
[c0000016f2217d90] [c0000000000dd5dc] .SyS_init_module+0x14c/0x2c0
[c0000016f2217e30] [c000000000008564] syscall_exit+0x0/0x40

while firmware is setting up the TCEs. This is fixed upstream by commit eca590f402332ab873d13f2d8d00fa0b91cfff36 in benh's tree to only sleep in the RTAS delay loop if something else could be done. This changed the order of time required for the DDW initialization from minutes to seconds.
  
---uname output---
Linux ahi02.upt.austin.ibm.com 2.6.32-131.0.10.el6.ppc64 #1 SMP Wed Apr 27 15:28:11 EDT 2011 ppc64 ppc64 ppc64 GNU/Linux
 
Machine Type = CHRP IBM,7891-74X (P7 Falcon Double Wide Blade) 

---Steps to Reproduce---
Boot the current 6.1 kernel with a 64-bit adapter under P7IOC and suitable firmware on a Falcon platform. Dynamic DMA windows will be initialized for those adapters and a spurious timeout message will occur.

Comment 1 IBM Bug Proxy 2011-05-06 14:40:21 UTC
Created attachment 497373 [details]
Backport of eca590f402332ab873d13f2d8d00fa0b91cfff36

Comment 3 RHEL Program Management 2011-05-13 18:49:36 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 4 Steve Best 2011-05-18 13:39:55 UTC
posted to rh-kernel mailing list
http://post-office.corp.redhat.com/archives/rhkernel-list/2011-May/msg00542.html

Comment 7 Aristeu Rozanski 2011-08-11 19:48:17 UTC
Patch(es) available on kernel-2.6.32-153.el6

Comment 10 IBM Bug Proxy 2011-10-04 18:41:22 UTC
------- Comment From nacc.com 2011-10-04 14:27 EDT-------
Hello Redhat,

We would request this fix be backported to 6.1.z, per the following customer impact:

There is a large IBM customer who is seeing this bug. He has tried out the
patch that Anton submitted and confirms that it fixes his problem.

This customer is interested in learning if this will make it to RHEL 6.1
z-strream. One (z-stream)  is due out tomorrow. However, I suspect we would
have known by now if this was going into that one. If not, can we push for the
next one expected in about 6 weeks?

BTW, the slow boot problem was recreated with DDR adapters too (with a ps704)
at the customer site.

Thanks,
Nish

Comment 12 errata-xmlrpc 2011-12-06 13:24:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html