Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 643080

Summary: tasks blocked after putting Nehalem CPU offline
Product: Red Hat Enterprise Linux 5 Reporter: Jan Tluka <jtluka>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.6CC: jburke, sassmann
Target Milestone: betaKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 21:56:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 640580    
Attachments:
Description Flags
RHEL5 fix for this issue
none
RHEL5 fix for this issue none

Description Jan Tluka 2010-10-14 15:26:44 UTC
Description of problem:
When putting Nehalem CPU offline the echo command hangs and after 120 secs I get following message in the system log (and other similar follows after further 120 secs).

INFO: task scsi_eh_1:1668 blocked for more than 120 seconds. 
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
scsi_eh_1     D 0000003A  2704  1668    131          1669  1667 (L-TLB) 
       f7c6bf2c 00000046 01acf99b 0000003a f88f0666 f63c5328 f63c5328 0000000a  
       f6c62aa0 01acfdaf 0000003a 00000414 00000008 f6c62bac c3049ba0 f5063c80  
       f6c62aa0 00000000 f7c241b0 c3046aa0 f5063c80 f7c24000 f5063c80 00000042  
Call Trace: 
 [<f88f0666>] scsi_next_command+0x25/0x2f [scsi_mod] 
 [<c061fb1d>] __mutex_lock_slowpath+0x4d/0x7c 
 [<f88ef7f8>] scsi_error_handler+0x0/0x422 [scsi_mod] 
 [<c061fb5b>] .text.lock.mutex+0xf/0x14 
 [<c0433f5c>] flush_workqueue+0x2f/0x66 
 [<f8961ee9>] ata_port_flush_task+0xd/0x56 [libata] 
 [<f896d099>] ata_scsi_error+0x17/0x56a [libata] 
 [<f88ef7f8>] scsi_error_handler+0x0/0x422 [scsi_mod] 
 [<f88ef7f8>] scsi_error_handler+0x0/0x422 [scsi_mod] 
 [<f88ef89e>] scsi_error_handler+0xa6/0x422 [scsi_mod] 
 [<c041ebb9>] complete+0x2b/0x3d 
 [<f88ef7f8>] scsi_error_handler+0x0/0x422 [scsi_mod] 
 [<c0436abb>] kthread+0xc0/0xed 
 [<c04369fb>] kthread+0x0/0xed 
 [<c0405c87>] kernel_thread_helper+0x7/0x10 
 ======================= 

This is the log from i386 installation but x86_64 is affected as well.

Version-Release number of selected component (if applicable):
kernel-2.6.18-225.el5
RHEL5.6-Server-20100930.0

How reproducible:
always

Steps to Reproduce:
1. echo 0 > /sys/devices/system/cpu/cpu1/online
2. watch the log for next 2 minutes
3. kill -9 `pgrep echo`
  
Actual results:
blocked tasks messages in the log and I'm not able to kill the echo command


Expected results:
echo command returns and the cpu is offlined, no messages about blocked tasks in the system log

Additional info:

Comment 2 Prarit Bhargava 2010-10-14 19:13:38 UTC
Weird ... all other cpus except 1 appear to be onlining/offlining properly.

P.

Comment 4 Prarit Bhargava 2010-10-15 15:14:13 UTC
Created attachment 453743 [details]
RHEL5 fix for this issue

Comment 6 RHEL Program Management 2010-10-15 15:29:58 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Prarit Bhargava 2010-10-18 12:49:17 UTC
Created attachment 454102 [details]
RHEL5 fix for this issue

Comment 9 Stefan Assmann 2010-10-19 13:36:05 UTC
*** Bug 643869 has been marked as a duplicate of this bug. ***

Comment 10 Jarod Wilson 2010-10-19 18:53:54 UTC
in kernel-2.6.18-228.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 14 errata-xmlrpc 2011-01-13 21:56:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html