Bug 659480

Summary: UV: WAR for interrupt-IOPort deadlock
Product: Red Hat Enterprise Linux 6 Reporter: George Beshers <gbeshers>
Component: kernelAssignee: George Beshers <gbeshers>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0CC: dhoward, dwa, gbeshers, loriann, snagar, tee
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-92.el6 Doc Type: Bug Fix
Doc Text:
Prior to this update, running the hwclock --systohc command could halt a running system. This was due to the interrupt transactions being looped back from a local IOH (Input/Output Hub), through the IOH to a local CPU (erroneously), which caused a conflict with I/O port operations and other transactions. With this update, the conflicts are avoided and the system continues to run after executing the hwclock --systohc command.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 12:04:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 661953, 662543, 662921    
Attachments:
Description Flags
Tested patch none

Description George Beshers 2010-12-02 21:24:51 UTC
Created attachment 464378 [details]
Tested patch

Description of problem:
  Problem originally noticed when
  'hwclock --systohc' halted a running system.



Version-Release number of selected component (if applicable):
  kernel-2.6.32-71


How reproducible:
  Easily, but not 100%.


Steps to Reproduce:
1. boot system on UV hardware
2. hwclock --systohc    ## ususally sufficient
3. hwclock

On one system also had to do the following before running it:
	echo 4 >/proc/irq/8/smp_affinity
then run hwclock.



  
Actual results:

  System halts with CATERR.


Expected results:
  System continues to run.


Additional info:
  The attached patch has been tested inside SGI on multiple systems.

PV# 1012363

Comment 3 George Beshers 2010-12-03 21:19:15 UTC
Went ahead and posted as this was seen at a customer site.

Comment 4 George Beshers 2010-12-13 18:18:46 UTC
Reposted per Don's and Aristeu's request.

George

Comment 7 Aristeu Rozanski 2010-12-16 20:54:21 UTC
Patch(es) available on kernel-2.6.32-92.el6

Comment 9 George Beshers 2010-12-22 17:43:48 UTC
This has been tested on both UV100 and a very large UV1000
system inside SGI; the problem is solved :).

George

Comment 11 Martin Prpič 2011-02-23 15:12:02 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Prior to this update, running the hwclock --systohc command could halt a running system. This was due to the interrupt transactions being looped back from a local IOH (Input/Output Hub), through the IOH to a local CPU (erroneously), which caused a conflict with I/O port operations and other transactions. With this update, the conflicts are avoided and the system continues to run after executing the hwclock --systohc command.

Comment 12 errata-xmlrpc 2011-05-19 12:04:24 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html