Bug 432601 - ia32el: 32-bit application (had) causes system freeze on ia32el-1.6-14.EL4
Summary: ia32el: 32-bit application (had) causes system freeze on ia32el-1.6-14.EL4
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: redhat-release
Version: 4.7
Hardware: ia64
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Petr Machata
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 245608 RHEL4u7_relnotes 444823
TreeView+ depends on / blocked
 
Reported: 2008-02-13 09:29 UTC by Eric Lin
Modified: 2015-05-05 01:33 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-01 12:48:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eric Lin 2008-02-13 09:29:53 UTC
A 32-bit application (the High Availability daemon (had) of Veritas Cluster 
Server) runs fine on an earlier version of ia32el package [ia32el-1.2.4]. 
However the version of ia32el shipped with rhel4u4/u5 [ia32el-1.6-14.EL4] 
causes a system hang when the application is run. 

The problem is only seen when the number of cpus = 1. Also, when the 
application is run on strace or gdb, the application starts without any 
problems.

Comment 1 Eric Lin 2008-02-14 02:32:27 UTC
Root cause:

Some lock in IA-32 EL is implemented with atomic cmpxchg and sched_yield(). 
HAD is set as a real-time thread and spins on a internal lock used by IA-32 EL 
(because IA-32 EL executes code on behalf of HAD), while the lock is hold by 
another thread with low priority (so-called translation thread created by IA-
32 EL). As long as Translation Thread does not release the lock, the real-time 
thread will running endlessly and system seems freezing. 

For this specific application (HAD), Translation Thread is the feature that 
exposes the issue; Since it converted a single thread problem to multi-thread, 
the spin-lock internally used by IA32EL comes to be a problem. But for real 
multi-thread applications, these kind of lock can be a problem even if there 
is no Translation Thread within IA32EL, so we plan to provide an ultimate fix 
for this problem in the on-going version of IA32EL.

We have disabled Translation Thread in IA32EL shipped with RHEL5.1. So for a 
temporary workaround, we recommend customer to use IA-32 EL on RHEL5.1. 


Comment 2 Ronald Pacheco 2008-02-15 17:58:42 UTC
Eric,

The way I read this is that one can work around this problem by using the ia32el
package that is shipped with RHEL5.  Assuming so, then perhaps the easiest way
to resolve this bug is to document this in a knowledge base article.  I am also
bearin gin mind that Intel is shipping dual cores across the entire product
line, so the case of cpus=1 is rather small.

Please confirm.

Comment 3 Eric Lin 2008-02-18 05:22:18 UTC
Ronald, 

One correction for you, workaround should be using ia32el package with RHEL 5 
U1. And yes, we'd like to document it in knowledge base article, any process 
for that? 

Comment 7 RHEL Program Management 2008-04-15 20:49:35 UTC
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request. 

Comment 10 Eric Lin 2008-05-16 01:54:12 UTC
The issue is waiting to be verified by customers

Comment 11 Don Domingo 2008-06-02 23:15:26 UTC
Hi,

the RHEL4.7 release notes deadline is on June 17, 2008 (Tuesday). they will
undergo a final proofread before being dropped to translation, at which point no
further additions or revisions will be entertained.

a mockup of the RHEL4.7 release notes can be viewed here:
http://intranet.corp.redhat.com/ic/intranet/RHEL4u7relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Comment 13 Ronald Pacheco 2008-06-03 14:56:50 UTC
Eric/Keve,

Can you provide us with an update?

Comment 14 Eric Lin 2008-06-04 09:31:02 UTC
we have already got the workaround: using ia32-el shipped in RHEL5 instead in 
RHEL 4.7.
Gary Case (gcase) is currently verify the workaround with the 
customers.
we could close this bug now.


Comment 15 Andrius Benokraitis 2008-06-25 03:03:25 UTC
Don (and others), do you think we need to document the workaround in Comment #14
in the release notes before closing this out?

Comment 16 Eric Lin 2008-06-25 04:26:33 UTC
Yes, please. Please note user need this workaround only if threads of their 
application use real time priority

Comment 17 Tim Burke 2008-06-25 20:33:18 UTC
Eric,

Can you please make a specific suggestion of how we should word the release note?


Comment 18 Eric Lin 2008-06-26 00:34:29 UTC
How about the following?

In an X86 application with one or more SCHED_PR threads, it may hang due to a 
bug in IA-32 EL V6 shipped with this OS release. The workaround is to use IA-
32 EL V6 Update 1 shipped with RHEL 5.

Comment 22 Chris Ward 2008-07-29 07:26:51 UTC
Partners, I would like to thank you all for your participation in assuring the
quality of this RHEL 4.7 Update Release. My hat's off to you all. Thanks.

Comment 23 Eric Lin 2008-12-15 01:13:59 UTC
Intel will fix this regression in lastest IA-32EL release,
targeting RHEL5.4


Note You need to log in before you can comment on or make changes to this bug.