Bug 432601 - ia32el: 32-bit application (had) causes system freeze on ia32el-1.6-14.EL4
ia32el: 32-bit application (had) causes system freeze on ia32el-1.6-14.EL4
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: redhat-release (Show other bugs)
4.7
ia64 Linux
high Severity urgent
: rc
: ---
Assigned To: Petr Machata
: OtherQA, Regression, Reopened
Depends On:
Blocks: 245608 RHEL4u7_relnotes 444823
  Show dependency treegraph
 
Reported: 2008-02-13 04:29 EST by Eric Lin
Modified: 2015-05-04 21:33 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-07-01 08:48:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Eric Lin 2008-02-13 04:29:53 EST
A 32-bit application (the High Availability daemon (had) of Veritas Cluster 
Server) runs fine on an earlier version of ia32el package [ia32el-1.2.4]. 
However the version of ia32el shipped with rhel4u4/u5 [ia32el-1.6-14.EL4] 
causes a system hang when the application is run. 

The problem is only seen when the number of cpus = 1. Also, when the 
application is run on strace or gdb, the application starts without any 
problems.
Comment 1 Eric Lin 2008-02-13 21:32:27 EST
Root cause:

Some lock in IA-32 EL is implemented with atomic cmpxchg and sched_yield(). 
HAD is set as a real-time thread and spins on a internal lock used by IA-32 EL 
(because IA-32 EL executes code on behalf of HAD), while the lock is hold by 
another thread with low priority (so-called translation thread created by IA-
32 EL). As long as Translation Thread does not release the lock, the real-time 
thread will running endlessly and system seems freezing. 

For this specific application (HAD), Translation Thread is the feature that 
exposes the issue; Since it converted a single thread problem to multi-thread, 
the spin-lock internally used by IA32EL comes to be a problem. But for real 
multi-thread applications, these kind of lock can be a problem even if there 
is no Translation Thread within IA32EL, so we plan to provide an ultimate fix 
for this problem in the on-going version of IA32EL.

We have disabled Translation Thread in IA32EL shipped with RHEL5.1. So for a 
temporary workaround, we recommend customer to use IA-32 EL on RHEL5.1. 
Comment 2 Ronald Pacheco 2008-02-15 12:58:42 EST
Eric,

The way I read this is that one can work around this problem by using the ia32el
package that is shipped with RHEL5.  Assuming so, then perhaps the easiest way
to resolve this bug is to document this in a knowledge base article.  I am also
bearin gin mind that Intel is shipping dual cores across the entire product
line, so the case of cpus=1 is rather small.

Please confirm.
Comment 3 Eric Lin 2008-02-18 00:22:18 EST
Ronald, 

One correction for you, workaround should be using ia32el package with RHEL 5 
U1. And yes, we'd like to document it in knowledge base article, any process 
for that? 
Comment 7 RHEL Product and Program Management 2008-04-15 16:49:35 EDT
Product Management has reviewed and declined this request.  You may appeal this
decision by reopening this request. 
Comment 10 Eric Lin 2008-05-15 21:54:12 EDT
The issue is waiting to be verified by customers
Comment 11 Don Domingo 2008-06-02 19:15:26 EDT
Hi,

the RHEL4.7 release notes deadline is on June 17, 2008 (Tuesday). they will
undergo a final proofread before being dropped to translation, at which point no
further additions or revisions will be entertained.

a mockup of the RHEL4.7 release notes can be viewed here:
http://intranet.corp.redhat.com/ic/intranet/RHEL4u7relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don
Comment 13 Ronald Pacheco 2008-06-03 10:56:50 EDT
Eric/Keve,

Can you provide us with an update?
Comment 14 Eric Lin 2008-06-04 05:31:02 EDT
we have already got the workaround: using ia32-el shipped in RHEL5 instead in 
RHEL 4.7.
Gary Case (gcase@redhat.com) is currently verify the workaround with the 
customers.
we could close this bug now.
Comment 15 Andrius Benokraitis 2008-06-24 23:03:25 EDT
Don (and others), do you think we need to document the workaround in Comment #14
in the release notes before closing this out?
Comment 16 Eric Lin 2008-06-25 00:26:33 EDT
Yes, please. Please note user need this workaround only if threads of their 
application use real time priority
Comment 17 Tim Burke 2008-06-25 16:33:18 EDT
Eric,

Can you please make a specific suggestion of how we should word the release note?
Comment 18 Eric Lin 2008-06-25 20:34:29 EDT
How about the following?

In an X86 application with one or more SCHED_PR threads, it may hang due to a 
bug in IA-32 EL V6 shipped with this OS release. The workaround is to use IA-
32 EL V6 Update 1 shipped with RHEL 5.
Comment 22 Chris Ward 2008-07-29 03:26:51 EDT
Partners, I would like to thank you all for your participation in assuring the
quality of this RHEL 4.7 Update Release. My hat's off to you all. Thanks.
Comment 23 Eric Lin 2008-12-14 20:13:59 EST
Intel will fix this regression in lastest IA-32EL release,
targeting RHEL5.4

Note You need to log in before you can comment on or make changes to this bug.