Bug 104520

Summary:

SMP Kernel hang on shutdown with Intel SRCZCR Raid Controller

Product:

Red Hat Enterprise Linux 3

Reporter:

Jason Sauve <jasonsauve77>

Component:

kernel

Assignee:

Doug Ledford <dledford>

Status:

CLOSED ERRATA

QA Contact:

Severity:

high

Docs Contact:

Priority:

high

Version:

3.0

CC:

alietss, andreas.aretz, bbrock, bruce.grove, cgadd, chun.ming.li, coughlan, danielk, dledford, jneedle, keldon, ltroan, petrides, rf, rknepper, tao, t.koenig, wusel+rhbug

Target Milestone:

---

Target Release:

---

Hardware:

i686

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2004-12-03 01:41:42 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

106472

Attachments:

Description	Flags
Output of kernel sysrq	none
Kernel boot log with nmi_watchdog=1	none
kernel nmi_watchdog=1 log	none
oops from kernel-2.4.21-1.1931.2.399.entsmp	none
Proposed fix	none
Compressed cpio archive of all the i686 modules needed to solve the problem	none
Modules for the 2.4.21-4.0.2.EL kernel	none

Description Jason Sauve 2003-09-16 17:36:03 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Description of problem:
When issuing a "shutdown" or "reboot", the shutdown process hangs at the 
following line:

GDT: Flushing all host drives .. Starting timer : 0 0

If you press CTRL+ALT+DEL the error message repeats itself and still hangs 
there.

This problem could likely cause data loss due to improper data flushing of the 
SRCZCR RAID controller and the GDTH kernel driver.

This problem does not occur with respective version of the non-smp kernel. 




Version-Release number of selected component (if applicable):
kernel-2.4.21-1.1931.2.423entsmp

How reproducible:
Always

Steps to Reproduce:
1. Boot with SMP kernel for AS Beta 1 or 2
2. Issue a shutdown or reboot command

    

Actual Results:  Kernel hangs on shutdown, will not reboot without intervention

Expected Results:  Normal shutdown or reboot

Additional info:

With RHEL AS 3.0 Beta 1 kernel core dumps (problem is even worse). 

My understanding is that this problem is reproducable on other similar newer 
model Intel RAID controllers as well.

Comment 1 Jason Sauve 2003-09-18 13:25:49 UTC

In addition to this, when you execute "cat /proc/scsi/gdth/0" it also hangs.

Comment 2 Arjan van de Ven 2003-09-18 13:27:42 UTC

have you tried this on the kernel available via RHN ?

Comment 3 Jason Sauve 2003-09-18 13:41:26 UTC

The kernel I downloaded (2.4.21-1.1931.2.423.entsmp) came from up2date (rhn). 
I dont know whether it matters or not, but the system is running Dual Xeon 
2.4GHz CPU's with hyperthreading. Just for curiousity I disabled 
hyperthreading in the BIOS (Motherboard is Intel 7501WV2) and the problem 
still occurs.

Comment 4 Jason Sauve 2003-09-19 17:55:46 UTC

Just installed kernel-smp-2.4.21-2.EL from RHN. Problem is still unresolved. 
Unable to issue shutdown/halt/reboot successfully

Comment 5 Jason Sauve 2003-09-19 18:45:44 UTC

I found more info. The problem I'm encountering with RHEL 3.0 AS beta1,beta2 
is the same problem that was found with Redhat 7.2/7.3/8.0 as is well 
documented on Intel's website (see "Red Hat Linux* 8.0 segmentation fault with 
an IntelÂ® RAID
controller installed" within the document 
ftp://download.intel.com/support/motherboards/server/srczcr/tested_hwos.pdf)

Exerpt:
-------------------
When using the normal installation of Red Hat Linux* 8.0 with the 2.4.18-14
kernel and an Intel RAID controller installed, the following issue is seen:
4. A shutdown command results in a segmentation fault.
5. It is not possible to use some tools such as storcon.
6. Accessing the proc file system (via cat /proc/scsi/gdth/#, where â#â stands
for the controller number, also results in a segmentation fault. 

This issue occurs only when using Red Hat kernel version 2.4.18-14 installed
with SMP support, and it is not server board or RAID controller specific.
-------------------

Using Taroon's Beta 1 kernel, you get segmentation faults. With the Beta 2 
kernel you get a hang with it saying "Starting timer : 0 0".

These problems are only reproducable on the SMP version of Taroon's kernel, 
and by default it uses ext3 (w/journaling) as it draws similarities to RHBA-
2002-292. The Intel SE7501WV2 board has Dual Intel 1GB NIC's.

Hope this helps shed some light on the problem.

Comment 6 Jason Sauve 2003-09-24 17:51:58 UTC

Ok guys, I've made progress on this bug. I just downloaded kernel source 
2.4.22 from www.kernel.org and compiled using the redhat config file kernel-
2.4.21-i686-smp.config.

After compiling successfully with SMP, booting the system, and issuing a 
shutdown or reboot command all is well. The system shuts down as expected, and 
as well, performing 'cat /proc/scsi/gdth/0' works (output is attached).

It would seem that the gdth driver in the 2.4.1 kernel is flawed. Maybe it has 
something to do with "gdth register failure path" that is in the Changelog 
section "Summary of changes from v2.4.22-pre2 to v2.4.22-pre3"

It would seem that booting with the 2.4.22 kernel breaks XFree86 and 
hyperthreading amongst who knows what else. But I'd bet that AS 3.0 Beta 
wasn't made to work with that version kernel anyhow.

I would really appreciate it if someone could give an answer as to "if and 
when" a fix will be made accessible in RPM format from RHN. I don't want an 
unresolved bug to deter my purchasing a 5 year support license for AS 3.0 when 
it is released. We're very eager to install the final *working* product on our 
servers and are anxiously awaiting its release.

Comment 7 Doug Ledford 2003-10-02 12:00:59 UTC

I don't have hardware here to reproduce.  Can you get me the output of
alt-scroll_lock, shift-scroll_lock, and ctrl-scroll_lock after the Starting
timer: 1 1 message is displayed?

Reassigning to me since there is a good chance this might be iorl patch related.

Comment 10 Jason Sauve 2003-10-02 19:33:28 UTC

Created attachment 94894 [details]
Output of kernel sysrq

Comment 11 Arjan van de Ven 2003-10-06 07:01:25 UTC

*** Bug 106328 has been marked as a duplicate of this bug. ***

Comment 13 Jason Sauve 2003-10-20 14:20:09 UTC

How can I obtain taroon-rc to test whether or not this may have already been 
fixed? I've been dead in the water with no word for a couple weeks now on this 
one.

Comment 14 Larry Troan 2003-10-26 21:40:12 UTC

FROM ISSUE TRACKER
Event posted 10-20-2003 12:11pm by chunli with duration of 0.00        
Test with RHEL3 RC3 kernel 2.4.21-4 and got same failure.  OS hangs when try to
access the RAID controller.

Comment 16 Doug Ledford 2003-10-29 18:33:43 UTC

Can you boot this machine with the option nmi_watchdog=1 on the kernel command
line then attempt to reboot.  When it locks up it will eventually print out an
oops report that should give a traceback to see where we are spinning and
waiting (it sounds like somewhere we are trying to take a double lock, but I'm
not totally convinced of that, this should let me know).

Comment 17 Jason Sauve 2003-10-30 14:45:08 UTC

Created attachment 95602 [details]
Kernel boot log with nmi_watchdog=1

Comment 18 Jason Sauve 2003-10-30 14:47:18 UTC

Doug,

Booting with nmi_watchdog=1 resulted in a failure, so I've attached the kernel 
boot log (as you will see it fails on CPU0 when testing). When tried to access 
the SRCZCR controller /proc/scsi/gdth/0 it simply hung again but the kernel 
didn't OOPS. The kernel only OOPS'd on the original taroon-beta2 kernel. All 
latter ones it just hung in a dead-lock type state with the message "Starting 
timer: 0 0". One other thing I've noticed is that when running 'top' after a 
while it reports CPU timing problems, maybe this is the root cause of the SCSI 
problems. In the kernel boot log you will notice there are 4 CPUs. There are 
actually two CPUs but hyperthreading is enabled. Disabling hyperthreading in 
the BIOS still does not resolve the problem.

Let me know how I can be of further assistance. If you'd like to contact me 
directly by phone please let me know as it's not a problem since I know you 
guys dont have the same hardware in house.

Comment 19 Jason Sauve 2003-10-30 17:07:58 UTC

Good news!

After giving up on the possibility of nmi_watchdog=1 working, (after waiting 
about 5 minutes after initiating a shutdown), I left the machine alone for a 
couple hours, when I came back to it, to my surprise the watchdog information 
was there! I re-ran it again to get the results a second time and timed how 
long it took, approx 50 minutes for watchdog trace to show up on the console. 
Attached is your ouput!

Comment 20 Jason Sauve 2003-10-30 17:12:23 UTC

Created attachment 95607 [details]
kernel nmi_watchdog=1 log

Comment 21 Doug Ledford 2003-10-30 17:42:00 UTC

Yeah, I saw a patch get posted internally to fix the watchdog timeout problem (I
think if you use nmi_watchdog=2 that it might make the timeout happen in 30
seconds or so like it's supposed to).  Our next kernel release won't have that
problem.

Can you try booting the machine without the serial console enabled and see if it
still won't reboot?  (The netdump log is, umm, weird...I'm gonna have to
disassemble the module to see why it's showing up the way it is)

Comment 22 Jason Sauve 2003-10-30 19:04:25 UTC

Created attachment 95614 [details]
oops from kernel-2.4.21-1.1931.2.399.entsmp 

Doing a normal boot and shutdown without the serial console attached produces
the same bug as I originally detected without the aid of a serial console. I
tried it again just to be sure.

I decided to boot with the original taroon-beta2 kernel which OOPSes
immediately on shutdown (since this is the only known kernel to my knowledge
that will OOPS without the aid of nmi_watchdog kernel param, all the later
kernels just hang on shutdown/reboot or accessing /proc/scsi/gdth/0). 

Please let me know if this output is of better assistance to you. If you want,
I can provide ctrl+alt+shift scroll lock output from
kernel-2.4.21-1.1931.2.399.entsmp as well.

PS: I am guessing that a prior thread comment was a post from Chun Li at Intel
confirming that he was able to reproduce the error on the same or similar
hardware?

Comment 23 Doug Ledford 2003-10-30 19:25:20 UTC

Created attachment 95615 [details]
Proposed fix

I think this will actually solve the problem entirely.	If you could apply this
to kernel sources, test it, and let me know the results I would appreciate it.

Comment 24 Jason Sauve 2003-10-30 20:36:27 UTC

I'm estatic! I applied the patch to the kernel source (kernel-2.4.21-3.EL.smp)
and the problem is fixed! I will leave you to set the ticket to resolved and 
to add any additional comments. At the same time I'm actually quite surprised, 
this is a bug that seems to keep re-surfacing with each release of Redhat all 
the way back to 7.3 (Bugzilla #66867, #66867). 

Now all I would like to know is when it will be made available in RPM format 
for the RHEL AS/ES 3.0, we're ready to make a purchase. Is it possible that a 
fix can be provided for installation media? I don't know if this is something 
that would cause any amount of data-loss if I was to install with the non-
patched kernel and then apply the updated kernel RPM afterwards. 

I would appreciate it if someone could email me back directly regarding my 
questions. 

Thanks a bunch! Great work.

Comment 25 Jason Sauve 2003-10-30 20:48:02 UTC

PS: The "Starting timer: 0 0" message still appears when booting, shutting 
down and when issuing a /proc/scsi/gdth/0 command. Not sure what this is and 
whether or not it should be showing up. Doesn't seem to impact anything other 
than the console display.

FYI: The other time where this resurfaced was Bugzilla Bug #72207

Comment 26 Doug Ledford 2003-10-30 21:08:16 UTC

It will be available in RPM format with our first kernel update (it has
definitely made the cut off deadline for that).  As far as what to do between
now and when that comes out, you could install the system and rebuild the kernel
RPM with this patch added and use that until the next kernel is released.

For the most part all the flush routine does is make sure there is no latent
write data in the controller's RAM cache before powering down.  However, since
the cache will get written out eventually even without the flush, you are still
pretty safe.  The machine hangs on shutdown, no more writes will go to the
device, then all you have to do is let it sit for a few moments (enough time to
make sure the controller isn't writing to the disks any longer), then hit the
reset button and the data will have been written out and things should be fine.

The starting timer message is just informational and will be printed out any
time scsi_get_host_dev() is called (the gdth flush routine calls this function).

FWIW, the bugs that have happened have actually been different bugs, they've
just all been different bugs in scsi_get_host_dev and scsi_free_host_dev which
are both rarely used functions (only a couple drivers actually use them and the
core scsi layer doesn't use them at all) so bugs sometimes accidentally creep in.

Comment 28 Doug Ledford 2003-11-10 23:56:20 UTC

*** Bug 109639 has been marked as a duplicate of this bug. ***

Comment 29 Doug Ledford 2003-11-10 23:57:47 UTC

*** Bug 109652 has been marked as a duplicate of this bug. ***

Comment 30 Bruce Grove 2003-11-18 18:23:43 UTC

This problem is seen on the Sun Microsystems V60/65x's

Comment 31 Kai 'wusel' Siering 2003-11-19 00:02:19 UTC

I've noticed the same behaviour (hang on shutdown, crash on accessing  
/proc/scsi/gdth/0) with an GDT 6523RS.

When will this patch be included in an errata kernel?

Quoting RHBA-2003:308:

|Fixes
|amd64 has siginificant bug in 32 bit emulation

I'm running this (2.4.21-4.0.1.ELsmp), which is, according to "up2date
-u" it's the latest.

System info:

CPU0: Intel Pentium III (Coppermine) stepping 06
CPU1: Intel Pentium III (Coppermine) stepping 06
Total of 2 processors activated (3991.14 BogoMIPS).
scsi0 : GDT6523RS

Problem is still there, system becomes unusable on "cat
/proc/scsi/gdth/0".

Comment 32 rainer froemmel 2003-11-27 15:30:11 UTC

We have the same problem.

Could you please provide appropriate Information  for recompiling the 
AS30 Kernel. 

The AS30 support (we are paying for) is not aware of sending a 
"todo-list" how a AS30 Kernel is to be rebuild. 

Pls check 
Service Request Detail        

Service Request Number 266421 or have a short contact with

Steffen Mann

Comment 33 Doug Ledford 2003-12-01 13:31:53 UTC

The 4.0.1.EL kernel is an errata kernel.  There is a very significant
difference between an errata kernel and an update kernel.  We release
errata kernels whenever we have to for security reasons, and we only
release update kernels on an occasional basis.  The errata kernel in
question (4.0.1.EL) does not have the fix for this included (errata
kernels get the security updates only, not a bunch of other stuff, so
that we can get them through QA quicker).  The next update kernel
(which is getting ready to go into our QA phase) has the fix applied.
 It should be out soon.  In the meantime I'll try and build some
replacement scsi_mod.o modules and attach them to this bug report to
take care of the problem.  These modules will be built against a
4.0.1.EL kernel, so that's what you'll need to have installed to use them.

Comment 34 Need Real Name 2003-12-01 21:21:01 UTC

*** Bug 111201 has been marked as a duplicate of this bug. ***

Comment 35 Doug Ledford 2003-12-12 17:11:48 UTC

*** Bug 111887 has been marked as a duplicate of this bug. ***

Comment 36 rainer froemmel 2003-12-13 11:42:52 UTC

Please attach todo- list for AS 3.0 recompile of the kernel.

Comment 37 Kai 'wusel' Siering 2003-12-14 02:31:57 UTC

Refering to Comment #33:

Doug,

what is the "security update" of 4.0.1.EL? The errata states the
previous kernel "limited the amount of virtual address space [...] to
an unnecessary degree. [...] These updated kernel packages
significantly raise this address space limit." The fixing of the
do_brk() bug was not made a reason on releasing that errata kernel,
initially ...

So, following your argumentation, 4.0.1.EL should either not have been
issued then (x86-64-users would just have to wait for their fix as do
GDTH-users for theirs) or at least the x86-64 address space issue must
not have been included in it.

I don't want to start a discussion here (did that already with RH in
IT, Issue #29695); I just want to point out that, at least from my
point of view, your policy on issuing or not issuing an errata kernel
seems to biased on market share or something, but not on solving
production issues in general. From my point of view, that's a wrong
approach.
Having to wait for the next quarterly update to happen in order to get
this fixed -- or run unspupported kernels until then -- for me does
not match RHEL's claim of being an "Enterprise" level OS. That kind of
problems, until now, I only expected from volunteer-driven Projects.

YMMV,
-kai (forced to run UP or custom kernels)

Comment 38 cgadd 2003-12-19 00:33:45 UTC

Doug, 
Any word on the replacement scsi_mod.o module?

Comment 39 Andreas Aretz 2003-12-19 07:34:03 UTC

Doug,

how long do we have to wait for a kernel update fixing the GDTH 
problem ? 
This problem should be well known by RedHat , because it has been 
detected in some legacy dsitributions of RedHat too.
I think one could make a mistake, but he has to learn from it, so 
that this mistake shouldn't be done again :-))  !!
As long as this problem is not fixed , we can't sell any server 
togehter with RHEL3 and have to inform our customers not to use 
RHEL3 !!!

A patch-RPM, if it would be availble soon (!!) installing the correct 
scsi_mod.o module would be an acceptable solution until the update 
kernel is available !!

Rgds

Andreas

Comment 40 Doug Ledford 2003-12-23 17:56:50 UTC

Created attachment 96682 [details]
Compressed cpio archive of all the i686 modules needed to solve the problem

The attached cpio archive contains files that should solve your problems.  Here
are the directions for installing the files:

1. Save the file into /tmp
2. As root do:
   a.  cd /tmp
   b.  zcat modules.cpio.gz-i686 | cpio -ivd
   c.  cd /lib/modules
   d.  for i in 2.4.21-4.0.1*; do cp
/tmp/lib/modules/${i}/kernel/drivers/scsi/* ${i}/kernel/drivers/scsi; done
   e.  cd /boot
   f.  for i in initrd-2.4.21-4.0.1*; do VERSION=`basename $i .img`; mkinitrd
-v -f $i $VERSION; done

That's it.  Reboot into your normal 4.0.1 kernel and it should be working
properly now.

Note: I don't have this hardware in my home lab (which is where I am since I'm
already on Christmas vacation) so this is untested.  I manually verified that
the symbol versions didn't change, but I haven't actually booted these modules
up.  Saving a copy of your original initrd images under a different name and
adding a new line to your /etc/grub.conf file that boots the same kernel but
with the saved initrd images would be wise until these modules have been
verified.  Modules for the i686 UP, i686 SMP, i686 hugemem, and i386 BOOT
kernels are included in this package.  I didn't bother with Athlon modules
since I haven't seen anyone request a fix for this on an Athlon machine, nor on
any machines other than x86 (such as x86_64, ia64, or s390).

Comment 41 Need Real Name 2003-12-23 23:38:47 UTC

Hello all, and merry Christmas, hey Doug I tested your modules and 
they work great, at least for me, for my hardware details see bug 
#111201, I can reboot and shutdown our servers ok,  Doug this is the 
second time you save my a... on four years of RedHat use, in the past 
was with Intel 440GX boards, Thank's and keep playing with scsii 
hardware.
Ahh, just one doubt, when the new update kernel see the light, if I 
do a rpm update this is going to overwrite those files???

Bye all and Happy New Year

Comment 42 Doug Ledford 2003-12-24 03:13:57 UTC

The next kernel already has the same patch in it that I used here.  It
shouldn't have any problems at all.

Comment 43 rainer froemmel 2003-12-29 15:16:11 UTC

We tried to install the cpio archive without any success. That means 
the box is still hanging on reboot.
used hardware: dell sc1600 + intel srcu32 adapter

Comment 44 Andreas Aretz 2003-12-30 08:47:45 UTC

Doug,

the kernel version for which you compiled the scsi i686 modules is :
2.4.21-4.0.1* ( 2.4.21-4.0.1ELsmp for example ).

The kernel version you will get after installation of RHEL 3 AS
is 2.4.21-4.* ( 2.4.21-4.ELsmp for example ) !!!!

Due to this issue you can't use your precompiled modules for the 
generic RHEL 3 kernel versions !!!

Would you please compile the suitable ones and attach them to this 
bugzilla report ?

Thanks and best regards

Andreas

Comment 45 cgadd 2003-12-30 18:04:58 UTC

When I tried to run the "fix" commands, I got an error on step "F":

#for i in initrd-2.4.21-4.0.1*; do VERSION=`basename $i .img`; mkinitrd
-v -f $i $VERSION; done

/lib/modules/initrd-2.4.21-4.0.1.EL is not a directory
/lib/modules/initrd-2.4.21-4.0.1.ELsmp is not a directory

I manually ran mkinitrd like:

#mkinitrd -v -f initrd-2.4.21-4.0.1.ELsmp.img 2.4.21-4.0.1.ELsmp

and it worked ok.

I'll be rebooting the machine in a few hours, so I'll know then if the
fix works for me.

Comment 46 cgadd 2004-01-01 01:46:17 UTC

Finally got around to a reboot last night, no problems what so ever.

For the record, my hardware is a 7501WV2, 3ghz Xeons, 6 gigs ram,
SRCZCR raid with 6 120g drives.

Comment 47 Doug Ledford 2004-01-07 17:17:08 UTC

To Andreas Aretz:

The current released Red Hat kernel version is 2.4.21-4.0.1.EL. 
You'll need to update your system to the latest official Red Hat
kernel.  Once a kernel has been retired by a new kernel release we
don't keep making changes to it or for it.

Comment 48 Andreas Aretz 2004-01-08 10:07:59 UTC

Doug, 

got it !

But : Kernel update 2.4.21-4.0.1.EL is no longer available .

The current release is 2.4.21-4.0.2.EL .

Would you please compile the suitable patch for this release ?

Thanks and regards

Andreas

Comment 49 Kai 'wusel' Siering 2004-01-08 18:36:54 UTC

It must be a very bad joke -- this bug is still not fixed in the
SECOND errata kernel of RHEL v3 (RHSA-2003:416,
kernel-smp-2.4.21-4.0.2.EL), according to the ChangeLog. Thanks for
pissing me off again, Red Hat.

Comment 50 cgadd 2004-01-09 00:05:13 UTC

It was clearly stated above that it would be in the next UPDATE
kernel, not in the errata kernel.

The posted cpio archive does fix the problem.

Comment 51 Doug Ledford 2004-01-09 14:32:14 UTC

Created attachment 96856 [details]
Modules for the 2.4.21-4.0.2.EL kernel

This module archive is the same as the last one except compiled against the
2.4.21-4.0.2.EL kernel sources.  It should solve the problem on the current
kernel.  And, to answer the issue raised, the 4.0.2 kernel is a security errata
not an update kernel.  The security errata kernels are fast tracked through the
system in order to get them in the hands of users quicker.  In order to keep
the time needed to QA security errata kernels down, we are *not* allowed to
shove a bunch of non-security related patches into those kernels.  The next
update kernel, which contains this patch, will be released as soon as it passes
our rather rigorous update QA process.	You can also tell the difference
between security updates and regular updates by looking at the kernel version
number.  In this case, the initial kernel version was 2.4.21-4.EL.  The next
update kernel will be at least 2.4.21-5.EL.  Any point releases, such as 4.0.1
and 4.0.2, are built from the 4.EL base source code + just the minimal updates
for the errata.  If the kernel source has been updated with a full gamut of
patches, then it will have the major Red Hat release number incremented. 
Knowing that information, you can then gather that the first Red Hat kernel to
contain this patch will have a minimum number of 2.4.21-5.EL.

Comment 52 Kai 'wusel' Siering 2004-01-12 17:36:09 UTC

Doug,

thank you for your work on this matter; I do appreciate this, don't
get me wrong.

But I have to correct you in terms of the state of 2.4.21-4.0.1.EL --
than one came with RHBA-2003:308 and did only "accidently" fix a
security issue (do_brk()), the purpose of RHBA-2003:308 was to fix
x86_64-issues. Therefore my company as a paying customer, and as it
seems others as well, expect at least the next issued RHSA-kernel to
include any service-disrupting bug-fixes -- that's standard for RHEL
2.1 (see RHSA-2003:408) and should still be standard for RHEL 3.x.

The policy for RHEL 3.x, that is: to fix bugs only four times a year
with Quarterly Updates unless it's security releated, has to be
considered a major step backwards, and since Red Hat's providing
support only for Red Hat-backed kernels, it's actually voiding any
reason to go the RHEL route. What if QU1 fixes GDT and breaks QLC,
will I have to wait until QU2 to get a Red Hat supported kernel which
supports systems with GDT as well systems with QLC?

That's my concern, given that a fix for this GDT issue is known for
3 months now but STILL not available via RHN. Since rebooting SMP
systems with any GDT-SCSI-HBA to an errata (read: one with security
fixes) kernel is not possible (someone has to press RESET locally
somehow), not releasing the fix via RHN (i. e. via an errata kernel)
becomes a pain in the ass for at least some people.

Comment 53 Rich Knepper 2004-01-16 15:43:59 UTC

Any chance of release of the patch, so admins can update their own
kernels from kernel-source rpms when errata fixes come out?

Comment 54 Doug Ledford 2004-01-16 15:50:24 UTC

The patch is already in this bug report.  Just check the attached file
list.  As far as errata kernels go, barring some emergency security
update in the very near future, the next kernel update will already
have the fix in it.

Comment 55 Daren Grant 2004-02-02 22:25:58 UTC

I just finished loading EL 3.0 update 1 using kernel 2.4.21-9.ELsmp.
I'm seeing this type of a problem on my dell Pe 6400. So with this
update it doesn't appear fixed. or atleast all the way.
Any help?

Comment 56 Doug Ledford 2004-02-04 15:50:19 UTC

To Daren Grant (grant.csc.mil): 

The patch to solve the particular problem in this bug report is in
fact in the 2.4.21-9.EL kernel and I'm positive it solves the problem
it attempted to solve.  If you are still having problems, then you
need to open a different bug report with a description of your
hardware and exactly what the problem is, including any error messages
from the kernel, etc.

Comment 57 Ernie Petrides 2004-04-08 20:30:21 UTC

*** Bug 105032 has been marked as a duplicate of this bug. ***

Comment 58 Ernie Petrides 2004-12-03 01:41:42 UTC

An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-017.html