Bug 453507

Summary: kernel panic with kernel version 2.6.9-67.0.20.EL
Product: Red Hat Enterprise Linux 4 Reporter: Jimmy Cho <jcho>
Component: kernelAssignee: Vitaly Mayatskikh <vmayatsk>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 4.8CC: ajadhav, bernhard.furtmueller, duck, eric.eisenhart, gergnz, herrold, jan.iven, jburke, kajtzu, k.georgiou, linux, me, mishu, mmatsuya, mvaliyav, pasteur, pgervase, phaleintx, pzijlstr, qcai, rainer.traut, sputhenp, tao, tizod, vgoyal
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 19:26:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 455072, 455074, 461297    
Attachments:
Description Flags
extracted kernel log entries in /var/log/message
none
kernel panic log
none
lock sighand->siglock in release_task()
none
reproducer
none
simpler version of patch none

Description Jimmy Cho 2008-07-01 04:16:43 UTC
Description of problem:

Kernel update  to  version 2.6.9-67.0.20.EL.   Kernel  panics twice and the
system had to be rebooted each time.  Have reverted back to 2.6.9-67.0.15.EL

Kernel updates on Jun 27 17:00.  First kernel panic on  June 29 2:12,  second
kernel panic on June 30  14:59

Have attached relevant log entries.


Version-Release number of selected component (if applicable):

2.6.9-67.0.20.EL

How reproducible:

Kernel update  to  version 2.6.9-67.0.20.EL

Steps to Reproduce:
1. Just running the updated kernel
2.
3.
  
Actual results:

Kernel Panic  Twice

Expected results:


Additional info:

Comment 1 Jimmy Cho 2008-07-01 04:16:43 UTC
Created attachment 310639 [details]
extracted kernel log  entries  in   /var/log/message

Comment 3 Con Tassios 2008-07-08 23:50:12 UTC
Created attachment 311326 [details]
kernel panic log

Same problem experienced 4 days after installing 2.6.9-67.0.20.ELsmp kernel on
a system that previously had no stability issues.

Comment 4 Phil Hale 2008-07-09 05:04:14 UTC
seeing a similar issue on two of my 4.6 boxes running MailScanner as MX filters.

Comment 5 glshank 2008-07-10 14:22:39 UTC
Same here. It panics during heavy IO via MySQL.

Comment 6 Ivan Vecera 2008-07-10 15:34:13 UTC
The problem is in next_thread function, I'm assigning this issue to Vitaly.

Comment 7 Vitaly Mayatskikh 2008-07-10 16:14:15 UTC
This is a race between release_task() and sys_times()->next_thread()

Comment 14 Vitaly Mayatskikh 2008-07-11 23:03:31 UTC
Created attachment 311625 [details]
lock sighand->siglock in release_task()

This patch fixes the problem

Comment 15 Vitaly Mayatskikh 2008-07-11 23:04:07 UTC
Created attachment 311626 [details]
reproducer

Comment 16 Vitaly Mayatskikh 2008-07-14 08:25:07 UTC
Created attachment 311684 [details]
simpler version of patch

Unhash process with locked sighahd->siglock.

Comment 17 Tom Sightler 2008-07-14 13:42:11 UTC
We had two of these this weekend.  Previously stable systems were upgraded to
the latest kernel and two locked last night, not even making 24 hours.  In my
opinion this bug should be urgent.


Comment 18 Prarit Bhargava 2008-07-15 12:56:20 UTC
*** Bug 455274 has been marked as a duplicate of this bug. ***

Comment 20 Linda Wang 2008-07-17 20:44:37 UTC
This is to backout the patch in 4.8. 

Comment 21 RHEL Program Management 2008-07-17 21:01:08 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 23 Vivek Goyal 2008-07-22 20:33:11 UTC
Committed in 78.1.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

For the time being orignal two sys_times patches ( bz 435280)  have been
reverted back to solve the issue.

Following are the reverted commits.

f9c0ff860ebf6aa16fbd3bfaabd77d72c267d449
47a5118b6f0f22add09c28c10e09f93034a9b8d9



Comment 27 Phil Randal 2008-07-29 08:54:32 UTC
We've just had an RHEL 4.7 box crash with this one.

Is there any timescale for the fix to be released through the normal channels?

Comment 28 Prarit Bhargava 2008-08-06 12:57:06 UTC
*** Bug 456997 has been marked as a duplicate of this bug. ***

Comment 29 Prarit Bhargava 2008-08-06 12:57:16 UTC
*** Bug 456993 has been marked as a duplicate of this bug. ***

Comment 31 RHEL Program Management 2008-09-03 13:02:19 UTC
Updating PM score.

Comment 36 errata-xmlrpc 2009-05-18 19:26:01 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html