Bug 285021

Summary:	Kernel panic when sending SIGINT to GNU software rrdtool locks:1798
Product:	Red Hat Enterprise Linux 4	Reporter:	Bryan Heitman <bryanh>
Component:	kernel	Assignee:	Ric Wheeler <rwheeler>
Status:	CLOSED WONTFIX	QA Contact:
Severity:	high	Docs Contact:
Priority:	medium
Version:	4.5	CC:	dkwon, gvg, ricardo.labiaga
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
URL:	http://www.sqlpaste.com/?entry_id=105195
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-06-20 16:13:50 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Bryan Heitman 2007-09-10 19:45:38 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506)

Description of problem:
When sending a CTRL + C or SIGINT signal to a certain running application (GNU rrdtool software from Tobias), we receive a kernel panic, here is the oops:

----------- [cut here ] --------- [please bite here ] --------- 
Kernel BUG at locks:1798 
invalid operand: 0000 [1] SMP  
CPU 1  
Modules linked in: nfs lockd nfs_acl ipt_REJECT ipt_LOG ipt_limit iptable_nat ip_conntrack iptable_filter ip_tables md5 ipv6 autofs4 w83627hf w83781d i2c_sensor i2c_isa i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd shpchp e1000 floppy ext3 jbd ata_piix libata sd_mod scsi_mod 
Pid: 5690, comm: rrdtool Not tainted 2.6.9-55.ELsmp 
RIP: 0010:[<ffffffff8018f10c>] <ffffffff8018f10c>{locks_remove_flock+201} 
RSP: 0018:00000100602a5e48  EFLAGS: 00010246 
RAX: 000001007d0c24c0 RBX: 000001005d7604d0 RCX: 0000000000000002 
RDX: 0000000000000000 RSI: 000000000000007c RDI: ffffffff804ee400 
RBP: 000001005d7603c0 R08: ffffffffa0157c28 R09: 0000000300000000 
R10: ffffffffffffbef8 R11: 0000010060a7d6c0 R12: 0000010060a7d6c0 
R13: 000001005f6fcb88 R14: 0000002a956a3080 R15: 0000000000000000 
FS:  0000002a960f1a60(0000) GS:ffffffff804ed780(0000) knlGS:0000000000000000 
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
CR2: 0000002a95580d86 CR3: 000000007e2a8000 CR4: 00000000000006e0 
Process rrdtool (pid: 5690, threadinfo 00000100602a4000, task 000001007b81e7f0) 
Stack: 00000100602a5e78 ffffffffa020bb79 0000010060a7d6c0 0000010060a7d6c0  
       000001005d7604d0 ffffffff8018f039 0000000001345929 0000000046e59afe  
       0000000000000000 0000000046e59afc  
Call Trace:<ffffffffa020bb79>{:lockd:nlmclnt_locks_release_private+33}  
       <ffffffff8018f039>{locks_remove_posix+374} <ffffffff8017a9f9>{__fput+73}  
       <ffffffff801795f8>{filp_close+103} <ffffffff80179681>{sys_close+130}  
       <ffffffff8011026a>{system_call+126}  

Code: 0f 0b 1d af 32 80 ff ff ff ff 06 07 48 89 c3 48 8b 03 eb ba  
RIP <ffffffff8018f10c>{locks_remove_flock+201} RSP <00000100602a5e48> 
 <0>Kernel panic - not syncing: Oops 

Version-Release number of selected component (if applicable):
2.6.9-55.0.2.ELsmp

How reproducible:
Always


Steps to Reproduce:
1. Start rrdtool update
2. Send CTRL + C to terminal
3. Grab kernel panic output from terminal

Actual Results:
Kernel panic, every time, contact me to reproduce.

Expected Results:
Kernel should not have crashed.

Additional info:
rrdtool version is 1.3 beta

Comment 1 Peter Staubach 2007-09-13 20:42:47 UTC

I am not testing with rrdtool, but the Connectathon testsuite does
something which sounds like this.  It has a test which acquires a
lock and is then signalled.  This test passes and the system does
not fail.

How do I reproduce this situation, please?

Comment 2 Bryan Heitman 2007-09-13 21:14:46 UTC

Hi Peter,

Could you compile and run rrdtool v3 beta, to reproduce?  .....or I can also 
give you remote access to the machine temporarily.  Up to you.

Comment 3 Peter Staubach 2007-09-14 15:37:40 UTC

Where do I find rrdtool v3 beta, please?

Have you tried this on different file systems?

Comment 4 Bryan Heitman 2007-09-14 15:40:28 UTC

Only ext3

You can download rrdtool here and compile, if you need help getting it running, 
let me know
http://oss.oetiker.ch/rrdtool/pub/beta/rrdtool-1.3beta1.tar.gz

Comment 5 Peter Staubach 2007-09-14 15:53:56 UTC

Wow.  That has quite the complicated build process.

I will work on it, but it will take a while.  A simpler testcase
might accelerate the work.

Comment 6 Bryan Heitman 2007-09-14 16:11:02 UTC

Sorry Peter, if you have any trouble with re-creation of this let me know.  I 
will be happy to provide the commands I am using.

Comment 7 Ricardo Labiaga 2007-12-06 20:23:34 UTC

We hit this same panic when running sio
(http://www.netapp.com/go/techontap/tot-march2006/0306tot_monthlytoolSIO.html)
against a set of NetApp filers.  The filers were configured in a cluster to
takeover/giveback for the test.  We have not tried to reproduce this problem
with a single filer and plain reboot.

The specific steps we used to reproduce the problem:

- Takeover/Giveback the filer cluster every 3 minutes (We used FAS3050)
  - Setup and Enable clustering
  - Say A and B are the filers. All client traffic is targeted to B.
  - Let A takeover and giveback every 3 minutes, so that clients will be handled
by both A & B.

- Run the following SIO sequence on the clients repeatedly.
============================================
#!/usr/local/bin/bash
FILE=$DIR/$(hostname)

ITER=1

while [ true ]
do
        echo $(date) : Iteration $ITER start
        touch $FILE
        /usr/local/test/bin/sio 0 0 8k 0 1m 0 4 $FILE -fillonce
        if [ $? -ne 0 ]; then
                echo $(date) : Fillonce failed
        fi

        /usr/local/test/bin/sio 66 100 8k 0 1m 300 4 $FILE
        if [ $? -ne 0 ]; then
                echo $(date) : sio-test failed
        fi

        rm -rf $FILE

        echo $(date) : Iteration $ITER end
        ITER=$(expr $ITER + 1)
Done
============================================

- We used 14 RHEL4.4 clients and around 20 RHEL3.8 clients running the above
sequence for 6 hrs.
- We were able to get 2-4 RHEL4.4 clients to panic each time.

Note that not all of the RHEL4.4 clients hit the panic.

Comment 11 Jiri Pallich 2012-06-20 16:13:50 UTC

Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.