From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506) Description of problem: When sending a CTRL + C or SIGINT signal to a certain running application (GNU rrdtool software from Tobias), we receive a kernel panic, here is the oops: ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at locks:1798 invalid operand: 0000 [1] SMP CPU 1 Modules linked in: nfs lockd nfs_acl ipt_REJECT ipt_LOG ipt_limit iptable_nat ip_conntrack iptable_filter ip_tables md5 ipv6 autofs4 w83627hf w83781d i2c_sensor i2c_isa i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd shpchp e1000 floppy ext3 jbd ata_piix libata sd_mod scsi_mod Pid: 5690, comm: rrdtool Not tainted 2.6.9-55.ELsmp RIP: 0010:[<ffffffff8018f10c>] <ffffffff8018f10c>{locks_remove_flock+201} RSP: 0018:00000100602a5e48 EFLAGS: 00010246 RAX: 000001007d0c24c0 RBX: 000001005d7604d0 RCX: 0000000000000002 RDX: 0000000000000000 RSI: 000000000000007c RDI: ffffffff804ee400 RBP: 000001005d7603c0 R08: ffffffffa0157c28 R09: 0000000300000000 R10: ffffffffffffbef8 R11: 0000010060a7d6c0 R12: 0000010060a7d6c0 R13: 000001005f6fcb88 R14: 0000002a956a3080 R15: 0000000000000000 FS: 0000002a960f1a60(0000) GS:ffffffff804ed780(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000002a95580d86 CR3: 000000007e2a8000 CR4: 00000000000006e0 Process rrdtool (pid: 5690, threadinfo 00000100602a4000, task 000001007b81e7f0) Stack: 00000100602a5e78 ffffffffa020bb79 0000010060a7d6c0 0000010060a7d6c0 000001005d7604d0 ffffffff8018f039 0000000001345929 0000000046e59afe 0000000000000000 0000000046e59afc Call Trace:<ffffffffa020bb79>{:lockd:nlmclnt_locks_release_private+33} <ffffffff8018f039>{locks_remove_posix+374} <ffffffff8017a9f9>{__fput+73} <ffffffff801795f8>{filp_close+103} <ffffffff80179681>{sys_close+130} <ffffffff8011026a>{system_call+126} Code: 0f 0b 1d af 32 80 ff ff ff ff 06 07 48 89 c3 48 8b 03 eb ba RIP <ffffffff8018f10c>{locks_remove_flock+201} RSP <00000100602a5e48> <0>Kernel panic - not syncing: Oops Version-Release number of selected component (if applicable): 2.6.9-55.0.2.ELsmp How reproducible: Always Steps to Reproduce: 1. Start rrdtool update 2. Send CTRL + C to terminal 3. Grab kernel panic output from terminal Actual Results: Kernel panic, every time, contact me to reproduce. Expected Results: Kernel should not have crashed. Additional info: rrdtool version is 1.3 beta
I am not testing with rrdtool, but the Connectathon testsuite does something which sounds like this. It has a test which acquires a lock and is then signalled. This test passes and the system does not fail. How do I reproduce this situation, please?
Hi Peter, Could you compile and run rrdtool v3 beta, to reproduce? .....or I can also give you remote access to the machine temporarily. Up to you.
Where do I find rrdtool v3 beta, please? Have you tried this on different file systems?
Only ext3 You can download rrdtool here and compile, if you need help getting it running, let me know http://oss.oetiker.ch/rrdtool/pub/beta/rrdtool-1.3beta1.tar.gz
Wow. That has quite the complicated build process. I will work on it, but it will take a while. A simpler testcase might accelerate the work.
Sorry Peter, if you have any trouble with re-creation of this let me know. I will be happy to provide the commands I am using.
We hit this same panic when running sio (http://www.netapp.com/go/techontap/tot-march2006/0306tot_monthlytoolSIO.html) against a set of NetApp filers. The filers were configured in a cluster to takeover/giveback for the test. We have not tried to reproduce this problem with a single filer and plain reboot. The specific steps we used to reproduce the problem: - Takeover/Giveback the filer cluster every 3 minutes (We used FAS3050) - Setup and Enable clustering - Say A and B are the filers. All client traffic is targeted to B. - Let A takeover and giveback every 3 minutes, so that clients will be handled by both A & B. - Run the following SIO sequence on the clients repeatedly. ============================================ #!/usr/local/bin/bash FILE=$DIR/$(hostname) ITER=1 while [ true ] do echo $(date) : Iteration $ITER start touch $FILE /usr/local/test/bin/sio 0 0 8k 0 1m 0 4 $FILE -fillonce if [ $? -ne 0 ]; then echo $(date) : Fillonce failed fi /usr/local/test/bin/sio 66 100 8k 0 1m 300 4 $FILE if [ $? -ne 0 ]; then echo $(date) : sio-test failed fi rm -rf $FILE echo $(date) : Iteration $ITER end ITER=$(expr $ITER + 1) Done ============================================ - We used 14 RHEL4.4 clients and around 20 RHEL3.8 clients running the above sequence for 6 hrs. - We were able to get 2-4 RHEL4.4 clients to panic each time. Note that not all of the RHEL4.4 clients hit the panic.
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.