Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 702456

Summary:

kernel panic: invalid opcode: 0000 [#1] SMP ...shared_cpu_map

Product:

Red Hat Enterprise Linux 6

Reporter:

daryl herzmann <akrherz>

Component:

kernel

Assignee:

Prarit Bhargava <prarit>

Status:

CLOSED WORKSFORME

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.0

CC:

arozansk, charles_rose, jns, knoel, lee.drengenberg, linux-bugs, luke, pasteur, shardy, shiyer

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-04-23 12:47:24 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

846704

Attachments:

Description	Flags
crash log output	none
screen shot of console after crash	none

Description daryl herzmann 2011-05-05 17:27:29 UTC

Created attachment 497171 [details]
crash log output

I have a Dell PE R510 with fully updated RHEL 6.0 64bit running.  I've had the machine for a few months now, but am now hitting a kernel panic when running some scientific software.

I'm attaching the crash report, the highlights of the file:

invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 6 
Process swapper (pid: 0, threadinfo ffff88081e372000, task ffff88041e0ec080)
Stack:
 ffff88081e373ea8 0000000000000000 0000000000000001 0000000000000082
<0> ffff88081e373ea8 ffff88043e471ec8 000000004dc16f1e 0000000000000082
<0> 0000000000010518 ffff88043e471f80 00000bfd16d6e22c ffff88041e0ec080
Call Trace:
 [<ffffffff81096214>] ? hrtimer_start_range_ns+0x14/0x20
 [<ffffffff81011ece>] cpu_idle+0xee/0x110
 [<ffffffff814c2658>] start_secondary+0x1fc/0x23f

Linux xxx 2.6.32-71.24.1.el6.x86_64 #1 SMP Sat Mar 26 16:05:19 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

Thank you.

Comment 2 RHEL Program Management 2011-05-06 06:01:08 UTC

Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 Prarit Bhargava 2011-05-13 13:09:33 UTC

Hi Daryl,

Can you tell me what Scientific software you're running?

Thanks,

P.

Comment 4 daryl herzmann 2011-05-13 13:15:48 UTC

Hi Prarit,

I have yet to figure out if one particular application is to blame or not.  My users run a whole suite of them, including:

GEMPAK - http://www.unidata.ucar.edu/software/gempak/
NCL - http://ncl.ucar.edu
NCO - http://nco.sf.net

I realize I am not giving you much to go on here :(  I was hoping the crash log provided enough information to give some hints.

I went back to a 'more stable' for me kernel, 2.6.32-71.14.1.el6.x86_64 and have gone 3 days without a crash.


daryl

Comment 5 Prarit Bhargava 2011-05-13 13:36:55 UTC

Hey Daryl, no problem.  Just to be clear, the panic does NOT happen if those applications are disabled?

P.

Comment 6 daryl herzmann 2011-05-13 13:54:00 UTC

Hi Prarit,

The panic has happened in the past while the machine was running one or more of those applications listed.  So I have not seen a panic while the system was 'idle'.  I am not 100% sure though as when the panic occurs, crash dump runs and the machine reboots nicely, so I am not always noticing the panic.  I tried installing the 6.1 beta kernel and the panic occured as well, but crash dump froze up, so the machine hung and I had to power cycle it at that point.

daryl

Comment 7 Prarit Bhargava 2011-05-19 12:52:24 UTC

Daryl, please boot your system with the 'debug' option.  The next time the panic happens you should get additional output including the modules that are loaded on the system.

I'd like to know if any of the applications you use load a kernel module.

Thanks,

P.

Comment 8 daryl herzmann 2011-05-19 13:43:40 UTC

Hi Prarit,

Thanks for the continued help.  We just have an academic license, so we can't contact GSS about this...  Anyway, the applications are pure userland and load no kernel modules.  We have been experimenting some with the various suspected applications and we think we have it narrowed down to one of them (GEMPAK).  I have added your 'debug' suggestion to grub and will use it after the next crash.

daryl

Comment 9 daryl herzmann 2011-06-14 12:37:47 UTC

Hello,

Here is the output from a crash with kernel 2.6.32-131.2.1.el6.x86_64 and debug enabled:

crash> bt
PID: 0      TASK: ffff88081d8ceb00  CPU: 6   COMMAND: "swapper"
 #0 [ffff88041e14baf0] machine_kexec at ffffffff810310cb
 #1 [ffff88041e14bb50] crash_kexec at ffffffff810b6312
 #2 [ffff88041e14bc20] oops_end at ffffffff814de190
 #3 [ffff88041e14bc50] die at ffffffff8100f2eb
 #4 [ffff88041e14bc80] do_trap at ffffffff814dda84
 #5 [ffff88041e14bce0] do_invalid_op at ffffffff8100ceb5
 #6 [ffff88041e14bd80] invalid_op at ffffffff8100bf5b
    [exception RIP: subbuf_splice_actor+586]
    RIP: ffffffff810dceea  RSP: ffff88041e14be30  RFLAGS: 00010086
    RAX: 0000000000011240  RBX: ffff88043e470f88  RCX: 0000000000000016
    RDX: ffff88043e460000  RSI: ffff88041e14be90  RDI: ffff88043e470f40
    RBP: ffff88041e14be38   R8: 0000000000000001   R9: 0000000000000001
    R10: 0000f116d8456069  R11: 0000000000000001  R12: ffff88043e471040
    R13: ffff88041e14be90  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88041e14be40] lock_hrtimer_base at ffffffff81092491
 #8 [ffff88041e14be70] hrtimer_try_to_cancel at ffffffff810932b7
 #9 [ffff88041e14beb0] hrtimer_cancel at ffffffff81093382
#10 [ffff88041e14bed0] tick_nohz_restart_sched_tick at ffffffff8109e5a7
#11 [ffff88041e14bf00] cpu_idle at ffffffff81009eb9

Please let me know if you need other output from the vmcore.

thanks!
  daryl

Comment 10 daryl herzmann 2011-06-30 13:31:29 UTC

Hi,

Anything further I can provide to help this bug out?  We continue to see crashes with latest kernel.

thank you!
  daryl

Comment 11 Lee Drengenberg 2011-07-01 21:57:13 UTC

Just to add another data point, I just hit what appears to be the same problem.

My machine is a Dell R410 running RHEL 6.0 with no updates.  It has 4 500G SAS drives arranged in 2 RAID 1 arrays.  Those are in 1 LVM group.  I just recently created 2 striped logical volumes on that group so data would be split across the 2 RAID arrays for performance.  I believe its a "SAS 6/iR SAS internal RAID adapter for Hot Plug Configuration, PCI-Express"
24 GB RAM.  6 NICs (2 on-board and a card with 4 more).

This machine started with a minimal server install and hasn't had very much added on top of that.  The application I'm running performance tests on uses the FUSE library.  Our application written in C creates a file system and I'm hammering on it writing several hundred thousand files, in some cases with 8 threads from a Java test framework.

On the screen I got;

Message from syslogd@cfs25 at Jul  1 15:38:17 ...
 kernel:------------[ cut here ]------------

Message from syslogd@cfs25 at Jul  1 15:38:17 ...
 kernel:invalid opcode: 0000 [#1] SMP

Message from syslogd@cfs25 at Jul  1 15:38:17 ...
 kernel:last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0/host0/target0:1:2/0:1:2:0/block/sda/queue/scheduler


I don't see a crash dump file anywhere on the system.  I did capture the console screen image which I can add as an attachment.

Comment 12 Lee Drengenberg 2011-07-01 21:58:00 UTC

Created attachment 510945 [details]
screen shot of console after crash

Comment 13 daryl herzmann 2011-07-20 12:40:00 UTC

Hi,

Anything further I can provide to help this bug out?  We continue to see
crashes with latest kernel. 2.6.32-131.6.1.el6.x86_64

thank you!
  daryl

Comment 14 Prarit Bhargava 2011-08-08 17:15:52 UTC

Daryl and Lee, can you boot your systems with 'vga=791 debug' and remove 'quiet' from the kernel parameters?

Thanks,

P.

Comment 15 daryl herzmann 2011-08-08 17:28:54 UTC

Hi Prarit,

Didn't comment #9 have what you just requested?

daryl

Comment 16 Prarit Bhargava 2011-08-08 17:34:30 UTC

(In reply to comment #15)
> Hi Prarit,
> 
> Didn't comment #9 have what you just requested?
> 
> daryl

Oh geez ... I didn't see that. :(  Looking now ...

P.

Comment 17 RHEL Program Management 2011-10-07 15:34:13 UTC

Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 18 daryl herzmann 2011-10-11 18:47:03 UTC

Had another system (Dell T410) with a similar crash this weekend:

PID: 31225  TASK: ffff8801e4a700c0  CPU: 9   COMMAND: "python"
 #0 [ffff880288cdb960] machine_kexec at ffffffff810310cb
 #1 [ffff880288cdb9c0] crash_kexec at ffffffff810b6392
 #2 [ffff880288cdba90] oops_end at ffffffff814de670
 #3 [ffff880288cdbac0] die at ffffffff8100f2eb
 #4 [ffff880288cdbaf0] do_trap at ffffffff814ddf64
 #5 [ffff880288cdbb50] do_invalid_op at ffffffff8100ceb5
 #6 [ffff880288cdbbf0] invalid_op at ffffffff8100bf5b
    [exception RIP: migration_entry_wait+385]
    RIP: ffffffff8115f231  RSP: ffff880288cdbca8  RFLAGS: 00010246
    RAX: ffffea0000000000  RBX: ffffea0003a93570  RCX: ffff88010cd93768
    RDX: 000000000010bc62  RSI: ffff8801cf1624b8  RDI: 000000002178c43e
    RBP: ffff880288cdbcc8   R8: ffff8801cf1624b8   R9: 0000000000000008
    R10: 0000000000000000  R11: 0000000000726740  R12: ffffea0003acf838
    R13: 000000010cd93768  R14: 000000010cd93067  R15: 00002adfd2eedff8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #7 [ffff880288cdbcd0] handle_pte_fault at ffffffff811381d8
 #8 [ffff880288cdbdb0] handle_mm_fault at ffffffff811383b8
 #9 [ffff880288cdbe00] __do_page_fault at ffffffff810414e9
#10 [ffff880288cdbf20] do_page_fault at ffffffff814e067e
#11 [ffff880288cdbf50] page_fault at ffffffff814dda05
    RIP: 0000000000434aa3  RSP: 00007fff09fc5d50  RFLAGS: 00010287
    RAX: 00002adfd2dfa010  RBX: 00002adfcdde6840  RCX: 000000000000000c
    RDX: 000000000072e3e0  RSI: 000000000001e7fd  RDI: 000000000164c710
    RBP: 00000000038a71f8   R8: 00002adfcd677134   R9: 0000000000000009
    R10: 0000000000000000  R11: 0000000000726740  R12: 00002adfcdde6840
    R13: 000000000164c710  R14: 000000000001e7fd  R15: 0000000000000020
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b

kernel: 2.6.32-131.12.1

Comment 19 daryl herzmann 2011-11-01 12:52:31 UTC

Hi,  Just had another crash with the RHEL6.2 beta kernel installed:


PID: 24751  TASK: ffff8801d6b85500  CPU: 14  COMMAND: "python"
 #0 [ffff8801b58e31d0] machine_kexec at ffffffff81031e0b
 #1 [ffff8801b58e3230] crash_kexec at ffffffff810b8e92
 #2 [ffff8801b58e3300] oops_end at ffffffff814ef640
 #3 [ffff8801b58e3330] no_context at ffffffff810422db
 #4 [ffff8801b58e3380] __bad_area_nosemaphore at ffffffff81042565
 #5 [ffff8801b58e33d0] bad_area_nosemaphore at ffffffff81042633
 #6 [ffff8801b58e33e0] __do_page_fault at ffffffff81042ced
 #7 [ffff8801b58e3500] do_page_fault at ffffffff814f161e
 #8 [ffff8801b58e3530] page_fault at ffffffff814ee9d5
    [exception RIP: __br_deliver+97]
    RIP: ffffffffa037a7f1  RSP: ffff8801b58e35e0  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: ffff8801fdacc7c0  RCX: ffffc90013c52290
    RDX: ffff880329aab49c  RSI: ffff8801aa9d0da0  RDI: ffff880329aab49c
    RBP: ffff8801b58e3600   R8: ffff880329aab49c   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000004  R12: 0000000000000000
    R13: ffff8801fdacc7f8  R14: ffff88032afee8ce  R15: ffff8803290fe4c0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #9 [ffff8801b58e3608] br_deliver at ffffffffa037a8c5 [bridge]
#10 [ffff8801b58e3618] br_dev_xmit at ffffffffa03795b8 [bridge]
#11 [ffff8801b58e3648] dev_hard_start_xmit at ffffffff8142ba98
#12 [ffff8801b58e3698] dev_queue_xmit at ffffffff81430226
#13 [ffff8801b58e36e8] ip_finish_output at ffffffff8146730c
#14 [ffff8801b58e3728] ip_output at ffffffff81467598
#15 [ffff8801b58e3758] ip_local_out at ffffffff81466895
#16 [ffff8801b58e3778] ip_queue_xmit at ffffffff81466d70
#17 [ffff8801b58e3828] tcp_transmit_skb at ffffffff8147baae
#18 [ffff8801b58e3898] tcp_send_ack at ffffffff8147d3d9
#19 [ffff8801b58e38b8] tcp_cleanup_rbuf at ffffffff8146e106
#20 [ffff8801b58e38d8] tcp_read_sock at ffffffff814717e2
#21 [ffff8801b58e3938] xs_tcp_data_ready at ffffffffa0397022 [sunrpc]
#22 [ffff8801b58e3988] tcp_rcv_established at ffffffff81479df2
#23 [ffff8801b58e39e8] tcp_v4_do_rcv at ffffffff81481f93
#24 [ffff8801b58e3a88] tcp_v4_rcv at ffffffff81483781
#25 [ffff8801b58e3b08] ip_local_deliver_finish at ffffffff8146150d
#26 [ffff8801b58e3b38] ip_local_deliver at ffffffff81461798
#27 [ffff8801b58e3b68] ip_rcv_finish at ffffffff81460c5d
#28 [ffff8801b58e3ba8] ip_rcv at ffffffff814611e5
#29 [ffff8801b58e3be8] __netif_receive_skb at ffffffff8142b19b
#30 [ffff8801b58e3c48] netif_receive_skb at ffffffff8142d238
#31 [ffff8801b58e3c88] br_handle_frame_finish at ffffffffa037b6c8 [bridge]
#32 [ffff8801b58e3cd8] br_handle_frame at ffffffffa037b92a [bridge]
#33 [ffff8801b58e3d18] __netif_receive_skb at ffffffff8142b219
#34 [ffff8801b58e3d78] netif_receive_skb at ffffffff8142d238
#35 [ffff8801b58e3db8] napi_gro_complete at ffffffff8142d3e4
#36 [ffff8801b58e3de8] napi_gro_flush at ffffffff8142d85f
#37 [ffff8801b58e3e08] napi_complete at ffffffff8142d8a4
#38 [ffff8801b58e3e28] bnx2_poll_msix at ffffffffa01982f5 [bnx2]
#39 [ffff8801b58e3e68] net_rx_action at ffffffff8142fae3
#40 [ffff8801b58e3ec8] __do_softirq at ffffffff810720e1
#41 [ffff8801b58e3f38] call_softirq at ffffffff8100c20c
#42 [ffff8801b58e3f50] do_softirq at ffffffff8100de45
#43 [ffff8801b58e3f70] irq_exit at ffffffff81071ec5
#44 [ffff8801b58e3f80] do_IRQ at ffffffff814f3ea5
--- <IRQ stack> ---
#45 [ffff8801d9757f58] ret_from_intr at ffffffff8100ba13
    RIP: 00002b41f5746098  RSP: 00007fff5ac5d868  RFLAGS: 00000212
    RAX: 0000000000f1b560  RBX: 00000000019f63e0  RCX: 00000000000001b8
    RDX: 0000000000000000  RSI: 00000000000007d1  RDI: 0000000000000008
    RBP: ffffffff8100ba0e   R8: 000000000388ad70   R9: 0000000001c183c0
    R10: 0000000000000008  R11: 0000000000000fa0  R12: 00007fff5ac5dbc0
    R13: 00007fff5ac5de40  R14: 0000000000000002  R15: 0000000000000002
    ORIG_RAX: ffffffffffffff57  CS: 0033  SS: 002b

Comment 20 daryl herzmann 2012-02-10 17:00:26 UTC

Hi, this system crashes about once or twice a week under load.  Here's the most recent crash output from 2.6.32-220.4.1

BUG: unable to handle kernel paging request at 0000000000011200
IP: [<ffffffff810ef17a>] tracing_saved_cmdlines_read+0x4a/0x1d0
PGD 4243d0067 PUD 41d53d067 PMD 0 
Oops: 0002 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
CPU 7 
Modules linked in: nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ipmi_si mpt2sas scsi_transport_sas raid_class mptctl(U) mptbase(U) ipmi_devintf ipmi_msghandler dell_rbu nfsd autofs4 nfs fscache nfs_acl auth_rpcgss lockd sunrpc ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xfs exportfs power_meter ses enclosure sg bnx2 dcdbas microcode serio_raw iTCO_wdt iTCO_vendor_support i7core_edac edac_core ext4 mbcache jbd2 sd_mod crc_t10dif megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]

Pid: 0, comm: swapper Not tainted 2.6.32-220.4.1.el6.x86_64 #1 Dell Inc. PowerEdge R510/0DPRKF
RIP: 0010:[<ffffffff810ef17a>]  [<ffffffff810ef17a>] tracing_saved_cmdlines_read+0x4a/0x1d0
RSP: 0018:ffff880425c2fe30  EFLAGS: 00010086
RAX: 0000000000011200 RBX: ffff880028270f48 RCX: 0000000000000016
RDX: ffff880028260000 RSI: ffff880425c2fe90 RDI: ffff880028270f00
RBP: ffff880425c2fe38 R08: 0000000000000001 R09: 0000000000000001
R10: 0000e75241c898a9 R11: 0000000000000001 R12: ffff880028271000
R13: ffff880425c2fe90 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028260000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000011200 CR3: 0000000322962000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff880425c2e000, task ffff88082626c100)
Stack:
 0000000000000086 ffff880425c2fe68 ffffffff81094e41 ffff880425c2fea8
<0> ffff880028271000 00000000ffffffff 0000000000000007 ffff880425c2fea8
<0> ffffffff81095c87 0000000000000000 000000000000000d ffff880028271000
Call Trace:
 [<ffffffff81094e41>] lock_hrtimer_base+0x31/0x60
 [<ffffffff81095c87>] hrtimer_try_to_cancel+0x27/0xd0
 [<ffffffff81095d52>] hrtimer_cancel+0x22/0x30
 [<ffffffff810a1027>] tick_nohz_restart_sched_tick+0x107/0x190
 [<ffffffff81009e39>] cpu_idle+0xe9/0x110
 [<ffffffff814e5ebc>] start_secondary+0x202/0x245
Code: 00 00 48 89 55 c0 48 89 4d b8 48 89 df e8 df fb 06 00 48 89 df 49 89 c7 e8 34 e5 06 00 44 8b 05 f5 72 ad 00 45 85 c0 0f 85 d7 00 <00> 00 4d 85 ff 48 c7 c0 f4 ff ff ff 0f 84 b8 00 00 00 4c 8b 25 
RIP  [<ffffffff810ef17a>] tracing_saved_cmdlines_read+0x4a/0x1d0
 RSP <ffff880425c2fe30>
CR2: 0000000000011200

Comment 21 Prarit Bhargava 2012-02-13 14:18:08 UTC

Hi Daryl, thanks for the full panic trace.

Are you using *any* sort of kernel tracing while you're running?

Thanks,

P.

Comment 22 daryl herzmann 2012-02-13 14:23:08 UTC

Prarit,  Thanks, sorry I am ignorant as to what you are asking.  I would be happy to do whatever you suggest.

daryl

Comment 23 daryl herzmann 2012-04-11 13:59:57 UTC

Hi,  I have seen this crash with the latest kernel twice in the past day :( 
2.6.32-220.7.1

Comment 24 Prarit Bhargava 2012-04-12 12:55:05 UTC

Hi Daryl,

I'm turning my attention back to this.  A few things to do/try:

1.  Can you send me the output of sosreport on your system?

2.  Are you seeing any time-related failures on your system prior to the panic?

Thanks,

P.

Comment 25 daryl herzmann 2012-04-18 15:35:58 UTC

Hi,

1. I have emailed you with my SOS report.
2. If those messages would come to /var/log/messages, then no.

thanks!

Comment 28 daryl herzmann 2012-05-15 18:50:36 UTC

Hi Prarit,  Did you get my sosreport via email?

Comment 30 Suzanne Logcher 2012-05-18 20:48:17 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 31 daryl herzmann 2012-05-19 18:13:50 UTC

Suzanne,  We are an University and do not have access to that level of support.  Our only option is to open bugzilla tickets and hope for the kind help of engineers.

Comment 35 daryl herzmann 2012-06-25 23:31:54 UTC

2.6.32-279.el6.x86_64 just crashed,auto rebooted in a similar manner.

Comment 36 daryl herzmann 2012-07-11 12:07:30 UTC

I worked with Dell and ran their full hardware diagnostic tool.  No errors where found. :(

Comment 37 daryl herzmann 2012-08-10 11:54:12 UTC

Hi Prarit,  Did you get my sosreport via email?

Comment 38 daryl herzmann 2012-08-17 14:57:57 UTC

My current attempt to get a stable machine:

 * Fresh install of RHEL 6.3
 * Disabled hyperthreading  
 * Disabled CPU C-states

Uptime of 4 days without a crash/reboot, crosses fingers!

Comment 39 daryl herzmann 2012-08-31 19:09:47 UTC

At the risk of jinxing it, no kernel crashes in the past 18 days.  My suspicion is that disabling c-states did the trick.

Comment 40 Prarit Bhargava 2013-04-23 12:47:24 UTC

No updates in a while.  Please reopen if this is still a problem.

Comment 41 daryl herzmann 2013-04-23 12:51:44 UTC

For posterity, I believe this issue goes away when disabling c-states in the BIOS. I just had a RHEL6.4 machine do the following below and it did not have c-states disabled.  I disabled c-states for this machine and I have not seen it since.

<4>------------[ cut here ]------------
<2>kernel BUG at include/linux/swapops.h:126!
<4>invalid opcode: 0000 [#1] SMP 
<4>last sysfs file: /sys/kernel/mm/ksm/run
<4>CPU 7 
<4>Modules linked in: iptable_filter ip_tables nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xfs exportfs vhost_net macvtap macvlan tun kvm_intel kvm raid456 async_raid6_recov async_pq power_meter raid6_pq async_xor dcdbas xor microcode serio_raw async_memcpy async_tx iTCO_wdt iTCO_vendor_support i7core_edac edac_core sg bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix wmi mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
<4>
<4>Pid: 4581, comm: ssh Not tainted 2.6.32-358.2.1.el6.x86_64 #1 Dell Inc. PowerEdge T410/0Y2G6P
<4>RIP: 0010:[<ffffffff8116c501>]  [<ffffffff8116c501>] migration_entry_wait+0x181/0x190
<4>RSP: 0000:ffff8801c1703c88  EFLAGS: 00010246
<4>RAX: ffffea0000000000 RBX: ffffea0003bf6f58 RCX: ffff880236437580
<4>RDX: 00000000001121fd RSI: ffff8801c040e5d8 RDI: 000000002243fa3e
<4>RBP: ffff8801c1703ca8 R08: ffff8801c040e5d8 R09: 0000000000000029
<4>R10: ffff8801d6850200 R11: 00002ad7d96cbf5a R12: ffffea0007bdec18
<4>R13: 0000000236437580 R14: 0000000236437067 R15: 00002ad7d76b0000
<4>FS:  00002ad7dace2880(0000) GS:ffff880028260000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 00002ad7d76b0000 CR3: 00000001bb686000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process ssh (pid: 4581, threadinfo ffff8801c1702000, task ffff880261aa7500)
<4>Stack:
<4> ffff88024b5f22d8 0000000000000000 000000002243fa3e ffff8801c040e5d8
<4><d> ffff8801c1703d88 ffffffff811441b8 0000000000000000 ffff8801c1703d08
<4><d> ffff8801c1703eb8 ffff8801c1703dc8 ffff880328cb48c0 0000000000000040
<4>Call Trace:
<4> [<ffffffff811441b8>] handle_pte_fault+0xb48/0xb50
<4> [<ffffffff81437dbb>] ? sock_aio_write+0x19b/0x1c0
<4> [<ffffffff8112c6d4>] ? __pagevec_free+0x44/0x90
<4> [<ffffffff811443fa>] handle_mm_fault+0x23a/0x310
<4> [<ffffffff810474c9>] __do_page_fault+0x139/0x480
<4> [<ffffffff81194fb2>] ? vfs_ioctl+0x22/0xa0
<4> [<ffffffff811493a0>] ? unmap_region+0x110/0x130
<4> [<ffffffff81195154>] ? do_vfs_ioctl+0x84/0x580
<4> [<ffffffff8151339e>] do_page_fault+0x3e/0xa0
<4> [<ffffffff81510755>] page_fault+0x25/0x30
<4>Code: e8 f5 2f fc ff e9 59 ff ff ff 48 8d 53 08 85 c9 0f 84 44 ff ff ff 8d 71 01 48 63 c1 48 63 f6 f0 0f b1 32 39 c1 74 be 89 c1 eb e3 <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 
<1>RIP  [<ffffffff8116c501>] migration_entry_wait+0x181/0x190
<4> RSP <ffff8801c1703c88>