Bug 198350

Summary: AMD64 crash invalid opcode: 0000 [1] SMP
Product: [Fedora] Fedora Reporter: Rhys Compton <rhys.compton>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: pfrields, trevin, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: NeedsRetesting
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-11-20 23:48:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rhys Compton 2006-07-11 10:10:00 UTC
Description of problem:
getting kernel crash periodically.

Version-Release number of selected component (if applicable):
2.6.16-1.2111_FC5 #1 SMP Thu May 4 21:16:04 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux

  (although have experienced in a number of x86_64 kernels)



How reproducible:
periodically.  Unfortunately not on demand

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

The machine is a single CPU AMD64 the 1GB memory.  System still responds to
sysrq but nothing else.  sysrq 'e' and 'i' have no effect but 'b' does reboot.

The error stack trace is


----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at include/linux/list.h:167
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed
CPU 0
Modules linked in: ppp_deflate zlib_deflate iptable_filter ip_tables x_tables
netconsole netdump radeon drm lirc_serial(U) lirc_dev(U) nfsd exportfs lockd
nfs_acl autofs4 sunrpc nls_utf8 vfat fat dm_mirror dm_mod video button battery
ac ppp_async ppp_generic slhc crc_ccitt ipv6 ipaq snd_usb_audio snd_usb_lib
snd_rawmidi snd_hwdep usbserial lp parport_pc parport floppy nvram ohci_hcd
ehci_hcd bttv video_buf compat_ioctl32 i2c_algo_bit v4l2_common btcx_risc
ir_common tveeprom videodev snd_intel8x0 snd_ac97_codec snd_ac97_bus
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
snd_mixer_oss i2c_nforce2 i2c_core snd_pcm forcedeth snd_timer snd soundcore
snd_page_alloc ext3 jbd sata_nv libata sd_mod scsi_mod
Pid: 3, comm: ksoftirqd/0 Tainted: GF     2.6.16-1.2111_FC5 #1
RIP: 0010:[<ffffffff80179ff8>] <ffffffff80179ff8>{free_block+107}
RSP: 0018:ffffffff80473e78  EFLAGS: 00210006
RAX: 0000000000000000 RBX: ffff810007103e10 RCX: 000000000000003f
RDX: ffff81003aaea000 RSI: ffff810007103000 RDI: ffff810037ed0d40
RBP: ffff810037ed5040 R08: ffff810037ed7000 R09: 000000000000000a
R10: ffff81001dfdaa70 R11: ffffffff80339af6 R12: ffff810037ed6dc0
R13: 0000000000000033 R14: 0000000000000000 R15: 000000000000003c
FS:  00002aaaad062380(0000) GS:ffffffff8050c000(0000) knlGS:00000000f7b416b0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000087dc30 CR3: 000000001f2a5000 CR4: 00000000000006e0
Process ksoftirqd/0 (pid: 3, threadinfo ffff810037eb4000, task ffff810037e917e0)
Stack: 0000000000000000 ffff810037ed0d40 000000000000003c ffff810037ed6c00
       0000000000000000 ffff810037ed5040 ffff810037ed0d90 ffffffff80179d5f
       0000000000200096 ffff810037ed6c00
Call Trace: <IRQ> <ffffffff80179d5f>{cache_flusharray+125}
       <ffffffff80179ec4>{kmem_cache_free+304}
<ffffffff80143879>{__rcu_process_callbacks+304}
       <ffffffff80143918>{rcu_process_callbacks+35}
<ffffffff80137b15>{tasklet_action+98}
       <ffffffff801378ad>{__do_softirq+85} <ffffffff801379f4>{ksoftirqd+0}
       <ffffffff8010bc02>{call_softirq+30} <EOI> <ffffffff8010cb68>{do_softirq+44}
       <ffffffff80137a53>{ksoftirqd+95} <ffffffff80145da2>{kthread+254}
       <ffffffff801379f4>{ksoftirqd+0} <ffffffff8010b8b2>{child_rip+8}
       <ffffffff801379f4>{ksoftirqd+0} <ffffffff80145ca4>{kthread+0}
       <ffffffff8010b8aa>{child_rip+0}

Code: 0f 0b 68 8a 39 36 80 c2 a7 00 48 8b 06 48 39 70 08 74 0a 0f
RIP <ffffffff80179ff8>{free_block+107} RSP <ffffffff80473e78>




/proc/cpuinfo (for processor details)
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 47
model name      : AMD Athlon(tm) 64 Processor 3200+
stepping        : 2
cpu MHz         : 1000.000
cache size      : 512 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow
pni lahf_lm
bogomips        : 2202.65
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

Comment 1 Dave Jones 2006-07-29 06:44:51 UTC
Please try and reproduce this on the latest errata kernels which should give
extra diagnostics when this BUG() is hit.

You also seem to have insmod -f'd something.  (I'm assuming those lirc drivers?)
It may also be worth a shot to see if it still happens without those.



Comment 2 Dave Jones 2006-10-16 18:59:36 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 3 Trevin Beattie 2006-11-15 05:06:01 UTC
I've been seeing the same problem on my FC6 system, running kernel
2.6.18-1.2798.fc6.  My computer is a dual-Opteron 270HE, so luckily these kernel
crashes don't completely bring down the system, but I get Gnome panel applets
randomly crashing to the point where I eventually need to reboot.  (Don't know
if that's actually related, but it could be.)

I thought that the problem might be caused by one of the tainted 3rd-party
modules, but I've just seen the crash occur in a VMware guest which didn't have
any extra kernel modules installed.  Unfortunately that vm only has a single CPU
allocated, so it's locked up hard (though I'm surprised it was able to spit out
"fc6-x64 kernel: invalid opcode: 0000 [1] SMP" in a gnome-terminal window).

This is the most recent entry from my (host) system log:

Nov 12 20:01:07 hydra kernel: list_add corruption. prev->next should be ffff8101
4fd594a8, but was ffff81014f5e7480
Nov 12 20:01:07 hydra kernel: ----------- [cut here ] --------- [please bite her
e ] ---------
Nov 12 20:01:07 hydra kernel: Kernel BUG at lib/list_debug.c:31
Nov 12 20:01:07 hydra kernel: invalid opcode: 0000 [1] SMP 
Nov 12 20:01:07 hydra kernel: last sysfs file: /class/net/vmnet1/statistics/coll
isions
Nov 12 20:01:07 hydra kernel: CPU 1 
Nov 12 20:01:07 hydra kernel: Modules linked in: fglrx(U) nfsd exportfs lockd nf
s_acl autofs4 smsc47b397 hwmon eeprom i2c_isa hidp rfcomm l2cap bluetooth vmnet(
U) vmmon(U) sunrpc ipv6 ip_conntrack_netbios_ns xt_tcpudp xt_state ip_conntrack 
nfnetlink ipt_LOG ipt_REJECT iptable_filter ip_tables x_tables cpufreq_ondemand 
dm_mirror dm_mod video sbs i2c_ec button battery asus_acpi ac parport_pc lp parp
ort snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10
k1 k8_edac snd_rawmidi snd_ac97_codec snd_ac97_bus ide_cd forcedeth snd_seq_dumm
y edac_mc shpchp floppy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_m
ixer_oss i2c_nforce2 i2c_core snd_pcm sg ohci_hcd ehci_hcd snd_seq_device snd_ti
mer snd_page_alloc snd_util_mem snd_hwdep snd soundcore cdrom serio_raw emu10k1_
gp gameport pcspkr ext3 jbd sata_nv libata sd_mod scsi_mod
Nov 12 20:01:07 hydra kernel: Pid: 2550, comm: pcscd Tainted: P      2.6.18-1.27
98.fc6 #1
Nov 12 20:01:07 hydra kernel: RIP: 0010:[<ffffffff8033eb5c>]  [<ffffffff8033eb5c
>] __list_add+0x48/0x68
Nov 12 20:01:07 hydra kernel: RSP: 0018:ffff810076345dd8  EFLAGS: 00010286
Nov 12 20:01:07 hydra kernel: RAX: 0000000000000058 RBX: ffff81014fd594a8 RCX: f
fffffff80556eb8
Nov 12 20:01:07 hydra kernel: RDX: 0000000000000000 RSI: 0000000000000096 RDI: f
fffffff80556ea0
Nov 12 20:01:07 hydra kernel: RBP: ffff81014f5e7480 R08: ffffffff80556eb8 R09: 0
000000000007015
Nov 12 20:01:07 hydra kernel: R10: ffff810076345a78 R11: 0000000000000080 R12: f
fff810071c81e80
Nov 12 20:01:07 hydra kernel: R13: ffff81007f261570 R14: ffff810065d6ad40 R15: f
fff81000375f6d0
Nov 12 20:01:07 hydra kernel: FS:  0000000040a00940(0063) GS:ffff8100036571c0(00
00) knlGS:00000000f79a76d0
Nov 12 20:01:07 hydra kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 12 20:01:07 hydra kernel: CR2: 00002aaaaaac6000 CR3: 0000000075c78000 CR4: 0
0000000000006e0
Nov 12 20:01:07 hydra kernel: Process pcscd (pid: 2550, threadinfo ffff810076344
000, task ffff81000379d810)
Nov 12 20:01:07 hydra kernel: Stack:  0000000000000000 ffff810071c81e80 00000000
00000081 ffffffff803dabed
Nov 12 20:01:07 hydra kernel:  0000000000000000 ffffffff8058c520 ffff81007f26157
0 ffff810065d6ad40
Nov 12 20:01:07 hydra kernel:  ffff810003668880 ffffffff80247e54 ffff810065d6ad4
0 ffff81014ffae128
Nov 12 20:01:07 hydra kernel: Call Trace:
Nov 12 20:01:07 hydra kernel:  [<ffffffff803dabed>] usbdev_open+0x1db/0x206
Nov 12 20:01:07 hydra kernel:  [<ffffffff80247e54>] chrdev_open+0x149/0x198
Nov 12 20:01:07 hydra kernel:  [<ffffffff8021e29f>] __dentry_open+0xd9/0x1e2
Nov 12 20:01:07 hydra kernel:  [<ffffffff8022744c>] do_filp_open+0x2a/0x38
Nov 12 20:01:07 hydra kernel:  [<ffffffff80219499>] do_sys_open+0x44/0xbe
Nov 12 20:01:07 hydra kernel:  [<ffffffff8025c00e>] system_call+0x7e/0x83
Nov 12 20:01:07 hydra kernel: DWARF2 unwinder stuck at system_call+0x7e/0x83
Nov 12 20:01:07 hydra kernel: Leftover inexact backtrace:
Nov 12 20:01:07 hydra kernel: 
Nov 12 20:01:07 hydra kernel: 
Nov 12 20:01:07 hydra kernel: Code: 0f 0b 68 d7 f9 48 80 c2 1f 00 4c 89 63 08 49
 89 1c 24 4c 89 
Nov 12 20:01:07 hydra kernel: RIP  [<ffffffff8033eb5c>] __list_add+0x48/0x68
Nov 12 20:01:07 hydra kernel:  RSP <ffff810076345dd8>
Nov 13 01:12:37 hydra named[2285]: unexpected RCODE (SERVFAIL) resolving 'static
.technorati.com/AAAA/IN': 208.66.64.37#53
Nov 13 03:55:08 hydra kernel:  BUG: warning at kernel/cpu.c:56/unlock_cpu_hotplu
g() (Tainted: P     )
Nov 13 03:55:08 hydra kernel: 
Nov 13 03:55:08 hydra kernel: Call Trace:
Nov 13 03:55:08 hydra kernel:  [<ffffffff8026929b>] show_trace+0x34/0x47
Nov 13 03:55:08 hydra kernel:  [<ffffffff802692c0>] dump_stack+0x12/0x17
Nov 13 03:55:08 hydra kernel:  [<ffffffff802a044f>] unlock_cpu_hotplug+0x47/0x74
Nov 13 03:55:08 hydra kernel:  [<ffffffff884082aa>] :cpufreq_ondemand:do_dbs_tim
er+0x11c/0x174
Nov 13 03:55:08 hydra kernel:  [<ffffffff8024bf5f>] run_workqueue+0x9a/0xed
Nov 13 03:55:08 hydra kernel:  [<ffffffff8024893a>] worker_thread+0xf0/0x122
Nov 13 03:55:08 hydra kernel:  [<ffffffff80232843>] kthread+0xf6/0x12a
Nov 13 03:55:08 hydra kernel:  [<ffffffff8025cea5>] child_rip+0xa/0x11
Nov 13 03:55:08 hydra kernel: DWARF2 unwinder stuck at child_rip+0xa/0x11
Nov 13 03:55:08 hydra kernel: Leftover inexact backtrace:
Nov 13 03:55:08 hydra kernel:  [<ffffffff8029b98a>] keventd_create_kthread+0x0/0
x66
Nov 13 03:55:08 hydra kernel:  [<ffffffff8023274d>] kthread+0x0/0x12a
Nov 13 03:55:08 hydra kernel:  [<ffffffff8025ce9b>] child_rip+0x0/0x11
Nov 13 03:55:08 hydra kernel: 

Does the 2.6.18-1.2849.fc6 kernel address this issue?  If so, I'll give it a shot.


Comment 4 Dave Jones 2006-11-20 23:48:30 UTC
that binary module you loaded could be doing *anything*, and I've definitly seen
kernel memory corruption from it before.   It also bears no resemblance to the
original bug reported here.

Closing due to inactivity from original reporter.