Bug 1330878 - qemu-kvm over glusterfs over ext4 kills the host (OOPs within mm/slab.c)
Summary: qemu-kvm over glusterfs over ext4 kills the host (OOPs within mm/slab.c)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 23
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-27 08:27 UTC by Gilboa Davara
Modified: 2016-05-03 05:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-03 05:58:37 UTC
Type: Bug


Attachments (Terms of Use)

Description Gilboa Davara 2016-04-27 08:27:32 UTC
Description of problem:
Running a cluster of the F23 VM hosts using a shared glusterfs storage.
Trying to install a new Fedora 23 guest (every time) and/or starting an existing F23 guest (rare event) kills the host with OOPs.
We've managed to trigger the same OOPs on 3 different members of this cluster, all with different hardware configurations (Dual socket Xeon E5-26XXV3 / 64GB, dual Xeon E5-26XXV2 / 96GB, quad Xeon E5-46XXV2 / 256GB)

Version-Release number of selected component (if applicable):
Kernel 4.4.6 (Hosts).

How reproducible:
Always.

Steps to Reproduce:
1. Start installation of VM guest of glusterfs based shared storage.
2. Partition the guest drives.
3. OOPs.


Additional info:

OOPs trace:
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: kernel BUG at mm/slub.c:3627!
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: invalid opcode: 0000 [#2] SMP 
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: Modules linked in: tun fuse bridge stp llc bonding cfg80211 rfkill nf_log_ipv4 nf_log_common xt_LOG xt_conntrack ip6table_security ip6table_raw ip6table_mangle ip6table_filter ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables iptable_security iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel iTCO_vendor_support joydev hpwdt ipmi_ssif hpilo sb_edac edac_core shpchp ioatdma ipmi_si ipmi_msghandler lpc_ich i2c_i801 tpm_tis pcc_cpufreq tpm wmi acpi_power_meter acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc mgag200 ixgbe drm_kms_helper ttm drm mdio vxlan igb ip6_udp_tunnel
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: udp_tunnel hpsa ptp pps_core dca scsi_transport_sas i2c_algo_bit fjes
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: CPU: 4 PID: 9086 Comm: rpm Tainted: G      D         4.4.6-301.fc23.x86_64 #1
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: Hardware name: HP ProLiant DL160 Gen9, BIOS U20 07/20/2015
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: task: ffff88096a7e8000 ti: ffff880d27688000 task.ti: ffff880d27688000
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: RIP: 0010:[<ffffffff8120ddb3>]  [<ffffffff8120ddb3>] kfree+0x153/0x160
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: RSP: 0018:ffff880d2768bdf8  EFLAGS: 00010246
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: RAX: ffffea00013e92a0 RBX: ffff88004fa4bd80 RCX: ffffea00216883c0
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: RDX: 0000000000000000 RSI: 000000000001a260 RDI: ffffea00013e92c0
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: RBP: ffff880d2768be10 R08: 0000000000000001 R09: 00000001802a0029
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: R10: ffffea00013e92c0 R11: ffff880f230e8510 R12: 0000000000000000
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: R13: ffffffff812b55db R14: ffff880c4dcc5840 R15: ffff88084fa4b960
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: FS:  00007ff9059d4700(0000) GS:ffff88107fc40000(0000) knlGS:0000000000000000
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: CR2: 0000564796250828 CR3: 0000000b95c30000 CR4: 00000000001426e0
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: Stack:
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: ffff88004fa4bd80 0000000000000000 0000000000000000 ffff880d2768be48
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: ffffffff812b55db ffff880c4dcc5840 0000000040000010 ffff880b8535c4c0
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: ffff881059928020 ffff880b852f8f00 ffff880d2768be60 ffffffff812b561e
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: Call Trace:
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: [<ffffffff812b55db>] free_rb_tree_fname+0x4b/0x70
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: [<ffffffff812b561e>] ext4_release_dir+0x1e/0x30
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: [<ffffffff8122fc3c>] __fput+0xdc/0x1e0
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: [<ffffffff8122fd7e>] ____fput+0xe/0x10
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: [<ffffffff810c0b13>] task_work_run+0x73/0x90
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: [<ffffffff81003242>] exit_to_usermode_loop+0xc2/0xd0
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: [<ffffffff81003d31>] syscall_return_slowpath+0xa1/0xb0
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: [<ffffffff817a070c>] int_ret_from_sys_call+0x25/0x8f
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: Code: d7 e8 42 74 fa ff eb 8c 41 b8 01 00 00 00 48 89 d9 48 89 da 4c 89 d6 e8 6c fc ff ff e9 73 ff ff ff 0f 0b 49 8b 42 20 a8 01 75 c5 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 
Apr 26 16:39:10 office-wx-hv1-dl160 kernel: RIP  [<ffffffff8120ddb3>] kfree+0x153/0x160

qemu command line:
/usr/bin/qemu-system-x86_64
 -machineaccel=kvm
 -serial telnet::9010,server,nowait
 -net nic,vlan=1,macaddr=XXX,model=rtl8139
 -net tap,vlan=1,ifname=tap1000
 -net nic,vlan=2,macaddr=XXX,model=virtio
 -net tap,vlan=2,ifname=tap1001
 -net nic,vlan=3,macaddr=XXX,model=virtio
 -net tap,vlan=3,ifname=tap1002
 -net nic,vlan=4,macaddr=XXX,model=virtio
 -net tap,vlan=4,ifname=tap1003
 -net nic,vlan=5,macaddr=XXX,model=virtio
 -net tap,vlan=5,ifname=tap1004
 -net nic,vlan=6,macaddr=XX,model=virtio
 -net tap,vlan=6,ifname=tap1005
 -m32768
 -name office-wx-vmprobe
 -drivefile=/usr/drives/kvm/gv2/office-wx-vmprobe/office-wx-vmprobe_1.img,if=scsi,cache=none,format=raw
 -drivefile=/usr/drives/kvm/gv2/office-wx-vmprobe/office-wx-vmprobe_2.img,if=scsi,cache=none,format=raw
 -smp 8,cores=8
 -usb -usbdevice tablet
 -vga qxl
 -spiceport=5910,disable-ticketing
 -device virtio-serial
 -chardev spicevmc,id=vdagent,debug=0,name=vdagent
 -device virtserialport,chardev=vdagent,name=com.redhat.spice.0
 -boot c

Comment 1 Gilboa Davara 2016-04-27 08:42:05 UTC
We plan to move one of the machines glusterfs FS to xfs, to see if this somehow triggers a bug within ext4 (as opposed to KVM).

Comment 2 Gilboa Davara 2016-04-28 11:39:39 UTC
Switching to xfs doesn't change anything.

*Both* xfs and ext4 are crashing now.


Apr 28 14:23:29 office-wx-hv1-dl160 kernel: IP: [<ffffffff8120e540>] __kmalloc+0xa0/0x260
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: PGD 2051067 PUD 107ffff067 PMD 0 
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: Oops: 0000 [#1] SMP 
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: Modules linked in: tun fuse bridge stp llc bonding cfg80211 rfkill nf_log_ipv4 nf_log_common xt_LOG xt_conntrack ip6table_security ip6table_raw ip6table_mangle ip6table_filter ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables iptable_security iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack xfs libcrc32c vfat fat intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm iTCO_wdt irqbypass crct10dif_pclmul iTCO_vendor_support crc32_pclmul crc32c_intel hpwdt hpilo joydev sb_edac lpc_ich edac_core wmi shpchp i2c_i801 ipmi_ssif tpm_tis tpm ipmi_si ioatdma ipmi_msghandler acpi_power_meter pcc_cpufreq acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc sunrpc ixgbe mgag200 drm_kms_helper ttm mdio vxlan
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: drm ip6_udp_tunnel igb udp_tunnel hpsa ptp pps_core dca scsi_transport_sas i2c_algo_bit fjes
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: CPU: 10 PID: 8621 Comm: glusterfsd Not tainted 4.4.6-301.fc23.x86_64 #1
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: Hardware name: HP ProLiant DL160 Gen9, BIOS U20 07/20/2015
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: task: ffff88104d0c3c00 ti: ffff880902dfc000 task.ti: ffff880902dfc000
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: RIP: 0010:[<ffffffff8120e540>]  [<ffffffff8120e540>] __kmalloc+0xa0/0x260
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: RSP: 0018:ffff880902dffce0  EFLAGS: 00010246
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: RAX: ffff88007d781600 RBX: 0000000002400240 RCX: 000000000000007b
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: RDX: 0000000000004e50 RSI: 0000000000000000 RDI: 000000000001a260
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: RBP: ffff880902dffd18 R08: ffff88107fd1a260 R09: ffff88085f8037c0
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: R10: ffff88007d781600 R11: ffff881032ce8ef0 R12: 0000000002400240
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: R13: 0000000000000050 R14: ffffffffa0486581 R15: ffff88085f8037c0
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: FS:  00007f6fbb4fa700(0000) GS:ffff88107fd00000(0000) knlGS:0000000000000000
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: CR2: ffff88007d781600 CR3: 00000002dd28f000 CR4: 00000000001426e0
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: Stack:
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: ffff88105c3d0f00 ffff880902dffd3c 0000000000000000 0000000002400240
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: 0000000000000050 0000000000000000 0000000051eb851f ffff880902dffd68
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: ffffffffa0486581 0000000000000000 ffff88104d0c4278 ffff88104d0c3c00
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: Call Trace:
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffffa0486581>] kmem_alloc+0x81/0x120 [xfs]
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffffa04626fa>] xfs_attr_shortform_list+0x8a/0x3c0 [xfs]
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffffa0478bdf>] ? xfs_ilock+0xff/0x130 [xfs]
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffffa046348a>] xfs_attr_list_int+0xca/0xd0 [xfs]
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffffa0486390>] xfs_vn_listxattr+0x90/0x1c0 [xfs]
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffffa04860e0>] ? xfs_xattr_get+0x80/0x80 [xfs]
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffff81252a03>] vfs_listxattr+0x43/0x70
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffff81252a87>] listxattr+0x57/0x130
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffff81252bbe>] path_listxattr+0x5e/0xb0
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffff81253790>] SyS_llistxattr+0x10/0x20
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: [<ffffffff817a05ae>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: Code: 49 83 78 10 00 4d 8b 10 0f 84 3f 01 00 00 4d 85 d2 0f 84 36 01 00 00 49 63 41 20 49 8b 39 4c 01 d0 40 f6 c7 0f 0f 85 a2 01 00 00 <48> 8b 18 48 8d 4a 01 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: RIP  [<ffffffff8120e540>] __kmalloc+0xa0/0x260
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: RSP <ffff880902dffce0>
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: CR2: ffff88007d781600
Apr 28 14:23:29 office-wx-hv1-dl160 kernel: ---[ end trace e355c15b87a26b01 ]---
Apr 28 14:23:30 office-wx-hv1-dl160 abrt-dump-journal-oops: abrt-dump-journal-oops: Found oopses: 1
Apr 28 14:23:30 office-wx-hv1-dl160 abrt-dump-journal-oops: abrt-dump-journal-oops: Creating problem directories
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: BUG: unable to handle kernel paging request at ffff88007d781600
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: IP: [<ffffffff8120e540>] __kmalloc+0xa0/0x260
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: PGD 2051067 PUD 107ffff067 PMD 0 
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: Oops: 0000 [#2] SMP 
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: Modules linked in: tun fuse bridge stp llc bonding cfg80211 rfkill nf_log_ipv4 nf_log_common xt_LOG xt_conntrack ip6table_security ip6table_raw ip6table_mangle ip6table_filter ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables iptable_security iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack xfs libcrc32c vfat fat intel_rapl iosf_mbi x86_pkg_temp_thermal coretemp kvm_intel kvm iTCO_wdt irqbypass crct10dif_pclmul iTCO_vendor_support crc32_pclmul crc32c_intel hpwdt hpilo joydev sb_edac lpc_ich edac_core wmi shpchp i2c_i801 ipmi_ssif tpm_tis tpm ipmi_si ioatdma ipmi_msghandler acpi_power_meter pcc_cpufreq acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace binfmt_misc sunrpc ixgbe mgag200 drm_kms_helper ttm mdio vxlan
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: drm ip6_udp_tunnel igb udp_tunnel hpsa ptp pps_core dca scsi_transport_sas i2c_algo_bit fjes
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: CPU: 10 PID: 1175 Comm: abrt-dump-journ Tainted: G      D         4.4.6-301.fc23.x86_64 #1
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: Hardware name: HP ProLiant DL160 Gen9, BIOS U20 07/20/2015
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: task: ffff881054a38000 ti: ffff88105ac18000 task.ti: ffff88105ac18000
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: RIP: 0010:[<ffffffff8120e540>]  [<ffffffff8120e540>] __kmalloc+0xa0/0x260
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: RSP: 0018:ffff88105ac1b920  EFLAGS: 00010246
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: RAX: ffff88007d781600 RBX: 0000000002408040 RCX: 0000000000000000
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: RDX: 0000000000004e50 RSI: 0000000000000000 RDI: 000000000001a260
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: RBP: ffff88105ac1b958 R08: ffff88107fd1a260 R09: ffff88085f8037c0
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: R10: ffff88007d781600 R11: 0000000000000000 R12: 0000000002408040
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: R13: 0000000000000060 R14: ffffffff812e7ed9 R15: ffff88085f8037c0
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: FS:  00007f4c8e587880(0000) GS:ffff88107fd00000(0000) knlGS:0000000000000000
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: CR2: ffff88007d781600 CR3: 000000105498d000 CR4: 00000000001426e0
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: Stack:
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: 0000000000000246 ffff88107ffece00 0000000000000000 0000000000000000
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: 0000000000000000 0000000000000000 ffff881057620400 ffff88105ac1b9b8
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: ffffffff812e7ed9 0000000000000000 000000005ac1bab8 0000000000000000
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: Call Trace:
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff812e7ed9>] ext4_find_extent+0x1b9/0x320
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff812ec748>] ext4_ext_map_blocks+0x88/0xea0
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff812bcdeb>] ext4_map_blocks+0x9b/0x4a0
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff81263f3a>] ? __find_get_block+0x10a/0x110
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff812bd8d1>] ext4_getblk+0x51/0x190
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff81350af5>] ? type_attribute_bounds_av+0x65/0x2e0
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff812c7e52>] ext4_find_entry+0x382/0x720
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff81247ee3>] ? find_inode_fast+0x53/0x90
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8120ce87>] ? kmem_cache_alloc+0x197/0x200
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff812465a5>] ? __d_alloc+0x25/0x170
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff812c8724>] ext4_lookup+0x64/0x170
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8123791d>] lookup_real+0x1d/0x60
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff81238e92>] __lookup_hash+0x42/0x60
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8123aa16>] walk_component+0x226/0x300
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8123947b>] ? path_init+0x1eb/0x380
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8123b96d>] path_lookupat+0x5d/0x110
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8123d5a1>] filename_lookup+0xb1/0x180
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff81210d38>] ? __kmalloc_track_caller+0x1b8/0x250
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8120ce87>] ? kmem_cache_alloc+0x197/0x200
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8123d1a6>] ? getname_flags+0x56/0x1f0
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8133e57b>] ? selinux_cred_prepare+0x1b/0x30
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8123d746>] user_path_at_empty+0x36/0x40
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff8122c584>] SyS_access+0xb4/0x230
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: [<ffffffff817a05ae>] entry_SYSCALL_64_fastpath+0x12/0x71
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: Code: 49 83 78 10 00 4d 8b 10 0f 84 3f 01 00 00 4d 85 d2 0f 84 36 01 00 00 49 63 41 20 49 8b 39 4c 01 d0 40 f6 c7 0f 0f 85 a2 01 00 00 <48> 8b 18 48 8d 4a 01 4c 89 d0 65 48 0f c7 0f 0f 94 c0 84 c0 74 
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: RIP  [<ffffffff8120e540>] __kmalloc+0xa0/0x260
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: RSP <ffff88105ac1b920>
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: CR2: ffff88007d781600
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: ---[ end trace e355c15b87a26b02 ]---
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: BUG: unable to handle kernel paging request at ffff88007d781600
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: IP: [<ffffffff8120cf73>] kmem_cache_alloc_trace+0x83/0x210
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: PGD 2051067 PUD 107ffff067 PMD 0 
Apr 28 14:23:30 office-wx-hv1-dl160 kernel: Oops: 0000 [#3] SMP

Comment 3 Laura Abbott 2016-04-28 17:26:24 UTC
Can you try booting with slub_debug=PFZU on the kernel command line? This will rule out some set of errors.

Comment 4 Gilboa Davara 2016-05-01 06:27:30 UTC
We'll try.
We lost the gluster late last night after 3/3 nodes went kaboom, one after one.

We'll create a new gluster volume and report back.

Comment 5 Gilboa Davara 2016-05-02 13:59:37 UTC
w/ debug information host doesn't crash, but we're seeing a lot of poison overwritten messages.
E.g.

May  2 15:49:37 office-wx-hv1-dl160 kernel: =============================================================================
May  2 15:49:37 office-wx-hv1-dl160 kernel: BUG kmalloc-96 (Not tainted): Poison overwritten
May  2 15:49:37 office-wx-hv1-dl160 kernel: -----------------------------------------------------------------------------
May  2 15:49:37 office-wx-hv1-dl160 kernel: Disabling lock debugging due to kernel taint
May  2 15:49:37 office-wx-hv1-dl160 kernel: INFO: 0xffff88104ef2b854-0xffff88104ef2b854. First byte 0x0 instead of 0x6b
May  2 15:49:37 office-wx-hv1-dl160 kernel: INFO: Allocated in fuse_direct_IO+0xfe/0x310 [fuse] age=2 cpu=3 pid=4470
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011___slab_alloc+0x486/0x4e0
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011__slab_alloc+0x20/0x40
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011kmem_cache_alloc_trace+0x1b2/0x210
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011fuse_direct_IO+0xfe/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011generic_file_read_iter+0x47b/0x5c0
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011fuse_file_read_iter+0x4c/0x70 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011__vfs_read+0xc6/0x100
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011vfs_read+0x86/0x130
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011SyS_pread64+0x95/0xb0
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011entry_SYSCALL_64_fastpath+0x12/0x71
May  2 15:49:37 office-wx-hv1-dl160 kernel: INFO: Freed in fuse_direct_IO+0x1ee/0x310 [fuse] age=1 cpu=3 pid=4470
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011__slab_free+0x195/0x250
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011kfree+0x144/0x160
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011fuse_direct_IO+0x1ee/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011generic_file_read_iter+0x47b/0x5c0
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011fuse_file_read_iter+0x4c/0x70 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011__vfs_read+0xc6/0x100
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011vfs_read+0x86/0x130
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011SyS_pread64+0x95/0xb0
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011entry_SYSCALL_64_fastpath+0x12/0x71
May  2 15:49:37 office-wx-hv1-dl160 kernel: INFO: Slab 0xffffea00413bca00 objects=38 used=38 fp=0x          (null) flags=0xdffff800004080
May  2 15:49:37 office-wx-hv1-dl160 kernel: INFO: Object 0xffff88104ef2b850 @offset=14416 fp=0xffff88104ef2b9f8
May  2 15:49:37 office-wx-hv1-dl160 kernel: Bytes b4 ffff88104ef2b840: b9 bb 07 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff88104ef2b850: 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkk.kkkkkkkkkkk
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff88104ef2b860: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff88104ef2b870: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff88104ef2b880: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff88104ef2b890: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff88104ef2b8a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
May  2 15:49:37 office-wx-hv1-dl160 kernel: Redzone ffff88104ef2b8b0: bb bb bb bb bb bb bb bb                          ........
May  2 15:49:37 office-wx-hv1-dl160 kernel: Padding ffff88104ef2b9f0: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
May  2 15:49:37 office-wx-hv1-dl160 kernel: CPU: 3 PID: 4470 Comm: qemu-system-x86 Tainted: G    B           4.4.6-301.fc23.x86_64 #1
May  2 15:49:37 office-wx-hv1-dl160 kernel: Hardware name: HP ProLiant DL160 Gen9, BIOS U20 07/20/2015
May  2 15:49:37 office-wx-hv1-dl160 kernel: 0000000000000086 00000000f2422ffc ffff88084030fab8 ffffffff813b542e
May  2 15:49:37 office-wx-hv1-dl160 kernel: ffff88085f804780 ffff88104ef2b850 ffff88084030faf8 ffffffff8120a0f5
May  2 15:49:37 office-wx-hv1-dl160 kernel: 0000000000000008 ffff881000000001 ffff88104ef2b855 ffff88085f804780
May  2 15:49:37 office-wx-hv1-dl160 kernel: Call Trace:
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff813b542e>] dump_stack+0x63/0x85
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120a0f5>] print_trailer+0x145/0x1f0
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120a265>] check_bytes_and_report+0xc5/0x110
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120a474>] check_object+0x1c4/0x240
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa05834de>] ? fuse_direct_IO+0xfe/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120ade4>] alloc_debug_processing+0x104/0x180
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120cc56>] ___slab_alloc+0x486/0x4e0
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa05834de>] ? fuse_direct_IO+0xfe/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120e16e>] ? kmem_cache_free+0x1de/0x1f0
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa0579a10>] ? fuse_request_free+0x40/0x50 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa05834de>] ? fuse_direct_IO+0xfe/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120ccd0>] __slab_alloc+0x20/0x40
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120d0a2>] kmem_cache_alloc_trace+0x1b2/0x210
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa05834de>] fuse_direct_IO+0xfe/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff811aeadb>] generic_file_read_iter+0x47b/0x5c0
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa05800cc>] fuse_file_read_iter+0x4c/0x70 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8122d746>] __vfs_read+0xc6/0x100
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8122e176>] vfs_read+0x86/0x130
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8122f085>] SyS_pread64+0x95/0xb0
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff817a05ae>] entry_SYSCALL_64_fastpath+0x12/0x71
May  2 15:49:37 office-wx-hv1-dl160 kernel: FIX kmalloc-96: Restoring 0xffff88104ef2b854-0xffff88104ef2b854=0x6b
May  2 15:49:37 office-wx-hv1-dl160 kernel: FIX kmalloc-96: Marking all objects used
May  2 15:49:37 office-wx-hv1-dl160 kernel: =============================================================================
May  2 15:49:37 office-wx-hv1-dl160 kernel: BUG kmalloc-96 (Tainted: G    B          ): Poison overwritten
May  2 15:49:37 office-wx-hv1-dl160 kernel: -----------------------------------------------------------------------------
May  2 15:49:37 office-wx-hv1-dl160 kernel: INFO: 0xffff8810508793e4-0xffff8810508793e4. First byte 0x0 instead of 0x6b
May  2 15:49:37 office-wx-hv1-dl160 kernel: INFO: Allocated in fuse_direct_IO+0xfe/0x310 [fuse] age=2 cpu=5 pid=4470
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011___slab_alloc+0x486/0x4e0
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011__slab_alloc+0x20/0x40
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011kmem_cache_alloc_trace+0x1b2/0x210
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011fuse_direct_IO+0xfe/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011generic_file_read_iter+0x47b/0x5c0
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011fuse_file_read_iter+0x4c/0x70 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011__vfs_read+0xc6/0x100
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011vfs_read+0x86/0x130
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011SyS_pread64+0x95/0xb0
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011entry_SYSCALL_64_fastpath+0x12/0x71
May  2 15:49:37 office-wx-hv1-dl160 kernel: INFO: Freed in fuse_direct_IO+0x1ee/0x310 [fuse] age=1 cpu=5 pid=4470
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011__slab_free+0x195/0x250
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011kfree+0x144/0x160
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011fuse_direct_IO+0x1ee/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011generic_file_read_iter+0x47b/0x5c0
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011fuse_file_read_iter+0x4c/0x70 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011__vfs_read+0xc6/0x100
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011vfs_read+0x86/0x130
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011SyS_pread64+0x95/0xb0
May  2 15:49:37 office-wx-hv1-dl160 kernel: #011entry_SYSCALL_64_fastpath+0x12/0x71
May  2 15:49:37 office-wx-hv1-dl160 kernel: INFO: Slab 0xffffea0041421e00 objects=38 used=38 fp=0x          (null) flags=0xdffff800004080
May  2 15:49:37 office-wx-hv1-dl160 kernel: INFO: Object 0xffff8810508793e0 @offset=5088 fp=0xffff88105087a2c8
May  2 15:49:37 office-wx-hv1-dl160 kernel: Bytes b4 ffff8810508793d0: e7 a2 07 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff8810508793e0: 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkk.kkkkkkkkkkk
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff8810508793f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff881050879400: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff881050879410: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff881050879420: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
May  2 15:49:37 office-wx-hv1-dl160 kernel: Object ffff881050879430: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
May  2 15:49:37 office-wx-hv1-dl160 kernel: Redzone ffff881050879440: bb bb bb bb bb bb bb bb                          ........
May  2 15:49:37 office-wx-hv1-dl160 kernel: Padding ffff881050879580: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
May  2 15:49:37 office-wx-hv1-dl160 kernel: CPU: 5 PID: 4470 Comm: qemu-system-x86 Tainted: G    B           4.4.6-301.fc23.x86_64 #1
May  2 15:49:37 office-wx-hv1-dl160 kernel: Hardware name: HP ProLiant DL160 Gen9, BIOS U20 07/20/2015
May  2 15:49:37 office-wx-hv1-dl160 kernel: 0000000000000086 00000000f2422ffc ffff88084030fab8 ffffffff813b542e
May  2 15:49:37 office-wx-hv1-dl160 kernel: ffff88085f804780 ffff8810508793e0 ffff88084030faf8 ffffffff8120a0f5
May  2 15:49:37 office-wx-hv1-dl160 kernel: 0000000000000008 ffff881000000001 ffff8810508793e5 ffff88085f804780
May  2 15:49:37 office-wx-hv1-dl160 kernel: Call Trace:
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff813b542e>] dump_stack+0x63/0x85
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120a0f5>] print_trailer+0x145/0x1f0
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120a265>] check_bytes_and_report+0xc5/0x110
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120a474>] check_object+0x1c4/0x240
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa05834de>] ? fuse_direct_IO+0xfe/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120ade4>] alloc_debug_processing+0x104/0x180
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120cc56>] ___slab_alloc+0x486/0x4e0
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa05834de>] ? fuse_direct_IO+0xfe/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120e16e>] ? kmem_cache_free+0x1de/0x1f0
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa0579a10>] ? fuse_request_free+0x40/0x50 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa05834de>] ? fuse_direct_IO+0xfe/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120ccd0>] __slab_alloc+0x20/0x40
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8120d0a2>] kmem_cache_alloc_trace+0x1b2/0x210
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa05834de>] fuse_direct_IO+0xfe/0x310 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff811aeadb>] generic_file_read_iter+0x47b/0x5c0
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffffa05800cc>] fuse_file_read_iter+0x4c/0x70 [fuse]
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8122d746>] __vfs_read+0xc6/0x100
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8122e176>] vfs_read+0x86/0x130
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff8122f085>] SyS_pread64+0x95/0xb0
May  2 15:49:37 office-wx-hv1-dl160 kernel: [<ffffffff817a05ae>] entry_SYSCALL_64_fastpath+0x12/0x71
May  2 15:49:37 office-wx-hv1-dl160 kernel: FIX kmalloc-96: Restoring 0xffff8810508793e4-0xffff8810508793e4=0x6b
May  2 15:49:37 office-wx-hv1-dl160 kernel: FIX kmalloc-96: Marking all objects used

Comment 6 Laura Abbott 2016-05-02 17:10:00 UTC
That's definitely a problem. There were several FUSE fixes that came in 4.4.7/4.4.8, the most promising is "fuse: do not use iocb after it may have been freed". Please test on 4.4.8 to see if the problem is still present.

Comment 7 Gilboa Davara 2016-05-03 05:58:13 UTC
Seems to be solved w/ 4.4.8.

Many, many thanks for the help.

- Gilboa


Note You need to log in before you can comment on or make changes to this bug.