Bug 1254310

Summary: SLUB allocator breaks fuse_direct_IO (SLAB works)
Product: [Fedora] Fedora Reporter: Jillian Morgan <penguin.wrangler>
Component: kernelAssignee: Miklos Szeredi <mszeredi>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 22CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, maik, mchehab, mszeredi
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: fuse gluster ovirt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-08 01:55:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed patch #1
none
proposed patch #2 none

Description Jillian Morgan 2015-08-17 16:41:10 UTC
Description of problem:

After upgrading a node from F20 to F21, node crashes accessing glusterfs volume.
The remaining F20 nodes have no problem accessing the volume.


Aug 16 20:24:25 bagel kernel: [ 1810.077267] ------------[ cut here ]------------
Aug 16 20:24:25 bagel kernel: [ 1810.081945] kernel BUG at mm/slub.c:3413!
Aug 16 20:24:25 bagel kernel: [ 1810.085998] invalid opcode: 0000 [#1] SMP
Aug 16 20:24:25 bagel kernel: [ 1810.090177] Modules linked in: vhost_net vhost m
acvtap macvlan ebt_arp ebtable_nat fuse nfsv3 nfs_acl nfs lockd grace sunrpc fsca
che ebtable_filter ebtables ip6table_filter ip6_tables softdog scsi_transport_isc
si xt_physdev br_netfilter nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_connt
rack nf_conntrack vfat fat coretemp kvm_intel kvm bcache iTCO_wdt crct10dif_pclmu
l ipmi_devintf crc32_pclmul iTCO_vendor_support gpio_ich igb crc32c_intel ptp pps
_core lpc_ich ghash_clmulni_intel i2c_i801 mfd_core ipmi_si dca ipmi_msghandler i
2c_ismt tpm_tis shpchp tpm acpi_cpufreq ast i2c_algo_bit drm_kms_helper ttm drm 8
021q garp mrp tun bridge stp llc bonding
Aug 16 20:24:25 bagel kernel: [ 1810.149526] CPU: 1 PID: 4794 Comm: qemu-system-x
86 Not tainted 4.1.4-100.fc21.x86_64 #1
Aug 16 20:24:25 bagel kernel: [ 1810.157603] Hardware name: Supermicro A1SRM-2758
F/A1SRM-2758F, BIOS 1.2 02/16/2015
Aug 16 20:24:25 bagel kernel: [ 1810.165246] task: ffff88085a1313c0 ti: ffff8803b
09b4000 task.ti: ffff8803b09b4000
Aug 16 20:24:25 bagel kernel: [ 1810.172800] RIP: 0010:[<ffffffff81208532>]  [<ff
ffffff81208532>] kfree+0x152/0x160
Aug 16 20:24:25 bagel kernel: [ 1810.180467] RSP: 0018:ffff8803b09b7c98  EFLAGS:
00010246
Aug 16 20:24:25 bagel kernel: [ 1810.185833] RAX: 005ffff80000002c RBX: ffff88020
08b9960 RCX: dead000000200200
Aug 16 20:24:25 bagel kernel: [ 1810.193032] RDX: 000077ff80000000 RSI: ffff88085
a1313c0 RDI: ffff8802008b9960
Aug 16 20:24:25 bagel kernel: [ 1810.200231] RBP: ffff8803b09b7cb8 R08: ffff8803b
09b7c80 R09: ffffea0008022e40
Aug 16 20:24:25 bagel kernel: [ 1810.207431] R10: 0000000000002fe4 R11: 000000000
0000000 R12: 0000000149928000
Aug 16 20:24:25 bagel kernel: [ 1810.214629] R13: ffffffffa02e5c8c R14: ffff8803b
09b7e50 R15: ffff8801009b5600
Aug 16 20:24:25 bagel kernel: [ 1810.221829] FS:  00007f35609ff700(0000) GS:ffff88087fc40000(0000) knlGS:0000000000000000
Aug 16 20:24:25 bagel kernel: [ 1810.229992] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 16 20:24:25 bagel kernel: [ 1810.235799] CR2: 00007fbf24022a98 CR3: 0000000100a81000 CR4: 00000000001027e0
Aug 16 20:24:25 bagel kernel: [ 1810.243001] Stack:
Aug 16 20:24:25 bagel kernel: [ 1810.245037]  ffff8802008b9960 ffff8802008b9960 0000000149928000 ffff8803b09b7da8
Aug 16 20:24:25 bagel kernel: [ 1810.252590]  ffff8803b09b7d48 ffffffffa02e5c8c 0000000000004800 ffff8806eea842c0
Aug 16 20:24:25 bagel kernel: [ 1810.260145]  0000000000004800 00000001f4000000 000000014992c800 0000000000000000
Aug 16 20:24:25 bagel kernel: [ 1810.267699] Call Trace:
Aug 16 20:24:25 bagel kernel: [ 1810.270189]  [<ffffffffa02e5c8c>] fuse_direct_IO+0x20c/0x340 [fuse]
Aug 16 20:24:25 bagel kernel: [ 1810.276525]  [<ffffffff811ac2fa>] generic_file_read_iter+0x4ca/0x600
Aug 16 20:24:25 bagel kernel: [ 1810.282941]  [<ffffffffa02e22ac>] fuse_file_read_iter+0x4c/0x70 [fuse]
Aug 16 20:24:25 bagel kernel: [ 1810.289531]  [<ffffffff81227e1e>] __vfs_read+0xce/0x100
Aug 16 20:24:25 bagel kernel: [ 1810.294810]  [<ffffffff8122849a>] vfs_read+0x8a/0x140
Aug 16 20:24:25 bagel kernel: [ 1810.299910]  [<ffffffff812295c2>] SyS_pread64+0x92/0xc0
Aug 16 20:24:25 bagel kernel: [ 1810.305186]  [<ffffffff8179a76e>] system_call_fastpath+0x12/0x71
Aug 16 20:24:25 bagel kernel: [ 1810.311253] Code: 00 4d 8b 49 30 e9 35 ff ff ff 0f 1f 80 00 00 00 00 4c 89 d1 48 89 da 4c 89 ce e8 ca f9 ff ff e9 73 ff ff ff 0f 1f 44 00 00 0f 0b <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 89
Aug 16 20:24:25 bagel kernel: [ 1810.336949] RIP  [<ffffffff81208532>] kfree+0x152/0x160
Aug 16 20:24:25 bagel kernel: [ 1810.344889]  RSP <ffff8803b09b7c98>
Aug 16 20:24:25 bagel kernel: [ 1810.360802] ---[ end trace 76f7ea1ab5ea1b36 ]---


Version-Release number of selected component (if applicable):

kernel-4.1.4-100.fc21.x86_64
glusterfs-fuse-3.5.5-2.fc21.x86_64


How reproducible:

Every time I would start a VM whose disk lived on the gluster volume, the crash would happen immediately. The node would become mostly unresponsive and require a hard reset.


Steps to Reproduce:
1. glusterfs distributed-replicated volume across 3 F20 nodes.
2. upgrade one node from F20 to F21
3. attempt to run a VM on the new F21 node (accessing a disk image on the gluster volume)

Actual results:
Accessing files on gluster volume causes node crash.


Expected results:
No crash.

Additional info:

Comment 1 Fedora End Of Life 2015-11-04 10:12:01 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 2 Jillian Morgan 2015-11-08 10:45:19 UTC
Confirming that this still occurs on fresh Fedora 22 install with:

kernel-4.2.5-201.fc22.x86_64
glusterfs-fuse-3.7.5-1.fc22.x86_64
fuse-2.9.4-3.fc22.x86_64
fuse-libs-2.9.4-3.fc22.x86_64

Comment 3 Jillian Morgan 2015-11-11 02:20:09 UTC
Found a workaround (not a fix):

I recompiled kernel-4.2.5-201.fc22.x86_64 to use the older SLAB allocator instead of the default SLUB allocator. Problem avoided. No more crash when using glusterfs (fuse).

Now.. what the -bleep- is wrong with SLUB?

While using SLAB is a workaround (at least it seems to be working so far; knock-on-wood), I am uncertain what performance impacts it is going to have on my virtualization cluster. :-(

And without a run/boot-time method of switching between allocators, I am now going to have to compile my customized kernel from here on out.. not a big deal, but a nuisance.. and have to take extra care to make sure to never boot into a distro-built kernel by mistake and have everything come crashing down.

Comment 4 Jillian Morgan 2016-02-13 01:20:55 UTC
I am trying to get some traction on this bug, open for 6 months with no responses.

I have attempted to remove some variables from the equation to see what factors are potentially contributing to this kernel BUG.

First test:

I have replicated the issue on a host that does NOT run a glusterfsd, and thus only consumes a vm image from a separate server, eliminating any potential conflict from having both glusterfs server and client on the same node.

Also, the original hosts used when this bug was first reported were Supermicro Avoton Atom C2750/58. This new replication of the fault is on an older Dell PE2950 (Xeon E54xx), so the specific hardware does not seem to be a factor in the bug.

Reproduction steps:

- Fresh install of Fedora Server 22, minimal package set, with online updates.
- rpm -Uvh http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm.
- Add this node as a new host via oVirt WebAdmin.
- Start a VM on this new node, using a disk image that resides on a glusterfs storage domain.
- Boom!


kernel-4.3.4-200.fc22.x86_64
glusterfs-fuse-3.7.6-1.fc22.x86_64
fuse-2.9.4-3.fc22.x86_64
fuse-libs-2.9.4-3.fc22.x86_64

[  316.458148] ------------[ cut here ]------------
[  316.459052] kernel BUG at mm/slub.c:3517!
[  316.459052] invalid opcode: 0000 [#1] SMP 
[  316.459052] Modules linked in: vhost_net vhost macvtap macvlan ebt_arp ebtable_nat tun nfsv3 nfs fscache fuse ebtable_filter ebtables ip6table_filter ip6_tables scsi_transport_iscsi xt_physdev br_netfilter nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack dm_service_time nf_conntrack coretemp kvm_intel iTCO_wdt ipmi_ssif iTCO_vendor_support gpio_ich kvm ipmi_devintf dcdbas bnx2 lpc_ich ipmi_si i5000_edac edac_core ipmi_msghandler i5k_amb shpchp fjes acpi_cpufreq tpm_tis tpm nfsd 8021q auth_rpcgss garp mrp bridge nfs_acl lockd stp grace llc sunrpc bonding dm_multipath amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper ttm drm ata_generic serio_raw pata_acpi megaraid_sas
[  316.515263] CPU: 2 PID: 3055 Comm: qemu-system-x86 Not tainted 4.3.4-200.fc22.x86_64 #1
[  316.515263] Hardware name: Dell Inc. PowerEdge 2950/0M332H, BIOS 2.7.0 10/30/2010
[  316.515263] task: ffff88041cbbb980 ti: ffff880418e94000 task.ti: ffff880418e94000
[  316.515263] RIP: 0010:[<ffffffff81203edc>]  [<ffffffff81203edc>] kfree+0x12c/0x130
[  316.515263] RSP: 0018:ffff880418e97cc8  EFLAGS: 00010246
[  316.515263] RAX: 003ffff800000000 RBX: ffff88002a43fea0 RCX: dead000000000200
[  316.515263] RDX: 000077ff80000000 RSI: ffff88041cbbb980 RDI: ffff88002a43fea0
[  316.515263] RBP: ffff880418e97ce0 R08: ffff880418e97ca8 R09: ffffea0000a90fc0
[  316.515263] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000006e30400
[  316.515263] R13: ffffffffa054c60e R14: ffff88042b32e400 R15: ffff880418e97dc8
[  316.515263] FS:  00007f1c4e3ff700(0000) GS:ffff88043fc80000(0000) knlGS:0000000000000000
[  316.515263] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  316.515263] CR2: 0000000000000000 CR3: 0000000418c96000 CR4: 00000000000026e0
[  316.515263] Stack:
[  316.515263]  ffff88002a43fea0 0000000006e30400 ffff880418e97e60 ffff880418e97d68
[  316.515263]  ffffffffa054c60e 0000000000007c00 ffff88041c944c00 0000000280000000
[  316.515263]  0000000000007c00 0000000006e38000 0000000000000000 0000000000000000
[  316.515263] Call Trace:
[  316.515263]  [<ffffffffa054c60e>] fuse_direct_IO+0x1ee/0x310 [fuse]
[  316.515263]  [<ffffffff811a791b>] generic_file_read_iter+0x47b/0x5c0
[  316.515263]  [<ffffffffa054910c>] fuse_file_read_iter+0x4c/0x70 [fuse]
[  316.515263]  [<ffffffff81223346>] __vfs_read+0xc6/0x100
[  316.515263]  [<ffffffff81223d73>] vfs_read+0x83/0x130
[  316.515263]  [<ffffffff81224c85>] SyS_pread64+0x95/0xb0
[  316.515263]  [<ffffffff8178182e>] entry_SYSCALL_64_fastpath+0x12/0x71
[  316.515263] Code: 2a 49 8b 01 31 f6 f6 c4 40 74 04 41 8b 71 68 4c 89 cf e8 58 a2 fa ff eb a0 4c 89 d1 48 89 da 4c 89 ce e8 78 fa ff ff eb 90 0f 0b <0f> 0b 66 90 66 66 66 66 90 55 48 89 e5 41 57 41 56 41 55 41 54 
[  316.515263] RIP  [<ffffffff81203edc>] kfree+0x12c/0x130
[  316.515263]  RSP <ffff880418e97cc8>
[  316.904456] ---[ end trace a63508bc8d44e7be ]---


Second test:

Same as above, but with a fresh install of CentOS 7.2 (1511).
Result: No bug triggered. The VM runs just fine.

kernel-3.10.0-327.4.5.el7.x86_64
glusterfs-fuse-3.7.6-1.el7.x86_64
fuse-2.9.2-6.el7.x86_64
fuse-libs-2.9.2-6.el7.x86_64

Comment 5 Miklos Szeredi 2016-03-16 13:57:08 UTC
Created attachment 1137049 [details]
proposed patch #1

Comment 6 Miklos Szeredi 2016-03-16 13:58:38 UTC
Created attachment 1137050 [details]
proposed patch #2

Could you please test with these two patches?

Comment 7 Jillian Morgan 2016-03-16 14:19:11 UTC
Miklos,

Those patches look promising. I will endeavour to test them ASAP. If not today, then by the end of the week.

In the interest of not introducing any additional variables into the tests at this point, I will switch my current in-production kernel (kernel-4.2.5-201.fc22.x86_64 recompiled to use SLAB) back to the default/broken SLUB-based allocator, with your two patches applied and test that.

Whether that works or not, I will then apply the patches against the latest kernel-4.4.4-200.fc22 and test that as well.

Thank you for your work on this. I am very pleased to see this bug finally get some attention.

Comment 8 Jillian Morgan 2016-03-16 17:16:32 UTC
Miklos,

Yahoo! The above two patches have allowed me to return to the SLUB allocator without fuse crashing. VMs started up with no problem, just as they should. This is with kernel 4.2.5.

Having one test node with the patches running VMs for only a few minutes now, I am tentatively calling this one a success. I'll will try the patches on 4.4.4 shortly, but I expect that to work as well.

What are the odds that the Fedora kernel team will incorporate these patches without waiting for it to hit mainline/stable upstream first?

Comment 9 Jillian Morgan 2016-05-08 01:55:06 UTC
Confirmed fixed on all nodes of my production cluster with the FUSE patches included in kernel-4.4.8-200.fc22.x86_64.