Bug 1144128

Summary: FUSE: Scheduling while atomic OOPSes when using inval_entry
Product: Red Hat Enterprise Linux 6 Reporter: Richard Sharpe <realrichardsharpe>
Component: kernelAssignee: Brian Foster <bfoster>
kernel sub component: Other QA Contact: Zorro Lang <zlang>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bfoster, cmaiolin, eguan, esandeen, jharriga, realrichardsharpe, rwheeler, swhiteho
Version: 6.7   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-556.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1155771 (view as bug list) Environment:
Last Closed: 2015-07-22 08:18:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1155771, 1164931    
Attachments:
Description Flags
The patch to avoid scheduling while atomic in fuse.ko none

Description Richard Sharpe 2014-09-18 18:14:49 UTC
Created attachment 939006 [details]
The patch to avoid scheduling while atomic in fuse.ko

Description of problem:

I am seeing this OOPS reasonable frequently after I added a call to
fuse_lowlevel_notify_inval_entry:

Jun 23 11:53:24 localhost kernel: BUG: scheduling while atomic:
fuse.hf/13976/0x10000001
Jun 23 11:53:24 localhost kernel: Modules linked in: nls_utf8 fuse
ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM
iptable_mangle bridge nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs
autofs4 8021q garp stp llc vboxpci(U) vboxnetadp(U) vboxnetflt(U)
vboxdrv(U) cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables ipv6 vhost_net macvtap macvlan tun kvm_intel kvm uinput
iTCO_wdt iTCO_vendor_support microcode serio_raw r8169 mii i2c_i801 sg
lpc_ich mfd_core snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm
snd_timer snd soundcore snd_page_alloc shpchp ext4 jbd2 mbcache sr_mod
cdrom sd_mod crc_t10dif ahci xhci_hcd wmi radeon ttm drm_kms_helper
drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]
Jun 23 11:53:24 localhost kernel: Pid: 13976, comm: fuse.hf Not
tainted 2.6.32-431.20.3.el6.x86_64 #1
Jun 23 11:53:24 localhost kernel: Call Trace:
Jun 23 11:53:24 localhost kernel: [<ffffffff8105da66>] ?
__schedule_bug+0x66/0x70
Jun 23 11:53:24 localhost kernel: [<ffffffff81528e30>] ?
thread_return+0x640/0x760
Jun 23 11:53:24 localhost kernel: [<ffffffff810695fa>] ?
__cond_resched+0x2a/0x40
Jun 23 11:53:24 localhost kernel: [<ffffffff81529220>] ? _cond_resched+0x30/0x40
Jun 23 11:53:24 localhost kernel: [<ffffffff8116f405>] ?
kmem_cache_alloc_trace+0xe5/0x1b0
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d095e>] ?
fuse_notify+0x9e/0x2a0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d1580>] ?
fuse_dev_write+0x0/0x3e0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d0593>] ?
fuse_copy_one+0x53/0x70 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d16ea>] ?
fuse_dev_write+0x16a/0x3e0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffff8105a323>] ?
enqueue_pushable_task+0x73/0x90
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d1580>] ?
fuse_dev_write+0x0/0x3e0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffff811889bb>] ?
do_sync_readv_writev+0xfb/0x140
Jun 23 11:53:24 localhost kernel: [<ffffffff8109afa0>] ?
autoremove_wake_function+0x0/0x40
Jun 23 11:53:24 localhost kernel: [<ffffffff810b0f10>] ? do_futex+0x100/0xb50
Jun 23 11:53:24 localhost kernel: [<ffffffff81226c06>] ?
security_file_permission+0x16/0x20
Jun 23 11:53:24 localhost kernel: [<ffffffff81189946>] ?
do_readv_writev+0xd6/0x1f0
Jun 23 11:53:24 localhost kernel: [<ffffffff81189aa6>] ? vfs_writev+0x46/0x60
Jun 23 11:53:24 localhost kernel: [<ffffffff81189bd1>] ? sys_writev+0x51/0xb0
Jun 23 11:53:24 localhost kernel: [<ffffffff810e1c7e>] ?
__audit_syscall_exit+0x25e/0x290
Jun 23 11:53:24 localhost kernel: [<ffffffff8100b072>] ?
system_call_fastpath+0x16/0x1b

Version-Release number of selected component (if applicable):

Seems to exist in all versions of RHEL 6.x because there is a kzalloc after a kmap_atomic call, and if memory is low, we will schedule while atomic.

How reproducible:

Add code to your fuse file system to call fuse_lowlevel_notify_inval_entry.

Steps to Reproduce:
1. As above
2. Consume lots of memory
3. Create lots of files and exercise the code path where fuse_lowlevel_notify_inval_entry is called.

Actual results:

Lots of OOPS messages in /var/log/messages. These seem to be benign but they scare admins.

Expected results:

No such messages.

Additional info:

I have a patch which I will attach but there is a better one at:

https://lkml.org/lkml/2014/7/4/361

However, this will need a little massaging for 2.6.32.

Comment 2 Ric Wheeler 2014-09-19 20:48:58 UTC
Brian, Eric, 

Richard has hit this on all versions of RHEL6.x without the attached patch.

Can we get this nominated for a RHEL6.x kernel build?

We might eventually see this in Red Hat Storage.

Thanks!

Comment 3 Brian Foster 2014-09-27 13:55:29 UTC
I did some brief legwork on this a few days ago and managed to pull enough back to compile successfully on latest rhel6. I'll need to get back to it, review it more carefully and test, but I think we have plenty of time to get this fixed for 6.7. Set devel ack.

Comment 5 RHEL Program Management 2014-11-10 23:10:31 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 6 Kurt Stutsman 2015-04-27 16:14:32 UTC
Patch(es) available on kernel-2.6.32-556.el6

Comment 9 Zorro Lang 2015-06-11 10:26:42 UTC
Hi Richard,

I can't reproduce this bug easily. Could you give some detailed information about "lots of memory" and "lots of files"?
At least, How much memory should I consume? How many files should I create?

1) I mount a glusterfs
2) I mmap 2G memory and keep writing 2G random data to this memory. This consume nearly 100% memory of my machine.
3) I created 10000 files in the glusterfs, and invalidate their inode(setfattr -n 'inode-invalidate' testfile${num}) one by one. But still can not reproduce. Is there something I missed?

Thanks,
Zorro

Comment 10 Richard Sharpe 2015-06-11 18:35:01 UTC
I no longer work on that stuff, but here is my memory of what we were doing.

1. A long create run where we were creating anywhere between 4M and 20M files.

2. The create run used 10 threads, so at least 10 sub-directories, but could be more (like 100).

3. File sizes varied from 8kiB to 64kiB of random data.

However, the problem might also only occur because the FUSE file system I was working on was not properly cleaning up inodes etc.

Comment 11 Zorro Lang 2015-06-17 03:20:17 UTC
Done FUSE regression test test on kernel 567. No regression failures. This bug still hard to reproduce for me. I will sanityOnly this bug first.

Comment 13 errata-xmlrpc 2015-07-22 08:18:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html