Bug 1144128

Summary:

FUSE: Scheduling while atomic OOPSes when using inval_entry

Product:

Red Hat Enterprise Linux 6

Reporter:

Richard Sharpe <realrichardsharpe>

Component:

kernel

Assignee:

Brian Foster <bfoster>

kernel sub component:

Other

QA Contact:

Zorro Lang <zlang>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

unspecified

Priority:

unspecified

CC:

bfoster, cmaiolin, eguan, esandeen, jharriga, realrichardsharpe, rwheeler, swhiteho

Version:

6.7

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-2.6.32-556.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

1155771 (view as bug list)

Environment:

Last Closed:

2015-07-22 08:18:03 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1155771, 1164931

Attachments:

Description	Flags
The patch to avoid scheduling while atomic in fuse.ko	none

Description Richard Sharpe 2014-09-18 18:14:49 UTC

Created attachment 939006 [details]
The patch to avoid scheduling while atomic in fuse.ko

Description of problem:

I am seeing this OOPS reasonable frequently after I added a call to
fuse_lowlevel_notify_inval_entry:

Jun 23 11:53:24 localhost kernel: BUG: scheduling while atomic:
fuse.hf/13976/0x10000001
Jun 23 11:53:24 localhost kernel: Modules linked in: nls_utf8 fuse
ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM
iptable_mangle bridge nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs
autofs4 8021q garp stp llc vboxpci(U) vboxnetadp(U) vboxnetflt(U)
vboxdrv(U) cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables ipv6 vhost_net macvtap macvlan tun kvm_intel kvm uinput
iTCO_wdt iTCO_vendor_support microcode serio_raw r8169 mii i2c_i801 sg
lpc_ich mfd_core snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm
snd_timer snd soundcore snd_page_alloc shpchp ext4 jbd2 mbcache sr_mod
cdrom sd_mod crc_t10dif ahci xhci_hcd wmi radeon ttm drm_kms_helper
drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]
Jun 23 11:53:24 localhost kernel: Pid: 13976, comm: fuse.hf Not
tainted 2.6.32-431.20.3.el6.x86_64 #1
Jun 23 11:53:24 localhost kernel: Call Trace:
Jun 23 11:53:24 localhost kernel: [<ffffffff8105da66>] ?
__schedule_bug+0x66/0x70
Jun 23 11:53:24 localhost kernel: [<ffffffff81528e30>] ?
thread_return+0x640/0x760
Jun 23 11:53:24 localhost kernel: [<ffffffff810695fa>] ?
__cond_resched+0x2a/0x40
Jun 23 11:53:24 localhost kernel: [<ffffffff81529220>] ? _cond_resched+0x30/0x40
Jun 23 11:53:24 localhost kernel: [<ffffffff8116f405>] ?
kmem_cache_alloc_trace+0xe5/0x1b0
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d095e>] ?
fuse_notify+0x9e/0x2a0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d1580>] ?
fuse_dev_write+0x0/0x3e0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d0593>] ?
fuse_copy_one+0x53/0x70 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d16ea>] ?
fuse_dev_write+0x16a/0x3e0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffff8105a323>] ?
enqueue_pushable_task+0x73/0x90
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d1580>] ?
fuse_dev_write+0x0/0x3e0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffff811889bb>] ?
do_sync_readv_writev+0xfb/0x140
Jun 23 11:53:24 localhost kernel: [<ffffffff8109afa0>] ?
autoremove_wake_function+0x0/0x40
Jun 23 11:53:24 localhost kernel: [<ffffffff810b0f10>] ? do_futex+0x100/0xb50
Jun 23 11:53:24 localhost kernel: [<ffffffff81226c06>] ?
security_file_permission+0x16/0x20
Jun 23 11:53:24 localhost kernel: [<ffffffff81189946>] ?
do_readv_writev+0xd6/0x1f0
Jun 23 11:53:24 localhost kernel: [<ffffffff81189aa6>] ? vfs_writev+0x46/0x60
Jun 23 11:53:24 localhost kernel: [<ffffffff81189bd1>] ? sys_writev+0x51/0xb0
Jun 23 11:53:24 localhost kernel: [<ffffffff810e1c7e>] ?
__audit_syscall_exit+0x25e/0x290
Jun 23 11:53:24 localhost kernel: [<ffffffff8100b072>] ?
system_call_fastpath+0x16/0x1b

Version-Release number of selected component (if applicable):

Seems to exist in all versions of RHEL 6.x because there is a kzalloc after a kmap_atomic call, and if memory is low, we will schedule while atomic.

How reproducible:

Add code to your fuse file system to call fuse_lowlevel_notify_inval_entry.

Steps to Reproduce:
1. As above
2. Consume lots of memory
3. Create lots of files and exercise the code path where fuse_lowlevel_notify_inval_entry is called.

Actual results:

Lots of OOPS messages in /var/log/messages. These seem to be benign but they scare admins.

Expected results:

No such messages.

Additional info:

I have a patch which I will attach but there is a better one at:

https://lkml.org/lkml/2014/7/4/361

However, this will need a little massaging for 2.6.32.

Comment 2 Ric Wheeler 2014-09-19 20:48:58 UTC

Brian, Eric, 

Richard has hit this on all versions of RHEL6.x without the attached patch.

Can we get this nominated for a RHEL6.x kernel build?

We might eventually see this in Red Hat Storage.

Thanks!

Comment 3 Brian Foster 2014-09-27 13:55:29 UTC

I did some brief legwork on this a few days ago and managed to pull enough back to compile successfully on latest rhel6. I'll need to get back to it, review it more carefully and test, but I think we have plenty of time to get this fixed for 6.7. Set devel ack.

Comment 5 RHEL Program Management 2014-11-10 23:10:31 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 6 Kurt Stutsman 2015-04-27 16:14:32 UTC

Patch(es) available on kernel-2.6.32-556.el6

Comment 9 Zorro Lang 2015-06-11 10:26:42 UTC

Hi Richard,

I can't reproduce this bug easily. Could you give some detailed information about "lots of memory" and "lots of files"?
At least, How much memory should I consume? How many files should I create?

1) I mount a glusterfs
2) I mmap 2G memory and keep writing 2G random data to this memory. This consume nearly 100% memory of my machine.
3) I created 10000 files in the glusterfs, and invalidate their inode(setfattr -n 'inode-invalidate' testfile${num}) one by one. But still can not reproduce. Is there something I missed?

Thanks,
Zorro

Comment 10 Richard Sharpe 2015-06-11 18:35:01 UTC

I no longer work on that stuff, but here is my memory of what we were doing.

1. A long create run where we were creating anywhere between 4M and 20M files.

2. The create run used 10 threads, so at least 10 sub-directories, but could be more (like 100).

3. File sizes varied from 8kiB to 64kiB of random data.

However, the problem might also only occur because the FUSE file system I was working on was not properly cleaning up inodes etc.

Comment 11 Zorro Lang 2015-06-17 03:20:17 UTC

Done FUSE regression test test on kernel 567. No regression failures. This bug still hard to reproduce for me. I will sanityOnly this bug first.

Comment 13 errata-xmlrpc 2015-07-22 08:18:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html