1144128 – FUSE: Scheduling while atomic OOPSes when using inval_entry

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1144128 - FUSE: Scheduling while atomic OOPSes when using inval_entry

Summary: FUSE: Scheduling while atomic OOPSes when using inval_entry

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.7
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Brian Foster
QA Contact:	Zorro Lang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1155771 1164931
TreeView+	depends on / blocked

Reported:	2014-09-18 18:14 UTC by Richard Sharpe
Modified:	2015-07-22 08:18 UTC (History)
CC List:	8 users (show)
Fixed In Version:	kernel-2.6.32-556.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1155771 (view as bug list)
Environment:
Last Closed:	2015-07-22 08:18:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
The patch to avoid scheduling while atomic in fuse.ko (813 bytes, text/plain) 2014-09-18 18:14 UTC, Richard Sharpe	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1272	0	normal	SHIPPED_LIVE	Moderate: kernel security, bug fix, and enhancement update	2015-07-22 11:56:25 UTC

Description Richard Sharpe 2014-09-18 18:14:49 UTC

Created attachment 939006 [details]
The patch to avoid scheduling while atomic in fuse.ko

Description of problem:

I am seeing this OOPS reasonable frequently after I added a call to
fuse_lowlevel_notify_inval_entry:

Jun 23 11:53:24 localhost kernel: BUG: scheduling while atomic:
fuse.hf/13976/0x10000001
Jun 23 11:53:24 localhost kernel: Modules linked in: nls_utf8 fuse
ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM
iptable_mangle bridge nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs
autofs4 8021q garp stp llc vboxpci(U) vboxnetadp(U) vboxnetflt(U)
vboxdrv(U) cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
ip6_tables ipv6 vhost_net macvtap macvlan tun kvm_intel kvm uinput
iTCO_wdt iTCO_vendor_support microcode serio_raw r8169 mii i2c_i801 sg
lpc_ich mfd_core snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm
snd_timer snd soundcore snd_page_alloc shpchp ext4 jbd2 mbcache sr_mod
cdrom sd_mod crc_t10dif ahci xhci_hcd wmi radeon ttm drm_kms_helper
drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last
unloaded: scsi_wait_scan]
Jun 23 11:53:24 localhost kernel: Pid: 13976, comm: fuse.hf Not
tainted 2.6.32-431.20.3.el6.x86_64 #1
Jun 23 11:53:24 localhost kernel: Call Trace:
Jun 23 11:53:24 localhost kernel: [<ffffffff8105da66>] ?
__schedule_bug+0x66/0x70
Jun 23 11:53:24 localhost kernel: [<ffffffff81528e30>] ?
thread_return+0x640/0x760
Jun 23 11:53:24 localhost kernel: [<ffffffff810695fa>] ?
__cond_resched+0x2a/0x40
Jun 23 11:53:24 localhost kernel: [<ffffffff81529220>] ? _cond_resched+0x30/0x40
Jun 23 11:53:24 localhost kernel: [<ffffffff8116f405>] ?
kmem_cache_alloc_trace+0xe5/0x1b0
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d095e>] ?
fuse_notify+0x9e/0x2a0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d1580>] ?
fuse_dev_write+0x0/0x3e0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d0593>] ?
fuse_copy_one+0x53/0x70 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d16ea>] ?
fuse_dev_write+0x16a/0x3e0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffff8105a323>] ?
enqueue_pushable_task+0x73/0x90
Jun 23 11:53:24 localhost kernel: [<ffffffffa03d1580>] ?
fuse_dev_write+0x0/0x3e0 [fuse]
Jun 23 11:53:24 localhost kernel: [<ffffffff811889bb>] ?
do_sync_readv_writev+0xfb/0x140
Jun 23 11:53:24 localhost kernel: [<ffffffff8109afa0>] ?
autoremove_wake_function+0x0/0x40
Jun 23 11:53:24 localhost kernel: [<ffffffff810b0f10>] ? do_futex+0x100/0xb50
Jun 23 11:53:24 localhost kernel: [<ffffffff81226c06>] ?
security_file_permission+0x16/0x20
Jun 23 11:53:24 localhost kernel: [<ffffffff81189946>] ?
do_readv_writev+0xd6/0x1f0
Jun 23 11:53:24 localhost kernel: [<ffffffff81189aa6>] ? vfs_writev+0x46/0x60
Jun 23 11:53:24 localhost kernel: [<ffffffff81189bd1>] ? sys_writev+0x51/0xb0
Jun 23 11:53:24 localhost kernel: [<ffffffff810e1c7e>] ?
__audit_syscall_exit+0x25e/0x290
Jun 23 11:53:24 localhost kernel: [<ffffffff8100b072>] ?
system_call_fastpath+0x16/0x1b

Version-Release number of selected component (if applicable):

Seems to exist in all versions of RHEL 6.x because there is a kzalloc after a kmap_atomic call, and if memory is low, we will schedule while atomic.

How reproducible:

Add code to your fuse file system to call fuse_lowlevel_notify_inval_entry.

Steps to Reproduce:
1. As above
2. Consume lots of memory
3. Create lots of files and exercise the code path where fuse_lowlevel_notify_inval_entry is called.

Actual results:

Lots of OOPS messages in /var/log/messages. These seem to be benign but they scare admins.

Expected results:

No such messages.

Additional info:

I have a patch which I will attach but there is a better one at:

https://lkml.org/lkml/2014/7/4/361

However, this will need a little massaging for 2.6.32.

Comment 2 Ric Wheeler 2014-09-19 20:48:58 UTC

Brian, Eric, 

Richard has hit this on all versions of RHEL6.x without the attached patch.

Can we get this nominated for a RHEL6.x kernel build?

We might eventually see this in Red Hat Storage.

Thanks!

Comment 3 Brian Foster 2014-09-27 13:55:29 UTC

I did some brief legwork on this a few days ago and managed to pull enough back to compile successfully on latest rhel6. I'll need to get back to it, review it more carefully and test, but I think we have plenty of time to get this fixed for 6.7. Set devel ack.

Comment 5 RHEL Program Management 2014-11-10 23:10:31 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 6 Kurt Stutsman 2015-04-27 16:14:32 UTC

Patch(es) available on kernel-2.6.32-556.el6

Comment 9 Zorro Lang 2015-06-11 10:26:42 UTC

Hi Richard,

I can't reproduce this bug easily. Could you give some detailed information about "lots of memory" and "lots of files"?
At least, How much memory should I consume? How many files should I create?

1) I mount a glusterfs
2) I mmap 2G memory and keep writing 2G random data to this memory. This consume nearly 100% memory of my machine.
3) I created 10000 files in the glusterfs, and invalidate their inode(setfattr -n 'inode-invalidate' testfile${num}) one by one. But still can not reproduce. Is there something I missed?

Thanks,
Zorro

Comment 10 Richard Sharpe 2015-06-11 18:35:01 UTC

I no longer work on that stuff, but here is my memory of what we were doing.

1. A long create run where we were creating anywhere between 4M and 20M files.

2. The create run used 10 threads, so at least 10 sub-directories, but could be more (like 100).

3. File sizes varied from 8kiB to 64kiB of random data.

However, the problem might also only occur because the FUSE file system I was working on was not properly cleaning up inodes etc.

Comment 11 Zorro Lang 2015-06-17 03:20:17 UTC

Done FUSE regression test test on kernel 567. No regression failures. This bug still hard to reproduce for me. I will sanityOnly this bug first.

Comment 13 errata-xmlrpc 2015-07-22 08:18:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1272.html

Note You need to log in before you can comment on or make changes to this bug.