Bug 683836

Summary: ext4 crash and umount race condition
Product: [Fedora] Fedora Reporter: Albert Strasheim <fullung>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15CC: esandeen, fullung, gansalmon, itamar, jonathan, kernel-maint, lczerner, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-10 09:06:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
mount script
none
screenshot
none
screenshot 1
none
screenshot 2
none
screenshot 3
none
kernel crash none

Description Albert Strasheim 2011-03-10 13:22:41 UTC
Created attachment 483433 [details]
mount script

Description of problem:

Doing multiple umounts in parallel of a bunch of ext4 file systems causes umount to return "not mounted" for some file systems, even if /etc/mtab is a symlink to /proc/mounts.

Also, running the mount script a few times causes the kernel to crash. One crash dump attached.

Version-Release number of selected component (if applicable):

kernel-2.6.38-0.rc8.git0.2.fc14.x86_64

How reproducible:

Always

Steps to Reproduce:
1. boot with loop.max_loop=256
2. sudo ./testext4.py
3. cat /proc/mounts | grep /dev/loop | cut -d " " -f 1 | perl -ne 's/^/sudo umount /; s/$/&/; print' | sh -x
  
Actual results:

A few of the umounts fail with "not mounted". If you check /proc/mounts, the file systems are in fact mounted and another attempt to umount them succeeds.

Also, running ./testext4.py a few times with umounts in between causes the kernel to crash. One screenshot attached.

I am running on a machine with 24 cores.

Comment 1 Albert Strasheim 2011-03-10 13:23:04 UTC
Created attachment 483435 [details]
screenshot

Comment 2 Albert Strasheim 2011-03-10 13:24:09 UTC
Created attachment 483436 [details]
screenshot 1

Comment 3 Albert Strasheim 2011-03-10 13:24:42 UTC
Created attachment 483437 [details]
screenshot 2

Comment 4 Albert Strasheim 2011-03-10 13:25:29 UTC
Created attachment 483438 [details]
screenshot 3

Comment 5 Albert Strasheim 2011-03-10 13:36:12 UTC
Created attachment 483442 [details]
kernel crash

Comment 6 Eric Sandeen 2011-03-15 16:21:45 UTC
COmment #5 does indeed look like an unmount race, calling completion on something which has gone away.

Might be worth testing with a completely different filesystem (xfs, perhaps) to see whether some of this might be more of a vfs issue.

Comment 7 Albert Strasheim 2011-03-25 08:53:50 UTC
Same umount race with XFS.

Comment 8 Albert Strasheim 2011-03-30 05:05:56 UTC
This probably needs to be reported upstream?

Comment 9 Eric Sandeen 2011-03-30 18:28:15 UTC
(In reply to comment #8)
> This probably needs to be reported upstream?

Actually yes, that would be good.  Upstream has more bandwidth for bugs than I do alone.  :)

Thanks,
-Eric

Comment 10 Albert Strasheim 2011-03-31 04:56:24 UTC
Reported here:

https://bugzilla.kernel.org/show_bug.cgi?id=32312

Comment 11 Albert Strasheim 2011-05-07 20:02:40 UTC
According to Jan Kara, this bug should have been fixed by commit 0aeea18964173715a1037034ef6838198f319319 by lczerner, which went into 2.6.39-rc1.

Comment 12 Chuck Ebbert 2011-05-09 11:18:54 UTC
(In reply to comment #11)
> According to Jan Kara, this bug should have been fixed by commit
> 0aeea18964173715a1037034ef6838198f319319 by lczerner, which went
> into 2.6.39-rc1.

That patch actually went in 2.6.38 just before release.

Comment 13 Albert Strasheim 2011-05-09 11:29:01 UTC
I have retested with kernel-2.6.38.4-20.fc15.x86_64, and the bug is still there.

So if 0aeea18964173715a1037034ef6838198f319319 is in 2.6.38, it doesn't fix this bug.

Comment 14 Lukáš Czerner 2011-05-10 08:18:50 UTC
(In reply to comment #13)
> I have retested with kernel-2.6.38.4-20.fc15.x86_64, and the bug is still
> there.

Did the crash appear on the kernel-2.6.38.4-20.fc15.x86_64 as well ? Or just the "not mounted" error ?

Comment 15 Albert Strasheim 2011-05-10 08:43:00 UTC
Hello

I did a few tests and ut seems the crash is fixed in kernel-2.6.38.4-20.fc15.x86_64, but the "not mounted" error still appears.

Regards

Albert

Comment 16 Albert Strasheim 2011-05-10 08:43:21 UTC
Hello

I did a few tests and it seems the crash is fixed in kernel-2.6.38.4-20.fc15.x86_64, but the "not mounted" error still appears.

Regards

Albert

Comment 17 Lukáš Czerner 2011-05-10 09:06:39 UTC
Good, that's what I though. So Jan was right the crash has been fixed by that commit. The "not mounted" (EINVAL) error is however completely different problem and should have it's own bz entry. So if you do not mind I will close this one and you can open the new bz for that "not mounted" error (please cc me).

Thanks Albert!
-Lukas