Bug 835019 - trying to mount an empty 1K partition causes a hang in ext4 driver, using 100% CPU
trying to mount an empty 1K partition causes a hang in ext4 driver, using 100...
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
: Reopened
: 835084 (view as bug list)
Depends On:
Blocks: 834896
  Show dependency treegraph
 
Reported: 2012-06-25 05:18 EDT by Richard W.M. Jones
Modified: 2012-08-28 07:05 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 834896
: 835084 (view as bug list)
Environment:
Last Closed: 2012-08-28 07:05:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
test1.img.xz (15.19 KB, application/x-xz)
2012-06-25 05:32 EDT, Richard W.M. Jones
no flags Details

  None (edit)
Description Richard W.M. Jones 2012-06-25 05:18:42 EDT
+++ This bug was initially created as a clone of Bug #834896 +++

If you try to mount an extended partition directly,
previously it would give an error (which seems like the
correct thing to do, since an extended partition is never
a filesystem, and so cannot be mounted).

However with recent kernels it goes into an infinite loop
using 100% of CPU.

Here is a simple reproducer using libguestfs.

guestfish -x <<EOF
  sparse test1.img 100M
  run
  part-init /dev/sda mbr
  part-add /dev/sda p 32 127
  part-add /dev/sda e 128 -32
  part-add /dev/sda l 140 499
  part-add /dev/sda l 501 -64
  part-list /dev/sda
  mount /dev/sda2 /
EOF

It hangs at the last (mount) line where it's trying to mount
the extended partition.

You can get additional debug information by adding the '-v'
flag to the guestfish command line.

guestfsd is just executing this command:

  mount -o "" /dev/vda2 /sysroot/

So it appears to be a kernel bug.

Affected systems:

Distro      Kernel                   Affected?

Fedora 16   3.1.0-7.fc16.x86_64      No
Fedora 16   3.4.2-1.fc16.x86_64      Yes
Fedora 17   3.4.0-1.fc17.x86_64      Yes
Rawhide     3.5.0-0.rc2.git0.1.fc18.x86_64  Yes
Rawhide     3.5.0-0.rc3.git0.2.fc18.x86_64  Yes
RHEL 6      2.6.32-221.el6.x86_64    No

So it appears to be a bug that has been introduced to
the kernel between 3.1.0 and 3.4.2 (unfortunately rather
a large range of versions!)
Comment 1 Richard W.M. Jones 2012-06-25 05:32:27 EDT
Created attachment 594138 [details]
test1.img.xz

Here is a way to reproduce this without libguestfs, using
a virtual machine.

Take the attached disk image and uncompress it.

Then add it as an extra disk to a virtual machine.

Boot the virtual machine, and inside run the following
command (assumes that you added the disk image as /dev/vdb):

  mkdir /tmp/mnt
  mount -o '' /dev/vdb2 /tmp/mnt

The mount command will spin in a loop using 100% of CPU,
apparently forever (or at least for many minutes).

Also the mount command is unkillable, even with -9.
Comment 2 Richard W.M. Jones 2012-06-25 05:57:38 EDT
Possibly this bug?
http://www.spinics.net/lists/linux-ext4/msg32567.html
Comment 3 Richard W.M. Jones 2012-06-25 07:25:41 EDT
Stack trace from 'mount' command (captured using sysrq + t):

[    8.073005] mount           R  running task        0   134    133 0x00000000
[    8.073005]  ffff88001d6e3aa8 0000000000000082 ffff88001d768000 ffff88001d6e3fd8
[    8.073005]  ffff88001d6e3fd8 ffff88001d6e3fd8 ffff88001d769700 ffff88001d768000
[    8.073005]  0000000000000000 ffff88001d6e2000 0000000000000000 ffff88001dc26c60
[    8.073005] Call Trace:
[    8.073005]  [<ffffffff8108671a>] __cond_resched+0x2a/0x40
[    8.073005]  [<ffffffff815ef820>] _cond_resched+0x30/0x40
[    8.073005]  [<ffffffff8111d2eb>] find_lock_page+0x3b/0x80
[    8.073005]  [<ffffffff8111d9df>] find_or_create_page+0x3f/0xb0
[    8.073005]  [<ffffffff811acf12>] __getblk+0xf2/0x2a0
[    8.073005]  [<ffffffff811ad113>] __bread+0x13/0xb0
[    8.073005]  [<ffffffff8121b4e7>] ext4_fill_super+0x207/0x2a50
[    8.073005]  [<ffffffff8118055b>] mount_bdev+0x1cb/0x210
[    8.073005]  [<ffffffff8121b2e0>] ? ext4_remount+0x5d0/0x5d0
[    8.073005]  [<ffffffff8116b611>] ? __kmalloc_track_caller+0x51/0x180
[    8.073005]  [<ffffffff8120a7f5>] ext4_mount+0x15/0x20
[    8.073005]  [<ffffffff81181063>] mount_fs+0x43/0x1b0
[    8.073005]  [<ffffffff8113de80>] ? __alloc_percpu+0x10/0x20
[    8.073005]  [<ffffffff81199bc7>] vfs_kern_mount+0x67/0xf0
[    8.073005]  [<ffffffff8119a6e4>] do_kern_mount+0x54/0x110
[    8.073005]  [<ffffffff8119bf4a>] do_mount+0x26a/0x840
[    8.073005]  [<ffffffff8113832b>] ? strndup_user+0x5b/0x80
[    8.073005]  [<ffffffff8119c65d>] sys_mount+0x8d/0xe0
[    8.073005]  [<ffffffff815f8ae9>] system_call_fastpath+0x16/0x1b
Comment 4 Richard W.M. Jones 2012-06-25 07:31:00 EDT
Here's an even simpler way to reproduce the bug.  Simply
create a 1024 byte device (empty) and try to mount it:

guestfish -x -v <<EOF                                                           
  sparse test1.img 1024                                                         
  run                                                                           
  mount /dev/sda /                                                              
EOF                                                                             

The stack trace from this one is substantially the same:

[    7.476010] mount           R  running task        0   109    108 0x00000000
[    7.476010]  ffff88001d783aa8 0000000000000082 ffff88001d6cc500 ffff88001d783fd8
[    7.476010]  ffff88001d783fd8 ffff88001d783fd8 ffff88001d430000 ffff88001d6cc500
[    7.476010]  ffffea0000722ddc ffff88001d782000 0000000000000000 ffff88001dc248a0
[    7.476010] Call Trace:
[    7.476010]  [<ffffffff8108671a>] __cond_resched+0x2a/0x40
[    7.476010]  [<ffffffff8111d2f2>] ? find_lock_page+0x42/0x80
[    7.476010]  [<ffffffff815ef820>] _cond_resched+0x30/0x40
[    7.476010]  [<ffffffff8111d2eb>] find_lock_page+0x3b/0x80
[    7.476010]  [<ffffffff8111d9df>] find_or_create_page+0x3f/0xb0
[    7.476010]  [<ffffffff811acf12>] __getblk+0xf2/0x2a0
[    7.476010]  [<ffffffff811ad113>] __bread+0x13/0xb0
[    7.476010]  [<ffffffff8121b4e7>] ext4_fill_super+0x207/0x2a50
[    7.476010]  [<ffffffff8118055b>] mount_bdev+0x1cb/0x210
[    7.476010]  [<ffffffff8121b2e0>] ? ext4_remount+0x5d0/0x5d0
[    7.476010]  [<ffffffff8116b611>] ? __kmalloc_track_caller+0x51/0x180
[    7.476010]  [<ffffffff8120a7f5>] ext4_mount+0x15/0x20
[    7.476010]  [<ffffffff81181063>] mount_fs+0x43/0x1b0
[    7.476010]  [<ffffffff8113de80>] ? __alloc_percpu+0x10/0x20
[    7.476010]  [<ffffffff81199bc7>] vfs_kern_mount+0x67/0xf0
[    7.476010]  [<ffffffff8119a6e4>] do_kern_mount+0x54/0x110
[    7.476010]  [<ffffffff8119bf4a>] do_mount+0x26a/0x840
[    7.476010]  [<ffffffff8113832b>] ? strndup_user+0x5b/0x80
[    7.476010]  [<ffffffff8119c65d>] sys_mount+0x8d/0xe0
[    7.476010]  [<ffffffff815f8ae9>] system_call_fastpath+0x16/0x1b
Comment 5 Richard W.M. Jones 2012-06-25 15:38:08 EDT
Thanks to Jeff Moyer who suggested the following patch:

https://lkml.org/lkml/2012/6/25/306

which fixes this bug.
Comment 6 Josh Boyer 2012-06-25 16:42:45 EDT
*** Bug 835084 has been marked as a duplicate of this bug. ***
Comment 7 Josh Boyer 2012-06-26 11:38:58 EDT
Patch committed to Fedora git.  Will be in the next build.
Comment 8 Fedora Update System 2012-06-26 20:08:53 EDT
kernel-3.4.4-3.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/kernel-3.4.4-3.fc17
Comment 9 Fedora Update System 2012-06-26 20:11:25 EDT
kernel-3.4.4-3.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.4.4-3.fc16
Comment 10 Fedora Update System 2012-06-27 23:28:00 EDT
Package kernel-3.4.4-3.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.4.4-3.fc17'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-9988/kernel-3.4.4-3.fc17
then log in and leave karma (feedback).
Comment 11 Fedora Update System 2012-06-30 17:59:33 EDT
kernel-3.4.4-3.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 12 Fedora Update System 2012-07-05 19:50:32 EDT
kernel-3.4.4-4.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.4.4-4.fc16
Comment 13 Fedora Update System 2012-07-08 16:51:42 EDT
kernel-3.4.4-4.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 14 Richard W.M. Jones 2012-08-22 06:09:56 EDT
Reopening on the basis of this email:

https://lkml.org/lkml/2012/8/21/692
"[PATCH] block: replace __getblk_slow misfix by grow_dev_page fix"

I am now testing the alternate fix proposed there.
Comment 15 Richard W.M. Jones 2012-08-28 07:05:13 EDT
The first patch (comment 14) caused a regression.

A second version of the patch went upstream and is already
included in kernel-3.6.0-0.rc3.git2.1.fc18.x86_64.rpm.
I wasn't able to test this until now.  However I have
just tested it, and the regression has gone.  Therefore
I am closing this bug again.

Note You need to log in before you can comment on or make changes to this bug.