Bug 730998 - possible circular locking dependency on sb->s_type->i_mutex_key
possible circular locking dependency on sb->s_type->i_mutex_key
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
16
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
: Reopened
: 735206 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-16 10:03 EDT by Mikko Tiihonen
Modified: 2012-09-07 12:09 EDT (History)
8 users (show)

See Also:
Fixed In Version: kernel-3.1.0-0.rc4.git0.0.fc16
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-09-07 12:09:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Unit test that triggers warning (2.02 MB, application/octet-stream)
2011-09-12 11:49 EDT, Albert Strasheim
no flags Details
strace log (8.69 KB, application/octet-stream)
2011-09-12 11:49 EDT, Albert Strasheim
no flags Details

  None (edit)
Description Mikko Tiihonen 2011-08-16 10:03:19 EDT
Description of problem:
[ INFO: possible circular locking dependency detected ]
3.0.1-3.fc16.x86_64 #1
-------------------------------------------------------
find/645 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac

but task is already holding lock:
 (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>] vfs_readdir+0x5b/0xb4

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
       [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
       [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
       [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
       [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
       [<ffffffff81111557>] mmap_region+0x258/0x432
       [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
       [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
       [<ffffffff8100c858>] sys_mmap+0x22/0x24
       [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

-> #0 (&mm->mmap_sem){++++++}:
       [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
       [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
       [<ffffffff81109541>] might_fault+0x89/0xac
       [<ffffffff81149cff>] filldir+0x6f/0xc7
       [<ffffffff811586ea>] dcache_readdir+0x67/0x205
       [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
       [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
       [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sb->s_type->i_mutex_key);
                               lock(&mm->mmap_sem);
                               lock(&sb->s_type->i_mutex_key);
  lock(&mm->mmap_sem);

 *** DEADLOCK ***

1 lock held by find/645:
 #0:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>] vfs_readdir+0x5b/0xb4

stack backtrace:
Pid: 645, comm: find Not tainted 3.0.1-3.fc16.x86_64 #1
Call Trace:
 [<ffffffff814d3571>] print_circular_bug+0x1f8/0x209
 [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
 [<ffffffff81087ae3>] ? register_lock_class+0x1e/0x2d3
 [<ffffffff8100e9fd>] ? paravirt_read_tsc+0x9/0xd
 [<ffffffff8100e9fd>] ? paravirt_read_tsc+0x9/0xd
 [<ffffffff81109514>] ? might_fault+0x5c/0xac
 [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
 [<ffffffff81109514>] ? might_fault+0x5c/0xac
 [<ffffffff8108aefc>] ? mark_held_locks+0x4b/0x6d
 [<ffffffff81109541>] might_fault+0x89/0xac
 [<ffffffff81109514>] ? might_fault+0x5c/0xac
 [<ffffffff81149cff>] filldir+0x6f/0xc7
 [<ffffffff81149c90>] ? sys_ioctl+0x7b/0x7b
 [<ffffffff811586ea>] dcache_readdir+0x67/0x205
 [<ffffffff81149c90>] ? sys_ioctl+0x7b/0x7b
 [<ffffffff81149c90>] ? sys_ioctl+0x7b/0x7b
 [<ffffffff81149c90>] ? sys_ioctl+0x7b/0x7b
 [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
 [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b


filesystem is btrfs
Comment 1 Josh Boyer 2011-08-22 11:38:25 EDT
This was reported here:

https://lkml.org/lkml/2011/4/15/272

and the thread kinda died.  I've poked upstream again and maybe we'll get some more focus.
Comment 2 Josh Boyer 2011-08-25 09:50:37 EDT
I've posted a patch for this upstream:

https://lkml.org/lkml/2011/8/25/144
Comment 3 Josh Boyer 2011-08-26 08:47:11 EDT
This will be fixed with the next f16 build
Comment 4 Fedora Update System 2011-08-30 07:08:58 EDT
kernel-3.1.0-0.rc4.git0.0.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.1.0-0.rc4.git0.0.fc16
Comment 5 bsfmig 2011-08-30 08:08:55 EDT
Hope next mirror sync can fix the issue. Seen on a 3.1.0-0.rc3 kernel.
[   60.496540] 
[   60.496542] =======================================================
[   60.496607] [ INFO: possible circular locking dependency detected ]
[   60.496645] 3.1.0-0.rc3.git0.0.fc16.x86_64 #1
[   60.496672] -------------------------------------------------------
[   60.496710] dconf-service/1546 is trying to acquire lock:
[   60.496743]  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffff811ae562>] ext4_evict_inode+0x76/0x33c
[   60.496830] 
[   60.496831] but task is already holding lock:
[   60.496870]  (&mm->mmap_sem){++++++}, at: [<ffffffff81116f9a>] sys_munmap+0x3b/0x60
[   60.496936] 
[   60.496936] which lock already depends on the new lock.
[   60.496937] 
[   60.496991] 
[   60.496991] the existing dependency chain (in reverse order) is:
[   60.497039] 
[   60.497039] -> #1 (&mm->mmap_sem){++++++}:
[   60.497095]        [<ffffffff8108eff1>] lock_acquire+0xf3/0x13e
[   60.497137]        [<ffffffff8110fbd7>] might_fault+0x89/0xac
[   60.497185]        [<ffffffff81151bfb>] filldir+0x6f/0xc7
[   60.497225]        [<ffffffff811a5347>] call_filldir+0x96/0xc0
[   60.497267]        [<ffffffff811a5680>] ext4_readdir+0x1bd/0x548
[   60.497309]        [<ffffffff81151e50>] vfs_readdir+0x7b/0xb4
[   60.497349]        [<ffffffff81151f6f>] sys_getdents+0x7e/0xd1
[   60.497389]        [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b
[   60.497436] 
[   60.497436] -> #0 (&sb->s_type->i_mutex_key#13){+.+.+.}:
[   60.497504]        [<ffffffff8108e81e>] __lock_acquire+0xa1a/0xcf7
[   60.497545]        [<ffffffff8108eff1>] lock_acquire+0xf3/0x13e
[   60.497586]        [<ffffffff815025db>] __mutex_lock_common+0x5d/0x39a
[   60.497631]        [<ffffffff81502a27>] mutex_lock_nested+0x40/0x45
[   60.497674]        [<ffffffff811ae562>] ext4_evict_inode+0x76/0x33c
[   60.497715]        [<ffffffff81157cf2>] evict+0x99/0x153
[   60.497754]        [<ffffffff81157f3d>] iput+0x191/0x19a
[   60.497791]        [<ffffffff81154bf5>] dentry_kill+0x123/0x145
[   60.497832]        [<ffffffff81155004>] dput+0xf7/0x107
[   60.497868]        [<ffffffff811440db>] fput+0x1dd/0x1f5
[   60.497908]        [<ffffffff811158ee>] remove_vma+0x56/0x87
[   60.497947]        [<ffffffff81116afd>] do_munmap+0x2f2/0x30b
[   60.497987]        [<ffffffff81116fa8>] sys_munmap+0x49/0x60
[   60.498026]        [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b
[   60.498070] 
[   60.498070] other info that might help us debug this:
[   60.498071] 
[   60.498124]  Possible unsafe locking scenario:
[   60.498125] 
[   60.498166]        CPU0                    CPU1
[   60.498194]        ----                    ----
[   60.499322]   lock(&mm->mmap_sem);
[   60.500440]                                lock(&sb->s_type->i_mutex_key);
[   60.501599]                                lock(&mm->mmap_sem);
[   60.502714]   lock(&sb->s_type->i_mutex_key);
[   60.503820] 
[   60.503820]  *** DEADLOCK ***
[   60.503821] 
[   60.508404] 1 lock held by dconf-service/1546:
[   60.509729]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81116f9a>] sys_munmap+0x3b/0x60
[   60.511027] 
[   60.511029] stack backtrace:
[   60.513393] Pid: 1546, comm: dconf-service Not tainted 3.1.0-0.rc3.git0.0.fc16.x86_64 #1
[   60.513396] Call Trace:
[   60.513406]  [<ffffffff814f9b74>] print_circular_bug+0x1f8/0x209
[   60.513412]  [<ffffffff8108e81e>] __lock_acquire+0xa1a/0xcf7
[   60.513416]  [<ffffffff8108be17>] ? register_lock_class+0x1e/0x2d3
[   60.513422]  [<ffffffff811ae562>] ? ext4_evict_inode+0x76/0x33c
[   60.513425]  [<ffffffff8108eff1>] lock_acquire+0xf3/0x13e
[   60.513429]  [<ffffffff811ae562>] ? ext4_evict_inode+0x76/0x33c
[   60.513432]  [<ffffffff811ae562>] ? ext4_evict_inode+0x76/0x33c
[   60.513438]  [<ffffffff815025db>] __mutex_lock_common+0x5d/0x39a
[   60.513441]  [<ffffffff811ae562>] ? ext4_evict_inode+0x76/0x33c
[   60.513446]  [<ffffffff810152af>] ? native_sched_clock+0x34/0x36
[   60.513450]  [<ffffffff810152ba>] ? sched_clock+0x9/0xd
[   60.513453]  [<ffffffff8108b885>] ? trace_hardirqs_off+0xd/0xf
[   60.513457]  [<ffffffff8108bdf0>] ? lock_release_holdtime.part.9+0x59/0x62
[   60.513461]  [<ffffffff81502a27>] mutex_lock_nested+0x40/0x45
[   60.513464]  [<ffffffff811ae562>] ext4_evict_inode+0x76/0x33c
[   60.513469]  [<ffffffff81157cf2>] evict+0x99/0x153
[   60.513472]  [<ffffffff81157f3d>] iput+0x191/0x19a
[   60.513477]  [<ffffffff81154bf5>] dentry_kill+0x123/0x145
[   60.513481]  [<ffffffff81155004>] dput+0xf7/0x107
[   60.513486]  [<ffffffff811440db>] fput+0x1dd/0x1f5
[   60.513491]  [<ffffffff811158ee>] remove_vma+0x56/0x87
[   60.513494]  [<ffffffff81116afd>] do_munmap+0x2f2/0x30b
[   60.513498]  [<ffffffff81116fa8>] sys_munmap+0x49/0x60
[   60.513503]  [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b
Comment 6 Josh Boyer 2011-08-30 08:29:05 EDT
(In reply to comment #5)
> Hope next mirror sync can fix the issue. Seen on a 3.1.0-0.rc3 kernel.
> [   60.496540] 
> [   60.496542] =======================================================
> [   60.496607] [ INFO: possible circular locking dependency detected ]
> [   60.496645] 3.1.0-0.rc3.git0.0.fc16.x86_64 #1
> [   60.496672] -------------------------------------------------------
> [   60.496710] dconf-service/1546 is trying to acquire lock:
> [   60.496743]  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffff811ae562>]
> ext4_evict_inode+0x76/0x33c
> [   60.496830] 
> [   60.496831] but task is already holding lock:
> [   60.496870]  (&mm->mmap_sem){++++++}, at: [<ffffffff81116f9a>]
> sys_munmap+0x3b/0x60
> [   60.496936] 
> [   60.496936] which lock already depends on the new lock.
> [   60.496937] 
> [   60.496991] 
> [   60.496991] the existing dependency chain (in reverse order) is:
> [   60.497039] 
> [   60.497039] -> #1 (&mm->mmap_sem){++++++}:
> [   60.497095]        [<ffffffff8108eff1>] lock_acquire+0xf3/0x13e
> [   60.497137]        [<ffffffff8110fbd7>] might_fault+0x89/0xac
> [   60.497185]        [<ffffffff81151bfb>] filldir+0x6f/0xc7
> [   60.497225]        [<ffffffff811a5347>] call_filldir+0x96/0xc0
> [   60.497267]        [<ffffffff811a5680>] ext4_readdir+0x1bd/0x548
> [   60.497309]        [<ffffffff81151e50>] vfs_readdir+0x7b/0xb4
> [   60.497349]        [<ffffffff81151f6f>] sys_getdents+0x7e/0xd1
> [   60.497389]        [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b
> [   60.497436] 
> [   60.497436] -> #0 (&sb->s_type->i_mutex_key#13){+.+.+.}:
> [   60.497504]        [<ffffffff8108e81e>] __lock_acquire+0xa1a/0xcf7
> [   60.497545]        [<ffffffff8108eff1>] lock_acquire+0xf3/0x13e
> [   60.497586]        [<ffffffff815025db>] __mutex_lock_common+0x5d/0x39a
> [   60.497631]        [<ffffffff81502a27>] mutex_lock_nested+0x40/0x45
> [   60.497674]        [<ffffffff811ae562>] ext4_evict_inode+0x76/0x33c
> [   60.497715]        [<ffffffff81157cf2>] evict+0x99/0x153
> [   60.497754]        [<ffffffff81157f3d>] iput+0x191/0x19a
> [   60.497791]        [<ffffffff81154bf5>] dentry_kill+0x123/0x145
> [   60.497832]        [<ffffffff81155004>] dput+0xf7/0x107
> [   60.497868]        [<ffffffff811440db>] fput+0x1dd/0x1f5
> [   60.497908]        [<ffffffff811158ee>] remove_vma+0x56/0x87
> [   60.497947]        [<ffffffff81116afd>] do_munmap+0x2f2/0x30b
> [   60.497987]        [<ffffffff81116fa8>] sys_munmap+0x49/0x60
> [   60.498026]        [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b

That's bug 732572, not this one.
Comment 7 Fedora Update System 2011-08-30 16:40:04 EDT
Package kernel-3.1.0-0.rc4.git0.0.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.1.0-0.rc4.git0.0.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/kernel-3.1.0-0.rc4.git0.0.fc16
then log in and leave karma (feedback).
Comment 8 Josh Boyer 2011-09-01 15:51:51 EDT
*** Bug 735206 has been marked as a duplicate of this bug. ***
Comment 9 Fedora Update System 2011-09-09 13:09:52 EDT
kernel-3.1.0-0.rc4.git0.0.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 10 Albert Strasheim 2011-09-12 07:48:35 EDT
Doesn't seem to be fixed:

[   40.584440] =======================================================
[   40.592303] [ INFO: possible circular locking dependency detected ]
[   40.598613] 3.1.0-0.rc4.git0.0.fc16.x86_64 #1
[   40.603012] -------------------------------------------------------
[   40.609323] foo/2369 is trying to acquire lock:
[   40.614239]  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff811e5e0c>] hugetlbfs_file_mmap+0x82/0x10a
[   40.624291]
[   40.624291] but task is already holding lock:
[   40.630230]  (&mm->mmap_sem){++++++}, at: [<ffffffff811185f5>] sys_mmap_pgoff+0xf8/0x16a
[   40.638561]
[   40.638561] which lock already depends on the new lock.
[   40.638562]
[   40.646903]
[   40.646903] the existing dependency chain (in reverse order) is:
[   40.654933]
[   40.654933] -> #1 (&mm->mmap_sem){++++++}:
[   40.661251]        [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
[   40.667501]        [<ffffffff8110fffb>] might_fault+0x89/0xac
[   40.673576]        [<ffffffff811520fe>] filldir64+0x7f/0xcd
[   40.679480]        [<ffffffff81160a75>] dcache_readdir+0x64/0x1fc
[   40.685901]        [<ffffffff8115227c>] vfs_readdir+0x7b/0xb4
[   40.691988]        [<ffffffff8115246c>] sys_getdents64+0x7e/0xca
[   40.698324]        [<ffffffff8150b742>] system_call_fastpath+0x16/0x1b
[   40.705179]
[   40.705179] -> #0 (&sb->s_type->i_mutex_key#15){+.+.+.}:
[   40.712774]        [<ffffffff8108e963>] __lock_acquire+0xa2f/0xd0c
[   40.719285]        [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
[   40.725532]        [<ffffffff81502cb3>] __mutex_lock_common+0x5d/0x39a
[   40.732387]        [<ffffffff815030ff>] mutex_lock_nested+0x40/0x45
[   40.738981]        [<ffffffff811e5e0c>] hugetlbfs_file_mmap+0x82/0x10a
[   40.745837]        [<ffffffff81118000>] mmap_region+0x274/0x46b
[   40.752086]        [<ffffffff811184a3>] do_mmap_pgoff+0x2ac/0x306
[   40.758507]        [<ffffffff81118615>] sys_mmap_pgoff+0x118/0x16a
[   40.765017]        [<ffffffff81012888>] sys_mmap+0x22/0x24
[   40.770825]        [<ffffffff8150b742>] system_call_fastpath+0x16/0x1b
[   40.777680]
[   40.777680] other info that might help us debug this:
[   40.777681]
[   40.786521]  Possible unsafe locking scenario:
[   40.786521]
[   40.792994]        CPU0                    CPU1
[   40.797793]        ----                    ----
[   40.802590]   lock(&mm->mmap_sem);
[   40.806393]                                lock(&sb->s_type->i_mutex_key);
[   40.813653]                                lock(&mm->mmap_sem);
[   40.819971]   lock(&sb->s_type->i_mutex_key);
[   40.824716]
[   40.824716]  *** DEADLOCK ***
[   40.824717]
[   40.831485] 1 lock held by puttups/2369:
[   40.835678]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff811185f5>] sys_mmap_pgoff+0xf8/0x16a
[   40.844951]
[   40.844952] stack backtrace:
[   40.849862] Pid: 2369, comm: foo Tainted: G        W   3.1.0-0.rc4.git0.0.fc16.x86_64 #1
[   40.858781] Call Trace:
[   40.861500]  [<ffffffff814fa254>] print_circular_bug+0x1f8/0x209
[   40.867773]  [<ffffffff8108e963>] __lock_acquire+0xa2f/0xd0c
[   40.873859]  [<ffffffff811314b5>] ? deactivate_slab+0x28f/0x2b5
[   40.880047]  [<ffffffff811e5e0c>] ? hugetlbfs_file_mmap+0x82/0x10a
[   40.886484]  [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
[   40.892146]  [<ffffffff811e5e0c>] ? hugetlbfs_file_mmap+0x82/0x10a
[   40.898594]  [<ffffffff811e5e0c>] ? hugetlbfs_file_mmap+0x82/0x10a
[   40.905034]  [<ffffffff81502cb3>] __mutex_lock_common+0x5d/0x39a
[   40.911300]  [<ffffffff811e5e0c>] ? hugetlbfs_file_mmap+0x82/0x10a
[   40.917748]  [<ffffffff81117f35>] ? mmap_region+0x1a9/0x46b
[   40.923583]  [<ffffffff815030ff>] mutex_lock_nested+0x40/0x45
[   40.929589]  [<ffffffff811e5e0c>] hugetlbfs_file_mmap+0x82/0x10a
[   40.935855]  [<ffffffff81118000>] mmap_region+0x274/0x46b
[   40.941517]  [<ffffffff811184a3>] do_mmap_pgoff+0x2ac/0x306
[   40.947353]  [<ffffffff81118615>] sys_mmap_pgoff+0x118/0x16a
[   40.953273]  [<ffffffff8108f58b>] ? trace_hardirqs_on_caller+0x121/0x158
[   40.960241]  [<ffffffff81253b5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   40.966946]  [<ffffffff81012888>] sys_mmap+0x22/0x24
[   40.972184]  [<ffffffff8150b742>] system_call_fastpath+0x16/0x1b
Comment 11 Josh Boyer 2011-09-12 09:53:26 EDT
(In reply to comment #10)
> Doesn't seem to be fixed:
> 
> [   40.584440] =======================================================
> [   40.592303] [ INFO: possible circular locking dependency detected ]
> [   40.598613] 3.1.0-0.rc4.git0.0.fc16.x86_64 #1
> [   40.603012] -------------------------------------------------------
> [   40.609323] foo/2369 is trying to acquire lock:
> [   40.614239]  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff811e5e0c>]
> hugetlbfs_file_mmap+0x82/0x10a

Huh.  Odd.  What is the foo process?  Is it something that can be used to easily recreate this?
Comment 12 Albert Strasheim 2011-09-12 11:37:56 EDT
Unfortunately foo is internal code that I can't easily release.

It's doing something along the following lines when the error occurs:

1. open a file in /dev/hugepages
2. unlink the file
3. mmap the file descriptor with a 64 MiB length parameter

It does this for two buffers of the same size.

There is a small unit test for this code that might trigger the bug. I'll try to run it after the system has booted and send the binary and an strace if it works.
Comment 13 Albert Strasheim 2011-09-12 11:49:26 EDT
Created attachment 522726 [details]
Unit test that triggers warning
Comment 14 Albert Strasheim 2011-09-12 11:49:51 EDT
Created attachment 522727 [details]
strace log
Comment 15 Albert Strasheim 2011-09-12 11:53:57 EDT
# cat /etc/sysctl.d/vm.conf
vm.nr_hugepages = 8192
vm.overcommit_ratio = 90
vm.overcommit_memory = 2
vm.max_map_count = 131072

# cat /proc/meminfo
MemTotal:       165056552 kB
HugePages_Total:    8192
HugePages_Free:     8192

# cat /etc/systemd/system/dev-hugepages.mount
[Unit]
Description=Huge Pages File System
DefaultDependencies=no

[Mount]
What=hugetlbfs
Where=/dev/hugepages
Type=hugetlbfs
Options=mode=1777,uid=0,gid=0
Comment 16 Albert Strasheim 2011-09-13 04:24:19 EDT
I've reproduced this on multiple machines using the 6.out I provided.
Comment 17 Albert Strasheim 2012-03-07 05:43:09 EST
This is broken (again?) with 3.2.7-1.

https://lkml.org/lkml/2012/2/16/498
Comment 18 Albert Strasheim 2012-03-19 04:17:07 EDT
More discussion:

https://lkml.org/lkml/2012/3/1/77
https://lkml.org/lkml/2012/3/8/83
Comment 19 Dave Jones 2012-03-22 12:55:59 EDT
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
Comment 20 Dave Jones 2012-03-22 12:59:53 EDT
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
Comment 21 Dave Jones 2012-03-22 13:11:03 EDT
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.
Comment 22 Mikko Tiihonen 2012-03-23 06:40:10 EDT
Latest that I have seen this is this, most likely it is still in final 3.3 kernel too.

[ INFO: possible circular locking dependency detected ]
3.3.0-0.rc7.git0.3.fc17.x86_64 #1 Not tainted
-------------------------------------------------------
java/1043 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#13){+.+...}, at: [<ffffffff8129808a>] hugetlbfs_file_mmap+0x8a/0x120

but task is already holding lock:
 (&mm->mmap_sem){++++++}, at: [<ffffffff8117e4c3>] sys_mmap_pgoff+0x1f3/0x270

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&mm->mmap_sem){++++++}:
       [<ffffffff810cc841>] lock_acquire+0xa1/0x1e0
       [<ffffffff811740e9>] might_fault+0x89/0xb0
       [<ffffffff811d0976>] filldir+0x76/0xf0
       [<ffffffff811e4109>] dcache_readdir+0x69/0x240
       [<ffffffff811d0c78>] vfs_readdir+0xb8/0xf0
       [<ffffffff811d0daa>] sys_getdents+0x8a/0x100
       [<ffffffff816a6b69>] system_call_fastpath+0x16/0x1b

-> #0 (&sb->s_type->i_mutex_key#13){+.+...}:
       [<ffffffff810cb9f2>] __lock_acquire+0x1432/0x1bb0
       [<ffffffff810cc841>] lock_acquire+0xa1/0x1e0
       [<ffffffff81699f06>] mutex_lock_nested+0x76/0x3a0
       [<ffffffff8129808a>] hugetlbfs_file_mmap+0x8a/0x120
       [<ffffffff8117dd6a>] mmap_region+0x3ca/0x5a0
       [<ffffffff8117e285>] do_mmap_pgoff+0x345/0x390
       [<ffffffff8117e4e6>] sys_mmap_pgoff+0x216/0x270
       [<ffffffff8101dc72>] sys_mmap+0x22/0x30
       [<ffffffff816a6b69>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(&sb->s_type->i_mutex_key#13);
                               lock(&mm->mmap_sem);
  lock(&sb->s_type->i_mutex_key#13);

 *** DEADLOCK ***

1 lock held by java/1043:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8117e4c3>] sys_mmap_pgoff+0x1f3/0x270

stack backtrace:
Pid: 1043, comm: java Not tainted 3.3.0-0.rc7.git0.3.fc17.x86_64 #1
Call Trace:
 [<ffffffff8169256d>] print_circular_bug+0x1fb/0x20c
 [<ffffffff810cb9f2>] __lock_acquire+0x1432/0x1bb0
 [<ffffffff811a099f>] ? deactivate_slab+0x5bf/0x6c0
 [<ffffffff8117dca4>] ? mmap_region+0x304/0x5a0
 [<ffffffff810cc841>] lock_acquire+0xa1/0x1e0
 [<ffffffff8129808a>] ? hugetlbfs_file_mmap+0x8a/0x120
 [<ffffffff81699f06>] mutex_lock_nested+0x76/0x3a0
 [<ffffffff8129808a>] ? hugetlbfs_file_mmap+0x8a/0x120
 [<ffffffff8129808a>] ? hugetlbfs_file_mmap+0x8a/0x120
 [<ffffffff8117dca4>] ? mmap_region+0x304/0x5a0
 [<ffffffff8129808a>] hugetlbfs_file_mmap+0x8a/0x120
 [<ffffffff8117dd6a>] mmap_region+0x3ca/0x5a0
 [<ffffffff81297ded>] ? hugetlb_get_unmapped_area+0x14d/0x360
 [<ffffffff8117e285>] do_mmap_pgoff+0x345/0x390
 [<ffffffff8117e4c3>] ? sys_mmap_pgoff+0x1f3/0x270
 [<ffffffff8117e4e6>] sys_mmap_pgoff+0x216/0x270
 [<ffffffff8101dc72>] sys_mmap+0x22/0x30
Comment 23 Dave Jones 2012-03-23 10:50:51 EDT
ah, yes. That is still showing up in rawhide too. It's still being worked on, possibly for 3.4
Comment 24 Josh Boyer 2012-09-07 12:09:10 EDT
I believe this was finally fixed in 3.4 or 3.5.  F17 is on 3.5 and F16 will be getting a backport hopefully soon, so closing this out.

Note You need to log in before you can comment on or make changes to this bug.