Bug 730998

Summary:

possible circular locking dependency on sb->s_type->i_mutex_key

Product:

[Fedora] Fedora

Reporter:

Mikko Tiihonen <mikko.tiihonen>

Component:

kernel

Assignee:

Kernel Maintainer List <kernel-maint>

Status:

CLOSED NEXTRELEASE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

bigslowfat, fullung, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mads

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

kernel-3.1.0-0.rc4.git0.0.fc16

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-09-07 16:09:10 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Unit test that triggers warning	none
strace log	none

Description Mikko Tiihonen 2011-08-16 14:03:19 UTC

Description of problem:
[ INFO: possible circular locking dependency detected ]
3.0.1-3.fc16.x86_64 #1
-------------------------------------------------------
find/645 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac

but task is already holding lock:
 (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>] vfs_readdir+0x5b/0xb4

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
       [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
       [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
       [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
       [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
       [<ffffffff81111557>] mmap_region+0x258/0x432
       [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
       [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
       [<ffffffff8100c858>] sys_mmap+0x22/0x24
       [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

-> #0 (&mm->mmap_sem){++++++}:
       [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
       [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
       [<ffffffff81109541>] might_fault+0x89/0xac
       [<ffffffff81149cff>] filldir+0x6f/0xc7
       [<ffffffff811586ea>] dcache_readdir+0x67/0x205
       [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
       [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
       [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sb->s_type->i_mutex_key);
                               lock(&mm->mmap_sem);
                               lock(&sb->s_type->i_mutex_key);
  lock(&mm->mmap_sem);

 *** DEADLOCK ***

1 lock held by find/645:
 #0:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>] vfs_readdir+0x5b/0xb4

stack backtrace:
Pid: 645, comm: find Not tainted 3.0.1-3.fc16.x86_64 #1
Call Trace:
 [<ffffffff814d3571>] print_circular_bug+0x1f8/0x209
 [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
 [<ffffffff81087ae3>] ? register_lock_class+0x1e/0x2d3
 [<ffffffff8100e9fd>] ? paravirt_read_tsc+0x9/0xd
 [<ffffffff8100e9fd>] ? paravirt_read_tsc+0x9/0xd
 [<ffffffff81109514>] ? might_fault+0x5c/0xac
 [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
 [<ffffffff81109514>] ? might_fault+0x5c/0xac
 [<ffffffff8108aefc>] ? mark_held_locks+0x4b/0x6d
 [<ffffffff81109541>] might_fault+0x89/0xac
 [<ffffffff81109514>] ? might_fault+0x5c/0xac
 [<ffffffff81149cff>] filldir+0x6f/0xc7
 [<ffffffff81149c90>] ? sys_ioctl+0x7b/0x7b
 [<ffffffff811586ea>] dcache_readdir+0x67/0x205
 [<ffffffff81149c90>] ? sys_ioctl+0x7b/0x7b
 [<ffffffff81149c90>] ? sys_ioctl+0x7b/0x7b
 [<ffffffff81149c90>] ? sys_ioctl+0x7b/0x7b
 [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
 [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b


filesystem is btrfs

Comment 1 Josh Boyer 2011-08-22 15:38:25 UTC

This was reported here:

https://lkml.org/lkml/2011/4/15/272

and the thread kinda died.  I've poked upstream again and maybe we'll get some more focus.

Comment 2 Josh Boyer 2011-08-25 13:50:37 UTC

I've posted a patch for this upstream:

https://lkml.org/lkml/2011/8/25/144

Comment 3 Josh Boyer 2011-08-26 12:47:11 UTC

This will be fixed with the next f16 build

Comment 4 Fedora Update System 2011-08-30 11:08:58 UTC

kernel-3.1.0-0.rc4.git0.0.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.1.0-0.rc4.git0.0.fc16

Comment 5 bsfmig 2011-08-30 12:08:55 UTC

Hope next mirror sync can fix the issue. Seen on a 3.1.0-0.rc3 kernel.
[   60.496540] 
[   60.496542] =======================================================
[   60.496607] [ INFO: possible circular locking dependency detected ]
[   60.496645] 3.1.0-0.rc3.git0.0.fc16.x86_64 #1
[   60.496672] -------------------------------------------------------
[   60.496710] dconf-service/1546 is trying to acquire lock:
[   60.496743]  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffff811ae562>] ext4_evict_inode+0x76/0x33c
[   60.496830] 
[   60.496831] but task is already holding lock:
[   60.496870]  (&mm->mmap_sem){++++++}, at: [<ffffffff81116f9a>] sys_munmap+0x3b/0x60
[   60.496936] 
[   60.496936] which lock already depends on the new lock.
[   60.496937] 
[   60.496991] 
[   60.496991] the existing dependency chain (in reverse order) is:
[   60.497039] 
[   60.497039] -> #1 (&mm->mmap_sem){++++++}:
[   60.497095]        [<ffffffff8108eff1>] lock_acquire+0xf3/0x13e
[   60.497137]        [<ffffffff8110fbd7>] might_fault+0x89/0xac
[   60.497185]        [<ffffffff81151bfb>] filldir+0x6f/0xc7
[   60.497225]        [<ffffffff811a5347>] call_filldir+0x96/0xc0
[   60.497267]        [<ffffffff811a5680>] ext4_readdir+0x1bd/0x548
[   60.497309]        [<ffffffff81151e50>] vfs_readdir+0x7b/0xb4
[   60.497349]        [<ffffffff81151f6f>] sys_getdents+0x7e/0xd1
[   60.497389]        [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b
[   60.497436] 
[   60.497436] -> #0 (&sb->s_type->i_mutex_key#13){+.+.+.}:
[   60.497504]        [<ffffffff8108e81e>] __lock_acquire+0xa1a/0xcf7
[   60.497545]        [<ffffffff8108eff1>] lock_acquire+0xf3/0x13e
[   60.497586]        [<ffffffff815025db>] __mutex_lock_common+0x5d/0x39a
[   60.497631]        [<ffffffff81502a27>] mutex_lock_nested+0x40/0x45
[   60.497674]        [<ffffffff811ae562>] ext4_evict_inode+0x76/0x33c
[   60.497715]        [<ffffffff81157cf2>] evict+0x99/0x153
[   60.497754]        [<ffffffff81157f3d>] iput+0x191/0x19a
[   60.497791]        [<ffffffff81154bf5>] dentry_kill+0x123/0x145
[   60.497832]        [<ffffffff81155004>] dput+0xf7/0x107
[   60.497868]        [<ffffffff811440db>] fput+0x1dd/0x1f5
[   60.497908]        [<ffffffff811158ee>] remove_vma+0x56/0x87
[   60.497947]        [<ffffffff81116afd>] do_munmap+0x2f2/0x30b
[   60.497987]        [<ffffffff81116fa8>] sys_munmap+0x49/0x60
[   60.498026]        [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b
[   60.498070] 
[   60.498070] other info that might help us debug this:
[   60.498071] 
[   60.498124]  Possible unsafe locking scenario:
[   60.498125] 
[   60.498166]        CPU0                    CPU1
[   60.498194]        ----                    ----
[   60.499322]   lock(&mm->mmap_sem);
[   60.500440]                                lock(&sb->s_type->i_mutex_key);
[   60.501599]                                lock(&mm->mmap_sem);
[   60.502714]   lock(&sb->s_type->i_mutex_key);
[   60.503820] 
[   60.503820]  *** DEADLOCK ***
[   60.503821] 
[   60.508404] 1 lock held by dconf-service/1546:
[   60.509729]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81116f9a>] sys_munmap+0x3b/0x60
[   60.511027] 
[   60.511029] stack backtrace:
[   60.513393] Pid: 1546, comm: dconf-service Not tainted 3.1.0-0.rc3.git0.0.fc16.x86_64 #1
[   60.513396] Call Trace:
[   60.513406]  [<ffffffff814f9b74>] print_circular_bug+0x1f8/0x209
[   60.513412]  [<ffffffff8108e81e>] __lock_acquire+0xa1a/0xcf7
[   60.513416]  [<ffffffff8108be17>] ? register_lock_class+0x1e/0x2d3
[   60.513422]  [<ffffffff811ae562>] ? ext4_evict_inode+0x76/0x33c
[   60.513425]  [<ffffffff8108eff1>] lock_acquire+0xf3/0x13e
[   60.513429]  [<ffffffff811ae562>] ? ext4_evict_inode+0x76/0x33c
[   60.513432]  [<ffffffff811ae562>] ? ext4_evict_inode+0x76/0x33c
[   60.513438]  [<ffffffff815025db>] __mutex_lock_common+0x5d/0x39a
[   60.513441]  [<ffffffff811ae562>] ? ext4_evict_inode+0x76/0x33c
[   60.513446]  [<ffffffff810152af>] ? native_sched_clock+0x34/0x36
[   60.513450]  [<ffffffff810152ba>] ? sched_clock+0x9/0xd
[   60.513453]  [<ffffffff8108b885>] ? trace_hardirqs_off+0xd/0xf
[   60.513457]  [<ffffffff8108bdf0>] ? lock_release_holdtime.part.9+0x59/0x62
[   60.513461]  [<ffffffff81502a27>] mutex_lock_nested+0x40/0x45
[   60.513464]  [<ffffffff811ae562>] ext4_evict_inode+0x76/0x33c
[   60.513469]  [<ffffffff81157cf2>] evict+0x99/0x153
[   60.513472]  [<ffffffff81157f3d>] iput+0x191/0x19a
[   60.513477]  [<ffffffff81154bf5>] dentry_kill+0x123/0x145
[   60.513481]  [<ffffffff81155004>] dput+0xf7/0x107
[   60.513486]  [<ffffffff811440db>] fput+0x1dd/0x1f5
[   60.513491]  [<ffffffff811158ee>] remove_vma+0x56/0x87
[   60.513494]  [<ffffffff81116afd>] do_munmap+0x2f2/0x30b
[   60.513498]  [<ffffffff81116fa8>] sys_munmap+0x49/0x60
[   60.513503]  [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b

Comment 6 Josh Boyer 2011-08-30 12:29:05 UTC

(In reply to comment #5)
> Hope next mirror sync can fix the issue. Seen on a 3.1.0-0.rc3 kernel.
> [   60.496540] 
> [   60.496542] =======================================================
> [   60.496607] [ INFO: possible circular locking dependency detected ]
> [   60.496645] 3.1.0-0.rc3.git0.0.fc16.x86_64 #1
> [   60.496672] -------------------------------------------------------
> [   60.496710] dconf-service/1546 is trying to acquire lock:
> [   60.496743]  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [<ffffffff811ae562>]
> ext4_evict_inode+0x76/0x33c
> [   60.496830] 
> [   60.496831] but task is already holding lock:
> [   60.496870]  (&mm->mmap_sem){++++++}, at: [<ffffffff81116f9a>]
> sys_munmap+0x3b/0x60
> [   60.496936] 
> [   60.496936] which lock already depends on the new lock.
> [   60.496937] 
> [   60.496991] 
> [   60.496991] the existing dependency chain (in reverse order) is:
> [   60.497039] 
> [   60.497039] -> #1 (&mm->mmap_sem){++++++}:
> [   60.497095]        [<ffffffff8108eff1>] lock_acquire+0xf3/0x13e
> [   60.497137]        [<ffffffff8110fbd7>] might_fault+0x89/0xac
> [   60.497185]        [<ffffffff81151bfb>] filldir+0x6f/0xc7
> [   60.497225]        [<ffffffff811a5347>] call_filldir+0x96/0xc0
> [   60.497267]        [<ffffffff811a5680>] ext4_readdir+0x1bd/0x548
> [   60.497309]        [<ffffffff81151e50>] vfs_readdir+0x7b/0xb4
> [   60.497349]        [<ffffffff81151f6f>] sys_getdents+0x7e/0xd1
> [   60.497389]        [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b
> [   60.497436] 
> [   60.497436] -> #0 (&sb->s_type->i_mutex_key#13){+.+.+.}:
> [   60.497504]        [<ffffffff8108e81e>] __lock_acquire+0xa1a/0xcf7
> [   60.497545]        [<ffffffff8108eff1>] lock_acquire+0xf3/0x13e
> [   60.497586]        [<ffffffff815025db>] __mutex_lock_common+0x5d/0x39a
> [   60.497631]        [<ffffffff81502a27>] mutex_lock_nested+0x40/0x45
> [   60.497674]        [<ffffffff811ae562>] ext4_evict_inode+0x76/0x33c
> [   60.497715]        [<ffffffff81157cf2>] evict+0x99/0x153
> [   60.497754]        [<ffffffff81157f3d>] iput+0x191/0x19a
> [   60.497791]        [<ffffffff81154bf5>] dentry_kill+0x123/0x145
> [   60.497832]        [<ffffffff81155004>] dput+0xf7/0x107
> [   60.497868]        [<ffffffff811440db>] fput+0x1dd/0x1f5
> [   60.497908]        [<ffffffff811158ee>] remove_vma+0x56/0x87
> [   60.497947]        [<ffffffff81116afd>] do_munmap+0x2f2/0x30b
> [   60.497987]        [<ffffffff81116fa8>] sys_munmap+0x49/0x60
> [   60.498026]        [<ffffffff8150b082>] system_call_fastpath+0x16/0x1b

That's bug 732572, not this one.

Comment 7 Fedora Update System 2011-08-30 20:40:04 UTC

Package kernel-3.1.0-0.rc4.git0.0.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.1.0-0.rc4.git0.0.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/kernel-3.1.0-0.rc4.git0.0.fc16
then log in and leave karma (feedback).

Comment 8 Josh Boyer 2011-09-01 19:51:51 UTC

*** Bug 735206 has been marked as a duplicate of this bug. ***

Comment 9 Fedora Update System 2011-09-09 17:09:52 UTC

kernel-3.1.0-0.rc4.git0.0.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 10 Albert Strasheim 2011-09-12 11:48:35 UTC

Doesn't seem to be fixed:

[   40.584440] =======================================================
[   40.592303] [ INFO: possible circular locking dependency detected ]
[   40.598613] 3.1.0-0.rc4.git0.0.fc16.x86_64 #1
[   40.603012] -------------------------------------------------------
[   40.609323] foo/2369 is trying to acquire lock:
[   40.614239]  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff811e5e0c>] hugetlbfs_file_mmap+0x82/0x10a
[   40.624291]
[   40.624291] but task is already holding lock:
[   40.630230]  (&mm->mmap_sem){++++++}, at: [<ffffffff811185f5>] sys_mmap_pgoff+0xf8/0x16a
[   40.638561]
[   40.638561] which lock already depends on the new lock.
[   40.638562]
[   40.646903]
[   40.646903] the existing dependency chain (in reverse order) is:
[   40.654933]
[   40.654933] -> #1 (&mm->mmap_sem){++++++}:
[   40.661251]        [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
[   40.667501]        [<ffffffff8110fffb>] might_fault+0x89/0xac
[   40.673576]        [<ffffffff811520fe>] filldir64+0x7f/0xcd
[   40.679480]        [<ffffffff81160a75>] dcache_readdir+0x64/0x1fc
[   40.685901]        [<ffffffff8115227c>] vfs_readdir+0x7b/0xb4
[   40.691988]        [<ffffffff8115246c>] sys_getdents64+0x7e/0xca
[   40.698324]        [<ffffffff8150b742>] system_call_fastpath+0x16/0x1b
[   40.705179]
[   40.705179] -> #0 (&sb->s_type->i_mutex_key#15){+.+.+.}:
[   40.712774]        [<ffffffff8108e963>] __lock_acquire+0xa2f/0xd0c
[   40.719285]        [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
[   40.725532]        [<ffffffff81502cb3>] __mutex_lock_common+0x5d/0x39a
[   40.732387]        [<ffffffff815030ff>] mutex_lock_nested+0x40/0x45
[   40.738981]        [<ffffffff811e5e0c>] hugetlbfs_file_mmap+0x82/0x10a
[   40.745837]        [<ffffffff81118000>] mmap_region+0x274/0x46b
[   40.752086]        [<ffffffff811184a3>] do_mmap_pgoff+0x2ac/0x306
[   40.758507]        [<ffffffff81118615>] sys_mmap_pgoff+0x118/0x16a
[   40.765017]        [<ffffffff81012888>] sys_mmap+0x22/0x24
[   40.770825]        [<ffffffff8150b742>] system_call_fastpath+0x16/0x1b
[   40.777680]
[   40.777680] other info that might help us debug this:
[   40.777681]
[   40.786521]  Possible unsafe locking scenario:
[   40.786521]
[   40.792994]        CPU0                    CPU1
[   40.797793]        ----                    ----
[   40.802590]   lock(&mm->mmap_sem);
[   40.806393]                                lock(&sb->s_type->i_mutex_key);
[   40.813653]                                lock(&mm->mmap_sem);
[   40.819971]   lock(&sb->s_type->i_mutex_key);
[   40.824716]
[   40.824716]  *** DEADLOCK ***
[   40.824717]
[   40.831485] 1 lock held by puttups/2369:
[   40.835678]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff811185f5>] sys_mmap_pgoff+0xf8/0x16a
[   40.844951]
[   40.844952] stack backtrace:
[   40.849862] Pid: 2369, comm: foo Tainted: G        W   3.1.0-0.rc4.git0.0.fc16.x86_64 #1
[   40.858781] Call Trace:
[   40.861500]  [<ffffffff814fa254>] print_circular_bug+0x1f8/0x209
[   40.867773]  [<ffffffff8108e963>] __lock_acquire+0xa2f/0xd0c
[   40.873859]  [<ffffffff811314b5>] ? deactivate_slab+0x28f/0x2b5
[   40.880047]  [<ffffffff811e5e0c>] ? hugetlbfs_file_mmap+0x82/0x10a
[   40.886484]  [<ffffffff8108f143>] lock_acquire+0xf3/0x13e
[   40.892146]  [<ffffffff811e5e0c>] ? hugetlbfs_file_mmap+0x82/0x10a
[   40.898594]  [<ffffffff811e5e0c>] ? hugetlbfs_file_mmap+0x82/0x10a
[   40.905034]  [<ffffffff81502cb3>] __mutex_lock_common+0x5d/0x39a
[   40.911300]  [<ffffffff811e5e0c>] ? hugetlbfs_file_mmap+0x82/0x10a
[   40.917748]  [<ffffffff81117f35>] ? mmap_region+0x1a9/0x46b
[   40.923583]  [<ffffffff815030ff>] mutex_lock_nested+0x40/0x45
[   40.929589]  [<ffffffff811e5e0c>] hugetlbfs_file_mmap+0x82/0x10a
[   40.935855]  [<ffffffff81118000>] mmap_region+0x274/0x46b
[   40.941517]  [<ffffffff811184a3>] do_mmap_pgoff+0x2ac/0x306
[   40.947353]  [<ffffffff81118615>] sys_mmap_pgoff+0x118/0x16a
[   40.953273]  [<ffffffff8108f58b>] ? trace_hardirqs_on_caller+0x121/0x158
[   40.960241]  [<ffffffff81253b5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   40.966946]  [<ffffffff81012888>] sys_mmap+0x22/0x24
[   40.972184]  [<ffffffff8150b742>] system_call_fastpath+0x16/0x1b

Comment 11 Josh Boyer 2011-09-12 13:53:26 UTC

(In reply to comment #10)
> Doesn't seem to be fixed:
> 
> [   40.584440] =======================================================
> [   40.592303] [ INFO: possible circular locking dependency detected ]
> [   40.598613] 3.1.0-0.rc4.git0.0.fc16.x86_64 #1
> [   40.603012] -------------------------------------------------------
> [   40.609323] foo/2369 is trying to acquire lock:
> [   40.614239]  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff811e5e0c>]
> hugetlbfs_file_mmap+0x82/0x10a

Huh.  Odd.  What is the foo process?  Is it something that can be used to easily recreate this?

Comment 12 Albert Strasheim 2011-09-12 15:37:56 UTC

Unfortunately foo is internal code that I can't easily release.

It's doing something along the following lines when the error occurs:

1. open a file in /dev/hugepages
2. unlink the file
3. mmap the file descriptor with a 64 MiB length parameter

It does this for two buffers of the same size.

There is a small unit test for this code that might trigger the bug. I'll try to run it after the system has booted and send the binary and an strace if it works.

Comment 13 Albert Strasheim 2011-09-12 15:49:26 UTC

Created attachment 522726 [details]
Unit test that triggers warning

Comment 14 Albert Strasheim 2011-09-12 15:49:51 UTC

Created attachment 522727 [details]
strace log

Comment 15 Albert Strasheim 2011-09-12 15:53:57 UTC

# cat /etc/sysctl.d/vm.conf
vm.nr_hugepages = 8192
vm.overcommit_ratio = 90
vm.overcommit_memory = 2
vm.max_map_count = 131072

# cat /proc/meminfo
MemTotal:       165056552 kB
HugePages_Total:    8192
HugePages_Free:     8192

# cat /etc/systemd/system/dev-hugepages.mount
[Unit]
Description=Huge Pages File System
DefaultDependencies=no

[Mount]
What=hugetlbfs
Where=/dev/hugepages
Type=hugetlbfs
Options=mode=1777,uid=0,gid=0

Comment 16 Albert Strasheim 2011-09-13 08:24:19 UTC

I've reproduced this on multiple machines using the 6.out I provided.

Comment 17 Albert Strasheim 2012-03-07 10:43:09 UTC

This is broken (again?) with 3.2.7-1.

https://lkml.org/lkml/2012/2/16/498

Comment 18 Albert Strasheim 2012-03-19 08:17:07 UTC

More discussion:

https://lkml.org/lkml/2012/3/1/77
https://lkml.org/lkml/2012/3/8/83

Comment 19 Dave Jones 2012-03-22 16:55:59 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 20 Dave Jones 2012-03-22 16:59:53 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 21 Dave Jones 2012-03-22 17:11:03 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 22 Mikko Tiihonen 2012-03-23 10:40:10 UTC

Latest that I have seen this is this, most likely it is still in final 3.3 kernel too.

[ INFO: possible circular locking dependency detected ]
3.3.0-0.rc7.git0.3.fc17.x86_64 #1 Not tainted
-------------------------------------------------------
java/1043 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#13){+.+...}, at: [<ffffffff8129808a>] hugetlbfs_file_mmap+0x8a/0x120

but task is already holding lock:
 (&mm->mmap_sem){++++++}, at: [<ffffffff8117e4c3>] sys_mmap_pgoff+0x1f3/0x270

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&mm->mmap_sem){++++++}:
       [<ffffffff810cc841>] lock_acquire+0xa1/0x1e0
       [<ffffffff811740e9>] might_fault+0x89/0xb0
       [<ffffffff811d0976>] filldir+0x76/0xf0
       [<ffffffff811e4109>] dcache_readdir+0x69/0x240
       [<ffffffff811d0c78>] vfs_readdir+0xb8/0xf0
       [<ffffffff811d0daa>] sys_getdents+0x8a/0x100
       [<ffffffff816a6b69>] system_call_fastpath+0x16/0x1b

-> #0 (&sb->s_type->i_mutex_key#13){+.+...}:
       [<ffffffff810cb9f2>] __lock_acquire+0x1432/0x1bb0
       [<ffffffff810cc841>] lock_acquire+0xa1/0x1e0
       [<ffffffff81699f06>] mutex_lock_nested+0x76/0x3a0
       [<ffffffff8129808a>] hugetlbfs_file_mmap+0x8a/0x120
       [<ffffffff8117dd6a>] mmap_region+0x3ca/0x5a0
       [<ffffffff8117e285>] do_mmap_pgoff+0x345/0x390
       [<ffffffff8117e4e6>] sys_mmap_pgoff+0x216/0x270
       [<ffffffff8101dc72>] sys_mmap+0x22/0x30
       [<ffffffff816a6b69>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&mm->mmap_sem);
                               lock(&sb->s_type->i_mutex_key#13);
                               lock(&mm->mmap_sem);
  lock(&sb->s_type->i_mutex_key#13);

 *** DEADLOCK ***

1 lock held by java/1043:
 #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8117e4c3>] sys_mmap_pgoff+0x1f3/0x270

stack backtrace:
Pid: 1043, comm: java Not tainted 3.3.0-0.rc7.git0.3.fc17.x86_64 #1
Call Trace:
 [<ffffffff8169256d>] print_circular_bug+0x1fb/0x20c
 [<ffffffff810cb9f2>] __lock_acquire+0x1432/0x1bb0
 [<ffffffff811a099f>] ? deactivate_slab+0x5bf/0x6c0
 [<ffffffff8117dca4>] ? mmap_region+0x304/0x5a0
 [<ffffffff810cc841>] lock_acquire+0xa1/0x1e0
 [<ffffffff8129808a>] ? hugetlbfs_file_mmap+0x8a/0x120
 [<ffffffff81699f06>] mutex_lock_nested+0x76/0x3a0
 [<ffffffff8129808a>] ? hugetlbfs_file_mmap+0x8a/0x120
 [<ffffffff8129808a>] ? hugetlbfs_file_mmap+0x8a/0x120
 [<ffffffff8117dca4>] ? mmap_region+0x304/0x5a0
 [<ffffffff8129808a>] hugetlbfs_file_mmap+0x8a/0x120
 [<ffffffff8117dd6a>] mmap_region+0x3ca/0x5a0
 [<ffffffff81297ded>] ? hugetlb_get_unmapped_area+0x14d/0x360
 [<ffffffff8117e285>] do_mmap_pgoff+0x345/0x390
 [<ffffffff8117e4c3>] ? sys_mmap_pgoff+0x1f3/0x270
 [<ffffffff8117e4e6>] sys_mmap_pgoff+0x216/0x270
 [<ffffffff8101dc72>] sys_mmap+0x22/0x30

Comment 23 Dave Jones 2012-03-23 14:50:51 UTC

ah, yes. That is still showing up in rawhide too. It's still being worked on, possibly for 3.4

Comment 24 Josh Boyer 2012-09-07 16:09:10 UTC

I believe this was finally fixed in 3.4 or 3.5.  F17 is on 3.5 and F16 will be getting a backport hopefully soon, so closing this out.