Bug 518226 - INFO: possible circular locking dependency detected - IBM Power 5
Summary: INFO: possible circular locking dependency detected - IBM Power 5
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 12
Hardware: ppc64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-19 15:00 UTC by James Laska
Modified: 2013-09-02 06:39 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-05 06:35:17 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
failure.log (5.76 KB, text/plain)
2009-09-03 17:53 UTC, James Laska
no flags Details

Description James Laska 2009-08-19 15:00:27 UTC
Description of problem:

After completing a DVD install on a IBM Power5 system, while the system is rebooting, a kernel circular lock dep message appears on the console.

Version-Release number of selected component (if applicable):

 * 2.6.31-0.125.4.2.rc5.git2.fc12.ppc64

How reproducible:


Steps to Reproduce:
1. Perform a DVD install of Fedora-12-Alpha on a IBM Power5 system

  
Actual results:

Running anaconda 12.15, the Fedora system installer - please wait.
14:21:29 Starting VNC...
14:21:30 The VNC server is now running.
14:21:31 

WARNING!!! VNC server running with NO PASSWORD!
You can use the vncpassword=<password> boot option
if you would like to secure the server.


14:21:31 Please manually connect your vnc client to ibm-505-lp1.test.redhat.com:1 (10.10.9.2) to begin the install.
//usr/bin/vncconfig: unable to open display ":1"
//usr/bin/vncconfig: unable to open display ":1"
14:21:37 Starting graphical installation.
XKB extension not present on :1
XKB extension not present on :1
disabling swap...
	/dev/mapper/vg_ibm505lp1-lv_swap
unmounting filesystems...
	/mnt/runtime done
	disabling /dev/loop0 LOOP_CLR_FD failed: 16
	/proc done
	/dev/pts done
	/sys done
	/mnt/stage2 done
	/selinux done
	/mnt/sysimage/boot done
	/mnt/sysimage/dev/pts done
	/mnt/sysimage/dev/shm done
	/mnt/sysimage/dev done
	/mnt/sysimage/proc done
	/mnt/sysimage/sys done
	/mnt/sysimage/selinux done
	/mnt/sysimage done
sending termination signals...done

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.31-0.125.4.2.rc5.git2.fc12.ppc64 #1
-------------------------------------------------------
sh/282 is trying to acquire lock:
 (&type->s_umount_key#25){++++..}, at: [<c0000000001d4580>] .deactivate_super+0xbc/0x11c

but task is already holding lock:
 (&bdev->bd_mutex){+.+.+.}, at: [<c00000000020b83c>] .__blkdev_put+0x5c/0x1d8

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&bdev->bd_mutex){+.+.+.}:
       [<c0000000000fc70c>] .__lock_acquire+0x13d8/0x1804
       [<c0000000000fcc48>] .lock_acquire+0x110/0x160
       [<c0000000006ec540>] .mutex_lock_nested+0xa4/0x4a8
       [<c00000000020b83c>] .__blkdev_put+0x5c/0x1d8
       [<c00000000020b9ec>] .blkdev_put+0x34/0x48
       [<c00000000020baac>] .close_bdev_exclusive+0x3c/0x54
       [<c0000000001d42b4>] .get_sb_bdev+0x10c/0x208
       [<c0000000002d2b44>] .isofs_get_sb+0x58/0x78
       [<c0000000001d3d10>] .vfs_kern_mount+0xe4/0x1b4
       [<c0000000001d3ea0>] .do_kern_mount+0x6c/0x148
       [<c0000000001f4584>] .do_mount+0x8cc/0x980
       [<c0000000002242d0>] .compat_sys_mount+0x214/0x288
       [<c0000000000085f0>] syscall_exit+0x0/0x40

-> #0 (&type->s_umount_key#25){++++..}:
       [<c0000000000fc3d8>] .__lock_acquire+0x10a4/0x1804
       [<c0000000000fcc48>] .lock_acquire+0x110/0x160
       [<c0000000006ed294>] .down_write+0x80/0x120
       [<c0000000001d4580>] .deactivate_super+0xbc/0x11c
       [<c0000000001f2364>] .mntput_no_expire+0x10c/0x180
       [<c0000000001d1f70>] .__fput+0x29c/0x2d4
       [<c0000000001d1ff4>] .fput+0x4c/0x60
       [<c0000000004ae2f4>] .loop_clr_fd+0x1bc/0x1f4
       [<c0000000004aee34>] .lo_release+0x70/0xc0
       [<c00000000020b8c0>] .__blkdev_put+0xe0/0x1d8
       [<c00000000020b9ec>] .blkdev_put+0x34/0x48
       [<c00000000020baac>] .close_bdev_exclusive+0x3c/0x54
       [<c0000000001d392c>] .kill_block_super+0x58/0x78
       [<c0000000001d45a4>] .deactivate_super+0xe0/0x11c
       [<c0000000001f2364>] .mntput_no_expire+0x10c/0x180
       [<c0000000001d1f70>] .__fput+0x29c/0x2d4
       [<c0000000001d1ff4>] .fput+0x4c/0x60
       [<c00000000019f18c>] .remove_vma+0x94/0xfc
       [<c00000000019f39c>] .exit_mmap+0x1a8/0x1e4
       [<c0000000000b9ccc>] .mmput+0xa0/0x154
       [<c0000000000bf778>] .exit_mm+0x180/0x1a8
       [<c0000000000c1ce8>] .do_exit+0x234/0x824
       [<c0000000000c237c>] .do_group_exit+0xa4/0xd8
       [<c0000000000d4bb4>] .get_signal_to_deliver+0x42c/0x490
       [<c000000000015cd4>] .do_signal+0x7c/0x350
       [<c000000000008c58>] do_work+0x24/0x28

other info that might help us debug this:

1 lock held by sh/282:
 #0:  (&bdev->bd_mutex){+.+.+.}, at: [<c00000000020b83c>] .__blkdev_put+0x5c/0x1d8

stack backtrace:
Call Trace:
[c0000000e9f1aa50] [c00000000001319c] .show_stack+0x98/0x188 (unreliable)
[c0000000e9f1ab00] [c0000000006f4560] .dump_stack+0x28/0x3c
[c0000000e9f1ab80] [c0000000000fada4] .print_circular_bug_tail+0xe0/0x108
[c0000000e9f1ac50] [c0000000000fc3d8] .__lock_acquire+0x10a4/0x1804
[c0000000e9f1ad80] [c0000000000fcc48] .lock_acquire+0x110/0x160
[c0000000e9f1ae50] [c0000000006ed294] .down_write+0x80/0x120
[c0000000e9f1aef0] [c0000000001d4580] .deactivate_super+0xbc/0x11c
[c0000000e9f1af90] [c0000000001f2364] .mntput_no_expire+0x10c/0x180
[c0000000e9f1b040] [c0000000001d1f70] .__fput+0x29c/0x2d4
[c0000000e9f1b100] [c0000000001d1ff4] .fput+0x4c/0x60
[c0000000e9f1b190] [c0000000004ae2f4] .loop_clr_fd+0x1bc/0x1f4
[c0000000e9f1b240] [c0000000004aee34] .lo_release+0x70/0xc0
[c0000000e9f1b2d0] [c00000000020b8c0] .__blkdev_put+0xe0/0x1d8
[c0000000e9f1b380] [c00000000020b9ec] .blkdev_put+0x34/0x48
[c0000000e9f1b410] [c00000000020baac] .close_bdev_exclusive+0x3c/0x54
[c0000000e9f1b4b0] [c0000000001d392c] .kill_block_super+0x58/0x78
[c0000000e9f1b540] [c0000000001d45a4] .deactivate_super+0xe0/0x11c
[c0000000e9f1b5e0] [c0000000001f2364] .mntput_no_expire+0x10c/0x180
[c0000000e9f1b690] [c0000000001d1f70] .__fput+0x29c/0x2d4
[c0000000e9f1b750] [c0000000001d1ff4] .fput+0x4c/0x60
[c0000000e9f1b7e0] [c00000000019f18c] .remove_vma+0x94/0xfc
[c0000000e9f1b870] [c00000000019f39c] .exit_mmap+0x1a8/0x1e4
[c0000000e9f1b920] [c0000000000b9ccc] .mmput+0xa0/0x154
[c0000000e9f1b9c0] [c0000000000bf778] .exit_mm+0x180/0x1a8
[c0000000e9f1ba70] [c0000000000c1ce8] .do_exit+0x234/0x824
[c0000000e9f1bb50] [c0000000000c237c] .do_group_exit+0xa4/0xd8
[c0000000e9f1bbf0] [c0000000000d4bb4] .get_signal_to_deliver+0x42c/0x490
[c0000000e9f1bce0] [c000000000015cd4] .do_signal+0x7c/0x350
[c0000000e9f1be30] [c000000000008c58] do_work+0x24/0x28
sending kill signals...done
rebooting system
md: stopping all md devices.
ipr 0000:d0:01.0: restoring config space at offset 0xf (was 0x100, writing 0x135)
ipr 0000:d0:01.0: restoring config space at offset 0xc (was 0x0, writing 0xc0800000)
ipr 0000:d0:01.0: restoring config space at offset 0x6 (was 0xc, writing 0xc000000c)
ipr 0000:d0:01.0: restoring config space at offset 0x4 (was 0x4, writing 0xc0900004)
ipr 0000:d0:01.0: restoring config space at offset 0x3 (was 0x80000000, writing 0x80009020)
ipr 0000:d0:01.0: restoring config space at offset 0x1 (was 0x4300000, writing 0x4300146)
Restarting system.

Expected results:

 * No circ lock dep message


Additional info:

Comment 1 Pavan Naregundi 2009-08-26 08:56:25 UTC
I could reproduce this issue in Power6(modle: IBM,9117-MMA) during the DVD install.

Comment 2 James Laska 2009-09-03 17:53:05 UTC
Created attachment 359709 [details]
failure.log

Seeing this again on Power6 (Model#9117-MMA SN#10DD39C) with rawhide-2009-09-03

Comment 3 James Laska 2009-09-11 15:34:29 UTC
Reviewed at the F-12-Beta blocker bug meeting.  Unclear how to proceed, need impact guidance from kernel team.  Updating priority from low->medium

Comment 4 James Laska 2009-09-11 15:36:05 UTC
Oops, resetting priority to initial value.  The priority field is reserved for developer/maintainer use only.

Comment 5 John Poelstra 2009-09-11 15:38:57 UTC
This bug was reviewed at the Fedora 12 Beta Blocker bug meeting on 2009-09-11.  Still waiting for feedback from maintainer to determine severity.  Will review at next weeks meeting and hope for maintainer response by then.

Comment 6 John Poelstra 2009-09-18 15:33:47 UTC
This bug was reviewed at the Fedora 12 Beta Blocker bug meeting on 2009-09-18. 
Still waiting for feedback from maintainer to determine severity

Comment 7 Chuck Ebbert 2009-09-25 08:13:37 UTC
This should not be a real problem, as the locking dependency is in the loopback driver and the filesystem (the install DVD) is mounted readonly.

Comment 8 Adam Williamson 2009-09-25 15:15:18 UTC
Thanks, Chuck. as agreed at 20090925 blocker review meeting, this is no longer a blocker in that case.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 9 Alexander Viro 2009-09-25 17:52:36 UTC
Cute...  It's a false positive, but lockdep annotations to deal with it will be interesting.  What's happening here:

1) if /dev/loopN has been marked autoclear (by mount -o loop without mentioning a specific loop device to use; then mount(8) picks an unused one, sets it up and marks it for teardown on final close), we can, of course, get closing of one block device (/dev/loopN) trigger closing of another (/dev/something_real).  AFAICT, lockdep is told to shut up and don't complain about bd_mutex deadlocks - loop devices are taken out into a class of their own, underlying one can't be a loop device, so everything's happy.

2) However, there's *another* false positive to deal with in that scenario.  Namely, if the block device node in question has lived on a filesystem that got lazy-unmounted and is only waiting for that device to get finally closed, we get

* umount of that -o loop mount, holding s_umount
* ... closing the block device we are using (/dev/loopN)
* ... triggering loop_clr_fd() to tear the /dev/loopN down
* ... closing the opened underlying device
* ... dropping the last reference that used to keep the lazy-umounted fs alive
* ... grabbing s_umount on that lazy-umounted fs

It *is* a false positive, but teaching lockdep about that might be interesting.

Comment 10 Alexander Viro 2009-09-25 18:22:55 UTC
Actually, there's another false positive in the same area - even "closing block device triggered by closing block device is OK in this case" machinery is not fool-proof.  Look:
1) create two ext3 images, say it in /tmp/big and /tmp/small
2) losetup /dev/loop0 /tmp/big
3) mount /dev/loop0 /mnt/1
4) cp /tmp/small /mnt/1/small
5) mount -o loop /mnt/1/small /mnt/2 (uses e.g. /dev/loop1)
6) umount -l /mnt/1
7) umount /mnt/2
After (5) we have
* loop0 attached to /tmp/big
* loop1 attached to a copy of /tmp/small sitting inside /tmp/big
(6) takes the filesystem loop-mounted from /tmp/big and removes it from the mount tree.  It's still busy (we have loop1 attached to a file on it), but the only thing that keeps it from shutdown is loop1.
(7) unmounts /mnt/2 and closes /dev/loop1.  Since loop1 has been marked tear-down-on-close, we get the following chain
* close /dev/loop1
* ... loop_clr_fd() to tear it down
* ... close the underlying file
* ... thus giving up the only remaining reference to fs mounted from /dev/loop0
* ... which gets it finally shut down
* ... closing /dev/loop0 in process
... and lockdep will scream bloody murder over the close of loopback device calling a close of another loopback device.  Existing trickery with "underlying device has got to be non-loop one, so we are fine" doesn't work, since here we have an underlying *file*, which lives on another loop device and happens to be the only thing keeping that another loop device from being closed.

Comment 12 Pavan Naregundi 2009-10-21 09:12:54 UTC
Still seeing this issue on F12 beta release.

Any updates on this?

Comment 13 Vedran Miletić 2009-10-26 14:57:26 UTC
Change summary to much more general one.

Comment 14 Bug Zapper 2009-11-16 11:33:00 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 15 Bug Zapper 2010-11-04 10:27:14 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 16 Bug Zapper 2010-12-05 06:35:17 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 17 Fedora Update System 2011-08-16 22:11:11 UTC
python-shapely-1.2.12-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/python-shapely-1.2.12-1.fc16

Comment 18 Fedora Update System 2011-08-16 22:11:26 UTC
python-shapely-1.2.12-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/python-shapely-1.2.12-1.fc15

Comment 19 Volker Fröhlich 2011-08-16 22:18:12 UTC
Sorry, please ignore the last comments!


Note You need to log in before you can comment on or make changes to this bug.