Bug 485230 - INFO: possible circular locking dependency detected - when disconnecting from iscsi nodes
INFO: possible circular locking dependency detected - when disconnecting from...
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
All Linux
low Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-12 09:32 EST by James Laska
Modified: 2013-09-02 02:31 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-02-27 08:53:35 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description James Laska 2009-02-12 09:32:45 EST
Description of problem:

Kernel lock warning issued when disconnecting from iSCSI nodes.

Version-Release number of selected component (if applicable):
2.6.29-0.110.rc4.git3.fc11.x86_64

How reproducible:


Steps to Reproduce:
1. Connect to a iSCSI Target device 
2. Disconnect all connected iSCSI devices
  
Actual results:

sh-4.0# iscsiadm -m node -U all
Logging out of session [sid: 1, target: iqn.2009-01.com.example:for.all, portal: 192.168.33.2,3260]
Logging out of session [sid: 2, target: iqn.2009-02.com.example:for.all, portal: 192.168.33.2,3260]
Logging out of session [sid: 3, target: iqn.2009-03.com.example:for.all, portal: 192.168.33.2,3260]

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.29-0.110.rc4.git3.fc11.x86_64 #1
-------------------------------------------------------
iscsid/1470 is trying to acquire lock:
 ((ihost->scan_workq_name)){--..}, at: [<ffffffff810595bf>] cleanup_workqueue_thread+0x16/0x7f

but task is already holding lock:
 (cpu_add_remove_lock){--..}, at: [<ffffffff81049ada>] cpu_maps_update_begin+0x12/0x14

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #5 (cpu_add_remove_lock){--..}:
       [<ffffffff8106c7ff>] __lock_acquire+0xa8d/0xc0d
       [<ffffffff8106ca11>] lock_acquire+0x92/0xc0
       [<ffffffff8137a1ea>] __mutex_lock_common+0xfa/0x394
       [<ffffffff8137a533>] mutex_lock_nested+0x37/0x3c
       [<ffffffff81049ada>] cpu_maps_update_begin+0x12/0x14
       [<ffffffff81059663>] destroy_workqueue+0x2c/0x95
       [<ffffffffa017347b>] iscsi_remove_host+0x1f/0x23 [scsi_transport_iscsi]
       [<ffffffff8122c5c8>] transport_remove_classdev+0x22/0x57
       [<ffffffff8122c1b3>] attribute_container_device_trigger+0x7a/0xba
       [<ffffffff8122c580>] transport_remove_device+0x10/0x12
       [<ffffffff8123762a>] scsi_remove_host+0xe8/0x11a
       [<ffffffffa0182fdf>] iscsi_host_remove+0xf7/0x104 [libiscsi]
       [<ffffffffa0190d2a>] iscsi_sw_tcp_session_destroy+0x4f/0x5c [iscsi_tcp]
       [<ffffffffa0175424>] iscsi_if_rx+0x151/0x6d9 [scsi_transport_iscsi]
       [<ffffffff812fcc4e>] netlink_unicast+0x11f/0x188
       [<ffffffff812fcf38>] netlink_sendmsg+0x281/0x290
       [<ffffffff812d4f9c>] __sock_sendmsg+0x5c/0x67
       [<ffffffff812d58d3>] sock_sendmsg+0xc7/0xe0
       [<ffffffff812d5ac6>] sys_sendmsg+0x1da/0x241
       [<ffffffff8101123a>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #4 (attribute_container_mutex){--..}:
       [<ffffffff8106c7ff>] __lock_acquire+0xa8d/0xc0d
       [<ffffffff8106ca11>] lock_acquire+0x92/0xc0
       [<ffffffff8137a1ea>] __mutex_lock_common+0xfa/0x394
       [<ffffffff8137a533>] mutex_lock_nested+0x37/0x3c
       [<ffffffff8122c308>] attribute_container_add_device+0x2c/0x136
       [<ffffffff8122c644>] transport_setup_device+0x10/0x12
       [<ffffffff812406ba>] scsi_sysfs_device_initialize+0xd5/0x123
       [<ffffffff8123dd1e>] scsi_alloc_sdev+0x1cf/0x230
       [<ffffffff8123df31>] scsi_probe_and_add_lun+0x11f/0xb8b
       [<ffffffff8123f858>] __scsi_add_device+0xb8/0xe4
       [<ffffffff8125530b>] ata_scsi_scan_host+0x74/0x16e
       [<ffffffff81253800>] async_port_probe+0xab/0xb7
       [<ffffffff8106284f>] __async_schedule+0x82/0x106
       [<ffffffff810628ee>] async_schedule+0x10/0x12
       [<ffffffff8124f2fd>] ata_host_register+0x1ee/0x20b
       [<ffffffff8125b640>] ata_pci_sff_activate_host+0x1a0/0x1d6
       [<ffffffff8136e5b9>] piix_init_one+0x6d2/0x74b
       [<ffffffff8119e4cf>] local_pci_probe+0x12/0x16
       [<ffffffff81059072>] do_work_for_cpu+0x13/0x1b
       [<ffffffff81059237>] run_workqueue+0xf8/0x1f8
       [<ffffffff81059421>] worker_thread+0xea/0xfb
       [<ffffffff8105cc2e>] kthread+0x48/0x73
       [<ffffffff810125aa>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #3 (&shost->scan_mutex){--..}:
       [<ffffffff8106c7ff>] __lock_acquire+0xa8d/0xc0d
       [<ffffffff8106ca11>] lock_acquire+0x92/0xc0
       [<ffffffff8137a1ea>] __mutex_lock_common+0xfa/0x394
       [<ffffffff8137a533>] mutex_lock_nested+0x37/0x3c
       [<ffffffff8123f75b>] scsi_scan_target+0x56/0x9b
       [<ffffffffa017435e>] iscsi_user_scan_session+0xb7/0xce [scsi_transport_iscsi]
       [<ffffffff81226f23>] device_for_each_child+0x2c/0x61
       [<ffffffffa017428c>] iscsi_user_scan+0x28/0x2a [scsi_transport_iscsi]
       [<ffffffff812400bf>] store_scan+0x9b/0xc6
       [<ffffffff8122669f>] dev_attr_store+0x1e/0x20
       [<ffffffff81128b7b>] sysfs_write_file+0xf6/0x132
       [<ffffffff810da8e2>] vfs_write+0xa9/0x106
       [<ffffffff810da9f5>] sys_write+0x45/0x69
       [<ffffffff8101123a>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #2 (&ihost->mutex){--..}:
       [<ffffffff8106c7ff>] __lock_acquire+0xa8d/0xc0d
       [<ffffffff8106ca11>] lock_acquire+0x92/0xc0
       [<ffffffff8137a1ea>] __mutex_lock_common+0xfa/0x394
       [<ffffffff8137a533>] mutex_lock_nested+0x37/0x3c
       [<ffffffffa0174f85>] __iscsi_unbind_session+0x3a/0x9c [scsi_transport_iscsi]
       [<ffffffff81059237>] run_workqueue+0xf8/0x1f8
       [<ffffffff81059421>] worker_thread+0xea/0xfb
       [<ffffffff8105cc2e>] kthread+0x48/0x73
       [<ffffffff810125aa>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 (&session->unbind_work){--..}:
       [<ffffffff8106c7ff>] __lock_acquire+0xa8d/0xc0d
       [<ffffffff8106ca11>] lock_acquire+0x92/0xc0
       [<ffffffff81059231>] run_workqueue+0xf2/0x1f8
       [<ffffffff81059421>] worker_thread+0xea/0xfb
       [<ffffffff8105cc2e>] kthread+0x48/0x73
       [<ffffffff810125aa>] child_rip+0xa/0x20
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 ((ihost->scan_workq_name)){--..}:
       [<ffffffff8106c68a>] __lock_acquire+0x918/0xc0d
       [<ffffffff8106ca11>] lock_acquire+0x92/0xc0
       [<ffffffff810595e6>] cleanup_workqueue_thread+0x3d/0x7f
       [<ffffffff81059698>] destroy_workqueue+0x61/0x95
       [<ffffffffa017347b>] iscsi_remove_host+0x1f/0x23 [scsi_transport_iscsi]
       [<ffffffff8122c5c8>] transport_remove_classdev+0x22/0x57
       [<ffffffff8122c1b3>] attribute_container_device_trigger+0x7a/0xba
       [<ffffffff8122c580>] transport_remove_device+0x10/0x12
       [<ffffffff8123762a>] scsi_remove_host+0xe8/0x11a
       [<ffffffffa0182fdf>] iscsi_host_remove+0xf7/0x104 [libiscsi]
       [<ffffffffa0190d2a>] iscsi_sw_tcp_session_destroy+0x4f/0x5c [iscsi_tcp]
       [<ffffffffa0175424>] iscsi_if_rx+0x151/0x6d9 [scsi_transport_iscsi]
       [<ffffffff812fcc4e>] netlink_unicast+0x11f/0x188
       [<ffffffff812fcf38>] netlink_sendmsg+0x281/0x290
       [<ffffffff812d4f9c>] __sock_sendmsg+0x5c/0x67
       [<ffffffff812d58d3>] sock_sendmsg+0xc7/0xe0
       [<ffffffff812d5ac6>] sys_sendmsg+0x1da/0x241
       [<ffffffff8101123a>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

3 locks held by iscsid/1470:
 #0:  (rx_queue_mutex){--..}, at: [<ffffffffa01752f6>] iscsi_if_rx+0x23/0x6d9 [scsi_transport_iscsi]
 #1:  (attribute_container_mutex){--..}, at: [<ffffffff8122c160>] attribute_container_device_trigger+0x27/0xba
 #2:  (cpu_add_remove_lock){--..}, at: [<ffffffff81049ada>] cpu_maps_update_begin+0x12/0x14

stack backtrace:
Pid: 1470, comm: iscsid Not tainted 2.6.29-0.110.rc4.git3.fc11.x86_64 #1
Call Trace:
 [<ffffffff8106bacb>] print_circular_bug_tail+0x71/0x7c
 [<ffffffff8106c68a>] __lock_acquire+0x918/0xc0d
 [<ffffffff8106ca11>] lock_acquire+0x92/0xc0
 [<ffffffff810595bf>] ? cleanup_workqueue_thread+0x16/0x7f
 [<ffffffff8122c5a6>] ? transport_remove_classdev+0x0/0x57
 [<ffffffff810595e6>] cleanup_workqueue_thread+0x3d/0x7f
 [<ffffffff810595bf>] ? cleanup_workqueue_thread+0x16/0x7f
 [<ffffffff8118928c>] ? __next_cpu_nr+0x21/0x2a
 [<ffffffff81059698>] destroy_workqueue+0x61/0x95
 [<ffffffffa017347b>] iscsi_remove_host+0x1f/0x23 [scsi_transport_iscsi]
 [<ffffffff8122c5c8>] transport_remove_classdev+0x22/0x57
 [<ffffffff8122c1b3>] attribute_container_device_trigger+0x7a/0xba
 [<ffffffff8122c580>] transport_remove_device+0x10/0x12
 [<ffffffff8123762a>] scsi_remove_host+0xe8/0x11a
 [<ffffffffa0182fdf>] iscsi_host_remove+0xf7/0x104 [libiscsi]
 [<ffffffffa0182ea1>] ? iscsi_session_teardown+0x9b/0xa4 [libiscsi]
 [<ffffffffa0190d2a>] iscsi_sw_tcp_session_destroy+0x4f/0x5c [iscsi_tcp]
 [<ffffffffa0175424>] iscsi_if_rx+0x151/0x6d9 [scsi_transport_iscsi]
 [<ffffffff8137b8d2>] ? _read_unlock+0x26/0x2a
 [<ffffffff812fcc4e>] netlink_unicast+0x11f/0x188
 [<ffffffff8115df7b>] ? task_sid+0x0/0x61
 [<ffffffff812fcf38>] netlink_sendmsg+0x281/0x290
 [<ffffffff812d4f9c>] __sock_sendmsg+0x5c/0x67
 [<ffffffff812d58d3>] sock_sendmsg+0xc7/0xe0
 [<ffffffff81016d65>] ? sched_clock+0x9/0xc
 [<ffffffff8105cfab>] ? autoremove_wake_function+0x0/0x34
 [<ffffffff810b6880>] ? might_fault+0x5d/0xb1
 [<ffffffff812d636d>] ? move_addr_to_kernel+0x43/0x48
 [<ffffffff812ddc3b>] ? verify_iovec+0x4c/0x89
 [<ffffffff812d5ac6>] sys_sendmsg+0x1da/0x241
 [<ffffffff810eee58>] ? mntput_no_expire+0x2a/0x149
 [<ffffffff810dd681>] ? cp_new_stat+0xde/0xeb
 [<ffffffff8106b5ee>] ? trace_hardirqs_on_caller+0x12f/0x153
 [<ffffffff8108f6fb>] ? audit_syscall_entry+0x119/0x145
 [<ffffffff8137b533>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8101123a>] system_call_fastpath+0x16/0x1b
Logout of [sid: 1, target: iqn.2009-01.com.example:for.all, portal: 192.168.33.2,3260]: successful
Logout of [sid: 2, target: iqn.2009-02.com.example:for.all, portal: 192.168.33.2,3260]: successful
Logout of [sid: 3, target: iqn.2009-03.com.example:for.all, portal: 192.168.33.2,3260]: successful



Expected results:


Additional info:
Comment 1 Mike Christie 2009-02-12 15:21:53 EST
The deadlock will not happen because of how we end up tearing down the structs. For example we never run from the scan workqueue_struct while accessing the container code to add or remove device from another context.

Hans made another bugzilla for this. I will find it. I did a kernel patch to make this go away but it makes the iscsi api a little akward so I have not sent it upstream (still hoping to find a better way).
Comment 2 Kyle McMartin 2009-02-27 08:53:35 EST
Ok, I'm tempted to mark this NOTABUG then... It will go away in release when we disable lockdep (it will continue to show up in -debug kernels then, however.)

(As long as you're sure it's not a real issue, I'll defer to you. :)

cheers, Kyle
Comment 3 Mike Christie 2009-02-27 13:00:59 EST
(In reply to comment #2)
> Ok, I'm tempted to mark this NOTABUG then... It will go away in release when we
> disable lockdep (it will continue to show up in -debug kernels then, however.)
> 
> (As long as you're sure it's not a real issue, I'll defer to you. :)
> 

I could not take this report anymore (we get it every so often in fedora and suse), so I rearranged the code so that the warning code knows for sure this cannot happen.


It is in the iscsi git tree
http://git.kernel.org/?p=linux/kernel/git/mnc/linux-2.6-iscsi.git;a=commit;h=77b57d82aaa585ac77f87db843ead609d390eb26

I am going to push to the scsi maitnainer (he pushes to linus), for the next feature window which at this time would be 2.6.30.

Note You need to log in before you can comment on or make changes to this bug.