Bug 1281840

Summary:	dlm lockdep issue (deadlock)
Product:	[Fedora] Fedora	Reporter:	Andrew Price <anprice>
Component:	kernel	Assignee:	David Teigland <teigland>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rawhide	CC:	gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, rpeterso
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-12-02 22:19:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Andrew Price 2015-11-13 15:01:34 UTC

Description of problem:

DLM lockdep warning with recent kernels.

CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y

Version-Release number of selected component (if applicable):

commit f6d07dfcb15aad199d7351d3122eabd506968daf

How reproducible:

100% or near enough.

Steps to Reproduce:
1. Configure a cluster with dlm and clvmd as in the RHEL7 docs (no gfs2 necessary)
2. Start the cluster on 2 nodes (my cluster actually has 3 but I started 2 and 'pcs stonith confirm'ed the other one.)
3. Stop the cluster

Actual results:

See the lockdep splat below.

Expected results:

No lockdep issues.

Additional info:

[root@rawhide2 ~]# pcs cluster stop --all
[  137.082070] dlm: clvmd: leaving the lockspace group...
[  137.088229] dlm: clvmd: group event done 0 0
[  137.090430] dlm: clvmd: release_lockspace final free
[  137.096723] 
[  137.097030] ======================================================
[  137.097954] [ INFO: possible circular locking dependency detected ]
[  137.098896] 4.3.0-11742-gf6d07df #26 Not tainted
[  137.099601] -------------------------------------------------------
[  137.100549] clvmd/1188 is trying to acquire lock:
[  137.101266]  ((&con->swork)){+.+.+.}, at: [<ffffffff810a2d95>] flush_work+0x5/0x2c0
[  137.102678] 
[  137.102678] but task is already holding lock:
[  137.103468]  (connections_lock){+.+...}, at: [<ffffffff812c05cf>] dlm_lowcomms_stop+0x5f/0xa0
[  137.104911] 
[  137.104911] which lock already depends on the new lock.
[  137.104911] 
[  137.107316] 
[  137.107316] the existing dependency chain (in reverse order) is:
[  137.109314] 
-> #2 (connections_lock){+.+...}:
[  137.110682]        [<ffffffff810de4de>] lock_acquire+0xce/0x1c0
[  137.112353]        [<ffffffff817c7cd9>] mutex_lock_nested+0x79/0x3d0
[  137.114141]        [<ffffffff812bde38>] nodeid2con+0x28/0x240
[  137.115752]        [<ffffffff812be875>] tcp_accept_from_sock+0x1d5/0x580
[  137.117622]        [<ffffffff812bdc34>] process_recv_sockets+0x24/0x30
[  137.119429]        [<ffffffff810a1706>] process_one_work+0x1f6/0x640
[  137.121207]        [<ffffffff810a1b9e>] worker_thread+0x4e/0x450
[  137.122886]        [<ffffffff810a8e1e>] kthread+0xfe/0x120
[  137.124430]        [<ffffffff817ccd5f>] ret_from_fork+0x3f/0x70
[  137.126086] 
-> #1 (&con->sock_mutex){+.+.+.}:
[  137.127448]        [<ffffffff810de4de>] lock_acquire+0xce/0x1c0
[  137.129119]        [<ffffffff817c7cd9>] mutex_lock_nested+0x79/0x3d0
[  137.130893]        [<ffffffff812bfab7>] tcp_connect_to_sock+0x57/0x3b0
[  137.132720]        [<ffffffff812bfe67>] process_send_sockets+0x57/0x2a0
[  137.134566]        [<ffffffff810a1706>] process_one_work+0x1f6/0x640
[  137.136327]        [<ffffffff810a1b9e>] worker_thread+0x4e/0x450
[  137.138009]        [<ffffffff810a8e1e>] kthread+0xfe/0x120
[  137.139541]        [<ffffffff817ccd5f>] ret_from_fork+0x3f/0x70
[  137.141190] 
-> #0 ((&con->swork)){+.+.+.}:
[  137.142477]        [<ffffffff810dd922>] __lock_acquire+0x1a52/0x1a60
[  137.144251]        [<ffffffff810de4de>] lock_acquire+0xce/0x1c0
[  137.145907]        [<ffffffff810a2ddc>] flush_work+0x4c/0x2c0
[  137.146632]        [<ffffffff810a317f>] __cancel_work_timer+0x12f/0x1f0
[  137.147412]        [<ffffffff810a3250>] cancel_work_sync+0x10/0x20
[  137.148139]        [<ffffffff812bdde0>] close_connection+0xf0/0x120
[  137.148864]        [<ffffffff812be121>] free_conn+0x21/0x90
[  137.149518]        [<ffffffff812bdb22>] foreach_conn+0x42/0x60
[  137.150203]        [<ffffffff812c05e7>] dlm_lowcomms_stop+0x77/0xa0
[  137.150922]        [<ffffffff812bc02b>] dlm_release_lockspace+0xfb/0x3e0
[  137.152292]        [<ffffffff812c5753>] device_write+0x353/0x810
[  137.153529]        [<ffffffff8121bf17>] __vfs_write+0x37/0x110
[  137.154766]        [<ffffffff8121c189>] vfs_write+0xa9/0x1a0
[  137.155929]        [<ffffffff8121c3a8>] SyS_write+0x58/0xd0
[  137.157096]        [<ffffffff817cc9f2>] entry_SYSCALL_64_fastpath+0x12/0x76
[  137.158549] 
[  137.158549] other info that might help us debug this:
[  137.158549] 
[  137.160200] Chain exists of:
  (&con->swork) --> &con->sock_mutex --> connections_lock

[  137.162030]  Possible unsafe locking scenario:
[  137.162030] 
[  137.163257]        CPU0                    CPU1
[  137.164193]        ----                    ----
[  137.165091]   lock(connections_lock);
[  137.165895]                                lock(&con->sock_mutex);
[  137.167204]                                lock(connections_lock);
[  137.168518]   lock((&con->swork));
[  137.169272] 
[  137.169272]  *** DEADLOCK ***
[  137.169272] 
[  137.170493] 2 locks held by clvmd/1188:
[  137.171291]  #0:  (&ls_lock){+.+.+.}, at: [<ffffffff812bbf87>] dlm_release_lockspace+0x57/0x3e0
[  137.173099]  #1:  (connections_lock){+.+...}, at: [<ffffffff812c05cf>] dlm_lowcomms_stop+0x5f/0xa0
[  137.174919] 
[  137.174919] stack backtrace:
[  137.175823] CPU: 0 PID: 1188 Comm: clvmd Not tainted 4.3.0-11742-gf6d07df #26
[  137.177276] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  137.178518]  0000000000000000 000000004e1ea547 ffff88001f48fa20 ffffffff8147c609
[  137.180130]  ffffffff826c32f0 ffff88001f48fa60 ffffffff810d90d3 ffff88001f48fad0
[  137.181728]  0000000000000001 ffff88001f479bf0 ffff88001f4794c0 ffff88001f479bc8
[  137.183325] Call Trace:
[  137.183850]  [<ffffffff8147c609>] dump_stack+0x4b/0x72
[  137.184879]  [<ffffffff810d90d3>] print_circular_bug+0x1e3/0x250
[  137.186130]  [<ffffffff810dd922>] __lock_acquire+0x1a52/0x1a60
[  137.187314]  [<ffffffff8114eb5b>] ? is_ftrace_trampoline+0x4b/0x70
[  137.188538]  [<ffffffff810a7ac9>] ? __kernel_text_address+0x39/0x70
[  137.189861]  [<ffffffff8100721a>] ? dump_trace+0x12a/0x320
[  137.191013]  [<ffffffff810de4de>] lock_acquire+0xce/0x1c0
[  137.192137]  [<ffffffff810a2d95>] ? flush_work+0x5/0x2c0
[  137.193243]  [<ffffffff810a2ddc>] flush_work+0x4c/0x2c0
[  137.194300]  [<ffffffff810a2d95>] ? flush_work+0x5/0x2c0
[  137.195392]  [<ffffffff810da58f>] ? mark_held_locks+0x6f/0xa0
[  137.196568]  [<ffffffff817cc1ac>] ? _raw_spin_unlock_irq+0x2c/0x40
[  137.197844]  [<ffffffff810da58f>] ? mark_held_locks+0x6f/0xa0
[  137.199027]  [<ffffffff810a316c>] ? __cancel_work_timer+0x11c/0x1f0
[  137.200323]  [<ffffffff810a317f>] __cancel_work_timer+0x12f/0x1f0
[  137.201583]  [<ffffffff812bdc99>] ? clean_one_writequeue+0x29/0x80
[  137.202860]  [<ffffffff812be100>] ? lowcomms_data_ready+0x40/0x40
[  137.203942]  [<ffffffff810a3250>] cancel_work_sync+0x10/0x20
[  137.205047]  [<ffffffff812bdde0>] close_connection+0xf0/0x120
[  137.206122]  [<ffffffff812be100>] ? lowcomms_data_ready+0x40/0x40
[  137.207296]  [<ffffffff812be121>] free_conn+0x21/0x90
[  137.208334]  [<ffffffff812bdb22>] foreach_conn+0x42/0x60
[  137.209424]  [<ffffffff812c05e7>] dlm_lowcomms_stop+0x77/0xa0
[  137.210613]  [<ffffffff812bc02b>] dlm_release_lockspace+0xfb/0x3e0
[  137.211880]  [<ffffffff812c5753>] device_write+0x353/0x810
[  137.213031]  [<ffffffff810f880d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[  137.214393]  [<ffffffff8121bf17>] __vfs_write+0x37/0x110
[  137.215494]  [<ffffffff8123b8cf>] ? __fget+0x11f/0x200
[  137.216554]  [<ffffffff8123b7b5>] ? __fget+0x5/0x200
[  137.217582]  [<ffffffff8121c189>] vfs_write+0xa9/0x1a0
[  137.218643]  [<ffffffff8123b9da>] ? __fget_light+0x2a/0x90
[  137.219766]  [<ffffffff8121c3a8>] SyS_write+0x58/0xd0
[  137.220806]  [<ffffffff817cc9f2>] entry_SYSCALL_64_fastpath+0x12/0x76
[  141.825107] dlm: closing connection to node 1
[  141.827022] dlm: closing connection to node 2
rawhide2: Stopping Cluster (pacemaker)...
...

Comment 1 David Teigland 2015-11-13 15:26:25 UTC

This reminds me of the problem with Bob's patch that I'd forgotten about.  Could you check if this patch is the issue?

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/dlm?id=b3a5bbfd780d9e9291f5f257be06e9ad6db11657

The other problem with that patch was a null pointer or something like that.  I'm not sure where to find the details of that report.  The conclusion was that we shouldn't call nodeid_to_addr() from lowcomms_error_report().

Comment 2 Andrew Price 2015-11-13 15:47:07 UTC

(In reply to David Teigland from comment #1)
> This reminds me of the problem with Bob's patch that I'd forgotten about. 
> Could you check if this patch is the issue?
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/
> dlm?id=b3a5bbfd780d9e9291f5f257be06e9ad6db11657

I still get the lockdep warning after reverting that patch.

Comment 3 David Teigland 2015-11-13 16:05:46 UTC

I finally remembered this case, and found a record of it in a private email discussion.  Copying from email:

> > It's how nodeid2con() takes connections_lock while already having the
> > sock_mutex, and dlm_lowcomms_stop() actually does it reverse: take
> > connections_lock so it can foreach() on it and close the connections,
> > but for that it will take sock_lock too.
> > 
> > Should I open a bz for this one?
> 
> I'm not certain, but this is vaguely familiar, and I think that the first
> parts of dlm_lowcomms_stop that shut things down and do some initial clean
> up would prevent anything from running nodeid2con while the last foreach
> is running.

I thought it could be something like this, like it's impossible to
actually trigger the deadlock as other workers would be stopped by then
but still, it causes that log polution. And if everyone else is stopped,
we wouldn't need to take that look for traversing the list.. in theory..

Comment 4 David Teigland 2020-12-02 22:19:45 UTC

this lowcomms code is quite different now and will have even more significant change in the next two releases, so this has likely disappeared.