Bug 1281840
Summary: | dlm lockdep issue (deadlock) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Andrew Price <anprice> |
Component: | kernel | Assignee: | David Teigland <teigland> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, rpeterso |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-12-02 22:19:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Andrew Price
2015-11-13 15:01:34 UTC
This reminds me of the problem with Bob's patch that I'd forgotten about. Could you check if this patch is the issue? http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/dlm?id=b3a5bbfd780d9e9291f5f257be06e9ad6db11657 The other problem with that patch was a null pointer or something like that. I'm not sure where to find the details of that report. The conclusion was that we shouldn't call nodeid_to_addr() from lowcomms_error_report(). (In reply to David Teigland from comment #1) > This reminds me of the problem with Bob's patch that I'd forgotten about. > Could you check if this patch is the issue? > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ > dlm?id=b3a5bbfd780d9e9291f5f257be06e9ad6db11657 I still get the lockdep warning after reverting that patch. I finally remembered this case, and found a record of it in a private email discussion. Copying from email:
> > It's how nodeid2con() takes connections_lock while already having the
> > sock_mutex, and dlm_lowcomms_stop() actually does it reverse: take
> > connections_lock so it can foreach() on it and close the connections,
> > but for that it will take sock_lock too.
> >
> > Should I open a bz for this one?
>
> I'm not certain, but this is vaguely familiar, and I think that the first
> parts of dlm_lowcomms_stop that shut things down and do some initial clean
> up would prevent anything from running nodeid2con while the last foreach
> is running.
I thought it could be something like this, like it's impossible to
actually trigger the deadlock as other workers would be stopped by then
but still, it causes that log polution. And if everyone else is stopped,
we wouldn't need to take that look for traversing the list.. in theory..
this lowcomms code is quite different now and will have even more significant change in the next two releases, so this has likely disappeared. |