Bug 2008541
Summary: | gfs2: schedule while atomic | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Alexander Aring <aahringo> |
Component: | kernel | Assignee: | Andreas Gruenbacher <agruenba> |
kernel sub component: | GFS-GFS2 | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | adas, agruenba, gfs2-maint |
Version: | 9.0 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-5.14.0-59.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-05-17 15:40:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alexander Aring
2021-09-28 13:59:04 UTC
https://listman.redhat.com/archives/cluster-devel/2021-September/msg00082.html possible solution for it? Alex, I'm having difficulties following your problem description and what the actual call stack is; I don't see thaw_glock there at all. In general, the glock code is pretty careful not do drop the final reference to a glock while holding rcu_read_lock or a spin lock; instead, whenever it needs to drop a reference that might be the final one in such a context, it delegates that to glock_work_func (see glock_work_func, gfs2_glock_queue_put, __gfs2_glock_queue_work). Do you have any more data? Hmm, I see now that thaw_glock calls gfs2_glock_put when it hits a glock that doesn't need thawing. So when gfs2_control_func calls gfs2_glock_thaw, that gfs2_glock_put can indeed lead to the bug you describe; it's only the stack trace that's been confusing me. A quick fix is to call gfs2_glock_queue_put instead of gfs2_glock_put in thaw_glock. Taking glock references unnecessarily during glock_hash_walk has always been ugly though, so maybe we should get rid of that instead. clearing needinfo, I think it's not necessary anymore. Regression tests passed with kernel-5.14.0-54.mr242_220204_1900.el9.x86_64 Regression tests passed with kernel-5.14.0-70.2.1.el9_0.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: kernel), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:3907 |