Bug 2161481
| Summary: | mds: md_log_replay thread (replay thread) can remain blocked | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Venky Shankar <vshankar> | |
| Component: | CephFS | Assignee: | Venky Shankar <vshankar> | |
| Status: | CLOSED ERRATA | QA Contact: | Hemanth Kumar <hyelloji> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 5.2 | CC: | ceph-eng-bugs, cephqe-warriors, hyelloji, tserlin, vereddy | |
| Target Milestone: | --- | Flags: | hyelloji:
needinfo-
|
|
| Target Release: | 5.3z1 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | ceph-16.2.10-105.el8cp | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2161483 (view as bug list) | Environment: | ||
| Last Closed: | 2023-02-28 10:06:24 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2161483 | |||
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 5.3 Bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:0980 |
(copied from upstream tracker) In production environment, we have a problem: one standby-replay's md_log_replay thread is hanged. 1,The reason: line1: while (!journaler->is_readable() && line2: journaler->get_read_pos() < journaler->get_write_pos() && line3: !journaler->get_error()) { line4: C_SaferCond readable_waiter; line5: journaler->wait_for_readable(&readable_waiter); line6: r = readable_waiter.wait(); line7: } This code is from void MDLog::_replay_thread(). (1), If the code enter the while and this thread ("md_log_replay") is switched to the MR_Finisher thread between line3 and line5. (HERE: journaler->get_read_pos() < journaler->get_write_pos()) (2), Then the MR_Finisher thread calls Journaler::C_Read: finish ls->_finish_read() -> _assimilate_prefetch(). a) In _assimilate_prefetch(), journaler->get_write_pos() maybe set to be equal to journaler->get_read_pos(). b) Because the variable on_readable is 0, the f->complete() will not be called! if (on_readable) { C_OnFinisher *f = on_readable; on_readable = 0; f->complete(0); } (3),Then the MR_Finisher thread is switched to the md_log_replay thread, it will hang on line6 forever !!