Bug 2269347
| Summary: | osdc/Journaler: better handle ENOENT during replay as up:standby-replay | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Venky Shankar <vshankar> | |
| Component: | CephFS | Assignee: | Venky Shankar <vshankar> | |
| Status: | CLOSED ERRATA | QA Contact: | Hemanth Kumar <hyelloji> | |
| Severity: | medium | Docs Contact: | Akash Raj <akraj> | |
| Priority: | unspecified | |||
| Version: | 7.0 | CC: | akraj, ceph-eng-bugs, cephqe-warriors, hyelloji, tserlin, vereddy | |
| Target Milestone: | --- | |||
| Target Release: | 7.1 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | ceph-18.2.1-84.el9cp | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2269348 (view as bug list) | Environment: | ||
| Last Closed: | 2024-06-13 14:29:22 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2267614, 2269348, 2298578, 2298579 | |||
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3925 |
-15> 2022-07-29T13:23:34.738+0000 7f1ee3d5a700 1 mds.21387370.journaler.mdlog(ro) recover start -14> 2022-07-29T13:23:34.738+0000 7f1ee3d5a700 1 mds.21387370.journaler.mdlog(ro) read_head -13> 2022-07-29T13:23:34.738+0000 7f1ee3d5a700 4 mds.0.log Waiting for journal 0x200 to recover... -12> 2022-07-29T13:23:34.742+0000 7f1ee455b700 1 mds.21387370.journaler.mdlog(ro) _finish_read_head loghead(trim 7788696698880, expire 7788721262080, write 7789114335530, stream_format 1). probing for end of log (from 7789114335530)... -11> 2022-07-29T13:23:34.742+0000 7f1ee455b700 1 mds.21387370.journaler.mdlog(ro) probing for end of the log -10> 2022-07-29T13:23:34.742+0000 7f1eed56d700 10 monclient: get_auth_request con 0x55c5deab7000 auth_method 0 -9> 2022-07-29T13:23:34.755+0000 7f1ee455b700 1 mds.21387370.journaler.mdlog(ro) _finish_probe_end write_pos = 7789125276317 (header had 7789114335530). recovered. -8> 2022-07-29T13:23:34.755+0000 7f1ee3d5a700 4 mds.0.log Journal 0x200 recovered. -7> 2022-07-29T13:23:34.755+0000 7f1ee3d5a700 4 mds.0.log Recovered journal 0x200 in format 1 -6> 2022-07-29T13:23:34.755+0000 7f1ee3d5a700 2 mds.0.0 Booting: 1: loading/discovering base inodes -5> 2022-07-29T13:23:34.755+0000 7f1ee3d5a700 0 mds.0.cache creating system inode with ino:0x100 -4> 2022-07-29T13:23:34.755+0000 7f1ee3d5a700 0 mds.0.cache creating system inode with ino:0x1 -3> 2022-07-29T13:23:34.757+0000 7f1ee455b700 2 mds.0.0 Booting: 2: replaying mds log -2> 2022-07-29T13:23:34.798+0000 7f1ee455b700 0 mds.21387370.journaler.mdlog(ro) _finish_read got error -2 -1> 2022-07-29T13:23:34.800+0000 7f1ee2d58700 -1 /builddir/build/BUILD/ceph-16.2.0/src/mds/MDLog.cc: In function 'void MDLog::_replay_thread()' thread 7f1ee2d58700 time 2022-07-29T13:23:34.799865+0000 /builddir/build/BUILD/ceph-16.2.0/src/mds/MDLog.cc: 1383: FAILED ceph_assert(journaler->is_readable() || mds->is_daemon_stopping()) ceph version 16.2.0-152.el8cp (e456e8b705cb2f4a779689a0d80b122bcb0d67c9) pacific (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f1ef2fc2b60] 2: /usr/lib64/ceph/libceph-common.so.2(+0x274d7a) [0x7f1ef2fc2d7a] 3: (MDLog::_replay_thread()+0x1d7c) [0x55c5dc72e2ec] 4: (MDLog::ReplayThread::entry()+0x11) [0x55c5dc430101] 5: /lib64/libpthread.so.0(+0x817a) [0x7f1ef1d6317a] 6: clone() 0> 2022-07-29T13:23:34.801+0000 7f1ee2d58700 -1 *** Caught signal (Aborted) ** in thread 7f1ee2d58700 thread_name:md_log_replay ceph version 16.2.0-152.el8cp (e456e8b705cb2f4a779689a0d80b122bcb0d67c9) pacific (stable) 1: /lib64/libpthread.so.0(+0x12c20) [0x7f1ef1d6dc20] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f1ef2fc2bb1] 5: /usr/lib64/ceph/libceph-common.so.2(+0x274d7a) [0x7f1ef2fc2d7a] 6: (MDLog::_replay_thread()+0x1d7c) [0x55c5dc72e2ec] 7: (MDLog::ReplayThread::entry()+0x11) [0x55c5dc430101] 8: /lib64/libpthread.so.0(+0x817a) [0x7f1ef1d6317a] 9: clone() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Journaler should not cause the MDS to assert in this situation. We should handle this more gracefully.