Bug 2110797
Summary: | MGR - libcephsqlite: crash cause of regex treating '-' as a range operator | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Vikhyat Umrao <vumrao> |
Component: | RADOS | Assignee: | Neha Ojha <nojha> |
Status: | CLOSED ERRATA | QA Contact: | Pawan <pdhiran> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 5.1 | CC: | akupczyk, amathuri, bhubbard, ceph-eng-bugs, cephqe-warriors, choffman, ekristov, kdreyer, ksirivad, lflores, nojha, pdhange, pdhiran, rfriedma, rzarzyns, sseshasa, vereddy, vumrao |
Target Milestone: | --- | Keywords: | CodeChange |
Target Release: | 6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ceph-17.2.3-1.el9cp | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-03-20 18:57:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Vikhyat Umrao
2022-07-25 22:47:48 UTC
Quincy backport - https://github.com/ceph/ceph/pull/47270 (In reply to Vikhyat Umrao from comment #1) > Quincy backport - https://github.com/ceph/ceph/pull/47270 This one is now merged upstream. Ken and Thomas - this bug would need your attention. The issue was found upstream in 17.2.2 when upstream users updated their clusters to 17.2.2 and the MGR started crashing. The root cause of this bug is not the Ceph code it is the platform issue. When 17.2.2 was built upstream in CentOS 8 stream the CentOS 8 Stream started using the GCC version 8.5.0-14 and it started hitting this issue while 17.2.1 was built with GCC version 8.5.0-13. Here is the RHEL bug - https://bugzilla.redhat.com/show_bug.cgi?id=2001788 For more details please check - https://tracker.ceph.com/issues/55304#note-15. We need to verify downstream if I remember correctly we are using RHEL 9 for RHCS 6 so looks like this issue might be present in GCC version 8.5.0-14 and above and RHCS built on top of RHEL 9 might see this. QE team - heads up can you please check what is the RHEL 9 GCC version inside MGR POD. cephadm shell --name <mgr daemon name> rpm -qa | grep gcc Example from 17.2.1 upstream cluster # cephadm shell --name mgr.f28-h28-000-r630.rdu2.scalelab.redhat.com.ffcshc Inferring fsid f0d9ede0-0e82-11ed-b543-000af7995756 Inferring config /var/lib/ceph/f0d9ede0-0e82-11ed-b543-000af7995756/mgr.f28-h28-000-r630.rdu2.scalelab.redhat.com.ffcshc/config Using ceph image with id 'e5af760fa1c1' and tag 'v17.2.1' created on 2022-06-23 19:49:45 +0000 UTC quay.io/ceph/ceph@sha256:d3f3e1b59a304a280a3a81641ca730982da141dad41e942631e4c5d88711a66b [ceph: root@f28-h28-000-r630 /]# rpm -qa | grep gcc libgcc-8.5.0-13.el8.x86_64 [ceph: root@f28-h28-000-r630 /]# cat /etc/redhat-release CentOS Stream release 8 Hey Vikhyat, here is the gcc version being used. Collected the info by logging into the pod via podman exec. # podman exec -it 0a2ac3ebeb5c /bin/bash [root@ceph-pdhiran-7g0lbr-node1-installer /]# rpm -qa | grep gcc libgcc-11.2.1-9.4.el9.x86_64 [root@ceph-pdhiran-7g0lbr-node1-installer /]# cat /etc/redhat-release Red Hat Enterprise Linux release 9.0 (Plow) When tried via cephadm, hit the error: # cephadm shell --name ceph-pdhiran-7g0lbr-node1-installer.vjunfz Inferring fsid a3952e4c-0eef-11ed-82e2-fa163e862102 Inferring config /var/lib/ceph/a3952e4c-0eef-11ed-82e2-fa163e862102/ceph-pdhiran-7g0lbr-node1-installer.vjunfz/config Using ceph image with id '0cdbd0431cdd' and tag 'ceph-6.0-rhel-9-containers-candidate-12601-20220727135926' created on 2022-07-27 14:01:47 +0000 UTC registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:11f49ce9958cf4ef4967920d12344ff047556dc2a047cf25604ef1f0d4e00bbb Error: statfs /var/lib/ceph/a3952e4c-0eef-11ed-82e2-fa163e862102/ceph-pdhiran-7g0lbr-node1-installer.vjunfz/config: no such file or directory [root@ceph-pdhiran-7g0lbr-node1-installer cephuser]# cephadm shell --name ceph-a3952e4c-0eef-11ed-82e2-fa163e862102-mgr-ceph-pdhiran-7g0lbr-node1-installer-vjunfz Inferring fsid a3952e4c-0eef-11ed-82e2-fa163e862102 Traceback (most recent call last): File "/sbin/cephadm", line 9281, in <module> main() File "/sbin/cephadm", line 9269, in main r = ctx.func(ctx) File "/sbin/cephadm", line 2034, in _infer_config return func(ctx) File "/sbin/cephadm", line 1979, in _infer_fsid return func(ctx) File "/sbin/cephadm", line 2022, in _infer_config ctx.config = config_path(name.split('.', 1)[0], name.split('.', 1)[1]) IndexError: list index out of range (In reply to Pawan from comment #3) > Hey Vikhyat, here is the gcc version being used. > Collected the info by logging into the pod via podman exec. > # podman exec -it 0a2ac3ebeb5c /bin/bash > [root@ceph-pdhiran-7g0lbr-node1-installer /]# rpm -qa | grep gcc > libgcc-11.2.1-9.4.el9.x86_64 > [root@ceph-pdhiran-7g0lbr-node1-installer /]# cat /etc/redhat-release > Red Hat Enterprise Linux release 9.0 (Plow) > > When tried via cephadm, hit the error: > # cephadm shell --name ceph-pdhiran-7g0lbr-node1-installer.vjunfz > Inferring fsid a3952e4c-0eef-11ed-82e2-fa163e862102 > Inferring config > /var/lib/ceph/a3952e4c-0eef-11ed-82e2-fa163e862102/ceph-pdhiran-7g0lbr-node1- > installer.vjunfz/config > Using ceph image with id '0cdbd0431cdd' and tag > 'ceph-6.0-rhel-9-containers-candidate-12601-20220727135926' created on > 2022-07-27 14:01:47 +0000 UTC > registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256: > 11f49ce9958cf4ef4967920d12344ff047556dc2a047cf25604ef1f0d4e00bbb > Error: statfs > /var/lib/ceph/a3952e4c-0eef-11ed-82e2-fa163e862102/ceph-pdhiran-7g0lbr-node1- > installer.vjunfz/config: no such file or directory > [root@ceph-pdhiran-7g0lbr-node1-installer cephuser]# cephadm shell --name > ceph-a3952e4c-0eef-11ed-82e2-fa163e862102-mgr-ceph-pdhiran-7g0lbr-node1- > installer-vjunfz > Inferring fsid a3952e4c-0eef-11ed-82e2-fa163e862102 > Traceback (most recent call last): > File "/sbin/cephadm", line 9281, in <module> > main() > File "/sbin/cephadm", line 9269, in main > r = ctx.func(ctx) > File "/sbin/cephadm", line 2034, in _infer_config > return func(ctx) > File "/sbin/cephadm", line 1979, in _infer_fsid > return func(ctx) > File "/sbin/cephadm", line 2022, in _infer_config > ctx.config = config_path(name.split('.', 1)[0], name.split('.', 1)[1]) > IndexError: list index out of range The command is wrong! It should have the tag `mgr.` cephadm shell --name mgr.ceph-pdhiran-7g0lbr-node1-installer.vjunfz Since this was due to a compiler change, will this affect versions of Ceph below quincy? Like RHCS 4 (nautilus) or 5 (pacific)? (In reply to Ken Dreyer (Red Hat) from comment #5) > Since this was due to a compiler change, will this affect versions of Ceph > below quincy? Like RHCS 4 (nautilus) or 5 (pacific)? Ken - in the internal discussion per Neha, the issue affects pacific as well, the only difference is that the devicehealth module does not use libcephsqlite there, so it is less likely to hit this issue in pacific and we have not seen any reports from nautilus and pacific. But I will have Neha check on this and let us know her feedback. Just to be clear - the version of gcc is only important at *build* time. We don't have any evidence indicating runtime gcc version matters. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 6.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:1360 |