Bug 2110797 - MGR - libcephsqlite: crash cause of regex treating '-' as a range operator
Summary: MGR - libcephsqlite: crash cause of regex treating '-' as a range operator
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 5.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 6.0
Assignee: Neha Ojha
QA Contact: Pawan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-25 22:47 UTC by Vikhyat Umrao
Modified: 2022-10-05 13:50 UTC (History)
19 users (show)

Fixed In Version: ceph-17.2.3-1.el9cp
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 55304 0 None None None 2022-07-28 22:37:31 UTC
Ceph Project Bug Tracker 56702 0 None None None 2022-07-25 22:47:47 UTC
Github ceph ceph pull 47270 0 None open quincy: libcephsqlite: ceph-mgr crashes when compiled with gcc12 2022-07-25 22:58:28 UTC
Red Hat Issue Tracker RHCEPH-4915 0 None None None 2022-07-25 22:49:58 UTC

Description Vikhyat Umrao 2022-07-25 22:47:48 UTC
Description of problem:
MGR - libcephsqlite: crash cause of regex treating '-' as a range operator

Original report - https://tracker.ceph.com/issues/55304
Rook report - https://tracker.ceph.com/issues/56700


Quincy backport - https://tracker.ceph.com/issues/56702

Version-Release number of selected component (if applicable):
Upstream quincy
RHCS 6

Comment 1 Vikhyat Umrao 2022-07-25 22:58:29 UTC
Quincy backport - https://github.com/ceph/ceph/pull/47270

Comment 2 Vikhyat Umrao 2022-07-28 22:34:41 UTC
(In reply to Vikhyat Umrao from comment #1)
> Quincy backport - https://github.com/ceph/ceph/pull/47270

This one is now merged upstream.

Ken and Thomas - this bug would need your attention. The issue was found upstream in 17.2.2 when upstream users updated their clusters to 17.2.2 and the MGR started crashing. The root cause of this bug is not the Ceph code it is the platform issue. When 17.2.2 was built upstream in CentOS 8 stream the CentOS 8 Stream started using the GCC version 8.5.0-14 and it started hitting this issue while 17.2.1 was built with GCC version 8.5.0-13.

Here is the RHEL bug - https://bugzilla.redhat.com/show_bug.cgi?id=2001788

For more details please check - https://tracker.ceph.com/issues/55304#note-15.

We need to verify downstream if I remember correctly we are using RHEL 9 for RHCS 6 so looks like this issue might be present in GCC version 8.5.0-14 and above and RHCS built on top of RHEL 9 might see this. 

QE team - heads up can you please check what is the RHEL 9 GCC version inside MGR POD.

cephadm shell --name <mgr daemon name>

rpm -qa | grep gcc


Example from 17.2.1 upstream cluster


# cephadm shell --name mgr.f28-h28-000-r630.rdu2.scalelab.redhat.com.ffcshc
Inferring fsid f0d9ede0-0e82-11ed-b543-000af7995756
Inferring config /var/lib/ceph/f0d9ede0-0e82-11ed-b543-000af7995756/mgr.f28-h28-000-r630.rdu2.scalelab.redhat.com.ffcshc/config
Using ceph image with id 'e5af760fa1c1' and tag 'v17.2.1' created on 2022-06-23 19:49:45 +0000 UTC
quay.io/ceph/ceph@sha256:d3f3e1b59a304a280a3a81641ca730982da141dad41e942631e4c5d88711a66b

[ceph: root@f28-h28-000-r630 /]# rpm -qa | grep gcc 
libgcc-8.5.0-13.el8.x86_64

[ceph: root@f28-h28-000-r630 /]# cat /etc/redhat-release 
CentOS Stream release 8

Comment 3 Pawan 2022-07-29 16:09:12 UTC
Hey Vikhyat, here is the gcc version being used.
Collected the info by logging into the pod via podman exec.
# podman exec -it 0a2ac3ebeb5c /bin/bash
[root@ceph-pdhiran-7g0lbr-node1-installer /]#  rpm -qa | grep gcc
libgcc-11.2.1-9.4.el9.x86_64
[root@ceph-pdhiran-7g0lbr-node1-installer /]# cat /etc/redhat-release
Red Hat Enterprise Linux release 9.0 (Plow)

When tried via cephadm, hit the error:
# cephadm shell --name ceph-pdhiran-7g0lbr-node1-installer.vjunfz
Inferring fsid a3952e4c-0eef-11ed-82e2-fa163e862102
Inferring config /var/lib/ceph/a3952e4c-0eef-11ed-82e2-fa163e862102/ceph-pdhiran-7g0lbr-node1-installer.vjunfz/config
Using ceph image with id '0cdbd0431cdd' and tag 'ceph-6.0-rhel-9-containers-candidate-12601-20220727135926' created on 2022-07-27 14:01:47 +0000 UTC
registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:11f49ce9958cf4ef4967920d12344ff047556dc2a047cf25604ef1f0d4e00bbb
Error: statfs /var/lib/ceph/a3952e4c-0eef-11ed-82e2-fa163e862102/ceph-pdhiran-7g0lbr-node1-installer.vjunfz/config: no such file or directory
[root@ceph-pdhiran-7g0lbr-node1-installer cephuser]# cephadm shell --name ceph-a3952e4c-0eef-11ed-82e2-fa163e862102-mgr-ceph-pdhiran-7g0lbr-node1-installer-vjunfz
Inferring fsid a3952e4c-0eef-11ed-82e2-fa163e862102
Traceback (most recent call last):
  File "/sbin/cephadm", line 9281, in <module>
    main()
  File "/sbin/cephadm", line 9269, in main
    r = ctx.func(ctx)
  File "/sbin/cephadm", line 2034, in _infer_config
    return func(ctx)
  File "/sbin/cephadm", line 1979, in _infer_fsid
    return func(ctx)
  File "/sbin/cephadm", line 2022, in _infer_config
    ctx.config = config_path(name.split('.', 1)[0], name.split('.', 1)[1])
IndexError: list index out of range

Comment 4 Vikhyat Umrao 2022-07-29 16:13:59 UTC
(In reply to Pawan from comment #3)
> Hey Vikhyat, here is the gcc version being used.
> Collected the info by logging into the pod via podman exec.
> # podman exec -it 0a2ac3ebeb5c /bin/bash
> [root@ceph-pdhiran-7g0lbr-node1-installer /]#  rpm -qa | grep gcc
> libgcc-11.2.1-9.4.el9.x86_64
> [root@ceph-pdhiran-7g0lbr-node1-installer /]# cat /etc/redhat-release
> Red Hat Enterprise Linux release 9.0 (Plow)
> 
> When tried via cephadm, hit the error:
> # cephadm shell --name ceph-pdhiran-7g0lbr-node1-installer.vjunfz
> Inferring fsid a3952e4c-0eef-11ed-82e2-fa163e862102
> Inferring config
> /var/lib/ceph/a3952e4c-0eef-11ed-82e2-fa163e862102/ceph-pdhiran-7g0lbr-node1-
> installer.vjunfz/config
> Using ceph image with id '0cdbd0431cdd' and tag
> 'ceph-6.0-rhel-9-containers-candidate-12601-20220727135926' created on
> 2022-07-27 14:01:47 +0000 UTC
> registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:
> 11f49ce9958cf4ef4967920d12344ff047556dc2a047cf25604ef1f0d4e00bbb
> Error: statfs
> /var/lib/ceph/a3952e4c-0eef-11ed-82e2-fa163e862102/ceph-pdhiran-7g0lbr-node1-
> installer.vjunfz/config: no such file or directory
> [root@ceph-pdhiran-7g0lbr-node1-installer cephuser]# cephadm shell --name
> ceph-a3952e4c-0eef-11ed-82e2-fa163e862102-mgr-ceph-pdhiran-7g0lbr-node1-
> installer-vjunfz
> Inferring fsid a3952e4c-0eef-11ed-82e2-fa163e862102
> Traceback (most recent call last):
>   File "/sbin/cephadm", line 9281, in <module>
>     main()
>   File "/sbin/cephadm", line 9269, in main
>     r = ctx.func(ctx)
>   File "/sbin/cephadm", line 2034, in _infer_config
>     return func(ctx)
>   File "/sbin/cephadm", line 1979, in _infer_fsid
>     return func(ctx)
>   File "/sbin/cephadm", line 2022, in _infer_config
>     ctx.config = config_path(name.split('.', 1)[0], name.split('.', 1)[1])
> IndexError: list index out of range

The command is wrong! It should have the tag `mgr.`

cephadm shell --name mgr.ceph-pdhiran-7g0lbr-node1-installer.vjunfz

Comment 5 Ken Dreyer (Red Hat) 2022-07-29 19:18:48 UTC
Since this was due to a compiler change, will this affect versions of Ceph below quincy? Like RHCS 4 (nautilus) or 5 (pacific)?

Comment 7 Vikhyat Umrao 2022-07-29 19:54:40 UTC
(In reply to Ken Dreyer (Red Hat) from comment #5)
> Since this was due to a compiler change, will this affect versions of Ceph
> below quincy? Like RHCS 4 (nautilus) or 5 (pacific)?

Ken - in the internal discussion per Neha, the issue affects pacific as well, the only difference is that the devicehealth module does not use libcephsqlite there, so it is less likely to hit this issue in pacific and we have not seen any reports from nautilus and pacific.

But I will have Neha check on this and let us know her feedback.

Comment 8 David Galloway 2022-07-29 21:56:17 UTC
Just to be clear - the version of gcc is only important at *build* time.  We don't have any evidence indicating runtime gcc version matters.


Note You need to log in before you can comment on or make changes to this bug.