2302748 – [RHCS 8.0] On a freshly configured ceph cluster, health status is in "HEALTH_WARN"" due to "mgr module smb crashed"

Bug 2302748 - [RHCS 8.0] On a freshly configured ceph cluster, health status is in "HEALTH_WARN"" due to "mgr module smb crashed"

Summary: [RHCS 8.0] On a freshly configured ceph cluster, health status is in "HEALTH_...

Keywords:
Status:	CLOSED DUPLICATE of bug 2300005
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	smb
Sub Component:
Version:	8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	8.0
Assignee:	John Mulligan
QA Contact:	Mohit Bisht
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2300005
TreeView+	depends on / blocked

Reported:	2024-08-04 21:42 UTC by Manisha Saini
Modified:	2024-08-05 14:53 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-08-05 14:53:41 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Manisha Saini 2024-08-04 21:42:49 UTC

Description of problem:
===========

Deploy ceph cluster with squid latest build.On a freshly installed cluster, cluster is in "HEALTH_WARN" state with "mgr module smb crashed" warnings


============
# ceph -s
  cluster:
    id:     04f38a26-52a5-11ef-bdb7-fa163e483f37
    health: HEALTH_WARN
            4 mgr modules have recently crashed

  services:
    mon: 3 daemons, quorum ceph-auto-cluster-pd1jue-node1-installer,ceph-auto-cluster-pd1jue-node3,ceph-auto-cluster-pd1jue-node2 (age 12m)
    mgr: ceph-auto-cluster-pd1jue-node1-installer.jgndkz(active, since 13m), standbys: ceph-auto-cluster-pd1jue-node3.aqfzak
    mds: 1/1 daemons up, 1 standby
    osd: 18 osds: 18 up (since 10m), 18 in (since 11m)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   8 pools, 689 pgs
    objects: 219 objects, 456 KiB
    usage:   1.2 GiB used, 269 GiB / 270 GiB avail
    pgs:     689 active+clean

  io:
    client:   43 KiB/s rd, 0 B/s wr, 43 op/s rd, 28 op/s wr

========


# ceph crash ls
ID                                                                ENTITY                                               NEW
2024-08-04T21:04:06.584355Z_0bb9a6ae-fd3f-44a8-b5ec-435621ba7d1b  mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz   *
2024-08-04T21:04:15.436980Z_000f9207-cb75-463f-ac8a-a72d162ef258  mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz   *
2024-08-04T21:04:35.105611Z_3f2dc513-65e8-4c6a-868b-b4842812f990  mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz   *
2024-08-04T21:06:12.053317Z_8aee909f-653a-445a-82a0-26f0e6c45b9d  mgr.ceph-auto-cluster-pd1jue-node3.aqfzak             *

===========



# ceph crash info 2024-08-04T21:04:06.584355Z_0bb9a6ae-fd3f-44a8-b5ec-435621ba7d1b
{
    "backtrace": [
        "  File \"/usr/share/ceph/mgr/smb/__init__.py\", line 7, in <module>\n    from .module import Module",
        "  File \"/usr/share/ceph/mgr/smb/module.py\", line 7, in <module>\n    from mgr_module import MgrModule, Option, OptionLevel",
        "ImportError: cannot import name 'OptionLevel' from 'mgr_module' (/usr/share/ceph/mgr/mgr_module.py)"
    ],
    "ceph_version": "19.1.0-15.el9cp",
    "crash_id": "2024-08-04T21:04:06.584355Z_0bb9a6ae-fd3f-44a8-b5ec-435621ba7d1b",
    "entity_name": "mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz",
    "mgr_module": "smb",
    "mgr_module_caller": "PyModule::load_subclass_of",
    "mgr_python_exception": "ImportError",
    "os_id": "rhel",
    "os_name": "Red Hat Enterprise Linux",
    "os_version": "9.4 (Plow)",
    "os_version_id": "9.4",
    "process_name": "ceph-mgr",
    "stack_sig": "d774555289991228caf1ae9fbdc3c0882773e3938936c6bb7acc1a585701360e",
    "timestamp": "2024-08-04T21:04:06.584355Z",
    "utsname_hostname": "ceph-auto-cluster-pd1jue-node1-installer",
    "utsname_machine": "x86_64",
    "utsname_release": "5.14.0-427.28.1.el9_4.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC Fri Jul 19 14:40:47 EDT 2024"
}
============


# ceph health detail
HEALTH_WARN 4 mgr modules have recently crashed
[WRN] RECENT_MGR_MODULE_CRASH: 4 mgr modules have recently crashed
    mgr module smb crashed in daemon mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz on host ceph-auto-cluster-pd1jue-node1-installer at 2024-08-04T21:04:06.584355Z
    mgr module smb crashed in daemon mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz on host ceph-auto-cluster-pd1jue-node1-installer at 2024-08-04T21:04:15.436980Z
    mgr module smb crashed in daemon mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz on host ceph-auto-cluster-pd1jue-node1-installer at 2024-08-04T21:04:35.105611Z
    mgr module smb crashed in daemon mgr.ceph-auto-cluster-pd1jue-node3.aqfzak on host ceph-auto-cluster-pd1jue-node3 at 2024-08-04T21:06:12.053317Z

=========

# ceph orch ls
NAME                       PORTS        RUNNING  REFRESHED  AGE  PLACEMENT
alertmanager               ?:9093,9094      1/1  6m ago     24m  count:1
ceph-exporter                               3/3  6m ago     24m  *
crash                                       3/3  6m ago     24m  *
grafana                    ?:3000           1/1  6m ago     24m  count:1
mds.cephfs                                  2/2  4m ago     18m  label:mds
mgr                                         2/2  6m ago     24m  count:2
mon                                         3/5  6m ago     24m  count:5
node-exporter              ?:9100           3/3  6m ago     24m  *
osd.all-available-devices                    18  6m ago     21m  *
prometheus                 ?:9095           1/1  6m ago     24m  count:1
rgw.rgw.1                  ?:80             2/2  4m ago     20m  label:rgw


Version-Release number of selected component (if applicable):
==================
# ceph --version
ceph version 19.1.0-15.el9cp (f552c890eaaac66497a15d2c04b4fc4cab52f209) squid (rc)


How reproducible:
=============
1/1


Steps to Reproduce:
============
1. Configure ceph cluster


Actual results:
==========
On fresh install ceph cluster, ceph status is in "HEALTH_WARN" state due to smb mgr module crash 

Expected results:
==========
No crashes should be observed and cluster should be in healthy state



Additional info:

Comment 1 Storage PM bot 2024-08-04 21:42:59 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 John Mulligan 2024-08-05 14:20:36 UTC

That's odd because I would assume the smb mgr module is disabled by default.

Can you please get the output of: ceph mgr module ls

In the meantime I will investigate the traceback shown in the ceph crash info output

Comment 5 John Mulligan 2024-08-05 14:44:31 UTC

Hi Mohit,
do you folks want me to put a fix under this new bz number?
or do you want to close this as a dupe and to #2300005, calling that one incomplete?

I have a WIP fix that I'm going to test so ideally if you make a decision soon I can put in the fix today, my afternoon.

Note You need to log in before you can comment on or make changes to this bug.