Description of problem: =========== Deploy ceph cluster with squid latest build.On a freshly installed cluster, cluster is in "HEALTH_WARN" state with "mgr module smb crashed" warnings ============ # ceph -s cluster: id: 04f38a26-52a5-11ef-bdb7-fa163e483f37 health: HEALTH_WARN 4 mgr modules have recently crashed services: mon: 3 daemons, quorum ceph-auto-cluster-pd1jue-node1-installer,ceph-auto-cluster-pd1jue-node3,ceph-auto-cluster-pd1jue-node2 (age 12m) mgr: ceph-auto-cluster-pd1jue-node1-installer.jgndkz(active, since 13m), standbys: ceph-auto-cluster-pd1jue-node3.aqfzak mds: 1/1 daemons up, 1 standby osd: 18 osds: 18 up (since 10m), 18 in (since 11m) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 8 pools, 689 pgs objects: 219 objects, 456 KiB usage: 1.2 GiB used, 269 GiB / 270 GiB avail pgs: 689 active+clean io: client: 43 KiB/s rd, 0 B/s wr, 43 op/s rd, 28 op/s wr ======== # ceph crash ls ID ENTITY NEW 2024-08-04T21:04:06.584355Z_0bb9a6ae-fd3f-44a8-b5ec-435621ba7d1b mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz * 2024-08-04T21:04:15.436980Z_000f9207-cb75-463f-ac8a-a72d162ef258 mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz * 2024-08-04T21:04:35.105611Z_3f2dc513-65e8-4c6a-868b-b4842812f990 mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz * 2024-08-04T21:06:12.053317Z_8aee909f-653a-445a-82a0-26f0e6c45b9d mgr.ceph-auto-cluster-pd1jue-node3.aqfzak * =========== # ceph crash info 2024-08-04T21:04:06.584355Z_0bb9a6ae-fd3f-44a8-b5ec-435621ba7d1b { "backtrace": [ " File \"/usr/share/ceph/mgr/smb/__init__.py\", line 7, in <module>\n from .module import Module", " File \"/usr/share/ceph/mgr/smb/module.py\", line 7, in <module>\n from mgr_module import MgrModule, Option, OptionLevel", "ImportError: cannot import name 'OptionLevel' from 'mgr_module' (/usr/share/ceph/mgr/mgr_module.py)" ], "ceph_version": "19.1.0-15.el9cp", "crash_id": "2024-08-04T21:04:06.584355Z_0bb9a6ae-fd3f-44a8-b5ec-435621ba7d1b", "entity_name": "mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz", "mgr_module": "smb", "mgr_module_caller": "PyModule::load_subclass_of", "mgr_python_exception": "ImportError", "os_id": "rhel", "os_name": "Red Hat Enterprise Linux", "os_version": "9.4 (Plow)", "os_version_id": "9.4", "process_name": "ceph-mgr", "stack_sig": "d774555289991228caf1ae9fbdc3c0882773e3938936c6bb7acc1a585701360e", "timestamp": "2024-08-04T21:04:06.584355Z", "utsname_hostname": "ceph-auto-cluster-pd1jue-node1-installer", "utsname_machine": "x86_64", "utsname_release": "5.14.0-427.28.1.el9_4.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP PREEMPT_DYNAMIC Fri Jul 19 14:40:47 EDT 2024" } ============ # ceph health detail HEALTH_WARN 4 mgr modules have recently crashed [WRN] RECENT_MGR_MODULE_CRASH: 4 mgr modules have recently crashed mgr module smb crashed in daemon mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz on host ceph-auto-cluster-pd1jue-node1-installer at 2024-08-04T21:04:06.584355Z mgr module smb crashed in daemon mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz on host ceph-auto-cluster-pd1jue-node1-installer at 2024-08-04T21:04:15.436980Z mgr module smb crashed in daemon mgr.ceph-auto-cluster-pd1jue-node1-installer.jgndkz on host ceph-auto-cluster-pd1jue-node1-installer at 2024-08-04T21:04:35.105611Z mgr module smb crashed in daemon mgr.ceph-auto-cluster-pd1jue-node3.aqfzak on host ceph-auto-cluster-pd1jue-node3 at 2024-08-04T21:06:12.053317Z ========= # ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 6m ago 24m count:1 ceph-exporter 3/3 6m ago 24m * crash 3/3 6m ago 24m * grafana ?:3000 1/1 6m ago 24m count:1 mds.cephfs 2/2 4m ago 18m label:mds mgr 2/2 6m ago 24m count:2 mon 3/5 6m ago 24m count:5 node-exporter ?:9100 3/3 6m ago 24m * osd.all-available-devices 18 6m ago 21m * prometheus ?:9095 1/1 6m ago 24m count:1 rgw.rgw.1 ?:80 2/2 4m ago 20m label:rgw Version-Release number of selected component (if applicable): ================== # ceph --version ceph version 19.1.0-15.el9cp (f552c890eaaac66497a15d2c04b4fc4cab52f209) squid (rc) How reproducible: ============= 1/1 Steps to Reproduce: ============ 1. Configure ceph cluster Actual results: ========== On fresh install ceph cluster, ceph status is in "HEALTH_WARN" state due to smb mgr module crash Expected results: ========== No crashes should be observed and cluster should be in healthy state Additional info:
Please specify the severity of this bug. Severity is defined here: https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.
That's odd because I would assume the smb mgr module is disabled by default. Can you please get the output of: ceph mgr module ls In the meantime I will investigate the traceback shown in the ceph crash info output
Hi Mohit, do you folks want me to put a fix under this new bz number? or do you want to close this as a dupe and to #2300005, calling that one incomplete? I have a WIP fix that I'm going to test so ideally if you make a decision soon I can put in the fix today, my afternoon.