+++ This bug was initially created as a clone of Bug #2219093 +++ Description of problem: Opening another BZ quickly, since the case attached to this bz is hot. The active MGR process on bmwcefesp000001 hung causing the mgr failover to the standby daemon on bmwcefesp000003. Version-Release number of selected component (if applicable): External ODF RHCS 5.1 Additional info: - Core dumps and "ceph -s" output are incoming. Perf dump was taken while mgr process was hung. It's in supportshell at: 0700-mgr-bmwcefesp000001-perf-top.txt --- Additional comment from on 2023-07-01 23:20:01 UTC --- Unfortunately we hit https://bugzilla.redhat.com/show_bug.cgi?id=2170213 so we weren't able to capture a coredump. -Brandon --- Additional comment from on 2023-07-01 23:28:07 UTC --- [ansautomat@bmwcefesp000001 ~]$ sudo podman exec -it ceph-mon-$(hostname -s) ceph -s cluster: id: 8d2323ac-0820-49ad-b917-059ba00794f2 health: HEALTH_WARN 1 daemons have recently crashed services: mon: 3 daemons, quorum bmwcefesp000003,bmwcefesp000002,bmwcefesp000001 (age 2d) mgr: bmwcefesp000003(active, since 63m), standbys: bmwcefesp000002, bmwcefesp000001 mds: 2/2 daemons up, 1 standby osd: 324 osds: 324 up (since 23h), 324 in (since 23h) data: volumes: 1/1 healthy pools: 13 pools, 10625 pgs objects: 106.42M objects, 41 TiB usage: 131 TiB used, 485 TiB / 616 TiB avail pgs: 10625 active+clean io: client: 16 MiB/s rd, 94 MiB/s wr, 865 op/s rd, 1.87k op/s wr [ansautomat@bmwcefesp000001 ~]$ sudo podman exec -it ceph-mon-$(hostname -s) ceph health detail HEALTH_WARN 1 daemons have recently crashed [WRN] RECENT_CRASH: 1 daemons have recently crashed mgr.bmwcefesp000001 crashed on host bmwcefesp000001.advancedp.org at 2023-07-01T23:00:55.691802Z --- Additional comment from on 2023-07-01 23:44:12 UTC --- ^^The above "ceph -s" output was taken after we SIGKILLED the hung mgr on bmwcefesp000001.^^ - Mgr logs located in supportshell: 0710-ceph-mgr.bmwcefesp000001.log - As stated, core dump could not be generated due to https://bugzilla.redhat.com/show_bug.cgi?id=2170213: [ansautomat@bmwcefesp000001 ~]$ sudo coredumpctl dump -o mgr-bmwcefesp000001-coredump.txt PID: 861251 (ceph-mgr) UID: 167 (167) GID: 167 (167) Signal: 6 (ABRT) Timestamp: Sat 2023-07-01 23:00:56 UTC (21min ago) Command Line: /usr/bin/ceph-mgr --cluster ceph --setuser ceph --setgroup ceph --default-log-to-stderr=true --err-to-stderr=true --default-log-to-file=false --foreground -i bmwcefesp000001 Executable: /usr/bin/ceph-mgr Control Group: /machine.slice/libpod-d030bd2c8bdfe2219c8bf0f888b0ff4df66f96c31d6eb5257c65b43fbc5fbfa0.scope Unit: libpod-d030bd2c8bdfe2219c8bf0f888b0ff4df66f96c31d6eb5257c65b43fbc5fbfa0.scope Slice: machine.slice Boot ID: 00792aa77e3a47fba7e8d900139eacf7 Machine ID: 096034123a2b40fd9421f3196736558d Hostname: bmwcefesp000001.advancedp.org Storage: none Message: Process 861251 (ceph-mgr) of user 167 dumped core. Coredump entry has no core attached (neither internally in the journal nor externally on disk). - Brandon --- Additional comment from Vikhyat Umrao on 2023-07-02 02:42:00 UTC --- This is from the DXC/BMW CephFS escalation - https://bugzilla.redhat.com/show_bug.cgi?id=2218688. Greg and I were called in the customer remote session when we were informed that they do not see Stand-by MGRs in ceph -s. When we joined the remote session these were the events - MGR03 was the active MGR and 01 and 02 were not visible as standby in ceph -s - We checked both MGR01 and 02 both have processes up and running - Restarting 02 made it to join the standby with 03 as active mgr - We thought before restarting 01 let us capture, perf top, MGR logs and coredump + perf top + MGR logs are available + but coredump capture failed as explained above - For capturing the coredump we did `kill -SIGABRT <01 PID> this made MGR 01 restart and it also joined 02 as standby and 03 being active. --- Additional comment from Brad Hubbard on 2023-07-02 04:11:20 UTC --- It may be easier to get cpu usage per thread and then a couple of thread dumps to see what the thread with high cpu usage is actually doing. If we do get a core make sure we identify which thread has high cpu usage to make it easier to identify where we should focus. --- Additional comment from Vikhyat Umrao on 2023-07-02 04:15:13 UTC --- (In reply to Brad Hubbard from comment #5) > It may be easier to get cpu usage per thread and then a couple of thread > dumps > to see what the thread with high cpu usage is actually doing. If we do get a > core make sure we identify which thread has high cpu usage to make it easier > to > identify where we should focus. Slack link - where the wallclock profiler and poormanprofiler got discussed - https://redhat-internal.slack.com/archives/C05EQ6XHL78/p1688270148381289 --- Additional comment from Brad Hubbard on 2023-07-02 04:15:52 UTC --- Please also get output showing whether the implicated thread is in d-state or not and also wchan details (some incantation of ps should get you these fields). --- Additional comment from Brad Hubbard on 2023-07-02 04:26:59 UTC --- If a coredump is gathered please also provide full details of the exact container image in use when the coredump happened to expedite setting up an environment to analyse the core. --- Additional comment from Ashish Singh on 2023-07-02 05:33:26 UTC --- We could be hitting the bug https://bugzilla.redhat.com/show_bug.cgi?id=2106031. For now we're capturing the following: - Debug logs (debug_mgr 20, debug_monc 20, debug_ms 5) - Threads for MGR daemon - Thread dump for threads with High CPU --- Additional comment from Karun Josy on 2023-07-02 10:24:20 UTC --- Hello everyone, In the remote session, we noticed only 1 active MGR and no-standby again. I restarted one to have at least 1 standby. And decided to collect the logs from the third one which is missing in #ceph -s. *** MGR03 ---> bmwcefesp000003 is the one that is currently running but missing in #ceph -s **** Logs collected : In support shell : - Debug logs from MGR03 (debug_mgr 20, debug_monc 20, debug_ms 5) -----> /cases/03544845/0760-mgr03_journal_20230702_06:19:36.txt - netstat output ----> /cases/03544845/0770-ntstat_bmwcefesp000003_06:11:04.txt - pstack text : /cases/03544845/0800-core-pid-dump-2023-07-02.tar.gz/core-pid-dump-2023-07-02 - gcore pid : /cases/03544845/0800-core-pid-dump-2023-07-02.tar.gz/core.768914 - container version : advantage-ceph-images-rhceph_rhceph-5-rhel8:5-235 - # rpm -qa | grep glibc and # rpm -qa | grep ceph from within container - top -H -p 768914 - sudo cat /proc/768914/stack - sudo cat /proc/768914/status ----------------------- ~]$ top -H -p 768914 top - 05:50:36 up 2 days, 14:46, 1 user, load average: 1.74, 1.78, 1.81 Threads: 163 total, 1 running, 162 sleeping, 0 stopped, 0 zombie %Cpu(s): 5.1 us, 2.5 sy, 0.0 ni, 91.1 id, 0.1 wa, 0.5 hi, 0.7 si, 0.0 st MiB Mem : 386352.3 total, 281340.1 free, 93432.1 used, 11580.0 buff/cache MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 290653.7 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 822152 ceph 20 0 6995340 1.0g 53016 R 16.3 0.3 35:53.05 msgr-worker-2 822151 ceph 20 0 6995340 1.0g 53016 S 14.6 0.3 45:11.16 msgr-worker-1 834736 ceph 20 0 6995340 1.0g 53016 S 0.7 0.3 1:20.85 crash 834775 ceph 20 0 6995340 1.0g 53016 S 0.7 0.3 1:18.67 telemetry 834776 ceph 20 0 6995340 1.0g 53016 S 0.7 0.3 1:19.98 mgr-fin 834787 ceph 20 0 6995340 1.0g 53016 S 0.7 0.3 1:18.12 mgr-fin 834788 ceph 20 0 6995340 1.0g 53016 S 0.7 0.3 9:32.67 mgr-fin 834793 ceph 20 0 6995340 1.0g 53016 S 0.7 0.3 1:20.85 mgr-fin 834796 ceph 20 0 6995340 1.0g 53016 S 0.7 0.3 9:23.91 mgr-fin 834798 ceph 20 0 6995340 1.0g 53016 S 0.7 0.3 1:21.94 mgr-fin 834831 ceph 20 0 6995340 1.0g 53016 S 0.7 0.3 2:41.94 dashboard 834735 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 2:10.11 balancer 834738 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 1:18.10 devicehealth 834741 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 2:10.28 pg_autoscaler 834742 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 1:21.20 progress 834754 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 13:07.54 prometheus 834755 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 2:41.95 prometheus 834766 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 1:31.76 mgr-fin 834767 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 1:23.17 mgr-fin 834768 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 1:22.24 mgr-fin 834772 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 1:30.30 mgr-fin 834792 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 1:19.69 mgr-fin 834794 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 8:15.70 mgr-fin 834797 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 11:26.84 mgr-fin 834899 ceph 20 0 6995340 1.0g 53016 S 0.3 0.3 0:05.19 mgr-fin 768914 ceph 20 0 6995340 1.0g 53016 S 0.0 0.3 0:07.32 ceph-mgr 822148 ceph 20 0 6995340 1.0g 53016 S 0.0 0.3 12:45.56 log $ sudo cat /proc//768914/status Name: ceph-mgr Umask: 0022 State: S (sleeping) Tgid: 768914 Ngid: 768914 Pid: 768914 PPid: 768853 TracerPid: 0 Uid: 167 167 167 167 Gid: 167 167 167 167 FDSize: 4096 Groups: NStgid: 768914 55 NSpid: 768914 55 NSpgid: 768914 55 NSsid: 768853 1 VmPeak: 7006864 kB VmSize: 6995340 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 1095708 kB VmRSS: 1070588 kB RssAnon: 1017572 kB RssFile: 53016 kB RssShmem: 0 kB VmData: 6638364 kB VmStk: 968 kB VmExe: 4892 kB VmLib: 129984 kB VmPTE: 3492 kB VmSwap: 0 kB HugetlbPages: 0 kB CoreDumping: 0 THP_enabled: 0 Threads: 163 SigQ: 0/1545256 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000001000 SigIgn: 0000000000000000 SigCgt: 00000001c18044eb CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 00000000800425fb CapAmb: 0000000000000000 NoNewPrivs: 0 Seccomp: 2 Speculation_Store_Bypass: thread vulnerable Cpus_allowed: ffffffff Cpus_allowed_list: 0-31 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003 Mems_allowed_list: 0-1 voluntary_ctxt_switches: 1633 nonvoluntary_ctxt_switches: 76 $ sudo cat /proc/768914/stack [<0>] futex_wait_queue_me+0xa3/0x100 [<0>] futex_wait+0x11f/0x210 [<0>] do_futex+0x2f9/0x4d0 [<0>] __x64_sys_futex+0x145/0x1f0 [<0>] do_syscall_64+0x5b/0x1b0 [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ansautomat@bmwcefesp000003 ~]$ # rpm -qa | grep ceph python3-cephfs-16.2.7-126.el8cp.x86_64 ceph-base-16.2.7-126.el8cp.x86_64 ceph-prometheus-alerts-16.2.7-126.el8cp.noarch ceph-mgr-diskprediction-local-16.2.7-126.el8cp.noarch ceph-mgr-rook-16.2.7-126.el8cp.noarch ceph-osd-16.2.7-126.el8cp.x86_64 libcephfs2-16.2.7-126.el8cp.x86_64 python3-ceph-argparse-16.2.7-126.el8cp.x86_64 ceph-common-16.2.7-126.el8cp.x86_64 ceph-selinux-16.2.7-126.el8cp.x86_64 ceph-grafana-dashboards-16.2.7-126.el8cp.noarch ceph-mgr-dashboard-16.2.7-126.el8cp.noarch ceph-mgr-k8sevents-16.2.7-126.el8cp.noarch ceph-mgr-16.2.7-126.el8cp.x86_64 ceph-iscsi-3.5-3.el8cp.noarch ceph-mon-16.2.7-126.el8cp.x86_64 ceph-immutable-object-cache-16.2.7-126.el8cp.x86_64 cephfs-mirror-16.2.7-126.el8cp.x86_64 nfs-ganesha-ceph-3.5-1.el8cp.x86_64 libcephsqlite-16.2.7-126.el8cp.x86_64 cephadm-16.2.7-126.el8cp.noarch ceph-mgr-cephadm-16.2.7-126.el8cp.noarch ceph-mgr-modules-core-16.2.7-126.el8cp.noarch ceph-radosgw-16.2.7-126.el8cp.x86_64 ceph-mds-16.2.7-126.el8cp.x86_64 python3-ceph-common-16.2.7-126.el8cp.x86_64 # rpm -qa | grep glibc glibc-minimal-langpack-2.28-189.5.el8_6.x86_64 glibc-common-2.28-189.5.el8_6.x86_64 glibc-2.28-189.5.el8_6.x86_64 37905aa113d2 bmwinfesv000031.advantagedp.org/advantage-ceph-images-rhceph_rhceph-5-rhel8:5-235 26 hours ago Up 26 hours ago ceph-mgr-bmwcefesp000003 --- Additional comment from Karun Josy on 2023-07-02 10:31:47 UTC --- Current status : We have 1 MGR active and 1 MGR on standby. MGR03 ---> bmwcefesp000003 is currently running but still missing in #ceph -s The debug logs and coredump are from this node MGR03. We decided not to restart the service and keep it as it is for the time being for further debugging if required. And asked the customer to monitor the cluster continuously to make sure we have at least 1 standby always. We will need to restart MGR03 as well eventually. Please confirm if we have all the required logs before we go ahead. ############### Note : We collected proc/stack and status from a working standby MGR for comparison. Adding it here, hope it helps output from node 2 ( standby mgr MGR02 ) --------------------------------------- $ sudo ps aux | grep mgr root 3174559 0.0 0.0 70152 2476 ? Ss 04:16 0:00 /usr/bin/conmon --api-version 1 -c b2bc3b7d9a72a7a0734c6b9b2b22048ecd96193d914a457dcc1e9b83ddf1cf58 -u b2bc3b7d9a72a7a0734c6b9b2b22048ecd96193d914a457dcc1e9b83ddf1cf58 -r /usr/bin/runc -b /var/lib/containers/storage/overlay-containers/b2bc3b7d9a72a7a0734c6b9b2b22048ecd96193d914a457dcc1e9b83ddf1cf58/userdata -p /run/containers/storage/overlay-containers/b2bc3b7d9a72a7a0734c6b9b2b22048ecd96193d914a457dcc1e9b83ddf1cf58/userdata/pidfile -n ceph-mgr-bmwcefesp000002 --exit-dir /run/libpod/exits --full-attach -s -l journald --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/b2bc3b7d9a72a7a0734c6b9b2b22048ecd96193d914a457dcc1e9b83ddf1cf58/userdata/oci-log --conmon-pidfile //run/ceph-mgr --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --network-config-dir --exit-command-arg --exit-command-arg --network-backend --exit-command-arg cni --exit-command-arg --volumepath --exit-command-arg /var/lib/containers/storage/volumes --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg b2bc3b7d9a72a7a0734c6b9b2b22048ecd96193d914a457dcc1e9b83ddf1cf58 167 3174634 1.5 0.1 5305704 434428 ? Sl 04:16 1:45 /usr/bin/ceph-mgr --cluster ceph --setuser ceph --setgroup ceph --default-log-to-stderr=true --err-to-stderr=true --default-log-to-file=false --foreground -i bmwcefesp000002 ansauto+ 3342669 0.0 0.0 221940 1196 pts/3 S+ 06:12 0:00 grep --color=auto mgr $ sudo cat /proc/3174559/stack [<0>] do_sys_poll+0x4b8/0x570 [<0>] __x64_sys_poll+0x37/0x130 [<0>] do_syscall_64+0x5b/0x1b0 [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 $ sudo cat /proc/3174559/status Name: conmon Umask: 0022 State: S (sleeping) Tgid: 3174559 Ngid: 0 Pid: 3174559 PPid: 1 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: NStgid: 3174559 NSpid: 3174559 NSpgid: 3174559 NSsid: 3174559 VmPeak: 70152 kB VmSize: 70152 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 2476 kB VmRSS: 2476 kB RssAnon: 748 kB RssFile: 1728 kB RssShmem: 0 kB VmData: 660 kB VmStk: 132 kB VmExe: 80 kB VmLib: 13712 kB VmPTE: 160 kB VmSwap: 0 kB HugetlbPages: 0 kB CoreDumping: 0 THP_enabled: 1 Threads: 1 SigQ: 0/1545256 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000010200 SigIgn: 0000000000001000 SigCgt: 0000000180004006 CapInh: 0000000000000000 CapPrm: 000001ffffffffff CapEff: 000001ffffffffff CapBnd: 000001ffffffffff CapAmb: 0000000000000000 NoNewPrivs: 0 Seccomp: 0 Speculation_Store_Bypass: thread vulnerable Cpus_allowed: ffffffff Cpus_allowed_list: 0-31 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003 Mems_allowed_list: 0-1 voluntary_ctxt_switches: 1725 nonvoluntary_ctxt_switches: 1 [ansautomat@bmwcefesp000002 ~]$ sudo cat /proc/3174559/stack [<0>] do_sys_poll+0x4b8/0x570 [<0>] __x64_sys_poll+0x37/0x130 [<0>] do_syscall_64+0x5b/0x1b0 [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ansautomat@bmwcefesp000002 ~]$ [ansautomat@bmwcefesp000002 ~]$ [ansautomat@bmwcefesp000002 ~]$ sudo cat /proc/3174559/stack [<0>] do_sys_poll+0x4b8/0x570 [<0>] __x64_sys_poll+0x37/0x130 [<0>] do_syscall_64+0x5b/0x1b0 [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 --- Additional comment from Vikhyat Umrao on 2023-07-02 14:50:25 UTC --- From Slack: Ranjith Rajaram 6:58 AM Currently they are running with two managers(one active and one standby). The other one is in hung state(003). We have asked the customer to keep monitoring the ceph status. If they find any of the existing mgr not listed in the output, we have asked them to restart. 003 is reserved for data collection (edited) 6:59 we have performed two mgr restart so far pdonnell 7:08 AM my initial analysis is that it appears the mgr is stuck on the DaemonServer:;lock 3 replies Last reply today at 7:49 AMView thread pdonnell 7:08 AM which is acquired in some question places like DaemonServer::ms_handle_authentication 7:08 that appears to have completely blocked the messenger thread reading messages on the wire :+1::skin-tone-2: 1 7:09 which is why it stopped doing anything --- Additional comment from Vikhyat Umrao on 2023-07-02 14:52:08 UTC --- This one might be duplicate of - https://bugzilla.redhat.com/show_bug.cgi?id=2106031 and Radek is actively looking into it. --- Additional comment from Radoslaw Zarzynski on 2023-07-02 15:27:53 UTC --- Hi Karun! Thanks for the mgr03 log. Let's take a look. This instance has booted up on yesterday. ``` [rzarzyns@supportshell-1 ~]$ less /cases/03544845/0760-mgr03_journal_20230702_06:19:36.txt Jul 01 19:06:45 bmwcefesp000003.advancedp.org ceph-mgr-bmwcefesp000003[768841]: 2023-07-01T19:06:45.827+0000 7fef0db2a500 0 ceph version 16.2.7-126.el8cp (fe0af61d104d48cb9d116cde6e593b5fc8c197e4) pacific (stable), process ceph-mgr, pid 55 ``` The very last 2 mins of the log is flooded *solely* with entries from `_check_auth_rotating()`. ``` Jul 02 06:18:46 bmwcefesp000003.advancedp.org ceph-mgr-bmwcefesp000003[768841]: 2023-07-02T06:18:45.999+0000 7feefe18b700 5 --2- 192.78.200.27:0/55 >> [v2:192.78.200.26:3300/0,v1:192.78.200.26:6789/0] conn(0x56 1104c79000 0x5610fd6f3e00 secure :-1 s=THROTTLE_DONE pgs=43442 cs=0 l=1 rev1=1 rx=0x5611042fd050 tx=0x5611046280c0).handle_message got 24 + 0 + 0 byte message. envelope type=54 src mon.1 off 0 Jul 02 06:18:46 bmwcefesp000003.advancedp.org ceph-mgr-bmwcefesp000003[768841]: 2023-07-02T06:18:46.844+0000 7feef9181700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-02T05:18:46.845689+0000) ... Jul 02 06:20:06 bmwcefesp000003.advancedp.org ceph-mgr-bmwcefesp000003[768841]: 2023-07-02T06:20:06.862+0000 7feef9181700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-02T05:20:06.862975+0000) Jul 02 06:20:07 bmwcefesp000003.advancedp.org ceph-mgr-bmwcefesp000003[768841]: 2023-07-02T06:20:07.862+0000 7feef9181700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-02T05:20:07.863107+0000) Jul 02 06:20:08 bmwcefesp000003.advancedp.org ceph-mgr-bmwcefesp000003[768841]: 2023-07-02T06:20:08.862+0000 7feef9181700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-02T05:20:08.863246+0000) Jul 02 06:20:09 bmwcefesp000003.advancedp.org ceph-mgr-bmwcefesp000003[768841]: 2023-07-02T06:20:09.862+0000 7feef9181700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-02T05:20:09.863376+0000) Jul 02 06:20:10 bmwcefesp000003.advancedp.org ceph-mgr-bmwcefesp000003[768841]: 2023-07-02T06:20:10.862+0000 7feef9181700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-02T05:20:10.863496+0000) ``` These *lderr* entries are generated by the following code: ``` ldout(cct, 10) << "_check_auth_rotating renewing rotating keys (they expired before " << cutoff << ")" << dendl; if (!rotating_secrets->need_new_secrets() && rotating_secrets->need_new_secrets(issued_at_lower_bound)) { // the key has expired before it has been issued? lderr(cct) << __func__ << " possible clock skew, rotating keys expired way too early" << " (before " << issued_at_lower_bound << ")" << dendl; } ``` We know this instance ceph-mgr is running with `debug_monc=20/20`, therefore I would expect to see: 1. the `ldout(10)` entry just before the `if`; 2. the unconditional `ldout(10)` from `MonClient::tick()` or `MonClient::_finish_auth()` as they are the sole callers of `_check_auth_rotating()`: ``` int MonClient::_check_auth_tickets() { ceph_assert(ceph_mutex_is_locked(monc_lock)); if (active_con && auth) { // ... _check_auth_rotating(); } return 0; } void MonClient::_finish_auth(int auth_err) { ldout(cct,10) << __func__ << " " << auth_err << dendl; // ... if (!auth_err && active_con) { ceph_assert(auth); _check_auth_tickets(); } ``` ``` void MonClient::tick() { ldout(cct, 10) << __func__ << dendl; utime_t now = ceph_clock_now(); auto reschedule_tick = make_scope_guard([this] { schedule_tick(); }); _check_auth_tickets(); ``` To exemplify the expected content let me show entries just couple of seconds older: ``` Jul 02 06:18:42 bmwcefesp000003.advancedp.org ceph-mgr-bmwcefesp000003[768841]: 2023-07-02T06:18:42.591+0000 7fedb55fe700 10 monclient: tick Jul 02 06:18:42 bmwcefesp000003.advancedp.org ceph-mgr-bmwcefesp000003[768841]: 2023-07-02T06:18:42.591+0000 7fedb55fe700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-07-02T06 :18:12.592768+0000) ``` Is a 2 min glitch in logging something you Guys ever encountered? May we have a fresh log from the mgr03? Regards, Radek --- Additional comment from Ranjith Rajaram on 2023-07-02 15:38:00 UTC --- Quick question(posting it to the BZ). Please nack if you do not think it is related We have a hung ceph manager, it may be hung due to a different reason We took the coredump of the hung ceph mgr. The hung ceph manager seems to be trying to delete an entry from the _deleting before going into a hung state Path '/volumes/_deleting/4a00a3ea-c792-4b29-9746-fb840f5a790b/0d13f4b6-fa25-4461-9356-d149d9634676/kpi_orchestration_11/write-dag-run-config/2023-06-12T12:03:02.937843+00:00') (gdb) bt 77 #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7fedc060eb88) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x5611073bd568, cond=0x7fedc060eb60) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=0x7fedc060eb60, mutex=0x5611073bd568) at pthread_cond_wait.c:655 #3 0x00007fef027bc8f0 in ?? () at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:118 from /lib64/libstdc++.so.6 #4 0x00007fedc060eb30 in ?? () #5 0x00007feef1e9b5ab in std::condition_variable::wait<Client::make_request(MetaRequest*, const UserPerm&, InodeRef*, bool*, mds_rank_t, ceph::bufferlist*)::<lambda()> > (__p=..., __lock=..., this=0x7fedc060eb60) at /usr/src/debug/ceph-16.2.7-126.el8cp.x86_64/src/log/Entry.h:35 #6 Client::make_request (this=0x5611073bd400, request=0x561104b65200, perms=..., ptarget=0x0, pcreated=0x0, use_mds=<optimized out>, pdirbl=0x0) at /usr/src/debug/ceph-16.2.7-126.el8cp.x86_64/src/client/Client.cc:1867 #7 0x00007feef1ebbea6 in Client::_rmdir (this=0x5611073bd400, dir=0x56110b85b000, name=<optimized out>, perms=...) at /usr/src/debug/ceph-16.2.7-126.el8cp.x86_64/src/client/Client.cc:13637 #8 0x00007feef1ebc705 in Client::unlinkat (this=0x5611073bd400, dirfd=-100, relpath=<optimized out>, flags=<optimized out>, perm=...) at /usr/src/debug/ceph-16.2.7-126.el8cp.x86_64/src/client/Client.cc:7197 #9 0x00007feef2274a87 in __pyx_pf_6cephfs_9LibCephFS_94rmdir ( __pyx_v_path=b'/volumes/_deleting/4a00a3ea-c792-4b29-9746-fb840f5a790b/0d13f4b6-fa25-4461-9356-d149d9634676/kpi_orchestration_11/write-dag-run-config/2023-06-12T12:03:02.937843+00:00', __pyx_v_self=0x7fee30464160) at /usr/src/debug/ceph-16.2.7-126.el8cp.x86_64/build/src/pybind/cephfs/cephfs.c:19171 #10 __pyx_pw_6cephfs_9LibCephFS_95rmdir (__pyx_v_self=<cephfs.LibCephFS at remote 0x7fee30464160>, __pyx_v_path=b'/volumes/_deleting/4a00a3ea-c792-4b29-9746-fb840f5a790b/0d13f4b6-fa25-4461-9356-d149d9634676/kpi_orchestration_11/write-dag-run-config/2023-06-12T12:03:02.937843+00:00') at /usr/src/debug/ceph-16.2.7-126.el8cp.x86_64/build/src/pybind/cephfs/cephfs.c:19059 #11 0x00007fef0d56c05d in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at /usr/src/debug/python3-3.6.8-51.el8.x86_64/Python/ceval.c:1303 We still have the following two subvolumes in the _deleting drwxrwxrwx. 3 root root 2 Aug 8 2022 4a00a3ea-c792-4b29-9746-fb840f5a790b drwxrwxrwx. 3 root root 1 Jun 23 13:32 fcf8112c-d8f4-49a1-a807-1e5222e672cf If the hung manager is holding up any lock and we know it is not proceeding further, will the other active manager will be able to clear up the deleted volume. --- Additional comment from Patrick Donnelly on 2023-07-02 15:47:51 UTC --- The cephfs call stacks are irrelevant. See: Thread 12 (Thread 0x7feefb185700 (LWP 822162)): #0 0x00007fef0339081d in __lll_lock_wait () from target:/lib64/libpthread.so.0 #1 0x00007fef03389ac9 in pthread_mutex_lock () from target:/lib64/libpthread.so.0 #2 0x00005610fa9bd6cf in ClusterState::set_service_map(ServiceMap const&) () #3 0x00005610faa271ea in Mgr::handle_service_map(boost::intrusive_ptr<MServiceMap>) () #4 0x00005610faa29ec4 in Mgr::ms_dispatch2(boost::intrusive_ptr<Message> const&) () #5 0x00005610faa33e95 in MgrStandby::ms_dispatch2(boost::intrusive_ptr<Message> const&) () #6 0x00007fef047a92aa in DispatchQueue::entry() () from target:/usr/lib64/ceph/libceph-common.so.2 #7 0x00007fef0485af91 in DispatchQueue::DispatchThread::entry() () from target:/usr/lib64/ceph/libceph-common.so.2 #8 0x00007fef033871cf in start_thread () from target:/lib64/libpthread.so.0 #9 0x00007fef01ddadd3 in clone () from target:/lib64/libc.so.6 has DaemonServeR::lock; wants ClusterState::lock Thread 22 (Thread 0x7fedd2e39700 (LWP 834730)): #0 0x00007fef0338d838 in pthread_cond_timedwait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 #1 0x00007fef0d4c7edc in take_gil () from target:/lib64/libpython3.6m.so.1.0 #2 0x00007fef0d4c80fd in PyEval_RestoreThread () from target:/lib64/libpython3.6m.so.1.0 #3 0x00005610faa1e993 in with_gil_t::with_gil_t(without_gil_t&) () #4 0x00005610fa976226 in ActivePyModules::get_python(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () Probably has ClusterState::lock; wants GIL Thread 3 (Thread 0x7feeff18d700 (LWP 822150)): #0 0x00007fef0339081d in __lll_lock_wait () from target:/lib64/libpthread.so.0 #1 0x00007fef03389ac9 in pthread_mutex_lock () from target:/lib64/libpthread.so.0 #2 0x00005610fa986d17 in std::mutex::lock() () #3 0x00005610fa9dee2c in DaemonServer::ms_handle_authentication(Connection*) () #4 0x00007fef04906e55 in MonClient::handle_auth_request(Connection*, AuthConnectionMeta*, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) () from target:/usr/lib64/ceph/libceph-common.so.2 #5 0x00007fef0489165f in ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) () from target:/usr/lib64/ceph/libceph-common.so.2 #6 0x00007fef0489261e in ProtocolV2::handle_auth_request_more(ceph::buffer::v15_2_0::list&) () from target:/usr/lib64/ceph/libceph-common.so.2 #7 0x00007fef0489b0c3 in ProtocolV2::handle_frame_payload() () from target:/usr/lib64/ceph/libceph-common.so.2 #8 0x00007fef0489b380 in ProtocolV2::handle_read_frame_dispatch() () from target:/usr/lib64/ceph/libceph-common.so.2 #9 0x00007fef0489b575 in ProtocolV2::_handle_read_frame_epilogue_main() () from target:/usr/lib64/ceph/libceph-common.so.2 #10 0x00007fef0489b622 in ProtocolV2::_handle_read_frame_segment() () from target:/usr/lib64/ceph/libceph-common.so.2 #11 0x00007fef0489c781 in ProtocolV2::handle_read_frame_segment(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) () from target:/usr/lib64/ceph/libceph-common.so.2 #12 0x00007fef04884eec in ProtocolV2::run_continuation(Ct<ProtocolV2>&) () from target:/usr/lib64/ceph/libceph-common.so.2 #13 0x00007fef0484d3f9 in AsyncConnection::process() () from target:/usr/lib64/ceph/libceph-common.so.2 #14 0x00007fef048a7507 in EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) () from target:/usr/lib64/ceph/libceph-common.so.2 #15 0x00007fef048ada1c in std::_Function_handler<void (), NetworkStack::add_thread(unsigned int)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from target:/usr/lib64/ceph/libceph-common.so.2 #16 0x00007fef027c2ba3 in execute_native_thread_routine () from target:/lib64/libstdc++.so.6 #17 0x00007fef033871cf in start_thread () from target:/lib64/libpthread.so.0 #18 0x00007fef01ddadd3 in clone () from target:/lib64/libc.so.6 wants DaemonServer::lock; blocking reads on the AsyncConnection center? (No more messages read off the wire) Question is: who has the GIL? --- Additional comment from Greg Farnum on 2023-07-02 15:48:51 UTC --- Ranjith: failed manager gets blocklisted, so it won’t be holding any locks or able to prevent progress by the others. :) --- Additional comment from Patrick Donnelly on 2023-07-02 16:00:39 UTC --- Thread 77 (Thread 0x7fedc0614700 (LWP 834795)): #0 0x00007fef0338d44c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 #1 0x00007fef027bc8f0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from target:/lib64/libstdc++.so.6 #2 0x00007feef1e9b5ab in Client::make_request(MetaRequest*, UserPerm const&, boost::intrusive_ptr<Inode>*, bool*, int, ceph::buffer::v15_2_0::list*) () from target:/lib64/libcephfs.so.2 #3 0x00007feef1ebbea6 in Client::_rmdir(Inode*, char const*, UserPerm const&) () from target:/lib64/libcephfs.so.2 #4 0x00007feef1ebc705 in Client::unlinkat(int, char const*, int, UserPerm const&) () from target:/lib64/libcephfs.so.2 #5 0x00007feef2274a87 in __pyx_pw_6cephfs_9LibCephFS_95rmdir () from target:/lib64/python3.6/site-packages/cephfs.cpython-36m-x86_64-linux-gnu.so has the GIL!! https://github.com/ceph/ceph/blob/c42efbf5874de8454e4c7cb3c22bd41bcc0e71f5/src/pybind/cephfs/cephfs.pyx#L1356 this probably explains soooo many performance problems with the mgr. It's often blocking WITH THE GIL waiting for rmdir response from the mds!!! --- Additional comment from Patrick Donnelly on 2023-07-02 16:02:19 UTC --- (In reply to Patrick Donnelly from comment #18) > Thread 77 (Thread 0x7fedc0614700 (LWP 834795)): > #0 0x00007fef0338d44c in pthread_cond_wait@@GLIBC_2.3.2 () from > target:/lib64/libpthread.so.0 > #1 0x00007fef027bc8f0 in > std::condition_variable::wait(std::unique_lock<std::mutex>&) () from > target:/lib64/libstdc++.so.6 > #2 0x00007feef1e9b5ab in Client::make_request(MetaRequest*, UserPerm > const&, boost::intrusive_ptr<Inode>*, bool*, int, > ceph::buffer::v15_2_0::list*) () from target:/lib64/libcephfs.so.2 > #3 0x00007feef1ebbea6 in Client::_rmdir(Inode*, char const*, UserPerm > const&) () from target:/lib64/libcephfs.so.2 > #4 0x00007feef1ebc705 in Client::unlinkat(int, char const*, int, UserPerm > const&) () from target:/lib64/libcephfs.so.2 > #5 0x00007feef2274a87 in __pyx_pw_6cephfs_9LibCephFS_95rmdir () from > target:/lib64/python3.6/site-packages/cephfs.cpython-36m-x86_64-linux-gnu.so > > > has the GIL!! > > https://github.com/ceph/ceph/blob/c42efbf5874de8454e4c7cb3c22bd41bcc0e71f5/ > src/pybind/cephfs/cephfs.pyx#L1356 > > this probably explains soooo many performance problems with the mgr. It's > often blocking WITH THE GIL waiting for rmdir response from the mds!!! https://tracker.ceph.com/issues/61869 --- Additional comment from on 2023-07-02 16:34:23 UTC --- Fresh logs from the hung mgr process are available in supportshell at: 0850-ceph-mgr.bmwcefesp000003.log --- Additional comment from Brad Hubbard on 2023-07-02 23:13:08 UTC --- Jumped the gun on POST, rolling back. --- Additional comment from Kentaro Nakai on 2023-07-02 23:43:51 UTC --- /// - Enabled the customer escalation flag to YES along with severity /priority set as urgent with the escalation raised as ACE EN-63511 & Watch-list Entry 7727 ///
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:4473