Description of problem: NFS server is getting crashed and moving to error state. We are using Cephfs as backend storage Steps Followed: This we have encountered when we ran our automation script. It is failing while running for n in {1..20}; do dd if=/dev/urandom of=/mnt/nfs_VA6JX/volumes/_nogroup/subvolume2/6c7638e9-bb94-49a2-a832-703807b264e1/file$(printf %03d $n) bs=500k count=1000; done NFS logs : Automation script logs : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-S0SH7I/cephfs_nfs_snapshot_clone_operations_0.log NFS server logs : http://magna002.ceph.redhat.com/ceph-qe-logs/amar/BZ_NFS_logs.txt [root@ceph-amk-test-o56vqd-node6 edf01e48-21a9-11ee-becc-fa163e6d5609]# podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 41a4a13349a8 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -n osd.2 -f --set... About an hour ago Up About an hour ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-osd-2 8c8a2ae00d8d registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -n osd.8 -f --set... About an hour ago Up About an hour ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-osd-8 2e4f25e306fb registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -n osd.11 -f --se... About an hour ago Up About an hour ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-osd-11 ea356097720c registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -n osd.5 -f --set... About an hour ago Up About an hour ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-osd-5 e4bc52d0a37d registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -n mds.cephfs.cep... 59 minutes ago Up 59 minutes ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-mds-cephfs-ceph-amk-test-o56vqd-node6-ibqcai 272e4942bc8d registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -F -L STDERR -N N... 2 seconds ago Up 2 seconds ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-nfs-cephfs-nfs-0-0-ceph-amk-test-o56vqd-node6-uppiec [root@ceph-amk-test-o56vqd-node6 edf01e48-21a9-11ee-becc-fa163e6d5609]# podman logs 272e4942bc8d [root@ceph-amk-test-o56vqd-node6 edf01e48-21a9-11ee-becc-fa163e6d5609]# podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 41a4a13349a8 registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -n osd.2 -f --set... About an hour ago Up About an hour ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-osd-2 8c8a2ae00d8d registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -n osd.8 -f --set... About an hour ago Up About an hour ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-osd-8 2e4f25e306fb registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -n osd.11 -f --se... About an hour ago Up About an hour ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-osd-11 ea356097720c registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -n osd.5 -f --set... About an hour ago Up About an hour ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-osd-5 e4bc52d0a37d registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:cf8710ef94bf3dcb65b998f90ce0c0ecf80bee8541fe6034f95d252500046cfd -n mds.cephfs.cep... About an hour ago Up About an hour ceph-edf01e48-21a9-11ee-becc-fa163e6d5609-mds-cephfs-ceph-amk-test-o56vqd-node6-ibqcai [root@ceph-amk-test-o56vqd-node9 ~]# ceph orch ps NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mds.cephfs.ceph-amk-test-o56vqd-node3.bjjbwo ceph-amk-test-o56vqd-node3 running (63m) 3m ago 18h 17.4M - 17.2.6-96.el9cp 85cb9476225e 870a32dca594 mds.cephfs.ceph-amk-test-o56vqd-node4.xcfydx ceph-amk-test-o56vqd-node4 running (63m) 3m ago 18h 65.8M - 17.2.6-96.el9cp 85cb9476225e 618abd182541 mds.cephfs.ceph-amk-test-o56vqd-node5.asejol ceph-amk-test-o56vqd-node5 running (63m) 3m ago 18h 89.8M - 17.2.6-96.el9cp 85cb9476225e bc88ffc160c2 mds.cephfs.ceph-amk-test-o56vqd-node6.ibqcai ceph-amk-test-o56vqd-node6 running (63m) 9m ago 18h 69.7M - 17.2.6-96.el9cp 85cb9476225e e4bc52d0a37d mds.cephfs.ceph-amk-test-o56vqd-node7.mumwju ceph-amk-test-o56vqd-node7 running (63m) 6m ago 18h 70.3M - 17.2.6-96.el9cp 85cb9476225e b6251c0a6620 mgr.ceph-amk-test-o56vqd-node1-installer.hcqtsu ceph-amk-test-o56vqd-node1-installer *:9283 running (18h) 6m ago 18h 408M - 17.2.6-96.el9cp 85cb9476225e c9e2cb37570e mgr.ceph-amk-test-o56vqd-node2.bldjxz ceph-amk-test-o56vqd-node2 *:8443 running (18h) 6m ago 18h 500M - 17.2.6-96.el9cp 85cb9476225e 41358a1086f1 mon.ceph-amk-test-o56vqd-node1-installer ceph-amk-test-o56vqd-node1-installer running (63m) 6m ago 18h 109M 2048M 17.2.6-96.el9cp 85cb9476225e f458e2e97bed mon.ceph-amk-test-o56vqd-node2 ceph-amk-test-o56vqd-node2 running (63m) 6m ago 18h 100M 2048M 17.2.6-96.el9cp 85cb9476225e c2c25a66ac31 mon.ceph-amk-test-o56vqd-node3 ceph-amk-test-o56vqd-node3 running (63m) 3m ago 18h 104M 2048M 17.2.6-96.el9cp 85cb9476225e 8580c3977a8e nfs.cephfs-nfs.0.0.ceph-amk-test-o56vqd-node6.uppiec ceph-amk-test-o56vqd-node6 *:2049 running (9m) 9m ago 72m 15.0M - 5.1 85cb9476225e 53c4ca2ce000 osd.0 ceph-amk-test-o56vqd-node5 running (70m) 3m ago 18h 492M 4096M 17.2.6-96.el9cp 85cb9476225e 875d3b758333 osd.1 ceph-amk-test-o56vqd-node4 running (72m) 3m ago 18h 470M 4096M 17.2.6-96.el9cp 85cb9476225e 86984e2016f1 osd.2 ceph-amk-test-o56vqd-node6 running (67m) 9m ago 18h 446M 4096M 17.2.6-96.el9cp 85cb9476225e 41a4a13349a8 osd.3 ceph-amk-test-o56vqd-node5 running (69m) 3m ago 18h 459M 4096M 17.2.6-96.el9cp 85cb9476225e f330e2de7e64 osd.4 ceph-amk-test-o56vqd-node4 running (71m) 3m ago 18h 524M 4096M 17.2.6-96.el9cp 85cb9476225e 946f3fdee22e osd.5 ceph-amk-test-o56vqd-node6 running (67m) 9m ago 18h 516M 4096M 17.2.6-96.el9cp 85cb9476225e ea356097720c osd.6 ceph-amk-test-o56vqd-node5 running (69m) 3m ago 18h 499M 4096M 17.2.6-96.el9cp 85cb9476225e e98cc06ac9c3 osd.7 ceph-amk-test-o56vqd-node4 running (71m) 3m ago 18h 480M 4096M 17.2.6-96.el9cp 85cb9476225e 01031abcea23 osd.8 ceph-amk-test-o56vqd-node6 running (67m) 9m ago 18h 421M 4096M 17.2.6-96.el9cp 85cb9476225e 8c8a2ae00d8d osd.9 ceph-amk-test-o56vqd-node5 running (69m) 3m ago 18h 415M 4096M 17.2.6-96.el9cp 85cb9476225e eb569f668458 osd.10 ceph-amk-test-o56vqd-node4 running (72m) 3m ago 18h 446M 4096M 17.2.6-96.el9cp 85cb9476225e 19513209737c osd.11 ceph-amk-test-o56vqd-node6 running (67m) 9m ago 18h 385M 4096M 17.2.6-96.el9cp 85cb9476225e 2e4f25e306fb [root@ceph-amk-test-o56vqd-node9 ~]# ceph orch ps NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mds.cephfs.ceph-amk-test-o56vqd-node3.bjjbwo ceph-amk-test-o56vqd-node3 running (65m) 4m ago 18h 17.4M - 17.2.6-96.el9cp 85cb9476225e 870a32dca594 mds.cephfs.ceph-amk-test-o56vqd-node4.xcfydx ceph-amk-test-o56vqd-node4 running (64m) 4m ago 18h 65.8M - 17.2.6-96.el9cp 85cb9476225e 618abd182541 mds.cephfs.ceph-amk-test-o56vqd-node5.asejol ceph-amk-test-o56vqd-node5 running (64m) 4m ago 18h 89.8M - 17.2.6-96.el9cp 85cb9476225e bc88ffc160c2 mds.cephfs.ceph-amk-test-o56vqd-node6.ibqcai ceph-amk-test-o56vqd-node6 running (64m) 15s ago 18h 75.3M - 17.2.6-96.el9cp 85cb9476225e e4bc52d0a37d mds.cephfs.ceph-amk-test-o56vqd-node7.mumwju ceph-amk-test-o56vqd-node7 running (64m) 7m ago 18h 70.3M - 17.2.6-96.el9cp 85cb9476225e b6251c0a6620 mgr.ceph-amk-test-o56vqd-node1-installer.hcqtsu ceph-amk-test-o56vqd-node1-installer *:9283 running (18h) 7m ago 18h 408M - 17.2.6-96.el9cp 85cb9476225e c9e2cb37570e mgr.ceph-amk-test-o56vqd-node2.bldjxz ceph-amk-test-o56vqd-node2 *:8443 running (18h) 7m ago 18h 500M - 17.2.6-96.el9cp 85cb9476225e 41358a1086f1 mon.ceph-amk-test-o56vqd-node1-installer ceph-amk-test-o56vqd-node1-installer running (65m) 7m ago 18h 109M 2048M 17.2.6-96.el9cp 85cb9476225e f458e2e97bed mon.ceph-amk-test-o56vqd-node2 ceph-amk-test-o56vqd-node2 running (65m) 7m ago 18h 100M 2048M 17.2.6-96.el9cp 85cb9476225e c2c25a66ac31 mon.ceph-amk-test-o56vqd-node3 ceph-amk-test-o56vqd-node3 running (65m) 4m ago 18h 104M 2048M 17.2.6-96.el9cp 85cb9476225e 8580c3977a8e nfs.cephfs-nfs.0.0.ceph-amk-test-o56vqd-node6.uppiec ceph-amk-test-o56vqd-node6 *:2049 error 15s ago 74m - - <unknown> <unknown> <unknown> osd.0 ceph-amk-test-o56vqd-node5 running (71m) 4m ago 18h 492M 4096M 17.2.6-96.el9cp 85cb9476225e 875d3b758333 osd.1 ceph-amk-test-o56vqd-node4 running (73m) 4m ago 18h 470M 4096M 17.2.6-96.el9cp 85cb9476225e 86984e2016f1 osd.2 ceph-amk-test-o56vqd-node6 running (69m) 15s ago 18h 486M 4096M 17.2.6-96.el9cp 85cb9476225e 41a4a13349a8 osd.3 ceph-amk-test-o56vqd-node5 running (70m) 4m ago 18h 459M 4096M 17.2.6-96.el9cp 85cb9476225e f330e2de7e64 osd.4 ceph-amk-test-o56vqd-node4 running (72m) 4m ago 18h 524M 4096M 17.2.6-96.el9cp 85cb9476225e 946f3fdee22e osd.5 ceph-amk-test-o56vqd-node6 running (68m) 15s ago 18h 540M 4096M 17.2.6-96.el9cp 85cb9476225e ea356097720c osd.6 ceph-amk-test-o56vqd-node5 running (70m) 4m ago 18h 499M 4096M 17.2.6-96.el9cp 85cb9476225e e98cc06ac9c3 osd.7 ceph-amk-test-o56vqd-node4 running (73m) 4m ago 18h 480M 4096M 17.2.6-96.el9cp 85cb9476225e 01031abcea23 osd.8 ceph-amk-test-o56vqd-node6 running (68m) 15s ago 18h 436M 4096M 17.2.6-96.el9cp 85cb9476225e 8c8a2ae00d8d osd.9 ceph-amk-test-o56vqd-node5 running (71m) 4m ago 18h 415M 4096M 17.2.6-96.el9cp 85cb9476225e eb569f668458 osd.10 ceph-amk-test-o56vqd-node4 running (73m) 4m ago 18h 446M 4096M 17.2.6-96.el9cp 85cb9476225e 19513209737c osd.11 ceph-amk-test-o56vqd-node6 running (68m) 15s ago 18h 406M 4096M 17.2.6-96.el9cp 85cb9476225e 2e4f25e306fb [root@ceph-amk-test-o56vqd-node9 ~]# Version-Release number of selected component (if applicable): How reproducible: 1/1 Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Missed the window for 6.1 z1. Retargeting to 6.1 z2.
What version of Ganesha is running? Do you have a stack back trace from the crash?
I'm going to assume it's the known crash fixed in V5.2. We should have the latest version (V5.4) available soon.
@amk - Can you retest in the latest version and update the BZ? I see a new build is available with nfs-ganesha v5.5(https://bugzilla.redhat.com/show_bug.cgi?id=2232674#c4)
Hi Hemanth On latest build we are not hitting the issue NFS version 5.5 Logs : http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-OOSH4J/ [root@ceph-nfs-reg-oosh4j-node8 ~]# ceph orch ps NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mds.cephfs.ceph-nfs-reg-oosh4j-node3.yswqxh ceph-nfs-reg-oosh4j-node3 running (21m) - 21m 17.9M - 18.2.0-17.el9cp 468120a40641 a059459d1478 mds.cephfs.ceph-nfs-reg-oosh4j-node4.fccokk ceph-nfs-reg-oosh4j-node4 running (21m) - 21m 64.4M - 18.2.0-17.el9cp 468120a40641 e7ac0645cd36 mds.cephfs.ceph-nfs-reg-oosh4j-node5.rkuxuc ceph-nfs-reg-oosh4j-node5 running (21m) - 21m 18.3M - 18.2.0-17.el9cp 468120a40641 5da8bfb9a4bd mds.cephfs.ceph-nfs-reg-oosh4j-node6.lopvxu ceph-nfs-reg-oosh4j-node6 running (21m) - 21m 16.5M - 18.2.0-17.el9cp 468120a40641 ac15caa841dd mds.cephfs.ceph-nfs-reg-oosh4j-node7.cxfpyi ceph-nfs-reg-oosh4j-node7 running (21m) - 21m 61.8M - 18.2.0-17.el9cp 468120a40641 eae04546b0ef mgr.ceph-nfs-reg-oosh4j-node1-installer.qsermm ceph-nfs-reg-oosh4j-node1-installer *:9283,8765 running (29m) - 29m 942M - 18.2.0-17.el9cp 468120a40641 4fa9fe22c3bd mgr.ceph-nfs-reg-oosh4j-node2.pzxfsr ceph-nfs-reg-oosh4j-node2 *:8443,8765 running (27m) - 27m 433M - 18.2.0-17.el9cp 468120a40641 cda90e9dfd6d mon.ceph-nfs-reg-oosh4j-node1-installer ceph-nfs-reg-oosh4j-node1-installer running (29m) - 29m 52.3M 2048M 18.2.0-17.el9cp 468120a40641 04a9c3f696ff mon.ceph-nfs-reg-oosh4j-node2 ceph-nfs-reg-oosh4j-node2 running (25m) - 25m 47.9M 2048M 18.2.0-17.el9cp 468120a40641 86f5940d2962 mon.ceph-nfs-reg-oosh4j-node3 ceph-nfs-reg-oosh4j-node3 running (25m) - 25m 46.7M 2048M 18.2.0-17.el9cp 468120a40641 b5463561eb58 nfs.cephfs-nfs.0.0.ceph-nfs-reg-oosh4j-node6.hfuigq ceph-nfs-reg-oosh4j-node6 *:2049 running (13m) - 13m 845M - 5.5 468120a40641 3956df735221 osd.0 ceph-nfs-reg-oosh4j-node4 running (21m) - 21m 191M 4096M 18.2.0-17.el9cp 468120a40641 0af25e2cc84e osd.1 ceph-nfs-reg-oosh4j-node6 running (21m) - 21m 195M 4096M 18.2.0-17.el9cp 468120a40641 c42b26410fa4 osd.2 ceph-nfs-reg-oosh4j-node5 running (21m) - 21m 249M 4096M 18.2.0-17.el9cp 468120a40641 bcc99a3137de osd.3 ceph-nfs-reg-oosh4j-node4 running (21m) - 21m 274M 4096M 18.2.0-17.el9cp 468120a40641 7b2bf51933ff osd.4 ceph-nfs-reg-oosh4j-node6 running (21m) - 21m 234M 4096M 18.2.0-17.el9cp 468120a40641 cb05d9df3d7d osd.5 ceph-nfs-reg-oosh4j-node5 running (21m) - 21m 255M 4096M 18.2.0-17.el9cp 468120a40641 fd04084d8b7d osd.6 ceph-nfs-reg-oosh4j-node6 running (21m) - 21m 220M 4096M 18.2.0-17.el9cp 468120a40641 1c3d0be85a4c osd.7 ceph-nfs-reg-oosh4j-node4 running (21m) - 21m 290M 4096M 18.2.0-17.el9cp 468120a40641 762f16cb0347 osd.8 ceph-nfs-reg-oosh4j-node5 running (21m) - 21m 217M 4096M 18.2.0-17.el9cp 468120a40641 30008d3daa8e osd.9 ceph-nfs-reg-oosh4j-node6 running (21m) - 21m 174M 4096M 18.2.0-17.el9cp 468120a40641 1d287b0b11ec osd.10 ceph-nfs-reg-oosh4j-node4 running (21m) - 21m 198M 4096M 18.2.0-17.el9cp 468120a40641 ecac226d6a48 osd.11 ceph-nfs-reg-oosh4j-node5 running (21m) - 21m 236M 4096M 18.2.0-17.el9cp 468120a40641 3217c23755fd Regards, Amarnath
Hi Amarnath, can you mark the BZ as verified based on your last test result?
As a bug introduced by the async/nonblocking work, I don't think this requires doc text. Please advise on how to proceed.
I think that takes care of the needinfo.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:7780