Description of problem (please be detailed as possible and provide log snippests): CephNFS is not reaching to ready state on IBM Power and NFS Ganesha server is also not Running. Version of all relevant components (if applicable): OCP Version : 4.11.0-0.nightly-ppc64le-2022-05-23-232055 ODF version: 4.11.0-80 ceph version 16.2.8-5.el8cp (0974c9ff5a69f17f3843a7c1a568daa2b4559e2d) pacific (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy ODF 4.11 on OCP4.11 on IBM Power 2. Patch the storagecluster to enable nfs feature, using following command: oc patch -n openshift-storage storageclusters.ocs.openshift.io ocs-storagecluster --patch '{"spec": {"nfs":{"enable": true}}}' --type merge 3.Wait for `rook-ceph-nfs-ocs-storagecluster-cephnfs-a-*` pod to reach Running state. Actual results: `rook-ceph-nfs-ocs-storagecluster-cephnfs-a-*` pod is in CrashLoopBackOff state and CephNFS status shows Failed. Expected results: `rook-ceph-nfs-ocs-storagecluster-cephnfs-a-*` pod should be in Running state and ocs-storagecluster-ceph-nfs Storageclass should be created. Additional info:
[root@rdr-aar411-sao01-bastion-0 ~]# oc get pods |grep nfs rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f 1/2 CrashLoopBackOff 30 (3m15s ago) 132m Described pod: [root@rdr-aar411-sao01-bastion-0 ~]# oc describe pod rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f Name: rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f Namespace: openshift-storage Priority: 1000000000 Priority Class Name: openshift-user-critical Node: sao01-worker-0.rdr-aar411.ibm.com/192.168.0.157 Start Time: Tue, 31 May 2022 05:07:39 -0400 Labels: app=rook-ceph-nfs app.kubernetes.io/component=cephnfses.ceph.rook.io app.kubernetes.io/created-by=rook-ceph-operator app.kubernetes.io/instance=ocs-storagecluster-cephnfs-a app.kubernetes.io/managed-by=rook-ceph-operator app.kubernetes.io/name=ceph-nfs app.kubernetes.io/part-of=ocs-storagecluster-cephnfs ceph_daemon_id=ocs-storagecluster-cephnfs-a ceph_daemon_type=nfs ceph_nfs=ocs-storagecluster-cephnfs instance=a nfs=ocs-storagecluster-cephnfs-a pod-template-hash=d97c557c8 rook.io/operator-namespace=openshift-storage rook_cluster=openshift-storage Annotations: config-hash: 5e0395e0b68b7bbbf8c216b865198a03 k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.129.2.54" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.129.2.54" ], "default": true, "dns": {} }] openshift.io/scc: rook-ceph Status: Running IP: 10.129.2.54 IPs: IP: 10.129.2.54 Controlled By: ReplicaSet/rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8 Init Containers: generate-minimal-ceph-conf: Container ID: cri-o://85040c6564def25f5d93407572e3a99b1141e330ce90130dfbe50d01fe625be6 Image: quay.io/rhceph-dev/rhceph@sha256:abc274cd8cbaaf4abc213c1b7949a2063b63d501329e0f692d93b2e546ae8b91 Image ID: quay.io/rhceph-dev/rhceph@sha256:917ac58f7f5dd3c78c30ff19cab7dbc2d31545a2e7f6f109f2c1bbb6d00a4dd6 Port: <none> Host Port: <none> Command: /bin/bash -c set -xEeuo pipefail cat << EOF > /etc/ceph/ceph.conf [global] mon_host = $(ROOK_CEPH_MON_HOST) [client.nfs-ganesha.ocs-storagecluster-cephnfs.a] keyring = /etc/ceph/keyring-store/keyring EOF chmod 444 /etc/ceph/ceph.conf cat /etc/ceph/ceph.conf State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 31 May 2022 05:07:41 -0400 Finished: Tue, 31 May 2022 05:07:41 -0400 Ready: True Restart Count: 0 Limits: cpu: 3 memory: 8Gi Requests: cpu: 3 memory: 8Gi Environment: ROOK_CEPH_MON_HOST: <set to the key 'mon_host' in secret 'rook-ceph-config'> Optional: false ROOK_CEPH_MON_INITIAL_MEMBERS: <set to the key 'mon_initial_members' in secret 'rook-ceph-config'> Optional: false Mounts: /etc/ceph from etc-ceph (rw) /etc/ceph/keyring-store/ from rook-ceph-nfs-ocs-storagecluster-cephnfs-a-keyring (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6qfwc (ro) Containers: nfs-ganesha: Container ID: cri-o://2562e74073374b4774f2c3d82506549e34b03829a9e698ff575d67c9fcb560dd Image: quay.io/rhceph-dev/rhceph@sha256:abc274cd8cbaaf4abc213c1b7949a2063b63d501329e0f692d93b2e546ae8b91 Image ID: quay.io/rhceph-dev/rhceph@sha256:917ac58f7f5dd3c78c30ff19cab7dbc2d31545a2e7f6f109f2c1bbb6d00a4dd6 Port: <none> Host Port: <none> Command: ganesha.nfsd Args: -F -L STDERR -p /var/run/ganesha/ganesha.pid -N NIV_INFO State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Tue, 31 May 2022 07:16:36 -0400 Finished: Tue, 31 May 2022 07:16:37 -0400 Ready: False Restart Count: 30 Limits: cpu: 3 memory: 8Gi Requests: cpu: 3 memory: 8Gi Environment: CONTAINER_IMAGE: quay.io/rhceph-dev/rhceph@sha256:abc274cd8cbaaf4abc213c1b7949a2063b63d501329e0f692d93b2e546ae8b91 POD_NAME: rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f (v1:metadata.name) POD_NAMESPACE: openshift-storage (v1:metadata.namespace) NODE_NAME: (v1:spec.nodeName) POD_MEMORY_LIMIT: 8589934592 (limits.memory) POD_MEMORY_REQUEST: 8589934592 (requests.memory) POD_CPU_LIMIT: 3 (limits.cpu) POD_CPU_REQUEST: 3 (requests.cpu) ROOK_CEPH_MON_HOST: <set to the key 'mon_host' in secret 'rook-ceph-config'> Optional: false ROOK_CEPH_MON_INITIAL_MEMBERS: <set to the key 'mon_initial_members' in secret 'rook-ceph-config'> Optional: false Mounts: /etc/ceph from etc-ceph (rw) /etc/ceph/keyring-store/ from rook-ceph-nfs-ocs-storagecluster-cephnfs-a-keyring (ro) /etc/ganesha from ganesha-config (rw) /run/dbus from run-dbus (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6qfwc (ro) dbus-daemon: Container ID: cri-o://324a2881fcbda852ef7960ec05f4b5eb49e471d52c76529d1d090c7356a7ed4f Image: quay.io/rhceph-dev/rhceph@sha256:abc274cd8cbaaf4abc213c1b7949a2063b63d501329e0f692d93b2e546ae8b91 Image ID: quay.io/rhceph-dev/rhceph@sha256:917ac58f7f5dd3c78c30ff19cab7dbc2d31545a2e7f6f109f2c1bbb6d00a4dd6 Port: <none> Host Port: <none> Command: dbus-daemon Args: --nofork --system --nopidfile State: Running Started: Tue, 31 May 2022 05:07:43 -0400 Ready: True Restart Count: 0 Limits: cpu: 3 memory: 8Gi Requests: cpu: 3 memory: 8Gi Environment: CONTAINER_IMAGE: quay.io/rhceph-dev/rhceph@sha256:abc274cd8cbaaf4abc213c1b7949a2063b63d501329e0f692d93b2e546ae8b91 POD_NAME: rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f (v1:metadata.name) POD_NAMESPACE: openshift-storage (v1:metadata.namespace) NODE_NAME: (v1:spec.nodeName) POD_MEMORY_LIMIT: 8589934592 (limits.memory) POD_MEMORY_REQUEST: 8589934592 (requests.memory) POD_CPU_LIMIT: 3 (limits.cpu) POD_CPU_REQUEST: 3 (requests.cpu) Mounts: /run/dbus from run-dbus (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6qfwc (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: etc-ceph: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> rook-ceph-nfs-ocs-storagecluster-cephnfs-a-keyring: Type: Secret (a volume populated by a Secret) SecretName: rook-ceph-nfs-ocs-storagecluster-cephnfs-a-keyring Optional: false ganesha-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: rook-ceph-nfs-ocs-storagecluster-cephnfs-a Optional: false run-dbus: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> kube-api-access-6qfwc: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Guaranteed Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 5s node.ocs.openshift.io/storage=true:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 132m default-scheduler Successfully assigned openshift-storage/rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f to sao01-worker-0.rdr-aar411.ibm.com by sao01-master-1.rdr-aar411.ibm.com Normal AddedInterface 132m multus Add eth0 [10.129.2.54/23] from openshift-sdn Normal Pulled 132m kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:abc274cd8cbaaf4abc213c1b7949a2063b63d501329e0f692d93b2e546ae8b91" already present on machine Normal Created 132m kubelet Created container generate-minimal-ceph-conf Normal Started 132m kubelet Started container generate-minimal-ceph-conf Normal Pulled 132m kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:abc274cd8cbaaf4abc213c1b7949a2063b63d501329e0f692d93b2e546ae8b91" already present on machine Normal Created 132m kubelet Created container dbus-daemon Normal Started 132m kubelet Started container dbus-daemon Normal Pulled 132m (x4 over 132m) kubelet Container image "quay.io/rhceph-dev/rhceph@sha256:abc274cd8cbaaf4abc213c1b7949a2063b63d501329e0f692d93b2e546ae8b91" already present on machine Normal Created 132m (x4 over 132m) kubelet Created container nfs-ganesha Normal Started 132m (x4 over 132m) kubelet Started container nfs-ganesha Warning BackOff 2m49s (x602 over 132m) kubelet Back-off restarting failed container logs of the pod: [root@rdr-aar411-sao01-bastion-0 ~]# oc logs pod/rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f -c nfs-ganesha 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] main :MAIN :EVENT :nfs-ganesha Starting: Ganesha Version 3.5 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] init_fds_limit :INODE LRU :INFO :Setting the system-imposed limit on FDs to 1048576. 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] init_server_pkgs :NFS STARTUP :INFO :State lock layer successfully initialized 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] init_server_pkgs :NFS STARTUP :INFO :IP/name cache successfully initialized 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] nfs4_recovery_init :CLIENT ID :INFO :Recovery Backend Init for rados_cluster 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] rados_cluster_init :CLIENT ID :EVENT :Cluster membership check failed: -2 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] main :NFS STARTUP :CRIT :Recovery backend initialization failed! 31/05/2022 09:13:36 : epoch 6295dc40 : rook-ceph-nfs-ocs-storagecluster-cephnfs-a-d97c557c8-sdw4f : nfs-ganesha-1[main] main :NFS STARTUP :FATAL :Fatal errors. Server exiting... Described Cephnfs : [root@rdr-aar411-sao01-bastion-0 ~]# oc describe cephnfs ocs-storagecluster-cephnfs Name: ocs-storagecluster-cephnfs Namespace: openshift-storage Labels: <none> Annotations: <none> API Version: ceph.rook.io/v1 Kind: CephNFS Metadata: Creation Timestamp: 2022-05-31T09:07:30Z Finalizers: cephnfs.ceph.rook.io Generation: 1 Managed Fields: API Version: ceph.rook.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:ownerReferences: .: k:{"uid":"d96624a6-f3d8-41ca-bb02-7301d78abb03"}: f:spec: .: f:rados: f:server: .: f:active: f:placement: .: f:nodeAffinity: .: f:requiredDuringSchedulingIgnoredDuringExecution: .: f:nodeSelectorTerms: f:podAntiAffinity: .: f:requiredDuringSchedulingIgnoredDuringExecution: f:tolerations: f:priorityClassName: f:resources: .: f:limits: .: f:cpu: f:memory: f:requests: .: f:cpu: f:memory: Manager: ocs-operator Operation: Update Time: 2022-05-31T09:07:30Z API Version: ceph.rook.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"cephnfs.ceph.rook.io": Manager: rook Operation: Update Time: 2022-05-31T09:07:30Z API Version: ceph.rook.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:phase: Manager: rook Operation: Update Subresource: status Time: 2022-05-31T09:07:39Z Owner References: API Version: ocs.openshift.io/v1 Block Owner Deletion: true Controller: true Kind: StorageCluster Name: ocs-storagecluster UID: d96624a6-f3d8-41ca-bb02-7301d78abb03 Resource Version: 12370216 UID: 1d21b144-ab15-4d67-81ee-fddb08fb7ace Spec: Rados: Server: Active: 1 Placement: Node Affinity: Required During Scheduling Ignored During Execution: Node Selector Terms: Match Expressions: Key: cluster.ocs.openshift.io/openshift-storage Operator: Exists Pod Anti Affinity: Required During Scheduling Ignored During Execution: Label Selector: Match Expressions: Key: app Operator: In Values: rook-ceph-nfs Topology Key: kubernetes.io/hostname Tolerations: Effect: NoSchedule Key: node.ocs.openshift.io/storage Operator: Equal Value: true Priority Class Name: openshift-user-critical Resources: Limits: Cpu: 3 Memory: 8Gi Requests: Cpu: 3 Memory: 8Gi Status: Phase: Failed Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ReconcileFailed 4s (x14 over 2m29s) rook-ceph-nfs-controller failed to reconcile CephNFS "openshift-storage/ocs-storagecluster-cephnfs". failed to create ceph nfs deployments: failed to update ceph nfs "ocs-storagecluster-cephnfs": failed to add server "a" to database: failed to add "a" to grace db: exit status 1 Rook-ceph-operator logs: 2022-05-31 11:29:24.670365 I | ceph-spec: parsing mon endpoints: a=172.30.245.151:6789,b=172.30.25.76:6789,c=172.30.64.206:6789 2022-05-31 11:29:24.670439 I | ceph-spec: detecting the ceph image version for image quay.io/rhceph-dev/rhceph@sha256:abc274cd8cbaaf4abc213c1b7949a2063b63d501329e0f692d93b2e546ae8b91... 2022-05-31 11:29:29.380569 I | ceph-spec: detected ceph image version: "16.2.8-5 pacific" 2022-05-31 11:29:30.402032 I | ceph-nfs-controller: configuring pool ".nfs" for nfs 2022-05-31 11:29:32.993018 I | ceph-nfs-controller: set pool ".nfs" for the application nfs 2022-05-31 11:29:33.005783 I | ceph-nfs-controller: updating ceph nfs "ocs-storagecluster-cephnfs" 2022-05-31 11:29:33.094323 I | cephclient: getting or creating ceph auth key "client.nfs-ganesha.ocs-storagecluster-cephnfs.a" 2022-05-31 11:29:34.036599 I | ceph-nfs-controller: ceph nfs deployment "rook-ceph-nfs-ocs-storagecluster-cephnfs-a" already exists. updating if needed 2022-05-31 11:29:34.049749 I | op-k8sutil: deployment "rook-ceph-nfs-ocs-storagecluster-cephnfs-a" did not change, nothing to update 2022-05-31 11:29:34.067508 I | ceph-nfs-controller: ceph nfs service already created 2022-05-31 11:29:34.067523 I | ceph-nfs-controller: adding ganesha "a" to grace db 2022-05-31 11:29:34.092006 E | ceph-nfs-controller: failed to reconcile CephNFS "openshift-storage/ocs-storagecluster-cephnfs". failed to create ceph nfs deployments: failed to update ceph nfs "ocs-storagecluster-cephnfs": failed to add server "a" to database: failed to add "a" to grace db: exit status 1
`ganesha-rados-grace` add command is failing https://github.com/rook/rook/blob/f7930455c4af01fdb4962a06af6e4ee49178d6e5/pkg/operator/ceph/nfs/nfs.go#L172-L179
Attaching must-gather logs: https://drive.google.com/file/d/1khBOLMFEcsXRYp3LfuyIgY6hzTCrZpLA/view?usp=sharing
I got SSH access to the cluster and enabled debug mode for the rook operator. I see some additional information about the command Rook is executing that is failing to add the NFS server to the grace database (copied at bottom). Ceph is responding as though the command has invalid usage. The command is not erroring due to an unknown flag. Otherwise, the error message would include something like "ganesha-rados-grace: unrecognized option '--poool'" (from a deliberate attempt to cause the error). This means that the flags themselves are not bad. When I execute the command manually from the operator pod, the return code is 1. The return code is the same if I remove both the --pool and --ns flags. The error code doesn't seem to be useful for identifying where in the ganesha-rados-grace utility the error is occurring. Both "ganesha-rados-grace --pool '.nfs' --ns ocs-storagecluster-cephnfs dump" as well as "ganesha-rados-grace --pool '.nfs' --ns ocs-storagecluster-cephnfs dump ocs-storagecluster-cephnfs.a" result in the same error. This suggests to me that it is not merely the "add" subcommand that is buggy. Adding the --cephconf option to any of the commands I have tried doesn't change the behavior. Based on my debugging so far, I am inclined to think this is a bug with the ganesha-rados-grace utility. This is possibly a build issue limited to ppc architectures, or could generically be any non-x86 architecture. I don't know enough about the ganesha-rados-grace utility or how its built to provide better feedback. ----- 2022-06-06 17:14:38.207620 I | ceph-nfs-controller: adding ganesha "a" to grace db 2022-06-06 17:14:38.207641 D | exec: Running command: ganesha-rados-grace --pool .nfs --ns ocs-storagecluster-cephnfs add ocs-storagecluster-cephnfs.a 2022-06-06 17:14:38.230578 D | exec: Usage: 2022-06-06 17:14:38.230611 D | exec: ganesha-rados-grace [ --userid ceph_user ] [ --cephconf /path/to/ceph.conf ] [ --ns namespace ] [ --oid obj_id ] [ --pool pool_id ] dump|add|start|join|lift|remove|enforce|noenforce|member [ nodeid ... ] 2022-06-06 17:14:38.240908 D | ceph-nfs-controller: nfs "openshift-storage/ocs-storagecluster-cephnfs" status updated to "Failed" 2022-06-06 17:14:38.240958 E | ceph-nfs-controller: failed to reconcile CephNFS "openshift-storage/ocs-storagecluster-cephnfs". failed to create ceph nfs deployments: failed to update ceph nfs "ocs-storagecluster-cephnfs": failed to add server "a" to database: failed to add "a" to grace db: exit status 1
It seems we need https://github.com/nfs-ganesha/nfs-ganesha/commit/3db6bc0cb75fa85ffcebeda1276d195915b84579 to have this working on ppc64le too.
Created attachment 1889712 [details] minimal reproducer for "char c" vs "int c" This is a small C program based on tools/ganesha-rados-grace.c in the NFS-Ganesha sources. On ppc64le getopt_long() returns c=255 instead of c=-1: [root@ibm-p9b-25 ~]# gdb --args ganesha-rados-grace -p .nfs -n ocs-storagecluster-cephnfs add ocs-storagecluster-cephnfs.a GNU gdb (GDB) Red Hat Enterprise Linux 8.2-18.el8 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "ppc64le-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ganesha-rados-grace...Reading symbols from /usr/lib/debug/usr/bin/ganesha-rados-grace-3.5-1.el8cp.ppc64le.debug...done. done. (gdb) b usage Breakpoint 1 at 0x1718: file /usr/include/bits/stdio2.h, line 100. (gdb) r Starting program: /usr/bin/ganesha-rados-grace -p .nfs -n ocs-storagecluster-cephnfs add ocs-storagecluster-cephnfs.a [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/glibc-hwcaps/power9/libthread_db-1.0.so". Breakpoint 1, usage (argv=0x7fffffffee48) at /usr/include/bits/stdio2.h:100 warning: Source file is more recent than executable. 100 return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, __fmt, Missing separate debuginfos, use: yum debuginfo-install libntirpc-3.4-1.el8cp.ppc64le librados2-16.2.8-43.el8cp.ppc64le (gdb) bt #0 usage (argv=0x7fffffffee48) at /usr/include/bits/stdio2.h:100 #1 main (argc=<optimized out>, argv=0x7fffffffee48) at /usr/src/debug/nfs-ganesha-3.5-1.el8cp.ppc64le/src/tools/ganesha-rados-grace.c:134 (gdb) f 1 #1 main (argc=<optimized out>, argv=0x7fffffffee48) at /usr/src/debug/nfs-ganesha-3.5-1.el8cp.ppc64le/src/tools/ganesha-rados-grace.c:134 134 /usr/src/debug/nfs-ganesha-3.5-1.el8cp.ppc64le/src/tools/ganesha-rados-grace.c: No such file or directory. (gdb) p c $1 = 255 '\377'
Bug 2096882 has been reported against RHCS. Once that is addressed, the rook-ceph container image needs to be rebuilt so that it uses the fixed `ganesha-rados-grace` command.
Ceph BZ is ON_QA
With the latest ODF version : 4.11.0-101 , nfs pod reached Running state as well as cephnfs reached Ready state. [root@rdr-odf-nfs-sao01-bastion-0 ~]# oc get csv -n openshift-storage -o json ocs-operator.v4.11.0 | jq '.metadata.labels["full_version"]' "4.11.0-101" [root@rdr-odf-nfs-sao01-bastion-0 ~]# oc get pods |grep rook-ceph-nfs rook-ceph-nfs-ocs-storagecluster-cephnfs-a-768c45bb78-w6zvh 2/2 Running 0 7m17s [root@rdr-odf-nfs-sao01-bastion-0 ~]# oc get sc |grep nfs ocs-storagecluster-ceph-nfs openshift-storage.nfs.csi.ceph.com Delete Immediate false 7m56s [root@rdr-odf-nfs-sao01-bastion-0 ~]# oc get cephnfs NAME AGE ocs-storagecluster-cephnfs 2m39s [root@rdr-odf-nfs-sao01-bastion-0 ~]# [root@rdr-odf-nfs-sao01-bastion-0 ~]# oc get cephnfs -o yaml apiVersion: v1 items: - apiVersion: ceph.rook.io/v1 kind: CephNFS metadata: creationTimestamp: "2022-06-22T13:37:23Z" finalizers: - cephnfs.ceph.rook.io generation: 1 name: ocs-storagecluster-cephnfs namespace: openshift-storage ownerReferences: - apiVersion: ocs.openshift.io/v1 blockOwnerDeletion: true controller: true kind: StorageCluster name: ocs-storagecluster uid: 4ebdb866-56d5-4145-b4c5-e73d003ffa50 resourceVersion: "695162" uid: ad5d9eb8-1973-4ff3-a544-61f707ea2063 spec: rados: {} server: active: 1 placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - rook-ceph-nfs topologyKey: kubernetes.io/hostname tolerations: - effect: NoSchedule key: node.ocs.openshift.io/storage operator: Equal value: "true" priorityClassName: openshift-user-critical resources: limits: cpu: "3" memory: 8Gi requests: cpu: "3" memory: 8Gi status: observedGeneration: 1 phase: Ready kind: List metadata: resourceVersion: ""
Moving to VERIFIED based on Aaruni's comment #14
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6156