Bug 2235311
| Summary: | [DOC] Restoring ceph-monitor quorum procedure, The bad mons cannot be deleted from the monmap because permission issue | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Oded <oviner> |
| Component: | documentation | Assignee: | Anjana Suparna Sriram <asriram> |
| Status: | NEW --- | QA Contact: | Neha Berry <nberry> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13 | CC: | asriram, hnallurv, kelwhite, kmanohar, odf-bz-bot |
| Target Milestone: | --- | Flags: | hnallurv:
needinfo?
(asriram) kelwhite: needinfo? (asriram) kelwhite: needinfo? (asriram) |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hit the same issue on the RDR environment when tried to recover the mons out of quorum
Permission error
----------------
monmaptool ${monmap_path} --rm bmonmaptool: monmap file /tmp/monmapmonmaptool: removing bmonmaptool: writing epoch 5 to /tmp/monmap (2 monitors)bufferlist::write_file(/tmp/monmap): failed to open file: (13) Permission deniedmonmaptool: error writing to '/tmp/monmap': (13) Permission denied
Followed the below URL:
-----------------------
https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.13/html/troubleshooting_openshift_data_foundation/restoring-ceph-monitor-quorum-in-openshift-data-foundation_rhodf
Product version:
ODF- 4.13-219
OCP - 4.13.0-0.nightly-2023-08-11-101506
Platform: Vsphere
Hi Anjana, This bug needs to be fixed in docs of 4.13, 4.14 and 4.15. Can you please prioritize this BZ and provide fix soon? I was doing some testing, and managed to get it to work: sh-5.1# monmaptool --rm a /tmp/monmap monmaptool: monmap file /tmp/monmap monmaptool: removing a monmaptool: writing epoch 3 to /tmp/monmap (2 monitors) bufferlist::write_file(/tmp/monmap): failed to open file: (13) Permission denied monmaptool: error writing to '/tmp/monmap': (13) Permission denied sh-5.1# monmaptool --print monmap monmaptool: monmap file monmap epoch 3 fsid 32deba18-6e88-4f41-8401-dede5787e344 last_changed 2024-09-02T02:22:52.521379+0000 created 2024-09-02T02:22:19.434678+0000 min_mon_release 18 (reef) election_strategy: 1 0: v2:172.30.195.83:3300/0 mon.a 1: v2:172.30.163.106:3300/0 mon.b 2: v2:172.30.193.96:3300/0 mon.c sh-5.1# monmaptool --add d 1.1.1.1:567 monmap monmaptool: monmap file monmap monmaptool: writing epoch 3 to monmap (4 monitors) sh-5.1# monmaptool --print monmap monmaptool: monmap file monmap epoch 3 fsid 32deba18-6e88-4f41-8401-dede5787e344 last_changed 2024-09-02T02:22:52.521379+0000 created 2024-09-02T02:22:19.434678+0000 min_mon_release 18 (reef) election_strategy: 1 0: v2:172.30.195.83:3300/0 mon.a 1: v2:172.30.163.106:3300/0 mon.b 2: v2:172.30.193.96:3300/0 mon.c 3: v2:1.1.1.1:567/0 mon.d sh-5.1# monmaptool rm a monmap monmaptool: too many arguments monmaptool -h for usage sh-5.1# monmaptool --rm a monmap monmaptool: monmap file monmap monmaptool: removing a monmaptool: writing epoch 3 to monmap (3 monitors) I don't know why, but as soon as I added to the mon map, I was able to remove from the mon map... seems like a bug to me? |
Describe the issue: Restoring ceph-monitor quorum procedure is not correct. The bad mons cannot be deleted from the monmap because permission issue Describe the task you were trying to accomplish: Test Procedure: 1.Stop 2 worker nodes oviner:auth$ oc get nodes NAME STATUS ROLES AGE VERSION compute-0 NotReady worker 3d v1.27.4+deb2c60 compute-1 NotReady worker 3d v1.27.4+deb2c60 compute-2 Ready worker 3d v1.27.4+deb2c60 control-plane-0 Ready control-plane,master 3d1h v1.27.4+deb2c60 control-plane-1 Ready control-plane,master 3d1h v1.27.4+deb2c60 control-plane-2 Ready control-plane,master 3d1h v1.27.4+deb2c60 oviner:auth$ oc get pods -l app=rook-ceph-mon NAME READY STATUS RESTARTS AGE rook-ceph-mon-a-576dc56947-l2cqx 0/2 Pending 0 20h rook-ceph-mon-b-569d6c5877-fvxf2 2/2 Terminating 0 21h rook-ceph-mon-b-569d6c5877-hclhg 0/2 Pending 0 20h rook-ceph-mon-c-6646b847ff-r9m4j 2/2 Running 1 (12h ago) 3d 2.Stop the rook-ceph-operator so that the mons are not failed over when you are modifying the monmap. $ oc -n openshift-storage scale deployment rook-ceph-operator --replicas=0 deployment.apps/rook-ceph-operator scaled 3. Open the YAML file and copy the command and arguments from the mon container $ oc -n openshift-storage get deployment rook-ceph-mon-c -o yaml > rook-ceph-mon-c-deployment.yaml 4.Cleanup the copied command and args fields to form a pastable command as follows: ceph-mon \ --fsid=8b24e1e2-00f9-4d81-a721-4ee4095fba99 \ --keyring=/etc/ceph/keyring-store/keyring \ --default-log-to-stderr=true \ --default-err-to-stderr=true \ --default-mon-cluster-log-to-stderr=true \ --default-log-stderr-prefix=debug \ --default-log-to-file=false \ --default-mon-cluster-log-to-file=false \ --mon-host=$(ROOK_CEPH_MON_HOST) \ --mon-initial-members=$(ROOK_CEPH_MON_INITIAL_MEMBERS) \ --id=c \ --setuser=ceph \ --setgroup=ceph \ --foreground \ --public-addr=172.30.53.157 \ --setuser-match-path=/var/lib/ceph/mon/ceph-c/store.db \ --public-bind-addr=$(ROOK_POD_IP) \ --extract-monmap=${monmap_path} 5. Patch the rook-ceph-mon-c Deployment to stop the working of this mon without deleting the mon pod. $ oc -n openshift-storage patch deployment rook-ceph-mon-c --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' $ oc -n openshift-storage patch deployment rook-ceph-mon-c -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}' 6.Connect to the pod of a healthy mon [mon-c]: $ oc -n openshift-storage exec -it rook-ceph-mon-c-765cbb446f-4xgzw bash [root@rook-ceph-mon-c-765cbb446f-4xgzw ceph]# monmap_path=/tmp/monmap 7.Review the contents of the monmap. [root@rook-ceph-mon-c-765cbb446f-4xgzw ceph]# monmaptool --print /tmp/monmap monmaptool: monmap file /tmp/monmap epoch 3 fsid 8b24e1e2-00f9-4d81-a721-4ee4095fba99 last_changed 2023-08-21T10:15:51.349720+0000 created 2023-08-21T10:13:54.902037+0000 min_mon_release 17 (quincy) election_strategy: 1 0: v2:172.30.122.31:3300/0 mon.a 1: v2:172.30.85.192:3300/0 mon.b 2: v2:172.30.53.157:3300/0 mon.c 8.Remove the bad mons from the monmap [Failed] [root@rook-ceph-mon-c-765cbb446f-4xgzw ceph]# monmaptool ${monmap_path} --rm a monmaptool: monmap file /tmp/monmap monmaptool: removing a monmaptool: writing epoch 3 to /tmp/monmap (2 monitors) bufferlist::write_file(/tmp/monmap): failed to open file: (13) Permission denied monmaptool: error writing to '/tmp/monmap': (13) Permission denied Suggestions for improvement: We need to find the correct procedure for restoring ceph-monitor quorum. Document URL: https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.13/html/troubleshooting_openshift_data_foundation/restoring-ceph-monitor-quorum-in-openshift-data-foundation_rhodf#doc-wrapper Chapter/Section Number and Title: Chapter 12. Restoring ceph-monitor quorum in OpenShift Data Foundation Product Version: ODF Version: odf-operator.v4.14.0-111.stable OCP Version: 4.14.0-0.nightly-2023-08-11-055332 platform: Vsphere Environment Details: Any other versions of this document that also needs this update: Additional information: for more info: https://docs.google.com/document/d/1Xu6L4ibi-0PWD9Y8ezeXRQ-TsHRPtnHH-eRaw0pDRec/edit