Fedora Account System
Red Hat Associate
Red Hat Customer
Description of problem (please be detailed as possible and provide log snippests): Documentation for bug https://bugzilla.redhat.com/show_bug.cgi?id=1821219 is needed and can be found here: https://github.com/red-hat-storage/ocs-training/pull/155/files
Note that a change was just made to this doc for the new steps with "oc process" instead of running the ceph commands in the ocs operator. There is a separate commit in that PR to see what changed. https://github.com/red-hat-storage/ocs-training/pull/155/files Annette, can you comment here if any other changes are needed or if they are working for you? Thanks
@anjana @raz this BZ is a must include for OCS 4.4 but still doesnt have the acks. Could you please provide the acks for the same?
Hi everyone, tested the new template. Scenario 1 - 3 OSD in the cluster, 1 per failure domain - AWS based When exceuting the template this is what I get oc logs rook-ceph-toolbox-job-0-lntkk marked out osd.0. Error EAGAIN: OSD(s) 0 have no reported stats, and not all PGs are active+clean; we cannot draw any conclusions. You can proceed by passing --force, but be warned that this will likely mean real, permanent data loss. Note that the job retries 6 times and everytime fails to complete as the PGs aren't active clean Scenario 2 - 6 OSDs in the cluster, 2 per failure domain - AWS based Note that I waited for the OSD ti be marked out and for the cluster to rebalance. On my empty cluster that meant just over 10 minute wait. oc logs pod/rook-ceph-toolbox-job-0-rj5c5 osd.0 is already out. purged osd.0 So the question is do we want to add the --force option to the purge command to for the following very specific scenarios: - Single OSD per failure domain deployments - Complete failure domain failure scenario whatever the number of OSDs deployed per failure domain For the very specific 3 OSd deployment scenario we could ship the template with a little refinement so that the only special case remains the complete failure domain. e.g. ceph osd out osd.${FAILED_OSD_ID};num=$(ceph osd stat | cut -f1 -d' ');if (( num == 3 )); then forceopt='--force';else forceopt=''; fi;ceph osd purge osd.${FAILED_OSD_ID} $forceopt Let me know if you would like me to run any additional testing @travis
We really need the --force flag inside the job. Otherwise, we will have to document the ceph commands for the user to workaround this issue. I've opened this BZ as a proposed blocker for 4.4. https://bugzilla.redhat.com/show_bug.cgi?id=1827978
Thanks Travis
With the merge of https://github.com/openshift/ocs-operator/pull/490, please update the docs with the name of the pod that needs to be used to verify the logs were successful. https://github.com/red-hat-storage/ocs-training/pull/155/files#diff-f3d8f11d485ecb01b8aeb2702a073b06R39
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days