Bug 1985074
| Summary: | must-gather is skipping the ceph collection when there are two must-gather-helper pods | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Oded <oviner> |
| Component: | must-gather | Assignee: | Rewant <resoni> |
| Status: | CLOSED ERRATA | QA Contact: | Oded <oviner> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.8 | CC: | ebenahar, godas, kramdoss, muagarwa, nberry, ocs-bugs, odf-bz-bot, resoni, sabose |
| Target Milestone: | --- | Keywords: | Automation |
| Target Release: | ODF 4.9.0 | Flags: | resoni:
needinfo-
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-13 17:44:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Oded
2021-07-22 18:48:30 UTC
Agree, the cleanup part needs improvement. Not a 4.8 blocker, its a day one issue. Bug reconstructed on OCS4.9 SetUp: OCP Version: 4.8.0-0.nightly-2021-08-12-174317 OCS Version: ocs-operator.v4.9.0-105.ci LSO Version: 4.8 Test Prcess: 1.Run must-gather command first time: $ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.9 2.Run must-gather command second time: $ oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.9 3.Check pod status: [there are two must-gather-helper pods on openshift-storage project] $ oc get pods | grep must must-gather-9j56r-helper 1/1 Running 0 2m7s 4.Check content of must-gather directory: a.On first must-gather, we did not collect the ceph files. Exception: Files don't exist: ['ceph_auth_list', 'ceph_balancer_status', 'ceph_config-key_ls', 'ceph_config_dump', 'ceph_crash_stat', 'ceph_device_ls', 'ceph_fs_dump', 'ceph_fs_ls', 'ceph_fs_status', 'ceph_fs_subvolumegroup_ls_ocs-storagecluster-cephfilesystem', 'ceph_health_detail', 'ceph_mds_stat', 'ceph_mgr_dump', 'ceph_mgr_module_ls', 'ceph_mgr_services', 'ceph_mon_dump', 'ceph_mon_stat', 'ceph_osd_blocked-by', 'ceph_osd_crush_class_ls', 'ceph_osd_crush_dump', 'ceph_osd_crush_rule_dump', 'ceph_osd_crush_rule_ls', 'ceph_osd_crush_show-tunables', 'ceph_osd_crush_weight-set_dump', 'ceph_osd_df', 'ceph_osd_df_tree', 'ceph_osd_dump', 'ceph_osd_getmaxosd', 'ceph_osd_lspools', 'ceph_osd_numa-status', 'ceph_osd_perf', 'ceph_osd_pool_ls_detail', 'ceph_osd_stat', 'ceph_osd_tree', 'ceph_osd_utilization', 'ceph_pg_dump', 'ceph_pg_stat', 'ceph_quorum_status', 'ceph_report', 'ceph_service_dump', 'ceph_status', 'ceph_time-sync-status', 'ceph_versions', 'ceph_df_detail'] b.On second must-gather, we collect the ceph files This is what we experienced when already mg helper pod is running(Some how it's not cleaned up or it's run by using "keep" flag) then running again MG(so total two mg helper pods), the ceph commands execution are having error. So we need to implement as I mentioned in Comment#4 . I don't think at a single point of time we need two MG instances to run. So some how the older helper pod is not deleting as per fix , PR: https://github.com/openshift/ocs-operator/pull/1280 @Rewant can you please take a look? As per our discussion over gchat with Oded, the 2nd run happend within a minute after 1st run. So the 1st helper pod terminated before reaching to ceph command execution. That's why 1st run is missing with ceph out put. This is the intention of the fix.So if 2nd run is able to collect ceph commands out put then it's perfect. Bug moved to verified state based on https://bugzilla.redhat.com/show_bug.cgi?id=1985074#c11 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:5086 |