Bug 1955042
| Summary: | [IBM Z] OCS-CI tier4b tests fails due to timeout during pod deletion | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Abdul Kandathil (IBM) <akandath> |
| Component: | csi-driver | Assignee: | Humble Chirammal <hchiramm> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.7 | CC: | madam, mschaefe, muagarwa, ocs-bugs, odf-bz-bot, sostapov |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | s390x | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-09-21 13:26:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Didn't get a chance to look into this but mostly looks like a sync issue. Doesn't look like a 4.8 blocker at this point of time, moving it out. Will pull it back if required. I have *not* seen these failures when running tier4b on OCS 4.8.0-175.ci, OCP 4.8.2 - candidate-4.8, ocs-ci stable-ocs-4.8-202107251413 anymore. (In reply to Michael Schaefer from comment #3) > I have *not* seen these failures when running tier4b on OCS 4.8.0-175.ci, > OCP 4.8.2 - candidate-4.8, ocs-ci stable-ocs-4.8-202107251413 anymore. Michael, Thanks for the verification. In that case, Can we close this bugzilla? No, please keep this open as it has occurred again when running the tier4b suite from ocs-ci once more. Environment: OCS 4.8.0-175.ci OCP 4.8.2 - candidate-4.8 ocs-ci stable-ocs-4.8-202107251413 All tests following tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-create_pvc-mgr] are failing skipped because of Ceph health warnings. Find must_gather logs for that testcase here: https://drive.google.com/file/d/1jfBqg4UaDvFPoroR7viqvEpdDJB6qxtJ/view?usp=sharing The sequence of the executed tests is: 1 PASS @1033.733s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephBlockPool-create_pvc-mgr] 2 PASS @977.894s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephBlockPool-create_pod-mgr] 3 PASS @1000.305s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephBlockPool-run_io-mgr] 4 PASS @1010.164s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephBlockPool-create_pvc-mon] 5 PASS @974.211s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephBlockPool-create_pod-mon] 6 PASS @997.931s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephBlockPool-run_io-mon] 7 PASS @978.874s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephBlockPool-create_pvc-osd] 8 PASS @991.756s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephBlockPool-create_pod-osd] 9 PASS @978.594s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephBlockPool-run_io-osd] 10 FAIL @767.205s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-create_pvc-mgr] 11 ERR @1509.976s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-create_pvc-mgr] 12 SKIP @1.439s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-create_pod-mgr] 13 SKIP @1.643s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-run_io-mgr] 14 SKIP @1.447s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-create_pvc-mon] 15 SKIP @1.578s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-create_pod-mon] 16 SKIP @1.607s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-run_io-mon] 17 SKIP @1.572s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-create_pvc-osd] 18 SKIP @1.458s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-create_pod-osd] 19 SKIP @1.788s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-run_io-osd] 20 SKIP @1.471s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-create_pvc-mds] 21 SKIP @1.439s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-create_pod-mds] 22 SKIP @25.557s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_creation.py::TestDaemonKillDuringResourceCreation::test_ceph_daemon_kill_during_resource_creation[CephFileSystem-run_io-mds] 23 SKIP @26.362s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephBlockPool-delete_pvcs-mgr] 24 SKIP @1.667s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephBlockPool-delete_pods-mgr] 25 SKIP @1.640s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephBlockPool-delete_pvcs-mon] 26 SKIP @1.593s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephBlockPool-delete_pods-mon] 27 SKIP @1.534s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephBlockPool-delete_pvcs-osd] 28 SKIP @1.637s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephBlockPool-delete_pods-osd] 29 SKIP @1.537s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pvcs-mgr] 30 SKIP @1.723s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pods-mgr] 31 SKIP @1.639s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pvcs-mon] 32 SKIP @1.365s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pods-mon] 33 SKIP @1.404s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pvcs-osd] 34 SKIP @1.746s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pods-osd] 35 SKIP @1.335s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pvcs-mds] 36 SKIP @25.536s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pods-mds] 37 SKIP @25.219s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_daemon_kill_during_pvc_pod_deletion_and_io.py::TestDaemonKillDuringMultipleDeleteOperations::test_daemon_kill_during_pvc_pod_deletion_and_io[CephBlockPool-mgr] 38 SKIP @1.581s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_daemon_kill_during_pvc_pod_deletion_and_io.py::TestDaemonKillDuringMultipleDeleteOperations::test_daemon_kill_during_pvc_pod_deletion_and_io[CephBlockPool-mon] 39 SKIP @1.537s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_daemon_kill_during_pvc_pod_deletion_and_io.py::TestDaemonKillDuringMultipleDeleteOperations::test_daemon_kill_during_pvc_pod_deletion_and_io[CephBlockPool-osd] 40 SKIP @1.428s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_daemon_kill_during_pvc_pod_deletion_and_io.py::TestDaemonKillDuringMultipleDeleteOperations::test_daemon_kill_during_pvc_pod_deletion_and_io[CephFileSystem-mgr] 41 SKIP @1.499s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_daemon_kill_during_pvc_pod_deletion_and_io.py::TestDaemonKillDuringMultipleDeleteOperations::test_daemon_kill_during_pvc_pod_deletion_and_io[CephFileSystem-mon] 42 SKIP @1.460s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_daemon_kill_during_pvc_pod_deletion_and_io.py::TestDaemonKillDuringMultipleDeleteOperations::test_daemon_kill_during_pvc_pod_deletion_and_io[CephFileSystem-osd] 43 SKIP @24.250s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_daemon_kill_during_pvc_pod_deletion_and_io.py::TestDaemonKillDuringMultipleDeleteOperations::test_daemon_kill_during_pvc_pod_deletion_and_io[CephFileSystem-mds] 44 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_rwo_pvc_fencing_unfencing.py::TestRwoPVCFencingUnfencing::test_rwo_pvc_fencing_node_prolonged_network_failure[dedicated-2-1-False] 45 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_rwo_pvc_fencing_unfencing.py::TestRwoPVCFencingUnfencing::test_rwo_pvc_fencing_node_prolonged_network_failure[dedicated-4-3-True] 46 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_rwo_pvc_fencing_unfencing.py::TestRwoPVCFencingUnfencing::test_rwo_pvc_fencing_node_prolonged_network_failure[colocated-4-1-False] 47 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/pv_services/test_rwo_pvc_fencing_unfencing.py::TestRwoPVCFencingUnfencing::test_rwo_pvc_fencing_node_prolonged_network_failure[colocated-6-3-True] 48 SKIP @1.892s ../ocs-ci-4.8-stable/tests/manage/pv_services/pvc_clone/test_node_restart_during_pvc_clone.py::TestNodeRestartDuringPvcClone::test_worker_node_restart_during_pvc_clone 49 SKIP @50.336s ../ocs-ci-4.8-stable/tests/manage/pv_services/pvc_clone/test_resource_deletion_during_pvc_clone.py::TestResourceDeletionDuringPvcClone::test_resource_deletion_during_pvc_clone 50 SKIP @1.665s ../ocs-ci-4.8-stable/tests/manage/pv_services/pvc_resize/test_node_restart_during_pvc_expansion.py::TestNodeRestartDuringPvcExpansion::test_worker_node_restart_during_pvc_expansion 51 SKIP @26.249s ../ocs-ci-4.8-stable/tests/manage/pv_services/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[mgr] 52 SKIP @1.558s ../ocs-ci-4.8-stable/tests/manage/pv_services/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[osd] 53 SKIP @1.365s ../ocs-ci-4.8-stable/tests/manage/pv_services/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[rbdplugin] 54 SKIP @1.485s ../ocs-ci-4.8-stable/tests/manage/pv_services/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[cephfsplugin] 55 SKIP @1.660s ../ocs-ci-4.8-stable/tests/manage/pv_services/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[rbdplugin_provisioner] 56 SKIP @25.314s ../ocs-ci-4.8-stable/tests/manage/pv_services/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[cephfsplugin_provisioner] 57 SKIP @53.131s ../ocs-ci-4.8-stable/tests/manage/pv_services/pvc_snapshot/test_resource_deletion_during_snapshot_restore.py::TestResourceDeletionDuringSnapshotRestore::test_resource_deletion_during_snapshot_restore 58 SKIP @0.002s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_automated_recovery_from_failed_nodes_proactive_IPI.py::TestAutomatedRecoveryFromFailedNodes::test_automated_recovery_from_failed_nodes_IPI_proactive[rbd] 59 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_automated_recovery_from_failed_nodes_proactive_IPI.py::TestAutomatedRecoveryFromFailedNodes::test_automated_recovery_from_failed_nodes_IPI_proactive[cephfs] 60 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_automated_recovery_from_failed_nodes_reactive_IPI.py::TestAutomatedRecoveryFromFailedNodes::test_automated_recovery_from_failed_nodes_IPI_reactive[rbd-shutdown] 61 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_automated_recovery_from_failed_nodes_reactive_IPI.py::TestAutomatedRecoveryFromFailedNodes::test_automated_recovery_from_failed_nodes_IPI_reactive[rbd-terminate] 62 SKIP @0.000s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_automated_recovery_from_failed_nodes_reactive_IPI.py::TestAutomatedRecoveryFromFailedNodes::test_automated_recovery_from_failed_nodes_IPI_reactive[cephfs-shutdown] 63 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_automated_recovery_from_failed_nodes_reactive_IPI.py::TestAutomatedRecoveryFromFailedNodes::test_automated_recovery_from_failed_nodes_IPI_reactive[cephfs-terminate] 64 SKIP @0.000s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_az_failure.py::TestAvailabilityZones::test_availability_zone_failure 65 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_disk_failures.py::TestDiskFailures::test_detach_attach_worker_volume 66 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_disk_failures.py::TestDiskFailures::test_detach_attach_2_data_volumes 67 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_disk_failures.py::TestDiskFailures::test_recovery_from_volume_deletion 68 SKIP @1.832s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_node_replacement_proactive.py::TestNodeReplacementTwice::test_nodereplacement_twice 69 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_node_replacement_reactive_aws_ipi.py::TestNodeReplacement::test_node_replacement_reactive_aws_ipi[rbd-power off] 70 SKIP @0.000s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_node_replacement_reactive_aws_ipi.py::TestNodeReplacement::test_node_replacement_reactive_aws_ipi[rbd-network failure] 71 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_node_replacement_reactive_aws_ipi.py::TestNodeReplacement::test_node_replacement_reactive_aws_ipi[cephfs-power off] 72 SKIP @0.000s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_node_replacement_reactive_aws_ipi.py::TestNodeReplacement::test_node_replacement_reactive_aws_ipi[cephfs-network failure] 73 SKIP @6.839s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_nodes_maintenance.py::TestNodesMaintenance::test_node_maintenance_restart_activate[worker] 74 SKIP @1.510s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_nodes_maintenance.py::TestNodesMaintenance::test_node_maintenance_restart_activate[master] 75 SKIP @0.001s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_nodes_maintenance.py::TestNodesMaintenance::test_simultaneous_drain_of_two_ocs_nodes[rbd] 76 SKIP @20.104s ../ocs-ci-4.8-stable/tests/manage/z_cluster/nodes/test_nodes_maintenance.py::TestNodesMaintenance::test_simultaneous_drain_of_two_ocs_nodes[cephfs] Cluster status after test: $ oc get node NAME STATUS ROLES AGE VERSION bootstrap-0.m1301015ocs.lnxne.boe Ready worker 16h v1.21.1+051ac4f master-0.m1301015ocs.lnxne.boe Ready master 4d17h v1.21.1+051ac4f master-1.m1301015ocs.lnxne.boe Ready master 4d17h v1.21.1+051ac4f master-2.m1301015ocs.lnxne.boe Ready master 4d17h v1.21.1+051ac4f worker-0.m1301015ocs.lnxne.boe NotReady worker 4d17h v1.21.1+051ac4f worker-1.m1301015ocs.lnxne.boe Ready worker 4d17h v1.21.1+051ac4f worker-2.m1301015ocs.lnxne.boe Ready worker 4d17h v1.21.1+051ac4f Ceph status: $ ./ceph-tool.sh status ceph health cluster: id: 79c11403-4fe0-4275-b0f9-1f53ba99fd9a health: HEALTH_WARN Long heartbeat ping times on back interface seen, longest is 5896.639 msec Long heartbeat ping times on front interface seen, longest is 5827.613 msec services: mon: 3 daemons, quorum a,b,c (age 10s) mgr: a(active, since 12h) mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay osd: 6 osds: 6 up (since 8h), 6 in (since 4d) rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a) data: pools: 10 pools, 368 pgs objects: 12.03k objects, 44 GiB usage: 101 GiB used, 5.9 TiB / 6 TiB avail pgs: 368 active+clean io: client: 170 B/s rd, 681 B/s wr, 0 op/s rd, 0 op/s wr Status of the "Not Ready" worker: $ oc describe node/worker-0.m1301015ocs.lnxne.boe Name: worker-0.m1301015ocs.lnxne.boe Roles: worker Labels: beta.kubernetes.io/arch=s390x beta.kubernetes.io/os=linux cluster.ocs.openshift.io/openshift-storage= kubernetes.io/arch=s390x kubernetes.io/hostname=worker-0.m1301015ocs.lnxne.boe kubernetes.io/os=linux node-role.kubernetes.io/worker= node.openshift.io/os_id=rhcos topology.rook.io/rack=rack2 Annotations: csi.volume.kubernetes.io/nodeid: {"openshift-storage.cephfs.csi.ceph.com":"worker-0.m1301015ocs.lnxne.boe","openshift-storage.rbd.csi.ceph.com":"worker-0.m1301015ocs.lnxne... machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-worker-408f748963da0fca1911c061a0fd93f6 machineconfiguration.openshift.io/desiredConfig: rendered-worker-408f748963da0fca1911c061a0fd93f6 machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 29 Jul 2021 15:45:38 +0200 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Unschedulable: false Lease: HolderIdentity: worker-0.m1301015ocs.lnxne.boe AcquireTime: <unset> RenewTime: Mon, 02 Aug 2021 19:48:43 +0200 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure Unknown Mon, 02 Aug 2021 19:46:19 +0200 Mon, 02 Aug 2021 19:49:23 +0200 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Mon, 02 Aug 2021 19:46:19 +0200 Mon, 02 Aug 2021 19:49:23 +0200 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Mon, 02 Aug 2021 19:46:19 +0200 Mon, 02 Aug 2021 19:49:23 +0200 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Mon, 02 Aug 2021 19:46:19 +0200 Mon, 02 Aug 2021 19:49:23 +0200 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 10.13.1.20 Hostname: worker-0.m1301015ocs.lnxne.boe Capacity: cpu: 16 ephemeral-storage: 125424620Ki hugepages-1Mi: 0 memory: 66026156Ki pods: 250 Allocatable: cpu: 15500m ephemeral-storage: 115591329601 hugepages-1Mi: 0 memory: 64875180Ki pods: 250 System Info: Machine ID: 8d32a0fb44ed45889f8bdb847dd6adee System UUID: 8d32a0fb44ed45889f8bdb847dd6adee Boot ID: 5eb8580b-18f4-4a7a-b638-3a7806c12ccf Kernel Version: 4.18.0-305.10.2.el8_4.s390x OS Image: Red Hat Enterprise Linux CoreOS 48.84.202107242219-0 (Ootpa) Operating System: linux Architecture: s390x Container Runtime Version: cri-o://1.21.2-6.rhaos4.8.git54a5889.el8 Kubelet Version: v1.21.1+051ac4f Kube-Proxy Version: v1.21.1+051ac4f Non-terminated Pods: (28 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- default pod-test-rbd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d22h namespace-test-9e8497ecc4f74fe0a51114a39 pod-test-cephfs-3b2052c93fbb47cfa072ab34 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13h namespace-test-9e8497ecc4f74fe0a51114a39 pod-test-cephfs-715aa69f608d49878928082e 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13h namespace-test-9e8497ecc4f74fe0a51114a39 pod-test-cephfs-bb7d4801cdb34e2f88ec629b 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13h openshift-cluster-node-tuning-operator tuned-qjpz7 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 4d17h openshift-dns dns-default-qctj9 60m (0%) 0 (0%) 110Mi (0%) 0 (0%) 4d17h openshift-dns node-resolver-p7vnh 5m (0%) 0 (0%) 21Mi (0%) 0 (0%) 4d17h openshift-image-registry node-ca-pggrx 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 4d17h openshift-ingress-canary ingress-canary-tzdx4 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 4d17h openshift-kube-storage-version-migrator migrator-5c458875b5-zhrst 10m (0%) 0 (0%) 200Mi (0%) 0 (0%) 4d6h openshift-local-storage diskmaker-manager-n77rp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d17h openshift-machine-config-operator machine-config-daemon-tlncf 40m (0%) 0 (0%) 100Mi (0%) 0 (0%) 4d17h openshift-marketplace certified-operators-qswdf 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 14h openshift-marketplace certified-operators-sltrb 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 13h openshift-marketplace community-operators-dltkw 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 3d16h openshift-marketplace community-operators-x524g 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 13h openshift-marketplace redhat-marketplace-tkp2s 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 13h openshift-marketplace redhat-marketplace-zwkqb 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 14h openshift-marketplace redhat-operators-tdlwx 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 18h openshift-monitoring node-exporter-fx7hv 9m (0%) 0 (0%) 47Mi (0%) 0 (0%) 4d17h openshift-multus multus-additional-cni-plugins-s5rxd 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 4d17h openshift-multus multus-phwrr 10m (0%) 0 (0%) 65Mi (0%) 0 (0%) 4d17h openshift-multus network-metrics-daemon-khqrs 20m (0%) 0 (0%) 120Mi (0%) 0 (0%) 4d17h openshift-network-diagnostics network-check-target-fbgnn 10m (0%) 0 (0%) 15Mi (0%) 0 (0%) 4d17h openshift-sdn sdn-vm8qv 110m (0%) 0 (0%) 220Mi (0%) 0 (0%) 4d17h openshift-storage csi-cephfsplugin-ks2g7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d17h openshift-storage csi-rbdplugin-v92v5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d17h openshift-storage must-gather-p484h-helper 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 384m (2%) 0 (0%) memory 1338Mi (2%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Mi 0 (0%) 0 (0%) Events: <none> The issue mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1955042#c5 is different. That happened because the ceph health was in WARN state, and the reason for the same is clearly the resource/environment issue as can be seen in the logs pasted in #comment5 one of the worker node is also down because of which this failure is expected. For the original issue, we recently did one fix (https://github.com/ceph/ceph-csi/pull/2136) in this area (available in 4.9) which should avoid this problem. Closing the bug, please reopen if this is seen on the latest 4.9 builds. If this is frequent on 4.8 builds then we might have to backport https://github.com/ceph/ceph-csi/pull/2136 |
Description of problem (please be detailed as possible and provide log snippests): Below tests from tier4b fails due to timeout during pod deletion. - tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pvcs-mon] - ests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pods-osd] - tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pvcs-mds] - tests/manage/pv_services/test_ceph_daemon_kill_during_resource_deletion.py::TestDaemonKillDuringPodPvcDeletion::test_ceph_daemon_kill_during_pod_pvc_deletion[CephFileSystem-delete_pods-mds] Version of all relevant components (if applicable): OCP 4.7.3, LSO 4.7.0-202104142050.p0, OCS 4.7.0-364.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy OCP + OCS + LSO 2. Execute the above tests using ocs-ci 3. Actual results: Tests fail due to timeout during pod deletion. Expected results: Additional info: - The same tests were passing with OCS 4.6.2 on OCP 4.6. - Test logs and mustgather logs are available in google drive due to size restriction : https://drive.google.com/file/d/1akBK9DdMgDhxjmDXFPGipUP7r7pg5Gep/view?usp=sharing