Bug 1973180
| Summary: | Mounting of one PVC failed when trying to add capacity to the cluster | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Itzhak <ikave> |
| Component: | ceph | Assignee: | Scott Ostapovicz <sostapov> |
| Status: | CLOSED WORKSFORME | QA Contact: | Raz Tamir <ratamir> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.6 | CC: | bniver, madam, muagarwa, ocs-bugs, odf-bz-bot, sostapov |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-01 07:43:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Itzhak
2021-06-17 11:21:30 UTC
Additional info: Link to the Jenkins job: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/1094/ Here is the pods' output where we can see that the mounting of the PVC failed: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j006vu1cs33s-uma/j006vu1cs33s-uma_20210611T143334/logs/failed_testcase_ocs_logs_1623785811/test_add_capacity_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-382e0c32f1b53ed43e2fbcfd0d9b20a0d77166a28f24c51369800c8b7961d6c4/namespaces/openshift-storage/oc_output/pods: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 68m default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims. Warning FailedScheduling 68m default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims. Normal Scheduled 67m default-scheduler Successfully assigned openshift-storage/rook-ceph-osd-prepare-ocs-deviceset-2-data-1fv5rc-knt2f to compute-2 Normal SuccessfulAttachVolume 67m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-f94e955f-1078-4426-8f51-b092bac794dc" Normal SuccessfulMountVolume 67m kubelet MapVolume.MapPodDevice succeeded for volume "pvc-f94e955f-1078-4426-8f51-b092bac794dc" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/vsphere-volume/volumeDevices/[vsanDatastore] d3718f5f-4047-3574-7837-e4434bd7dee2/j006vu1cs33s-uma-td6x9-dynamic-pvc-f94e955f-1078-4426-8f51-b092bac794dc.vmdk" Normal SuccessfulMountVolume 67m kubelet MapVolume.MapPodDevice succeeded for volume "pvc-f94e955f-1078-4426-8f51-b092bac794dc" volumeMapPath "/var/lib/kubelet/pods/adfe0df1-944f-4bf8-ba6c-9012ffe6141d/volumeDevices/kubernetes.io~vsphere-volume" Warning FailedMount 66m (x8 over 67m) kubelet MountVolume.SetUp failed for volume "ocs-deviceset-2-data-1fv5rc-bridge" : mount failed: exit status 1 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/adfe0df1-944f-4bf8-ba6c-9012ffe6141d/volumes/kubernetes.io~empty-dir/ocs-deviceset-2-data-1fv5rc-bridge --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/adfe0df1-944f-4bf8-ba6c-9012ffe6141d/volumes/kubernetes.io~empty-dir/ocs-deviceset-2-data-1fv5rc-bridge Output: Failed to start transient scope unit: Argument list too long Warning FailedCreatePodContainer 62m (x25 over 67m) kubelet unable to ensure pod container exists: failed to create container for [kubepods besteffort podadfe0df1-944f-4bf8-ba6c-9012ffe6141d] : Argument list too long Warning FailedMount 51m kubelet Unable to attach or mount volumes: unmounted volumes=[rook-ceph-osd-token-t4vf4 ocs-deviceset-2-data-1fv5rc-bridge], unattached volumes=[rook-binaries ceph-conf-emptydir rook-ceph-log rook-data rook-ceph-osd-token-t4vf4 ocs-deviceset-2-data-1fv5rc-bridge rook-ceph-crash devices udev ocs-deviceset-2-data-1fv5rc]: timed out waiting for the condition Warning FailedMount 42m kubelet Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-2-data-1fv5rc-bridge rook-ceph-osd-token-t4vf4], unattached volumes=[ocs-deviceset-2-data-1fv5rc-bridge ceph-conf-emptydir rook-ceph-osd-token-t4vf4 rook-data rook-ceph-crash rook-binaries rook-ceph-log devices udev ocs-deviceset-2-data-1fv5rc]: timed out waiting for the condition Warning FailedMount 6m47s (x36 over 67m) kubelet MountVolume.SetUp failed for volume "rook-ceph-osd-token-t4vf4" : mount failed: exit status 1 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/adfe0df1-944f-4bf8-ba6c-9012ffe6141d/volumes/kubernetes.io~secret/rook-ceph-osd-token-t4vf4 --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/adfe0df1-944f-4bf8-ba6c-9012ffe6141d/volumes/kubernetes.io~secret/rook-ceph-osd-token-t4vf4 Output: Failed to start transient scope unit: Argument list too long Warning FailedMount 2m43s (x25 over 40m) kubelet (combined from similar events): MountVolume.SetUp failed for volume "rook-ceph-osd-token-t4vf4" : mount failed: exit status 1 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/adfe0df1-944f-4bf8-ba6c-9012ffe6141d/volumes/kubernetes.io~secret/rook-ceph-osd-token-t4vf4 --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/adfe0df1-944f-4bf8-ba6c-9012ffe6141d/volumes/kubernetes.io~secret/rook-ceph-osd-token-t4vf4 Output: Failed to start transient scope unit: Argument list too long From what I understand from Jilju Joy, this error also happened in the test: tests/manage/pv_services/test_raw_block_pv.py::TestRawBlockPV::test_raw_block_pv[Retain]. When Petr retrigged the job, the add_capacity test passed as you can see here:https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/1149/testReport/. Link to the relevant thread: https://mail.google.com/chat/u/0/#chat/space/AAAAREGEba8/5mVX223n6Vg Happened once on 4.6 setup, not a 4.8 blocker. Moving out. Can we close this bug? |