Bug 1811537

Summary:	pv attach failure on the openstack environment for ppc64le platform
Product:	OpenShift Container Platform	Reporter:	mkumatag
Component:	Multi-Arch	Assignee:	Prashanth Sundararaman <psundara>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Barry Donahue <bdonahue>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.3.z	CC:	aos-bugs, bbennett, bbreard, danili, dustymabe, imcleod, jligon, jsafrane, lmcfadde, manokuma, miabbott, nstielau, psundara, smilner
Target Milestone:	---
Target Release:	4.7.0
Hardware:	ppc64le
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-01-04 15:13:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description mkumatag 2020-03-09 07:26:27 UTC

Description of problem:

Version-Release number of selected component (if applicable):

[root@ci-test-instance ocp4_test]# oc get clusterversion
NAME      VERSION                                     AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-ppc64le-2020-03-02-144601   True        False         2d8h    Cluster version is 4.3.0-0.nightly-ppc64le-2020-03-02-144601
[root@ci-test-instance ocp4_test]#

How reproducible:

Steps to Reproduce:
1. Deploy OCP 4.3 on ppc64le platform via openstack provider
2. Run: openshift-tests run-test "[sig-storage] PVC Protection Verify \"immediate\" deletion of a PVC that is not in active use by a pod [Suite:openshift/conformance/parallel] [Suite:k8s]"
3.

Actual results:

STEP: Collecting events from namespace "e2e-pvc-protection-6653".
STEP: Found 5 events.
Mar  9 02:54:49.820: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pvc-tester-8jddt: {default-scheduler } Scheduled: Successfully assigned e2e-pvc-protection-6653/pvc-tester-8jddt to test-znnmj-worker-wtktq
Mar  9 02:54:49.820: INFO: At 2020-03-09 02:49:43 -0400 EDT - event for pvc-protectionc8kzt: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding
Mar  9 02:54:49.821: INFO: At 2020-03-09 02:49:46 -0400 EDT - event for pvc-protectionc8kzt: {persistentvolume-controller } ProvisioningSucceeded: Successfully provisioned volume pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5 using kubernetes.io/cinder
Mar  9 02:54:49.821: INFO: At 2020-03-09 02:49:59 -0400 EDT - event for pvc-tester-8jddt: {attachdetach-controller } SuccessfulAttachVolume: AttachVolume.Attach succeeded for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5"
Mar  9 02:54:49.821: INFO: At 2020-03-09 02:51:49 -0400 EDT - event for pvc-tester-8jddt: {kubelet test-znnmj-worker-wtktq} FailedMount: Unable to attach or mount volumes: unmounted volumes=[volume1], unattached volumes=[volume1 default-token-nfzwb]: timed out waiting for the condition
Mar  9 02:54:49.826: INFO: POD               NODE                     PHASE    GRACE  CONDITIONS
Mar  9 02:54:49.826: INFO: pvc-tester-8jddt  test-znnmj-worker-wtktq  Pending         [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-03-09 02:49:46 -0400 EDT  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-03-09 02:49:46 -0400 EDT ContainersNotReady containers with unready status: [write-pod]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-03-09 02:49:46 -0400 EDT ContainersNotReady containers with unready status: [write-pod]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-03-09 02:49:46 -0400 EDT  }]
Mar  9 02:54:49.826: INFO:
Mar  9 02:54:49.841: INFO: unable to fetch logs for pods: pvc-tester-8jddt[e2e-pvc-protection-6653].container[write-pod].error=the server rejected our request for an unknown reason (get pods pvc-tester-8jddt)
Mar  9 02:54:49.849: INFO: skipping dumping cluster info - cluster too large
Mar  9 02:54:49.849: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-pvc-protection-6653" for this suite.
[AfterEach] [sig-storage] PVC Protection
  /opt/ci-playbooks/ocp4/test/.origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/storage/pvc_protection.go:80
Mar  9 02:54:49.864: INFO: Deleting PersistentVolumeClaim "pvc-protectionc8kzt"
Mar  9 02:54:49.877: INFO: Running AfterSuite actions on all nodes
Mar  9 02:54:49.877: INFO: Running AfterSuite actions on node 1
fail [k8s.io/kubernetes/test/e2e/storage/pvc_protection.go:68]: While creating pod that uses the PVC or waiting for the Pod to become Running
Unexpected error:
    <*errors.errorString | 0xc0043e7850>: {
        s: "pod \"pvc-tester-8jddt\" is not Running: timed out waiting for the condition",
    }
    pod "pvc-tester-8jddt" is not Running: timed out waiting for the condition
occurred
[root@ci-test-instance ocp4_test]#

Expected results:

Test should pass with pv attachment to the pod

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

[root@ci-test-instance ocp4_test]# oc get sc standard -o=yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2020-03-06T22:15:46Z"
  name: standard
  ownerReferences:
  - apiVersion: v1
    kind: clusteroperator
    name: storage
    uid: 93e4283a-e8c7-4b25-83d1-542dd68f86e0
  resourceVersion: "10075"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/standard
  uid: 438f9dc4-4db5-41eb-9242-85b81c1e5e0a
provisioner: kubernetes.io/cinder
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
[root@ci-test-instance ocp4_test]#

Additional info:


cinder volume is created and attached to the worker:

(overcloud) [root@director-ci07 ~]# cinder show 19275aa8-4c99-430b-98f3-a1ba1e295717
+--------------------------------+------------------------------------------------------------------------------+
| Property                       | Value                                                                        |
+--------------------------------+------------------------------------------------------------------------------+
| attached_servers               | ['ac9ebeda-e4bd-4af7-8a89-7fabcd400389']                                     |
| attachment_ids                 | ['cb1bf25b-f5fb-4df3-8a84-6e21b7b19eaf']                                     |
| availability_zone              | nova                                                                         |
| bootable                       | false                                                                        |
| consistencygroup_id            | None                                                                         |
| created_at                     | 2020-03-09T06:49:38.000000                                                   |
| description                    | None                                                                         |
| encrypted                      | False                                                                        |
| id                             | 19275aa8-4c99-430b-98f3-a1ba1e295717                                         |
| metadata                       | kubernetes.io/created-for/pv/name : pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5 |
|                                | kubernetes.io/created-for/pvc/name : pvc-protectionc8kzt                     |
|                                | kubernetes.io/created-for/pvc/namespace : e2e-pvc-protection-6653            |
| migration_status               | None                                                                         |
| multiattach                    | False                                                                        |
| name                           | test-znnmj-dynamic-pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5                  |
| os-vol-host-attr:host          | hostgroup@tripleo_iscsi#tripleo_iscsi                                        |
| os-vol-mig-status-attr:migstat | None                                                                         |
| os-vol-mig-status-attr:name_id | None                                                                         |
| os-vol-tenant-attr:tenant_id   | b94e5d3223794b4e946e2f910a656930                                             |
| replication_status             | None                                                                         |
| size                           | 1                                                                            |
| snapshot_id                    | None                                                                         |
| source_volid                   | None                                                                         |
| status                         | in-use                                                                       |
| updated_at                     | 2020-03-09T06:49:50.000000                                                   |
| user_id                        | a2f24e0d09d443c7b17152ab9f724a25                                             |
| volume_type                    | tripleo                                                                      |
+--------------------------------+------------------------------------------------------------------------------+
(overcloud) [root@director-ci07 ~]#


seen following messages for the volume:

sh-4.4# grep "19275aa8-4c99-430b-98f3-a1ba1e295717" kubelet.logs
Mar 09 06:49:46 test-znnmj-worker-wtktq hyperkube[1493]: I0309 06:49:46.512350    1493 reconciler.go:208] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" (UniqueName: "kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717") pod "pvc-tester-8jddt" (UID: "e79107b7-35d7-4d21-8cbb-3916749c35dd")
Mar 09 06:49:46 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:49:46.512452    1493 nestedpendingoperations.go:270] Operation for "\"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\"" failed. No retries permitted until 2020-03-09 06:49:47.012424607 +0000 UTC m=+202813.979571810 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5\" (UniqueName: \"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\") pod \"pvc-tester-8jddt\" (UID: \"e79107b7-35d7-4d21-8cbb-3916749c35dd\") "
Mar 09 06:49:47 test-znnmj-worker-wtktq hyperkube[1493]: I0309 06:49:47.013811    1493 reconciler.go:208] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" (UniqueName: "kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717") pod "pvc-tester-8jddt" (UID: "e79107b7-35d7-4d21-8cbb-3916749c35dd")
Mar 09 06:49:47 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:49:47.013930    1493 nestedpendingoperations.go:270] Operation for "\"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\"" failed. No retries permitted until 2020-03-09 06:49:48.01390222 +0000 UTC m=+202814.981049423 (durationBeforeRetry 1s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5\" (UniqueName: \"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\") pod \"pvc-tester-8jddt\" (UID: \"e79107b7-35d7-4d21-8cbb-3916749c35dd\") "
Mar 09 06:49:48 test-znnmj-worker-wtktq hyperkube[1493]: I0309 06:49:48.018224    1493 reconciler.go:208] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" (UniqueName: "kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717") pod "pvc-tester-8jddt" (UID: "e79107b7-35d7-4d21-8cbb-3916749c35dd")
Mar 09 06:49:48 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:49:48.018367    1493 nestedpendingoperations.go:270] Operation for "\"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\"" failed. No retries permitted until 2020-03-09 06:49:50.018332744 +0000 UTC m=+202816.985479948 (durationBeforeRetry 2s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5\" (UniqueName: \"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\") pod \"pvc-tester-8jddt\" (UID: \"e79107b7-35d7-4d21-8cbb-3916749c35dd\") "
Mar 09 06:49:50 test-znnmj-worker-wtktq hyperkube[1493]: I0309 06:49:50.028845    1493 reconciler.go:208] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" (UniqueName: "kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717") pod "pvc-tester-8jddt" (UID: "e79107b7-35d7-4d21-8cbb-3916749c35dd")
Mar 09 06:49:50 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:49:50.028963    1493 nestedpendingoperations.go:270] Operation for "\"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\"" failed. No retries permitted until 2020-03-09 06:49:54.028936128 +0000 UTC m=+202820.996083325 (durationBeforeRetry 4s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5\" (UniqueName: \"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\") pod \"pvc-tester-8jddt\" (UID: \"e79107b7-35d7-4d21-8cbb-3916749c35dd\") "
Mar 09 06:49:54 test-znnmj-worker-wtktq hyperkube[1493]: I0309 06:49:54.041008    1493 reconciler.go:208] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" (UniqueName: "kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717") pod "pvc-tester-8jddt" (UID: "e79107b7-35d7-4d21-8cbb-3916749c35dd")
Mar 09 06:49:54 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:49:54.044508    1493 nestedpendingoperations.go:270] Operation for "\"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\"" failed. No retries permitted until 2020-03-09 06:50:02.044482954 +0000 UTC m=+202829.011630159 (durationBeforeRetry 8s). Error: "Volume not attached according to node status for volume \"pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5\" (UniqueName: \"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\") pod \"pvc-tester-8jddt\" (UID: \"e79107b7-35d7-4d21-8cbb-3916749c35dd\") "
Mar 09 06:50:02 test-znnmj-worker-wtktq hyperkube[1493]: I0309 06:50:02.076161    1493 reconciler.go:208] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" (UniqueName: "kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717") pod "pvc-tester-8jddt" (UID: "e79107b7-35d7-4d21-8cbb-3916749c35dd")
Mar 09 06:50:02 test-znnmj-worker-wtktq hyperkube[1493]: I0309 06:50:02.079710    1493 operation_generator.go:1422] Controller attach succeeded for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" (UniqueName: "kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717") pod "pvc-tester-8jddt" (UID: "e79107b7-35d7-4d21-8cbb-3916749c35dd") device path: "/dev/vdb"
Mar 09 06:50:02 test-znnmj-worker-wtktq hyperkube[1493]: I0309 06:50:02.176408    1493 reconciler.go:253] operationExecutor.MountVolume started for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" (UniqueName: "kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717") pod "pvc-tester-8jddt" (UID: "e79107b7-35d7-4d21-8cbb-3916749c35dd")
Mar 09 06:50:02 test-znnmj-worker-wtktq hyperkube[1493]: I0309 06:50:02.176570    1493 operation_generator.go:661] MountVolume.WaitForAttach entering for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" (UniqueName: "kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717") pod "pvc-tester-8jddt" (UID: "e79107b7-35d7-4d21-8cbb-3916749c35dd") DevicePath "/dev/vdb"
Mar 09 06:50:03 test-znnmj-worker-wtktq hyperkube[1493]: W0309 06:50:03.682991    1493 openstack_volumes.go:572] Failed to find device for the volumeID: "19275aa8-4c99-430b-98f3-a1ba1e295717"
Mar 09 06:50:03 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:50:03.683024    1493 attacher.go:248] Error: could not find attached Cinder disk "19275aa8-4c99-430b-98f3-a1ba1e295717" (path: ""): <nil>
Mar 09 06:50:06 test-znnmj-worker-wtktq hyperkube[1493]: W0309 06:50:06.063134    1493 openstack_volumes.go:572] Failed to find device for the volumeID: "19275aa8-4c99-430b-98f3-a1ba1e295717"
Mar 09 06:50:06 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:50:06.063151    1493 attacher.go:248] Error: could not find attached Cinder disk "19275aa8-4c99-430b-98f3-a1ba1e295717" (path: ""): <nil>
Mar 09 06:50:10 test-znnmj-worker-wtktq hyperkube[1493]: W0309 06:50:10.829827    1493 openstack_volumes.go:572] Failed to find device for the volumeID: "19275aa8-4c99-430b-98f3-a1ba1e295717"
Mar 09 06:50:10 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:50:10.829850    1493 attacher.go:248] Error: could not find attached Cinder disk "19275aa8-4c99-430b-98f3-a1ba1e295717" (path: ""): <nil>
Mar 09 06:50:19 test-znnmj-worker-wtktq hyperkube[1493]: W0309 06:50:19.646520    1493 openstack_volumes.go:572] Failed to find device for the volumeID: "19275aa8-4c99-430b-98f3-a1ba1e295717"
Mar 09 06:50:19 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:50:19.646544    1493 attacher.go:248] Error: could not find attached Cinder disk "19275aa8-4c99-430b-98f3-a1ba1e295717" (path: ""): <nil>
Mar 09 06:50:36 test-znnmj-worker-wtktq hyperkube[1493]: W0309 06:50:36.593186    1493 openstack_volumes.go:572] Failed to find device for the volumeID: "19275aa8-4c99-430b-98f3-a1ba1e295717"
Mar 09 06:50:36 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:50:36.593202    1493 attacher.go:248] Error: could not find attached Cinder disk "19275aa8-4c99-430b-98f3-a1ba1e295717" (path: ""): <nil>
Mar 09 06:51:09 test-znnmj-worker-wtktq hyperkube[1493]: W0309 06:51:09.012218    1493 openstack_volumes.go:572] Failed to find device for the volumeID: "19275aa8-4c99-430b-98f3-a1ba1e295717"
Mar 09 06:51:09 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:51:09.012243    1493 attacher.go:248] Error: could not find attached Cinder disk "19275aa8-4c99-430b-98f3-a1ba1e295717" (path: ""): <nil>
Mar 09 06:52:14 test-znnmj-worker-wtktq hyperkube[1493]: W0309 06:52:14.698997    1493 openstack_volumes.go:572] Failed to find device for the volumeID: "19275aa8-4c99-430b-98f3-a1ba1e295717"
Mar 09 06:52:14 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:52:14.699019    1493 attacher.go:248] Error: could not find attached Cinder disk "19275aa8-4c99-430b-98f3-a1ba1e295717" (path: ""): <nil>
Mar 09 06:54:23 test-znnmj-worker-wtktq hyperkube[1493]: W0309 06:54:23.124801    1493 openstack_volumes.go:572] Failed to find device for the volumeID: "19275aa8-4c99-430b-98f3-a1ba1e295717"
Mar 09 06:54:23 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:54:23.124832    1493 attacher.go:248] Error: could not find attached Cinder disk "19275aa8-4c99-430b-98f3-a1ba1e295717" (path: ""): <nil>
Mar 09 06:58:40 test-znnmj-worker-wtktq hyperkube[1493]: W0309 06:58:40.093062    1493 openstack_volumes.go:572] Failed to find device for the volumeID: "19275aa8-4c99-430b-98f3-a1ba1e295717"
Mar 09 06:58:40 test-znnmj-worker-wtktq hyperkube[1493]: E0309 06:58:40.093105    1493 attacher.go:248] Error: could not find attached Cinder disk "19275aa8-4c99-430b-98f3-a1ba1e295717" (path: ""): <nil>
Mar 09 07:00:02 test-znnmj-worker-wtktq hyperkube[1493]: E0309 07:00:02.176772    1493 nestedpendingoperations.go:270] Operation for "\"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\"" failed. No retries permitted until 2020-03-09 07:00:02.676732318 +0000 UTC m=+203429.643879525 (durationBeforeRetry 500ms). Error: "MountVolume.WaitForAttach failed for volume \"pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5\" (UniqueName: \"kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717\") pod \"pvc-tester-8jddt\" (UID: \"e79107b7-35d7-4d21-8cbb-3916749c35dd\") : could not find attached Cinder disk \"19275aa8-4c99-430b-98f3-a1ba1e295717\". Timeout waiting for mount paths to be created"
Mar 09 07:00:02 test-znnmj-worker-wtktq hyperkube[1493]: I0309 07:00:02.176847    1493 event.go:255] Event(v1.ObjectReference{Kind:"Pod", Namespace:"e2e-pvc-protection-6653", Name:"pvc-tester-8jddt", UID:"e79107b7-35d7-4d21-8cbb-3916749c35dd", APIVersion:"v1", ResourceVersion:"1047592", FieldPath:""}): type: 'Warning' reason: 'FailedMount' MountVolume.WaitForAttach failed for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" : could not find attached Cinder disk "19275aa8-4c99-430b-98f3-a1ba1e295717". Timeout waiting for mount paths to be created
Mar 09 07:00:02 test-znnmj-worker-wtktq hyperkube[1493]: I0309 07:00:02.203165    1493 reconciler.go:302] Volume detached for volume "pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5" (UniqueName: "kubernetes.io/cinder/19275aa8-4c99-430b-98f3-a1ba1e295717") on node "test-znnmj-worker-wtktq" DevicePath "/dev/vdb"
Mar 09 07:00:02 test-znnmj-worker-wtktq hyperkube[1493]: E0309 07:00:02.230216    1493 event.go:237] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"pvc-tester-8jddt.15fa909829505331", GenerateName:"", Namespace:"e2e-pvc-protection-6653", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"e2e-pvc-protection-6653", Name:"pvc-tester-8jddt", UID:"e79107b7-35d7-4d21-8cbb-3916749c35dd", APIVersion:"v1", ResourceVersion:"1047592", FieldPath:""}, Reason:"FailedMount", Message:"MountVolume.WaitForAttach failed for volume \"pvc-b8a2c7d6-4467-4c3b-ba7b-979a4a423ea5\" : could not find attached Cinder disk \"19275aa8-4c99-430b-98f3-a1ba1e295717\". Timeout waiting for mount paths to be created", Source:v1.EventSource{Component:"kubelet", Host:"test-znnmj-worker-wtktq"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbf91983c8a885f31, ext:203429143856646, loc:(*time.Location)(0x1d2a88e0)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbf91983c8a885f31, ext:203429143856646, loc:(*time.Location)(0x1d2a88e0)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'namespaces "e2e-pvc-protection-6653" not found' (will not retry!)
sh-4.4#

Comment 1 mkumatag 2020-03-09 07:39:19 UTC

logs - http://file.rdu.redhat.com/~mkumatag/logs/BZ1811537/must-gather.local.7959666193381648547.tar.gz

Comment 2 Jan Safranek 2020-03-09 12:06:14 UTC

[got from the reporter on Slack]

$ lsblk on the node:
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                          252:0    0   30G  0 disk
|-vda1                       252:1    0  384M  0 part /boot
|-vda2                       252:2    0    4M  0 part
`-vda4                       252:4    0 29.6G  0 part
  `-coreos-luks-root-nocrypt 253:0    0 29.6G  0 dm   /sysroot

$ ls -la /dev/disk/by-id
total 0
drwxr-xr-x. 2 root root  60 Mar  6 22:28 .
drwxr-xr-x. 8 root root 160 Mar  6 22:28 ..
lrwxrwxrwx. 1 root root  10 Mar  9 06:58 dm-name-coreos-luks-root-nocrypt -> ../../dm-0


There is no cinder volume attached to the node. Or at least the node does not know about it - it misses its device.

Comment 3 Jan Safranek 2020-03-09 12:23:03 UTC

OpenStack indeed thinks the volume is attached:

(overcloud) [root@director-ci07 ~]# openstack volume list
+--------------------------------------+-------------------------------------------------------------+----------+------+--------------------------------------------------+
| ID                                   | Name                                                        | Status   | Size | Attached to                                      |
+--------------------------------------+-------------------------------------------------------------+----------+------+--------------------------------------------------+
| 86003647-ced9-4874-9211-e812c2ef3384 | test-znnmj-dynamic-pvc-6d354e6a-4795-47fe-896d-d72a595e87a3 | in-use   |    1 | Attached to test-znnmj-worker-wtktq on /dev/vdb  |


But there is no /dev/vdb on the node.

Comment 4 mkumatag 2020-03-09 15:00:45 UTC

One less headache because we have seen that volume is getting attached to the kvm vm(worker) properly but not discovered inside the worker vm

(overcloud) [root@director-ci07 ~]# openstack volume list
+--------------------------------------+-------------------------------------------------------------+----------+------+--------------------------------------------------+
| ID                                   | Name                                                        | Status   | Size | Attached to                                      |
+--------------------------------------+-------------------------------------------------------------+----------+------+--------------------------------------------------+
| 2e8d1e4f-3277-4038-ad70-bcfcf934574b | test-znnmj-dynamic-pvc-91efefcf-e02b-4205-b850-6155c42ed5bc | in-use   |    1 | Attached to test-znnmj-worker-wtktq on /dev/vdb  |

[root@overcloud-novacomputeppc64le-2 ~]# virsh dumpxml 9
<domain type='kvm' id='9'>
  <name>instance-00000031</name>
  <uuid>ac9ebeda-e4bd-4af7-8a89-7fabcd400389</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="19.0.3-0.20190814170534.a8e19af.el8ost"/>
      <nova:name>test-znnmj-worker-wtktq</nova:name>
      <nova:creationTime>2020-03-06 22:22:50</nova:creationTime>
      <nova:flavor name="m1.medium">
        <nova:memory>16384</nova:memory>
        <nova:disk>30</nova:disk>
        <nova:swap>0</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>4</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="a2f24e0d09d443c7b17152ab9f724a25">admin</nova:user>
        <nova:project uuid="b94e5d3223794b4e946e2f910a656930">admin</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="b1f6daf6-3fce-47a3-9208-7f5f3553b3bf"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <shares>4096</shares>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='ppc64le' machine='pseries-rhel7.6.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-model' check='partial'>
    <model fallback='forbid'/>
    <topology sockets='4' cores='1' threads='1'/>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/nova/instances/ac9ebeda-e4bd-4af7-8a89-7fabcd400389/disk'/>
      <backingStore type='file' index='1'>
        <format type='raw'/>
        <source file='/var/lib/nova/instances/_base/9af59a19bbce080fb61dd6ae35b6d124e309fb59'/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/sdb'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>2e8d1e4f-3277-4038-ad70-bcfcf934574b</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='0'/>
      <alias name='pci.0'/>
    </controller>
    <interface type='bridge'>
      <mac address='fa:16:3e:49:3d:20'/>
      <source bridge='qbr6daf1aa4-11'/>
      <target dev='tap6daf1aa4-11'/>
      <model type='virtio'/>
      <driver name='vhost' rx_queue_size='512'/>
      <mtu size='1450'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/1'/>
      <log file='/var/lib/nova/instances/ac9ebeda-e4bd-4af7-8a89-7fabcd400389/console.log' append='off'/>
      <target type='spapr-vio-serial' port='0'>
        <model name='spapr-vty'/>
      </target>
      <alias name='serial0'/>
      <address type='spapr-vio' reg='0x30000000'/>
    </serial>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <log file='/var/lib/nova/instances/ac9ebeda-e4bd-4af7-8a89-7fabcd400389/console.log' append='off'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
      <address type='spapr-vio' reg='0x30000000'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='keyboard' bus='usb'>
      <alias name='input1'/>
      <address type='usb' bus='0' port='2'/>
    </input>
    <input type='mouse' bus='usb'>
      <alias name='input2'/>
      <address type='usb' bus='0' port='3'/>
    </input>
    <graphics type='vnc' port='5901' autoport='yes' listen='192.168.85.53'>
      <listen type='address' address='192.168.85.53'/>
    </graphics>
    <video>
      <model type='vga' vram='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <stats period='10'/>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </memballoon>
    <panic model='pseries'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>

[root@overcloud-novacomputeppc64le-2 ~]#


[root@overcloud-novacomputeppc64le-2 ~]# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  1.8T  0 disk
├─sda1   8:1    0    8M  0 part
├─sda2   8:2    0  1.8T  0 part /
└─sda3   8:3    0   64M  0 part
sdb      8:16   0    1G  0 disk
[root@overcloud-novacomputeppc64le-2 ~]#


We need to debug why the disks are not getting detected in the guest dynamically!

Comment 7 mkumatag 2020-03-11 07:28:39 UTC

Got the information from the libvirt expert that we need the following packages and services be running to detect the disk hotplugs

Packages: librtas, powerpc-utils, ppc64-diag
Service: rtas_errd

Based on the request @psundara built the new coreos image which above packages and services. Unfortunately, I don't the complete OCP setup to test but instead I just mimic the environment by booting the coreos image in one of the openstack environment and able to see the disks getting attached runtime

kernel messages for the hotplug:

[Wed Mar 11 07:07:59 2020] RTAS: event: 1, Type: Unknown, Severity: 1
[Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: [1af4:1001] type 00 class 0x010000
[Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: reg 0x10: [io  0x10000-0x1003f]
[Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: reg 0x14: [mem 0x00000000-0x00000fff]
[Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: No hypervisor support for SR-IOV on this device, IOV BARs disabled.
[Wed Mar 11 07:08:00 2020] iommu: Adding device 0000:00:06.0 to group 0
[Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: BAR 4: assigned [mem 0x210000010000-0x210000013fff 64bit pref]
[Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: BAR 1: assigned [mem 0x200081000000-0x200081000fff]
[Wed Mar 11 07:08:00 2020] pci 0000:00:06.0: BAR 0: assigned [io  0x100c0-0x100ff]
[Wed Mar 11 07:08:00 2020] virtio-pci 0000:00:06.0: enabling device (0000 -> 0003)
[Wed Mar 11 07:08:00 2020] virtio-pci 0000:00:06.0: Using 64-bit direct DMA at offset 800000000000000
[Wed Mar 11 07:08:00 2020] virtio_blk virtio3: [vdb] 2097152 512-byte logical blocks (1.07 GB/1.00 GiB)


lspci:


[root@mkumatag-new-coreos-test core]# lspci
00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:02.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
00:03.0 SCSI storage controller: Red Hat, Inc. Virtio block device
00:04.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
00:05.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device           <================ New interface


lsblk:


[root@mkumatag-new-coreos-test core]# lsblk
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
vda                          252:0    0   160G  0 disk
|-vda1                       252:1    0     4M  0 part
|-vda2                       252:2    0   384M  0 part /boot
`-vda4                       252:4    0 159.6G  0 part
  `-coreos-luks-root-nocrypt 253:0    0 159.6G  0 dm   /sysroot
vdb                          252:16   0     1G  0 disk                     <================ New disk attached via cinder



[root@mkumatag-new-coreos-test core]# rpm -qa | grep diag
ppc64-diag-2.7.5-2.el8.ppc64le

[root@mkumatag-new-coreos-test core]# rpm -qa | grep powerpc-utils
powerpc-utils-1.3.6-4.el8.ppc64le
powerpc-utils-core-1.3.6-4.el8.ppc64le


[root@mkumatag-new-coreos-test core]# systemctl status rtas_errd
● rtas_errd.service - ppc64-diag rtas_errd (platform error handling) Service
   Loaded: loaded (/usr/lib/systemd/system/rtas_errd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-03-11 07:05:50 UTC; 21min ago
  Process: 6760 ExecStart=/usr/sbin/rtas_errd (code=exited, status=0/SUCCESS)
 Main PID: 6768 (rtas_errd)
    Tasks: 1 (limit: 26213)
   Memory: 15.3M
   CGroup: /system.slice/rtas_errd.service
           └─6768 /usr/sbin/rtas_errd

Mar 11 07:05:50 mkumatag-new-coreos-test.openstacklocal systemd[1]: Starting ppc64-diag rtas_errd (platform error h>
Mar 11 07:05:50 mkumatag-new-coreos-test.openstacklocal systemd[1]: Started ppc64-diag rtas_errd (platform error ha>
Mar 11 07:08:00 mkumatag-new-coreos-test drmgr[6998]: drmgr: -c pci -a -s 0x40000030 -n -d4 -V
[root@mkumatag-new-coreos-test core]#


I'm moving this issue to coreos to include these packages and service into the coreos 43 builds

Comment 13 Dan Li 2020-07-16 20:55:59 UTC

Chatted with Dennis on this bug, I don't believe that OSP on OCP is supported at the moment, is this bug something that an IBM engineer could investigate into? If not, I would like to close this bug and track it via BZ 1659152.

Comment 14 mkumatag 2020-07-17 10:39:56 UTC

(In reply to Dan Li from comment #13)
> Chatted with Dennis on this bug, I don't believe that OSP on OCP is
> supported at the moment, is this bug something that an IBM engineer could
> investigate into? If not, I would like to close this bug and track it via BZ
> 1659152.

@manokuma @lmcfadde ^^^

Comment 15 Dan Li 2020-07-21 15:07:19 UTC

Adding "UpcomingSprint" as this bug is unlikely to be fixed before the end of the sprint on August 1st. The (in/out of) scope status is still being discussed upon.

Comment 16 Prashanth Sundararaman 2020-08-12 13:49:03 UTC

so this history is that we did have a fix for this:

https://github.com/coreos/fedora-coreos-config/pull/296

The CoreOS team expressed concerns over adding these packages as it includes undesired dependencies like Perl. So https://bugzilla.redhat.com/show_bug.cgi?id=1814335 was worked on to exclude perl from these packages. Unfortunately the packages are only available in 8.3 so we will have to wait for RHCOS to move to 8.3 for this test to work in CI.

Meanwhile I think Manju has already disabled these tests for CI.

Comment 19 Dan Li 2020-08-18 15:28:52 UTC

Hi Prashanth, will this bug be looked at before the end of this sprint this week (I assume probably not based on your Comment 16)? If not, I would like to add "UpcomingSprint" tag

Comment 23 Dan Li 2020-10-13 22:48:54 UTC

Hi @Prashanth and @Manju, since this bug was reported on 4.3.z and 4.3 will go end of maintenance support after 4.6 releases next week, do you still see this bug as needed? If not, we should close this bug. If needed, we should re-target the version to a later release.

Comment 28 Dan Li 2020-12-04 22:03:26 UTC

Adding UpcomingSprint as this bug is unlikely to be resolved before the end of the current sprint

Comment 29 Prashanth Sundararaman 2020-12-05 01:42:38 UTC

The latest 4.7 builds have the powerpc-utils-core and ppc64-diag packages and rtas_errd enabled.

Manju,

Could you test it and let me know if it works now?

Thanks
Prashanth

Comment 30 mkumatag 2020-12-24 01:06:01 UTC

Didn't have an entire OCP with Openstack setup to verify the bug but could able to verify with a minimal single node VM in Openstack environment and here is the detail:

Now I'm able to see those new packages installed and service running as below:

[root@mkumatag-rhcos-47 ~]# rpm -qa | grep power
powerpc-utils-core-1.3.6-11.el8.ppc64le
[root@mkumatag-rhcos-47 ~]# rpm -qa | grep diag
ppc64-diag-rtas-2.7.6-2.el8.ppc64le
[root@mkumatag-rhcos-47 ~]# systemctl status rtas_errd
● rtas_errd.service - ppc64-diag rtas_errd (platform error handling) Service
   Loaded: loaded (/usr/lib/systemd/system/rtas_errd.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-12-24 00:52:20 UTC; 1min 13s ago
  Process: 1359 ExecStart=/usr/sbin/rtas_errd (code=exited, status=0/SUCCESS)
 Main PID: 1374 (rtas_errd)
    Tasks: 1 (limit: 103532)
   Memory: 9.1M
   CGroup: /system.slice/rtas_errd.service
           └─1374 /usr/sbin/rtas_errd

Dec 24 00:52:20 mkumatag-rhcos-47 systemd[1]: Starting ppc64-diag rtas_errd (platform error handling) Service...
Dec 24 00:52:20 mkumatag-rhcos-47 systemd[1]: Started ppc64-diag rtas_errd (platform error handling) Service.


Before volume attach:

[root@mkumatag-rhcos-47 ~]# lspci
00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:02.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
00:03.0 SCSI storage controller: Red Hat, Inc. Virtio block device
00:04.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
00:05.0 VGA compatible controller: Device 1234:1111 (rev 02)
[root@mkumatag-rhcos-47 ~]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
vda    252:0    0   160G  0 disk
|-vda1 252:1    0     4M  0 part
|-vda3 252:3    0   384M  0 part /boot
`-vda4 252:4    0 159.6G  0 part /sysroot


After volume attach:

[root@mkumatag-rhcos-47 ~]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
vda    252:0    0   160G  0 disk
|-vda1 252:1    0     4M  0 part
|-vda3 252:3    0   384M  0 part /boot
`-vda4 252:4    0 159.6G  0 part /sysroot
vdb    252:16   0     3G  0 disk                     <==================== NEW volume got attached here
[root@mkumatag-rhcos-47 ~]# ls /dev/vdb
/dev/vdb


RHCOS version:

[root@mkumatag-rhcos-47 ~]# uname -a
Linux mkumatag-rhcos-47 4.18.0-240.8.1.el8_3.ppc64le #1 SMP Fri Dec 4 12:02:23 EST 2020 ppc64le ppc64le ppc64le GNU/Linux
[root@mkumatag-rhcos-47 ~]#

[root@mkumatag-rhcos-47 ~]# cat /etc/os-release
NAME="Red Hat Enterprise Linux CoreOS"
VERSION="47.83.202012231312-0"
VERSION_ID="4.7"
OPENSHIFT_VERSION="4.7"
RHEL_VERSION="8.3"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 47.83.202012231312-0 (Ootpa)"
ID="rhcos"
ID_LIKE="rhel fedora"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.7"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.7"
OSTREE_VERSION='47.83.202012231312-0'
[root@mkumatag-rhcos-47 ~]#


So this issue has fixed and worked properly without any issues.

Comment 31 Dan Li 2021-01-04 15:13:34 UTC

Per Manju's Comment 30. I'm closing this bug as "Currentrelease" Please re-open if needed.