Bug 1862523
Summary: | Implement migration of Manila operator from OLM to CSO during 4.5->4.6 upgrade | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jan Safranek <jsafrane> |
Component: | Storage | Assignee: | Jan Safranek <jsafrane> |
Storage sub component: | Operators | QA Contact: | Qin Ping <piqin> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | aos-bugs |
Version: | 4.5 | ||
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Release Note | |
Doc Text: |
There already is a release note for migrating Manila driver from OLM to CVO.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:21:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1867152 | ||
Bug Blocks: |
Description
Jan Safranek
2020-07-31 15:52:33 UTC
Looking around the cluster, CSI driver on node piqin-0807-zlcd6-worker-xtvlr is not working: $ oc -n openshift-manila-csi-driver logs csi-nodeplugin-nfsplugin-2kj9l I0807 06:07:09.462191 1 nfs.go:49] Driver: nfs.csi.k8s.io version: 2.0.0 I0807 06:07:09.462292 1 nfs.go:99] Enabling volume access mode: SINGLE_NODE_WRITER I0807 06:07:09.462297 1 nfs.go:99] Enabling volume access mode: SINGLE_NODE_READER_ONLY I0807 06:07:09.462300 1 nfs.go:99] Enabling volume access mode: MULTI_NODE_READER_ONLY I0807 06:07:09.462302 1 nfs.go:99] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER I0807 06:07:09.462305 1 nfs.go:99] Enabling volume access mode: MULTI_NODE_MULTI_WRITER I0807 06:07:09.462312 1 nfs.go:110] Enabling controller service capability: UNKNOWN I0807 06:07:09.466682 1 server.go:92] Listening for connections on address: &net.UnixAddr{Name:"/plugin/csi.sock", Net:"unix"} E0807 07:04:55.834619 1 utils.go:50] GRPC error: rpc error: code = Internal desc = stat /var/lib/kubelet/pods/6c9305ff-4f2d-4151-af88-a615317ed519/volumes/kubernetes.io~csi/pvc-0caf4758-7215-4893-8e8e-4554d562a4b9/mount: input/output error E0807 07:04:55.835217 1 utils.go:50] GRPC error: rpc error: code = Internal desc = stat /var/lib/kubelet/pods/6c9305ff-4f2d-4151-af88-a615317ed519/volumes/kubernetes.io~csi/pvc-0caf4758-7215-4893-8e8e-4554d562a4b9/mount: input/output error [the error is repeated forever] $ oc -n openshift-manila-csi-driver logs openstack-manila-csi-nodeplugin-7q2dh csi-driver I0807 06:07:28.361733 1 driver.go:124] Driver: manila.csi.openstack.org I0807 06:07:28.361835 1 driver.go:125] Driver version: 0.9.0@ I0807 06:07:28.361839 1 driver.go:126] CSI spec version: 1.2.0 I0807 06:07:28.361843 1 driver.go:129] Operating on NFS shares I0807 06:07:28.361848 1 driver.go:134] Topology awareness disabled I0807 06:07:28.361858 1 driver.go:197] Enabling controller service capability: CREATE_DELETE_VOLUME I0807 06:07:28.361861 1 driver.go:197] Enabling controller service capability: CREATE_DELETE_SNAPSHOT I0807 06:07:28.361865 1 driver.go:216] Enabling volume access mode: MULTI_NODE_MULTI_WRITER I0807 06:07:28.361868 1 driver.go:216] Enabling volume access mode: MULTI_NODE_SINGLE_WRITER I0807 06:07:28.361871 1 driver.go:216] Enabling volume access mode: MULTI_NODE_READER_ONLY I0807 06:07:28.361873 1 driver.go:216] Enabling volume access mode: SINGLE_NODE_WRITER I0807 06:07:28.361875 1 driver.go:216] Enabling volume access mode: SINGLE_NODE_READER_ONLY I0807 06:07:28.363694 1 connection.go:261] Probing CSI driver for readiness I0807 06:07:28.366036 1 driver.go:262] proxying CSI driver nfs.csi.k8s.io version 2.0.0 I0807 06:07:28.366578 1 driver.go:227] Enabling node service capability: UNKNOWN I0807 06:07:28.366926 1 driver.go:326] listening for connections on &net.UnixAddr{Name:"/var/lib/kubelet/plugins/manila.csi.openstack.org/csi.sock", Net:"unix"} I0807 06:32:54.562756 1 builder.go:44] [ID:4] FWD GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded E0807 06:32:54.563194 1 driver.go:313] [ID:18] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded I0807 06:34:55.082383 1 builder.go:44] [ID:5] FWD GRPC error: rpc error: code = Canceled desc = context canceled E0807 06:34:55.082837 1 driver.go:313] [ID:20] GRPC error: rpc error: code = Canceled desc = context canceled [Again, the error is repeated forever] What's interesting with the node is that it reports Kubernetes 1.18 (=OCP 4.5) and not 4.6: $ oc get node piqin-0807-zlcd6-master-0 Ready master 7h11m v0.0.0-master+$Format:%h$ piqin-0807-zlcd6-master-1 Ready master 7h11m v0.0.0-master+$Format:%h$ piqin-0807-zlcd6-master-2 Ready master 7h11m v0.0.0-master+$Format:%h$ piqin-0807-zlcd6-worker-h4xs4 Ready worker 7h1m v0.0.0-master+$Format:%h$ piqin-0807-zlcd6-worker-kxhzx Ready worker 6h59m v0.0.0-master+$Format:%h$ piqin-0807-zlcd6-worker-xtvlr Ready,SchedulingDisabled worker 6h57m v1.18.3+08c38ef Looks like `clientaddr=10.129.2.6` is the ip of nfs plugin pod, when upgrading the csi manila operator is deleted and the nfs plugin pods are deleted too. So, the umount is hung there. It seems that the NFS driver tries to unmount the volume, but it times out. On the node, dmesg says: [26637.477459] nfs: server 172.16.32.1 not responding, still trying [26655.396484] nfs: server 172.16.32.1 not responding, timed out [26667.172003] nfs: server 172.16.32.1 not responding, timed out [26689.187229] nfs: server 172.16.32.1 not responding, timed out It could be related to `clientaddr=10.129.2.6` used in mount options of the 4.5 version of the driver - driver pod with this IP address no longer exist. I filed 1867152 for the unmount bug, because it affects also 4.5, without upgrade to 4.6, and we may need to fix it there too. Not sure what's the right status of *this* bug though... Do you agree that the *operator* was correctly removed from OLM and adopted by CVO? The upgrade failed, but we track that in bug #1867152 Verified with: 4.5.9 -> 4.6.0-0.nightly-2020-09-14-225526 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |