Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1596388

Summary:	Flexvolume migration when upgrading to 3.10 is broken
Product:	OpenShift Container Platform	Reporter:	Hemant Kumar <hekumar>
Component:	Storage	Assignee:	Hemant Kumar <hekumar>
Status:	CLOSED ERRATA	QA Contact:	Jianwei Hou <jhou>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.10.0	CC:	aos-bugs, aos-storage-staff, bbennett, bchilds, hekumar, jhou, jiajliu, wmeng, xtian
Target Milestone:	---
Target Release:	3.10.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-08-31 06:18:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Hemant Kumar 2018-06-28 19:22:07 UTC

When a user running openshift in non-containerized hosts upgrades to 3.10 then docs that specify old plugin paths - https://docs.openshift.com/container-platform/3.9/install_config/persistent_storage/persistent_storage_flex_volume.html will not be available inside controller-manager anymore because controller-manager runs as static pod.

I have a PR https://github.com/openshift/openshift-ansible/pull/8964 that makes sure that "/etc/origin/kubelet-plugins" is bind mounted inside controller-manager pod. This will ensure that flexvolume plugin is available inside controller-manager pod.

But there is still a question of - providing a migration path to user, because "/usr/libexec/xxxx" was old path and new path is "/etc/origin/kubelet-plugins".

Comment 1 Hemant Kumar 2018-06-29 13:48:11 UTC

Ansible fix https://github.com/openshift/openshift-ansible/pull/8964 has been merged and will be available.

Comment 3 Jianwei Hou 2018-07-02 11:07:05 UTC

Verified the fix in openshift-ansible using non-containerized hosts

Installer:
After installation, the Pod has a hostPath volume '/usr/libexec/kubernetes/kubelet-plugins' mounted to '/usr/libexec/kubernetes/kubelet-plugins'

```
    - mountPath: /usr/libexec/kubernetes/kubelet-plugins                                                                                                                                       
      mountPropagation: HostToContainer                                                                                                                                                        
      name: kubelet-plugins   
....
  - hostPath:                                  
      path: /usr/libexec/kubernetes/kubelet-plugins                                            
      type: ""                                 
    name: kubelet-plugins 
```

The flex volume installed at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ works.

Upgrade: after upgrading from 3.9 to 3.10, the master-controllers pod is deployed as desired and flex volume is functional.

I'll test again using system container before moving this to verified.

Comment 4 Bradley Childs 2018-07-05 15:26:41 UTC

Also this: https://github.com/kubernetes/kubernetes/pull/65549

Comment 5 Jianwei Hou 2018-07-06 09:07:49 UTC

(In reply to Bradley Childs from comment #4)
> Also this: https://github.com/kubernetes/kubernetes/pull/65549

I tested this fix on kubernetes, flexvolume works with containerized kubelet.

Comment 6 Jianwei Hou 2018-07-09 06:05:05 UTC

The upgrade from 3.9 to 3.10 with system container on atomic host is not successful. Node service could not start:

Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service holdoff time over, scheduling restart.
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: Starting atomic-openshift-node.service...
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 atomic-openshift-node[101317]: container_linux.go:348: starting container process caused "process_linux.go:399: container init caused \"rootfs_linux.go:58: mounting \\\"/etc/origin/kubelet-plugins\\\" to rootfs \\\"/var/lib/containers/atomic/atomic-openshift-node.0/rootfs\\\" at \\\"/etc/origin/kubelet-plugins\\\" caused \\\"stat /etc/origin/kubelet-plugins: no such file or directory\\\"\""
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 atomic-openshift-node[101336]: container "atomic-openshift-node" does not exist
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service: control process exited, code=exited status=1
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: Failed to start atomic-openshift-node.service.
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service failed.

Comment 7 Jianwei Hou 2018-07-09 07:35:34 UTC

The upgrade is tested with openshift v3.9.31 on atomic host 7.5, targeted to upgrade to v3.10.15.

Comment 8 Hemant Kumar 2018-07-09 14:11:15 UTC

Hmm, I am not super familiar with how ansible handles upgrades but if you had latest build of openshift-ansible for 3.9. Then you should have https://github.com/openshift/openshift-ansible/pull/8773 PR which already creates /etc/origin/kubelet-plugins directory on the node.

And then when we upgrade to 3.10 then same directory is created by ansible script again. Can you confirm is "/etc/origin/kubelet-plugins" directory exists on the node?

Comment 9 Hemant Kumar 2018-07-09 14:12:10 UTC

Isn't it a pre-requistie to have latest version z-stream version of all packages before upgrade to next major version happens?

Comment 10 liujia 2018-07-10 01:58:26 UTC

Block system container upgrade test.

Comment 11 Jianwei Hou 2018-07-10 13:59:08 UTC

I think the upgrade should use a latest z-stream before upgrading to a next major version. There is no problem with upgrading using this strategy.

I've tried it today twice using our jenkins job and both were successful. PR https://github.com/kubernetes/kubernetes/pull/65549 will be available in next build, I'll test it tomorrow. Liujia found the upgrade issue and I'll sync with her to better confirm the ansible fix.

Comment 13 liujia 2018-07-11 04:15:55 UTC

Confirmed:
1) About comment10, the upgrade path is v3.9.31 to v3.10.15 with installer(3.10.15), will hit upgrade fail issue.

2) Then communicated with jhou/xiaoli, it will be fixed in v3.9.33(fresh install will create the missing directory), so for the upgrade path v3.9.33 to v3.10.15 or later with installer(3.10.15 or later) should not hit 1)upgrade issue.(@jhou have verified in comment11)

3) For another upgrade path is v3.9.31 to latest z-stream v3.9.33 with installer(3.9.33), I tried today, upgrade works well with required directory created during upgrade. 
before upgrade:
[root@qe-jliu-c39-master-etcd-1 ~]# ls /etc/origin/
ansible-service-broker/ generated-configs/      master/                 sdn/                    
cloudprovider/          hosted/                 node/                   service-catalog/        
examples/               kubeconfig              openvswitch/            

after upgrade:
[root@qe-jliu-c39-master-etcd-1 ~]# ls -la /etc/origin/kubelet-plugins/
total 0
drwxr-xr-x.  2 root root   6 Jul 11 03:46 .
drwx------. 13 root root 232 Jul 11 03:46 ..

According to above, the path v3.9.31-v3.9.33-v3.10+ works well, remove testblocker.

Comment 17 Jianwei Hou 2018-08-22 02:28:30 UTC

Verified this according to previous comments.

Comment 19 errata-xmlrpc 2018-08-31 06:18:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2376