Bug 1596388 - Flexvolume migration when upgrading to 3.10 is broken
Summary: Flexvolume migration when upgrading to 3.10 is broken
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.10.z
Assignee: Hemant Kumar
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-28 19:22 UTC by Hemant Kumar
Modified: 2018-08-31 06:18 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-31 06:18:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 8964 0 None None None 2018-06-29 13:23:51 UTC
Red Hat Product Errata RHBA-2018:2376 0 None None None 2018-08-31 06:18:51 UTC

Description Hemant Kumar 2018-06-28 19:22:07 UTC
When a user running openshift in non-containerized hosts upgrades to 3.10 then docs that specify old plugin paths - https://docs.openshift.com/container-platform/3.9/install_config/persistent_storage/persistent_storage_flex_volume.html will not be available inside controller-manager anymore because controller-manager runs as static pod.

I have a PR https://github.com/openshift/openshift-ansible/pull/8964 that makes sure that "/etc/origin/kubelet-plugins" is bind mounted inside controller-manager pod. This will ensure that flexvolume plugin is available inside controller-manager pod.

But there is still a question of - providing a migration path to user, because "/usr/libexec/xxxx" was old path and new path is "/etc/origin/kubelet-plugins".

Comment 1 Hemant Kumar 2018-06-29 13:48:11 UTC
Ansible fix https://github.com/openshift/openshift-ansible/pull/8964 has been merged and will be available.

Comment 3 Jianwei Hou 2018-07-02 11:07:05 UTC
Verified the fix in openshift-ansible using non-containerized hosts

Installer:
After installation, the Pod has a hostPath volume '/usr/libexec/kubernetes/kubelet-plugins' mounted to '/usr/libexec/kubernetes/kubelet-plugins'

```
    - mountPath: /usr/libexec/kubernetes/kubelet-plugins                                                                                                                                       
      mountPropagation: HostToContainer                                                                                                                                                        
      name: kubelet-plugins   
....
  - hostPath:                                  
      path: /usr/libexec/kubernetes/kubelet-plugins                                            
      type: ""                                 
    name: kubelet-plugins 
```

The flex volume installed at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ works.

Upgrade: after upgrading from 3.9 to 3.10, the master-controllers pod is deployed as desired and flex volume is functional.

I'll test again using system container before moving this to verified.

Comment 4 Bradley Childs 2018-07-05 15:26:41 UTC
Also this: https://github.com/kubernetes/kubernetes/pull/65549

Comment 5 Jianwei Hou 2018-07-06 09:07:49 UTC
(In reply to Bradley Childs from comment #4)
> Also this: https://github.com/kubernetes/kubernetes/pull/65549

I tested this fix on kubernetes, flexvolume works with containerized kubelet.

Comment 6 Jianwei Hou 2018-07-09 06:05:05 UTC
The upgrade from 3.9 to 3.10 with system container on atomic host is not successful. Node service could not start:

Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service holdoff time over, scheduling restart.
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: Starting atomic-openshift-node.service...
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 atomic-openshift-node[101317]: container_linux.go:348: starting container process caused "process_linux.go:399: container init caused \"rootfs_linux.go:58: mounting \\\"/etc/origin/kubelet-plugins\\\" to rootfs \\\"/var/lib/containers/atomic/atomic-openshift-node.0/rootfs\\\" at \\\"/etc/origin/kubelet-plugins\\\" caused \\\"stat /etc/origin/kubelet-plugins: no such file or directory\\\"\""
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 atomic-openshift-node[101336]: container "atomic-openshift-node" does not exist
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service: control process exited, code=exited status=1
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: Failed to start atomic-openshift-node.service.
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service failed.

Comment 7 Jianwei Hou 2018-07-09 07:35:34 UTC
The upgrade is tested with openshift v3.9.31 on atomic host 7.5, targeted to upgrade to v3.10.15.

Comment 8 Hemant Kumar 2018-07-09 14:11:15 UTC
Hmm, I am not super familiar with how ansible handles upgrades but if you had latest build of openshift-ansible for 3.9. Then you should have https://github.com/openshift/openshift-ansible/pull/8773 PR which already creates /etc/origin/kubelet-plugins directory on the node.

And then when we upgrade to 3.10 then same directory is created by ansible script again. Can you confirm is "/etc/origin/kubelet-plugins" directory exists on the node?

Comment 9 Hemant Kumar 2018-07-09 14:12:10 UTC
Isn't it a pre-requistie to have latest version z-stream version of all packages before upgrade to next major version happens?

Comment 10 liujia 2018-07-10 01:58:26 UTC
Block system container upgrade test.

Comment 11 Jianwei Hou 2018-07-10 13:59:08 UTC
I think the upgrade should use a latest z-stream before upgrading to a next major version. There is no problem with upgrading using this strategy.

I've tried it today twice using our jenkins job and both were successful. PR https://github.com/kubernetes/kubernetes/pull/65549 will be available in next build, I'll test it tomorrow. Liujia found the upgrade issue and I'll sync with her to better confirm the ansible fix.

Comment 13 liujia 2018-07-11 04:15:55 UTC
Confirmed:
1) About comment10, the upgrade path is v3.9.31 to v3.10.15 with installer(3.10.15), will hit upgrade fail issue.

2) Then communicated with jhou/xiaoli, it will be fixed in v3.9.33(fresh install will create the missing directory), so for the upgrade path v3.9.33 to v3.10.15 or later with installer(3.10.15 or later) should not hit 1)upgrade issue.(@jhou have verified in comment11)

3) For another upgrade path is v3.9.31 to latest z-stream v3.9.33 with installer(3.9.33), I tried today, upgrade works well with required directory created during upgrade. 
before upgrade:
[root@qe-jliu-c39-master-etcd-1 ~]# ls /etc/origin/
ansible-service-broker/ generated-configs/      master/                 sdn/                    
cloudprovider/          hosted/                 node/                   service-catalog/        
examples/               kubeconfig              openvswitch/            

after upgrade:
[root@qe-jliu-c39-master-etcd-1 ~]# ls -la /etc/origin/kubelet-plugins/
total 0
drwxr-xr-x.  2 root root   6 Jul 11 03:46 .
drwx------. 13 root root 232 Jul 11 03:46 ..

According to above, the path v3.9.31-v3.9.33-v3.10+ works well, remove testblocker.

Comment 17 Jianwei Hou 2018-08-22 02:28:30 UTC
Verified this according to previous comments.

Comment 19 errata-xmlrpc 2018-08-31 06:18:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2376


Note You need to log in before you can comment on or make changes to this bug.