When a user running openshift in non-containerized hosts upgrades to 3.10 then docs that specify old plugin paths - https://docs.openshift.com/container-platform/3.9/install_config/persistent_storage/persistent_storage_flex_volume.html will not be available inside controller-manager anymore because controller-manager runs as static pod. I have a PR https://github.com/openshift/openshift-ansible/pull/8964 that makes sure that "/etc/origin/kubelet-plugins" is bind mounted inside controller-manager pod. This will ensure that flexvolume plugin is available inside controller-manager pod. But there is still a question of - providing a migration path to user, because "/usr/libexec/xxxx" was old path and new path is "/etc/origin/kubelet-plugins".
Ansible fix https://github.com/openshift/openshift-ansible/pull/8964 has been merged and will be available.
Verified the fix in openshift-ansible using non-containerized hosts Installer: After installation, the Pod has a hostPath volume '/usr/libexec/kubernetes/kubelet-plugins' mounted to '/usr/libexec/kubernetes/kubelet-plugins' ``` - mountPath: /usr/libexec/kubernetes/kubelet-plugins mountPropagation: HostToContainer name: kubelet-plugins .... - hostPath: path: /usr/libexec/kubernetes/kubelet-plugins type: "" name: kubelet-plugins ``` The flex volume installed at /usr/libexec/kubernetes/kubelet-plugins/volume/exec/ works. Upgrade: after upgrading from 3.9 to 3.10, the master-controllers pod is deployed as desired and flex volume is functional. I'll test again using system container before moving this to verified.
Also this: https://github.com/kubernetes/kubernetes/pull/65549
(In reply to Bradley Childs from comment #4) > Also this: https://github.com/kubernetes/kubernetes/pull/65549 I tested this fix on kubernetes, flexvolume works with containerized kubelet.
The upgrade from 3.9 to 3.10 with system container on atomic host is not successful. Node service could not start: Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service holdoff time over, scheduling restart. Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: Starting atomic-openshift-node.service... Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 atomic-openshift-node[101317]: container_linux.go:348: starting container process caused "process_linux.go:399: container init caused \"rootfs_linux.go:58: mounting \\\"/etc/origin/kubelet-plugins\\\" to rootfs \\\"/var/lib/containers/atomic/atomic-openshift-node.0/rootfs\\\" at \\\"/etc/origin/kubelet-plugins\\\" caused \\\"stat /etc/origin/kubelet-plugins: no such file or directory\\\"\"" Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 atomic-openshift-node[101336]: container "atomic-openshift-node" does not exist Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service: control process exited, code=exited status=1 Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: Failed to start atomic-openshift-node.service. Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: Unit atomic-openshift-node.service entered failed state. Jul 09 05:55:32 qe-jliu-jzws-master-etcd-1 systemd[1]: atomic-openshift-node.service failed.
The upgrade is tested with openshift v3.9.31 on atomic host 7.5, targeted to upgrade to v3.10.15.
Hmm, I am not super familiar with how ansible handles upgrades but if you had latest build of openshift-ansible for 3.9. Then you should have https://github.com/openshift/openshift-ansible/pull/8773 PR which already creates /etc/origin/kubelet-plugins directory on the node. And then when we upgrade to 3.10 then same directory is created by ansible script again. Can you confirm is "/etc/origin/kubelet-plugins" directory exists on the node?
Isn't it a pre-requistie to have latest version z-stream version of all packages before upgrade to next major version happens?
Block system container upgrade test.
I think the upgrade should use a latest z-stream before upgrading to a next major version. There is no problem with upgrading using this strategy. I've tried it today twice using our jenkins job and both were successful. PR https://github.com/kubernetes/kubernetes/pull/65549 will be available in next build, I'll test it tomorrow. Liujia found the upgrade issue and I'll sync with her to better confirm the ansible fix.
Confirmed: 1) About comment10, the upgrade path is v3.9.31 to v3.10.15 with installer(3.10.15), will hit upgrade fail issue. 2) Then communicated with jhou/xiaoli, it will be fixed in v3.9.33(fresh install will create the missing directory), so for the upgrade path v3.9.33 to v3.10.15 or later with installer(3.10.15 or later) should not hit 1)upgrade issue.(@jhou have verified in comment11) 3) For another upgrade path is v3.9.31 to latest z-stream v3.9.33 with installer(3.9.33), I tried today, upgrade works well with required directory created during upgrade. before upgrade: [root@qe-jliu-c39-master-etcd-1 ~]# ls /etc/origin/ ansible-service-broker/ generated-configs/ master/ sdn/ cloudprovider/ hosted/ node/ service-catalog/ examples/ kubeconfig openvswitch/ after upgrade: [root@qe-jliu-c39-master-etcd-1 ~]# ls -la /etc/origin/kubelet-plugins/ total 0 drwxr-xr-x. 2 root root 6 Jul 11 03:46 . drwx------. 13 root root 232 Jul 11 03:46 .. According to above, the path v3.9.31-v3.9.33-v3.10+ works well, remove testblocker.
Verified this according to previous comments.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2376