Bug 1552511
Summary: | device-plugin socket should be exposed on the host in container env | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | DeShuai Ma <dma> |
Component: | Node | Assignee: | Vikas Choudhary <vichoudh> |
Status: | CLOSED ERRATA | QA Contact: | DeShuai Ma <dma> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.9.0 | CC: | aos-bugs, jeder, jokerman, mhepburn, mmccomas, sjenning |
Target Milestone: | --- | ||
Target Release: | 3.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | aos-scalability-39 | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-07-30 19:10:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
DeShuai Ma
2018-03-07 09:00:50 UTC
Device plugin is alpha in 3.9 and shouldn't block the release but this should be fixed in 3.10 and potentially 3.9.z. Vikas, can you look at this after we handle https://bugzilla.redhat.com/show_bug.cgi?id=1548358 ? Origin 3.9 PR to openshift-ansible: https://github.com/openshift/openshift-ansible/pull/7900 FYI, there is no supported containerized install for 3.10. There is a system container install that will only be support on Atomic Host. The change needed in that case is in this PR: https://github.com/openshift/origin/pull/19308#event-1577516652 Verify on node system container env, in host instance node expose the socket. [root@ip-172-18-15-219 ~]# runc list ID PID STATUS BUNDLE CREATED OWNER atomic-openshift-node 18523 running /var/lib/containers/atomic/atomic-openshift-node.0 2018-04-28T08:03:01.053400639Z root [root@ip-172-18-15-219 ~]# oc version oc v3.10.0-0.30.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-15-219.ec2.internal:443 openshift v3.10.0-0.30.0 kubernetes v1.10.0+b81c8f8 [root@ip-172-18-15-219 ~]# ls /var/lib/kubelet/device-plugins/kubelet.sock /var/lib/kubelet/device-plugins/kubelet.sock this also appears to be a problem for 'oc cluster up' scenarios # docker ps 686064f1374f registry.access.redhat.com/openshift3/ose:v3.9.25 "/usr/bin/openshift …" 44 seconds ago Up 43 seconds origin # docker exec -it 686064f1374f /bin/bash [root@virt origin]# ls -lart /var/lib/kubelet/device-plugins/ total 4 srwxr-xr-x. 1 root root 0 May 11 23:20 kubelet.sock drwxr-xr-x. 3 root root 28 May 11 23:20 .. drwxr-xr-x. 2 root root 61 May 11 23:21 . -rw-r--r--. 1 root root 48 May 11 23:21 kubelet_internal_checkpoint [root@virt origin]# exit # oc version oc v3.9.25 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://127.0.0.1:8443 openshift v3.9.25 kubernetes v1.9.1+a0ce1bc657 will this be fixed/supported at all ? the use case is running openshift locally connecting to GPU for example. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816 |