Bug 1552511 - device-plugin socket should be exposed on the host in container env
Summary: device-plugin socket should be exposed on the host in container env
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.10.0
Assignee: Vikas Choudhary
QA Contact: DeShuai Ma
URL:
Whiteboard: aos-scalability-39
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-07 09:00 UTC by DeShuai Ma
Modified: 2018-07-30 19:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-30 19:10:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:10:30 UTC

Description DeShuai Ma 2018-03-07 09:00:50 UTC
Description of problem:
In container env the exposed device-plugin socket "/var/lib/kubelet/device-plugins/kubelet.sock" inside a container, can't be accessed by other container. we need mount a path for it.

Version-Release number of selected component (if applicable):
openshift v3.9.3
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

How reproducible:
Always

Steps to Reproduce:
1. Enable devicePlugins on node then restart node service

# cat /etc/origin/node/node-config.yaml
...
kubeletArguments:
...
  feature-gates:
  - DevicePlugins=true

# systemctl restart atomic-openshift-node

2. Make sure DevicePlugins socket is created on host.
ls -l /var/lib/kubelet/device-plugins/kubelet.sock


Actual results:
2. The socket inside the container

Expected results:
2. It should expose on host.

Additional info:

Comment 1 Seth Jennings 2018-03-07 15:11:17 UTC
Device plugin is alpha in 3.9 and shouldn't block the release but this should be fixed in 3.10 and potentially 3.9.z.

Vikas, can you look at this after we handle https://bugzilla.redhat.com/show_bug.cgi?id=1548358 ?

Comment 2 Seth Jennings 2018-04-17 01:19:06 UTC
Origin 3.9 PR to openshift-ansible:
https://github.com/openshift/openshift-ansible/pull/7900

Comment 3 Seth Jennings 2018-04-17 01:20:34 UTC
FYI, there is no supported containerized install for 3.10.  There is a system container install that will only be support on Atomic Host.  The change needed in that case is in this PR:
https://github.com/openshift/origin/pull/19308#event-1577516652

Comment 4 DeShuai Ma 2018-04-28 09:24:31 UTC
Verify on node system container env, in host instance node expose the socket.


[root@ip-172-18-15-219 ~]# runc list
ID                      PID         STATUS      BUNDLE                                               CREATED                          OWNER
atomic-openshift-node   18523       running     /var/lib/containers/atomic/atomic-openshift-node.0   2018-04-28T08:03:01.053400639Z   root
[root@ip-172-18-15-219 ~]# oc version
oc v3.10.0-0.30.0
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-15-219.ec2.internal:443
openshift v3.10.0-0.30.0
kubernetes v1.10.0+b81c8f8
[root@ip-172-18-15-219 ~]# ls /var/lib/kubelet/device-plugins/kubelet.sock
/var/lib/kubelet/device-plugins/kubelet.sock

Comment 5 Mike Hepburn 2018-05-11 23:28:49 UTC
this also appears to be a problem for 'oc cluster up' scenarios

# docker ps

686064f1374f        registry.access.redhat.com/openshift3/ose:v3.9.25       "/usr/bin/openshift …"   44 seconds ago      Up 43 seconds                           origin


# docker exec -it 686064f1374f /bin/bash
[root@virt origin]# ls -lart /var/lib/kubelet/device-plugins/
total 4
srwxr-xr-x. 1 root root  0 May 11 23:20 kubelet.sock
drwxr-xr-x. 3 root root 28 May 11 23:20 ..
drwxr-xr-x. 2 root root 61 May 11 23:21 .
-rw-r--r--. 1 root root 48 May 11 23:21 kubelet_internal_checkpoint
[root@virt origin]# exit

# oc version

oc v3.9.25
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.9.25
kubernetes v1.9.1+a0ce1bc657

will this be fixed/supported at all ? the use case is running openshift locally connecting to GPU for example.

Comment 7 errata-xmlrpc 2018-07-30 19:10:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.