Bug 1552511

Summary: device-plugin socket should be exposed on the host in container env
Product: OpenShift Container Platform Reporter: DeShuai Ma <dma>
Component: NodeAssignee: Vikas Choudhary <vichoudh>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: aos-bugs, jeder, jokerman, mhepburn, mmccomas, sjenning
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: aos-scalability-39
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:10:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description DeShuai Ma 2018-03-07 09:00:50 UTC
Description of problem:
In container env the exposed device-plugin socket "/var/lib/kubelet/device-plugins/kubelet.sock" inside a container, can't be accessed by other container. we need mount a path for it.

Version-Release number of selected component (if applicable):
openshift v3.9.3
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.16

How reproducible:
Always

Steps to Reproduce:
1. Enable devicePlugins on node then restart node service

# cat /etc/origin/node/node-config.yaml
...
kubeletArguments:
...
  feature-gates:
  - DevicePlugins=true

# systemctl restart atomic-openshift-node

2. Make sure DevicePlugins socket is created on host.
ls -l /var/lib/kubelet/device-plugins/kubelet.sock


Actual results:
2. The socket inside the container

Expected results:
2. It should expose on host.

Additional info:

Comment 1 Seth Jennings 2018-03-07 15:11:17 UTC
Device plugin is alpha in 3.9 and shouldn't block the release but this should be fixed in 3.10 and potentially 3.9.z.

Vikas, can you look at this after we handle https://bugzilla.redhat.com/show_bug.cgi?id=1548358 ?

Comment 2 Seth Jennings 2018-04-17 01:19:06 UTC
Origin 3.9 PR to openshift-ansible:
https://github.com/openshift/openshift-ansible/pull/7900

Comment 3 Seth Jennings 2018-04-17 01:20:34 UTC
FYI, there is no supported containerized install for 3.10.  There is a system container install that will only be support on Atomic Host.  The change needed in that case is in this PR:
https://github.com/openshift/origin/pull/19308#event-1577516652

Comment 4 DeShuai Ma 2018-04-28 09:24:31 UTC
Verify on node system container env, in host instance node expose the socket.


[root@ip-172-18-15-219 ~]# runc list
ID                      PID         STATUS      BUNDLE                                               CREATED                          OWNER
atomic-openshift-node   18523       running     /var/lib/containers/atomic/atomic-openshift-node.0   2018-04-28T08:03:01.053400639Z   root
[root@ip-172-18-15-219 ~]# oc version
oc v3.10.0-0.30.0
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-15-219.ec2.internal:443
openshift v3.10.0-0.30.0
kubernetes v1.10.0+b81c8f8
[root@ip-172-18-15-219 ~]# ls /var/lib/kubelet/device-plugins/kubelet.sock
/var/lib/kubelet/device-plugins/kubelet.sock

Comment 5 Mike Hepburn 2018-05-11 23:28:49 UTC
this also appears to be a problem for 'oc cluster up' scenarios

# docker ps

686064f1374f        registry.access.redhat.com/openshift3/ose:v3.9.25       "/usr/bin/openshift …"   44 seconds ago      Up 43 seconds                           origin


# docker exec -it 686064f1374f /bin/bash
[root@virt origin]# ls -lart /var/lib/kubelet/device-plugins/
total 4
srwxr-xr-x. 1 root root  0 May 11 23:20 kubelet.sock
drwxr-xr-x. 3 root root 28 May 11 23:20 ..
drwxr-xr-x. 2 root root 61 May 11 23:21 .
-rw-r--r--. 1 root root 48 May 11 23:21 kubelet_internal_checkpoint
[root@virt origin]# exit

# oc version

oc v3.9.25
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.9.25
kubernetes v1.9.1+a0ce1bc657

will this be fixed/supported at all ? the use case is running openshift locally connecting to GPU for example.

Comment 7 errata-xmlrpc 2018-07-30 19:10:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816