Bug 1450286

Summary: docker-registry and router deployments failed due to serviceaccount not found while using docker system container in containerized installation
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: InstallerAssignee: Steve Milner <smilner>
Status: CLOSED ERRATA QA Contact: Gan Huang <ghuang>
Severity: high Docs Contact:
Priority: medium    
Version: 3.6.0CC: aos-bugs, gscrivan, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 05:23:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1450307    
Bug Blocks:    

Description Gan Huang 2017-05-12 07:34:17 UTC
Description of problem:
docker-registry and router deployments failed due to serviceaccount not found while using docker system container in containerized installation:

# oc logs -f dc/router
error: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
# oc logs -f dc/docker-registry
error: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory


Version-Release number of selected component (if applicable):
openshift-ansible master branch (last commit id is 593ef65)

# openshift version
openshift v3.6.74
kubernetes v1.6.1+5115d708d7
etcd 3.1.0

container-engine image id: 74bcfa1d95732d05b3aec19577e8fa00f215bf3735f0e488fe7cda8eee8123f2

How reproducible:
always

Steps to Reproduce:
1. Trigger containerized installation using docker system container
containerized=true
openshift_docker_use_system_container=true
openshift_docker_systemcontainer_image_registry_override=brew.xxxx.xxx.redhat.com/rhel7


Actual results:
# oc logs -f dc/router
error: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
# oc logs -f dc/docker-registry
error: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

Expected results:
docker-registry/router deployed successfully, app can be created as well.

Additional info:
No such issue in rpm installs.

Comment 1 Steve Milner 2017-05-12 16:19:57 UTC
Adding gscrivan as this may be related to the underlying container.

Comment 2 Steve Milner 2017-05-12 19:15:57 UTC
PR for adding /var/run into the system container: https://github.com/projectatomic/atomic-system-containers/pull/67

Handing this over to Giuseppe for merging and verification.

Comment 3 Steve Milner 2017-05-12 19:28:43 UTC
To clarify, the container-engine service doesn't cause this problem when the rest of the install is not containerized, correct? (IE: containerized=false)

Comment 4 Giuseppe Scrivano 2017-05-12 21:57:16 UTC
I am trying to replicate it here, in the meanwhile could you try to replace "-v /run:/run" in the systemd unit file for the node container to "-v /run:/run -v /var/run/secrets:/var/run/secrets:rbind", restart the service and see if it works?

Comment 5 Steve Milner 2017-05-12 22:02:57 UTC
To give some more background, Giuseppe believes that this is a an issue with the openshift.docker.node.service file. It's currently mounting with "/run:/run". The /var/run/secrets directory is mounted as a tmpfs and is not propagated which could be the issue.

The belief is by specifying "-v /var/run/secrets:/var/run/secrets:rbind" the file system should become available.

Comment 6 Giuseppe Scrivano 2017-05-13 19:30:57 UTC
I could reproduce it here and I could see that `openshift.docker.gte_1_10` is not properly set when using the Docker container.

It results in losing `:rslave` in "-v {{ openshift.common.data_dir }}:{{ openshift.common.data_dir }}{{ ':rslave' if openshift.docker.gte_1_10 | default(False) | bool else '' }}" in the roles/openshift_node/templates/openshift.docker.node.service file.

I've tried to manually set ':rslave' and it solves the reported issue.

Comment 7 Steve Milner 2017-05-14 15:58:31 UTC
Giuseppe:

PTAL https://github.com/openshift/openshift-ansible/pull/4184

Comment 8 Steve Milner 2017-05-15 13:54:39 UTC
PR merged.

Comment 10 Gan Huang 2017-06-12 07:04:10 UTC
Verified with openshift-ansible-3.6.98-1.git.0.e651d65.el7.noarch.rpm

atomic-1.17.2-4.git2760e30.el7.x86_64
runc-1.0.0-6.gite800860.el7.x86_64

# atomic -v
1.17.1
# runc -v
runc version 1.0.0-rc3
commit: cafb8d8755dc2b990fc73fbf7bff62f534da9219-dirty
spec: 1.0.0-rc5

# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Wed May 17 01:16:44 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-31.git3a6eaeb.el7.x86_64
 Go version:      go1.7.6
 Git commit:      3a6eaeb/1.12.6
 Built:           Tue Jun  6 12:45:07 2017
 OS/Arch:         linux/amd64

Comment 12 errata-xmlrpc 2017-08-10 05:23:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716