Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1448384

Summary: The generated /etc/docker/daemon.json was not capable which using docker system container
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: InstallerAssignee: Steve Milner <smilner>
Status: CLOSED ERRATA QA Contact: Gan Huang <ghuang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.0CC: aos-bugs, ghuang, jhonce, jokerman, mmccomas, sdodson, smilner
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 05:23:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1448372    
Bug Blocks:    

Description Gan Huang 2017-05-05 10:16:16 UTC
Description of problem:
The generated /etc/docker/daemon.json was not capable which installing docker system container

Version-Release number of selected component (if applicable):
openshift-ansible-3.6.53-1.git.0.03f33da.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Trigger the installation using docker system container

#cat inventory_hosts
<--snip-->
openshift_docker_use_system_container=True
openshift_docker_systemcontainer_image_registry_override=test.registry.xxx.com/rhel7/

2.Check the container-engine status
3.

Actual results:

Check the status of container-engine
#journalctl -u container-engine
<--snip-->
level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character 'u' looking for beginning of value\n"
<--snip-->


#cat /etc/docker/daemon.json
{
    "api-cors-header": "",
    "authorization-plugins": ["rhel-push-plugin"],
    "bip": "",
    "bridge": "",
    "cgroup-parent": "",
    "cluster-store": "",
    "cluster-store-opts": {},
    "cluster-advertise": "",
    "debug": true,
    "default-gateway": "",
    "default-gateway-v6": "",
    "default-runtime": "oci",
    "containerd": "/var/run/containerd.sock",
    "default-ulimits": {},
    "disable-legacy-registry": false,
    "dns": [],
    "dns-opts": [],
    "dns-search": [],
    "exec-opts": ["native.cgroupdriver=systemd"],
    "exec-root": "",
    "fixed-cidr": "",
    "fixed-cidr-v6": "",
    "graph": "",
    "group": "",
    "hosts": [],
    "icc": false,
    "insecure-registries": [u'test.registry.com:8888', u'registry.ops.openshift.com'],
    "ip": "0.0.0.0",
    "iptables": false,
    "ipv6": false,
    "ip-forward": false,
    "ip-masq": false,
    "labels": [],
    "live-restore": true,
    "log-level": "",
    "log-opts": {},
    "max-concurrent-downloads": 3,
    "max-concurrent-uploads": 5,
    "mtu": 0,
    "oom-score-adjust": -500,
    "pidfile": "",
    "raw-logs": false,
    "registry-mirrors": [],
    "runtimes": {
	"oci": {
	    "path": "/usr/libexec/docker/docker-runc-current"
	}
    },
    "selinux-enabled": True,
    "storage-driver": "",
    "storage-opts": [],
    "tls": true,
    "tlscacert": "",
    "tlscert": "",
    "tlskey": "",
    "tlsverify": true,
    "userns-remap": "",
    "add-registry": [u'test.registry.com:8888', u'registry.access.redhat.com'],
    "blocked-registries": [u'registry.hacker.com'],
    "userland-proxy-path": "/usr/libexec/docker/docker-proxy-current"
}


Expected results:
No errors

Additional info:
After removing "u" character in the image prefix, restart container-engine may hit other issues:

May 05 05:39:51 host-8-175-193.host.centralci.eng.rdu2.redhat.com runc[51741]: time="2017-05-05T05:39:51-04:00" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character 'T' looking for beginning of value\n"

May 05 05:47:29 host-8-175-193.host.centralci.eng.rdu2.redhat.com runc[62757]: time="2017-05-05T05:47:29-04:00" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: runtimes: (from flag: [oci], from file: map[oci:map[path:/usr/libexec/docker/docker-runc-current]]), authorization-plugins: (from flag: [rhel-push-plugin], from file: [rhel-push-plugin]), containerd: (from flag: /run/containerd.sock, from file: /run/containerd.sock), default-runtime: (from flag: oci, from file: oci), exec-opts: (from flag: [native.cgroupdriver=systemd], from file: [native.cgroupdriver=systemd])\n"


May 05 05:47:49 host-8-175-193.host.centralci.eng.rdu2.redhat.com runc[63646]: time="2017-05-05T05:47:49-04:00" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character '}' looking for beginning of object key string\n"

Comment 1 Steve Milner 2017-05-05 16:43:16 UTC
I see the issue. The strings are being turned into unicode instances inside the template. I'll put something together.

Comment 2 Steve Milner 2017-05-05 18:32:49 UTC
Created https://github.com/openshift/openshift-ansible/pull/4106 and added Gan as a reviewer.

Comment 3 Gan Huang 2017-05-08 07:55:47 UTC
Only the daemon.json makes sense to me to start container-engine successfully after deleting the illegal parameters one by one. And `/var/run/docker.pid` needs to be deleted manually alought docker has been stopped. 

# cat /etc/docker/daemon.json
{
    "api-cors-header": "",
    "bip": "",
    "bridge": "",
    "cgroup-parent": "",
    "cluster-store": "",
    "cluster-store-opts": {},
    "cluster-advertise": "",
    "debug": true,
    "default-gateway": "",
    "default-gateway-v6": "",
    "default-ulimits": {},
    "disable-legacy-registry": false,
    "dns": [],
    "dns-opts": [],
    "dns-search": [],
    "exec-root": "",
    "fixed-cidr": "",
    "fixed-cidr-v6": "",
    "graph": "",
    "group": "",
    "hosts": [],
    "icc": false,
    "insecure-registries": [u'test.registry.com:8888', u'registry.ops.openshift.com'],
    "ip": "0.0.0.0",
    "iptables": true,
    "ipv6": false,
    "ip-forward": false,
    "ip-masq": false,
    "labels": [],
    "live-restore": true,
    "log-level": "",
    "log-opts": null,
    "max-concurrent-downloads": 3,
    "max-concurrent-uploads": 5,
    "mtu": 0,
    "oom-score-adjust": -500,
    "raw-logs": false,
    "registry-mirrors": [],
    "userns-remap": ""
}

Comment 4 Steve Milner 2017-05-08 18:20:15 UTC
Added https://github.com/projectatomic/atomic-system-containers/pull/65 to remove /etc/sysconfig/docker usage from the system container.

Comment 6 Steve Milner 2017-05-09 19:52:55 UTC
Both are now merged.

Comment 7 Gan Huang 2017-05-10 08:01:16 UTC
Test with latest openshift-ansible master branch and latest "container-engine" (IMAGE ID: edd29b7740cd)

Still have some issues not addressed.
1)
> May 05 05:39:51 host-8-175-193.host.centralci.eng.rdu2.redhat.com runc[51741]: time="2017-05-05T05:39:51-04:00" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character 'T' looking for beginning of value\n"

It's caused by capital "T" which can't be recognized by Docker daemon
# grep T /etc/docker/daemon.json 
    "selinux-enabled": True,
2)
Problem still persists after fixing the issue above, seems "blocked-registries" not supported in /etc/docker/daemon.json

> May 10 01:41:53 qe-ghuang-master-nfs-1.localdomain runc[23807]: time="2017-05-10T05:41:53Z" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives don't match any configuration option: blocked-registries\n"

3)
There still many duplicated settings that I couldn't figure out:

> May 10 01:42:37 qe-ghuang-master-nfs-1.localdomain runc[24688]: time="2017-05-10T05:42:37Z" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: add-registry: (from flag: [registry.access.redhat.com], from file: [registry.access.redhat.com]), runtimes: (from flag: [oci], from file: map[oci:map[path:/usr/libexec/docker/docker-runc-current]]), authorization-plugins: (from flag: [rhel-push-plugin], from file: [rhel-push-plugin]), containerd: (from flag: /run/containerd.sock, from file: /run/containerd.sock), default-runtime: (from flag: oci, from file: oci), exec-opts: (from flag: [native.cgroupdriver=systemd], from file: [native.cgroupdriver=systemd]), storage-driver: (from flag: devicemapper, from file: ), selinux-enabled: (from flag: true, from file: true), storage-opts: (from flag: [dm.fs=xfs dm.thinpooldev=/dev/mapper/rhel-docker--pool dm.use_deferred_removal=true], from file: []), userland-proxy-path: (from flag: /usr/libexec/docker/docker-proxy-current, from file: /usr/libexec/docker/docker-proxy-current)\n"

...

Comment 8 Steve Milner 2017-05-10 13:19:54 UTC
(In reply to Gan Huang from comment #7)
> Test with latest openshift-ansible master branch and latest
> "container-engine" (IMAGE ID: edd29b7740cd)
> 
> Still have some issues not addressed.
> 1)
> > May 05 05:39:51 host-8-175-193.host.centralci.eng.rdu2.redhat.com runc[51741]: time="2017-05-05T05:39:51-04:00" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character 'T' looking for beginning of value\n"
> 
> It's caused by capital "T" which can't be recognized by Docker daemon
> # grep T /etc/docker/daemon.json 
>     "selinux-enabled": True,

Ah, I see. Will fix. I should have noticed that...

> 2)
> Problem still persists after fixing the issue above, seems
> "blocked-registries" not supported in /etc/docker/daemon.json
>

Interesting. The code seems to indicate it is supported but I'll check again.

> > May 10 01:41:53 qe-ghuang-master-nfs-1.localdomain runc[23807]: time="2017-05-10T05:41:53Z" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives don't match any configuration option: blocked-registries\n"
> 
> 3)
> There still many duplicated settings that I couldn't figure out:
> 
> > May 10 01:42:37 qe-ghuang-master-nfs-1.localdomain runc[24688]: time="2017-05-10T05:42:37Z" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: add-registry: (from flag: [registry.access.redhat.com], from file: [registry.access.redhat.com]), runtimes: (from flag: [oci], from file: map[oci:map[path:/usr/libexec/docker/docker-runc-current]]), authorization-plugins: (from flag: [rhel-push-plugin], from file: [rhel-push-plugin]), containerd: (from flag: /run/containerd.sock, from file: /run/containerd.sock), default-runtime: (from flag: oci, from file: oci), exec-opts: (from flag: [native.cgroupdriver=systemd], from file: [native.cgroupdriver=systemd]), storage-driver: (from flag: devicemapper, from file: ), selinux-enabled: (from flag: true, from file: true), storage-opts: (from flag: [dm.fs=xfs dm.thinpooldev=/dev/mapper/rhel-docker--pool dm.use_deferred_removal=true], from file: []), userland-proxy-path: (from flag: /usr/libexec/docker/docker-proxy-current, from file: /usr/libexec/docker/docker-proxy-current)\n"

The duplicated settings should be fixed with https://bugzilla.redhat.com/show_bug.cgi?id=1448372 as it removes the use of /etc/sysconfig/docker from the system container.

Comment 10 Steve Milner 2017-05-10 14:36:14 UTC
Added https://github.com/openshift/openshift-ansible/pull/4147 for lower case T and renaming blocked to block.

Comment 11 Steve Milner 2017-05-11 01:05:57 UTC
Merged

Comment 12 Gan Huang 2017-05-11 12:03:59 UTC
Tested with latest openshfit-ansible including PR4147, selinux-enabled is still set to "True", and seems `block-registries` should be renamed to `docker-registry`

Proposed fix: https://github.com/openshift/openshift-ansible/pull/4158

Comment 13 Steve Milner 2017-05-11 15:04:01 UTC
These changes worked for me and have been merged.

Comment 14 Gan Huang 2017-05-12 02:23:11 UTC
Tested with latest openshift-ansible. container-engine still failed to start

May 11 21:55:28 qe-ghuang-master-nfs-1.localdomain runc[13140]: time="2017-05-11T21:55:28-04:00" level=fatal msg="unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: userland-proxy-path: (from flag: /usr/libexec/docker/docker-proxy-current, from file: /usr/libexec/docker/docker-proxy-current)\n"


Restart successfully after removing  "userland-proxy-path": "/usr/libexec/docker/docker-proxy-current" from /etc/docker/daemon.json.

Comment 15 Steve Milner 2017-05-12 16:30:24 UTC
Found it. The init.sh was still referencing that outside of the json.

PR: https://github.com/openshift/openshift-ansible/pull/4174

Comment 17 Gan Huang 2017-05-15 06:23:28 UTC
Verified with openshift-ansible-3.6.68-1.git.0.9cbe2b7.el7.noarch.rpm

container-engine can be started properly.

Comment 18 Gan Huang 2017-06-05 08:16:09 UTC
Reopening this issue:

Seems `add-registry`, `block-registry`, `insecure-registries` in /etc/docker/daemon.json didn't work as expected, installer failed at:

TASK [openshift_version : Set precise containerized version to configure if openshift_release specified] ***
Monday 05 June 2017  06:17:15 +0000 (0:00:00.114)       0:03:37.158 *********** 

fatal: [openshift-140.lab.sjc.redhat.com]: FAILED! => {
    "changed": true, 
    "cmd": [
        "docker", 
        "run", 
        "--rm", 
        "openshift3/ose:v3.6", 
        "version"
    ], 
    "delta": "0:00:02.489766", 
    "end": "2017-06-05 02:17:18.754778", 
    "failed": true, 
    "rc": 125, 
    "start": "2017-06-05 02:17:16.265012", 
    "warnings": []
}

STDERR:

Unable to find image 'openshift3/ose:v3.6' locally
Trying to pull repository docker.io/openshift3/ose ... 
/usr/bin/docker-current: unauthorized: authentication required.
See '/usr/bin/docker-current run --help'.

fatal: [openshift-125.lab.sjc.redhat.com]: FAILED! => {
    "changed": true, 
    "cmd": [
        "docker", 
        "run", 
        "--rm", 
        "openshift3/ose:v3.6", 
        "version"
    ], 
    "delta": "0:00:02.515127", 
    "end": "2017-06-05 02:17:17.958299", 
    "failed": true, 
    "rc": 125, 
    "start": "2017-06-05 02:17:15.443172", 
    "warnings": []
}

STDERR:

Unable to find image 'openshift3/ose:v3.6' locally
Trying to pull repository docker.io/openshift3/ose ... 
/usr/bin/docker-current: unauthorized: authentication required.
See '/usr/bin/docker-current run --help'.

Comment 21 Steve Milner 2017-06-05 14:58:09 UTC
With the same config I can't replicate the issue (docker 1.12.6):

$ sudo docker pull openshift3/ose:v3.6
Trying to pull repository brew-pulp.xxxxxxxxxx:8888/openshift3/ose ... 
sha256:0c8ae9030a2479d5a9407d5adb90476ee055bac8c16c2253750b515d4c0fa6b6: Pulling from brew-pulp.xxxxxxxxxx:8888/openshift3/ose


- What's the version of the docker command that is present?
- Can you provide the log of the run?
- Is this only happening when doing containerized installs or is there failure with rpm install as well?

Comment 24 Steve Milner 2017-06-05 16:21:40 UTC
It looks like in the latest build Jhon moved the daemon.json file to container-deamon.json and, thus, the changes we are making to daemon.json are not being picked up.  Once I copied container-deamon.json to daemon.json it started to work once more.

I'm going to update the installer to modify container-deamon.json instead.

Comment 25 Steve Milner 2017-06-05 16:31:06 UTC
*container-daemon.json

Comment 26 Steve Milner 2017-06-05 19:25:47 UTC
PR: https://github.com/openshift/openshift-ansible/pull/4370

Comment 27 Steve Milner 2017-06-07 18:11:23 UTC
Merged.

Comment 28 Gan Huang 2017-06-12 02:51:27 UTC
Tested against openshift-ansible-3.6.98-1.git.0.e651d65.el7.noarch.rpm

/etc/docker/container-daemon.json is generated and docker system container is running well.

atomic-1.17.2-4.git2760e30.el7.x86_64
runc-1.0.0-6.gite800860.el7.x86_64

# atomic -v
1.17.1
# runc -v
runc version 1.0.0-rc3
commit: cafb8d8755dc2b990fc73fbf7bff62f534da9219-dirty
spec: 1.0.0-rc5

# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Wed May 17 01:16:44 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-31.git3a6eaeb.el7.x86_64
 Go version:      go1.7.6
 Git commit:      3a6eaeb/1.12.6
 Built:           Tue Jun  6 12:45:07 2017
 OS/Arch:         linux/amd64

# atomic images list|grep container-engine
   xxx/rhel7/container-engine   latest   7d4eccca7dfc   2017-06-11 21:58                  ostree

Comment 30 errata-xmlrpc 2017-08-10 05:23:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716