Bug 1467824

Summary:	master api/controllers service in containerized install can not be restarted with docker-1.12.6-40.
Product:	OpenShift Container Platform	Reporter:	Johnny Liu <jialiu>
Component:	Installer	Assignee:	Scott Dodson <sdodson>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Johnny Liu <jialiu>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.6.0	CC:	abutcher, akostadi, amurdaca, aos-bugs, dwalsh, imcleod, jialiu, jligon, jokerman, mifiedle, mmccomas, sdodson, sjenning, slopezpa, vgoyal
Target Milestone:	---
Target Release:	3.6.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1470261 1470389 (view as bug list)		Environment:
Last Closed:	2017-08-14 15:41:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1468244
Bug Blocks:	1470389

Description Johnny Liu 2017-07-05 08:56:12 UTC

Description of problem:
Set up a containerized HA cluster using RHEL74 system, after that:
1. restart atomic-openshift-master-api, succeed

2. restart atomic-openshift-master-controllers, failed with the following message:
Jul 05 04:14:14 qe-jialiu-master-etcd-zone1-1 systemd[1]: Starting Atomic OpenShift Master Controllers...
Jul 05 04:14:15 qe-jialiu-master-etcd-zone1-1 atomic-openshift-master-controllers[5476]: Error response from daemon: Driver devicemapper failed to remove root filesystem 40d9fee19049c710e1b44cf10ecbae0c49485b8237b3a63fd673a2689bc44579: remove /var/lib/docker/devicemapper/mnt/209fa04d9a38f1f5592749f92c9a3a964a65abb2cbb636a12f698253223f3fb0: device or resource busy
Jul 05 04:14:15 qe-jialiu-master-etcd-zone1-1 atomic-openshift-master-controllers[5483]: /usr/bin/docker-current: Error response from daemon: Conflict. The name "/atomic-openshift-master-controllers" is already in use by container 40d9fee19049c710e1b44cf10ecbae0c49485b8237b3a63fd673a2689bc44579. You have to remove (or rename) that container to be able to reuse that name..
Jul 05 04:14:15 qe-jialiu-master-etcd-zone1-1 atomic-openshift-master-controllers[5483]: See '/usr/bin/docker-current run --help'.
Jul 05 04:14:15 qe-jialiu-master-etcd-zone1-1 systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=125/n/a

3. restart atomic-openshift-master-api, succeed

4. restart docker, succeed, and atomic-openshift-master-api and atomic-openshift-master-controllers come back to running state.

5. restart atomic-openshift-master-api, failed with the following message:
Jul 05 04:50:23 qe-jialiu-master-etcd-zone1-1 systemd[1]: Starting Atomic OpenShift Master API...
Jul 05 04:50:24 qe-jialiu-master-etcd-zone1-1 atomic-openshift-master-api[18716]: Error response from daemon: Driver devicemapper failed to remove root filesystem f5496ec837f5d72d4fd229d32e7a586276258091af7ab65ed70326717d30fd56: remove /var/lib/docker/devicemapper/mnt/5051b7a2bf73af34da91e6714d53acc0e793280b9394e56b3e5e7e55da12c96c: device or resource busy
Jul 05 04:50:24 qe-jialiu-master-etcd-zone1-1 atomic-openshift-master-api[18724]: /usr/bin/docker-current: Error response from daemon: Conflict. The name "/atomic-openshift-master-api" is already in use by container f5496ec837f5d72d4fd229d32e7a586276258091af7ab65ed70326717d30fd56. You have to remove (or rename) that container to be able to reuse that name..
Jul 05 04:50:24 qe-jialiu-master-etcd-zone1-1 atomic-openshift-master-api[18724]: See '/usr/bin/docker-current run --help'.
Jul 05 04:50:24 qe-jialiu-master-etcd-zone1-1 systemd[1]: atomic-openshift-master-api.service: main process exited, code=exited, status=125/n/a

6. atomic-openshift-node could be restarted successfully.

7. Downgrade docker version to docker-1.12.6-32.git88a4867.el7.x86_64, everything is working well.


# docker inspect atomic-openshift-master-api
[
    {
        "Id": "f3f1ae57c1192d3136e58cbd14fc0e3c431a0bf8ef4f412d1b5fb95d3870f14e",
        "Created": "2017-07-05T08:53:40.761550206Z",
        "Path": "/usr/bin/openshift",
        "Args": [
            "start",
            "master",
            "api",
            "--config=/etc/origin/master/master-config.yaml",
            "--loglevel=5",
            "--listen=https://0.0.0.0:443",
            "--master=https://qe-jialiu-master-etcd-zone1-1"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 22034,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2017-07-05T08:53:41.383685926Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:261f654db23fa14e08519660b341443b931573cfb1bb0879657a2e81b04bc3c4",
        "ResolvConfPath": "/var/lib/docker/containers/f3f1ae57c1192d3136e58cbd14fc0e3c431a0bf8ef4f412d1b5fb95d3870f14e/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/f3f1ae57c1192d3136e58cbd14fc0e3c431a0bf8ef4f412d1b5fb95d3870f14e/hostname",
        "HostsPath": "/var/lib/docker/containers/f3f1ae57c1192d3136e58cbd14fc0e3c431a0bf8ef4f412d1b5fb95d3870f14e/hosts",
        "LogPath": "",
        "Name": "/atomic-openshift-master-api",
        "RestartCount": 0,
        "Driver": "devicemapper",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/var/run/docker.sock:/var/run/docker.sock",
                "/etc/origin:/etc/origin",
                "/etc/origin/cloudprovider:/etc/origin/cloudprovider",
                "/etc/pki:/etc/pki:ro",
                "/var/lib/origin:/var/lib/origin",
                "/var/log:/var/log"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "journald",
                "Config": {}
            },
            "NetworkMode": "host",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": true,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
                "label=disable"
            ],
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "docker-runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": -1,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Name": "devicemapper",
            "Data": {
                "DeviceId": "157",
                "DeviceName": "docker-253:0-67183517-edda68727e6fb53a8b909df73a517cbcdce4b901e08a28b9bf3dc29f5173a5e2",
                "DeviceSize": "10737418240"
            }
        },
        "Mounts": [
            {
                "Source": "/var/log",
                "Destination": "/var/log",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/etc/origin",
                "Destination": "/etc/origin",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/etc/origin/cloudprovider",
                "Destination": "/etc/origin/cloudprovider",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/etc/pki",
                "Destination": "/etc/pki",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Source": "/var/lib/origin",
                "Destination": "/var/lib/origin",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "qe-jialiu-master-etcd-zone1-1",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": true,
            "AttachStderr": true,
            "ExposedPorts": {
                "53/tcp": {},
                "8443/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "OPTIONS=--loglevel=5 --listen=https://0.0.0.0:443 --master=https://qe-jialiu-master-etcd-zone1-1",
                "CONFIG_FILE=/etc/origin/master/master-config.yaml",
                "OPENSHIFT_DEFAULT_REGISTRY=docker-registry.default.svc:5000",
                "IMAGE_VERSION=v3.6.133",
                "NO_PROXY=.cluster.local,qe-jialiu-master-etcd-zone1-1,172.31.0.0/16,11.0.0.0/16",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "container=docker",
                "HOME=/root",
                "OPENSHIFT_CONTAINERIZED=true",
                "KUBECONFIG=/var/lib/origin/openshift.local.config/master/admin.kubeconfig"
            ],
            "Cmd": [
                "start",
                "master",
                "api",
                "--config=/etc/origin/master/master-config.yaml",
                "--loglevel=5",
                "--listen=https://0.0.0.0:443",
                "--master=https://qe-jialiu-master-etcd-zone1-1"
            ],
            "Image": "openshift3/ose:v3.6.133",
            "Volumes": null,
            "WorkingDir": "/var/lib/origin",
            "Entrypoint": [
                "/usr/bin/openshift"
            ],
            "OnBuild": null,
            "Labels": {
                "BZComponent": "openshift-enterprise-docker",
                "Component": "openshift-enterprise-base-docker",
                "Name": "openshift3/ose",
                "Release": "1",
                "Version": "v3.6.133",
                "architecture": "x86_64",
                "authoritative-source-url": "registry.access.redhat.com",
                "build-date": "2017-07-04T06:10:21.671111",
                "com.redhat.build-host": "ip-10-29-120-57.ec2.internal",
                "com.redhat.component": "openshift-enterprise-docker",
                "description": "OpenShift Container Platform is a platform for developing, building, and deploying containerized applications.",
                "distribution-scope": "public",
                "io.k8s.description": "OpenShift Container Platform is a platform for developing, building, and deploying containerized applications.",
                "io.k8s.display-name": "OpenShift Container Platform Application Platform",
                "io.openshift.tags": "openshift,core",
                "name": "openshift3/ose",
                "release": "1",
                "summary": "Provides the latest release of Red Hat Enterprise Linux 7 in a fully featured and supported base image.",
                "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift3/ose/images/v3.6.133-1",
                "vcs-ref": "01c5f1bd6443ea404f405bcaebf4f875d7fed258",
                "vcs-type": "git",
                "vendor": "Red Hat, Inc.",
                "version": "v3.6.133"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "e4cc9937d218f879d1861b01fa18c14fb4e9247a1462c57f26fc5f19178ca2c1",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/default",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "host": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "00527de350ecda56cecf283a1f163f8662d30bd0c01e900a4d92e85b3d9f2dfa",
                    "EndpointID": "49ff48557b5295ce077848c909e9e27a95d5feb6c9e71420cfa69e8da766beeb",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": ""
                }
            }
        }
    }
]



# docker inspect atomic-openshift-master-controllers
[
    {
        "Id": "b70af2ed83d56e248b1b1b319dfa922ee8d047100e32b35c9a6e340b17ceb71c",
        "Created": "2017-07-05T08:53:56.877986963Z",
        "Path": "/usr/bin/openshift",
        "Args": [
            "start",
            "master",
            "controllers",
            "--config=/etc/origin/master/master-config.yaml",
            "--loglevel=5",
            "--listen=https://0.0.0.0:8444"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 22246,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2017-07-05T08:53:57.527522741Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:261f654db23fa14e08519660b341443b931573cfb1bb0879657a2e81b04bc3c4",
        "ResolvConfPath": "/var/lib/docker/containers/b70af2ed83d56e248b1b1b319dfa922ee8d047100e32b35c9a6e340b17ceb71c/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/b70af2ed83d56e248b1b1b319dfa922ee8d047100e32b35c9a6e340b17ceb71c/hostname",
        "HostsPath": "/var/lib/docker/containers/b70af2ed83d56e248b1b1b319dfa922ee8d047100e32b35c9a6e340b17ceb71c/hosts",
        "LogPath": "",
        "Name": "/atomic-openshift-master-controllers",
        "RestartCount": 0,
        "Driver": "devicemapper",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/var/run/docker.sock:/var/run/docker.sock",
                "/etc/origin:/etc/origin",
                "/etc/origin/cloudprovider:/etc/origin/cloudprovider",
                "/etc/pki:/etc/pki:ro",
                "/var/lib/origin:/var/lib/origin"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "journald",
                "Config": {}
            },
            "NetworkMode": "host",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": true,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
                "label=disable"
            ],
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "docker-runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": -1,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Name": "devicemapper",
            "Data": {
                "DeviceId": "159",
                "DeviceName": "docker-253:0-67183517-5a030cd7ff2a3501b4158e164713135f664d7429f9109d9bc9cd088d0c0bcedd",
                "DeviceSize": "10737418240"
            }
        },
        "Mounts": [
            {
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/etc/origin",
                "Destination": "/etc/origin",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/etc/origin/cloudprovider",
                "Destination": "/etc/origin/cloudprovider",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/etc/pki",
                "Destination": "/etc/pki",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Source": "/var/lib/origin",
                "Destination": "/var/lib/origin",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "qe-jialiu-master-etcd-zone1-1",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": true,
            "AttachStderr": true,
            "ExposedPorts": {
                "53/tcp": {},
                "8443/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "OPTIONS=--loglevel=5 --listen=https://0.0.0.0:8444",
                "CONFIG_FILE=/etc/origin/master/master-config.yaml",
                "OPENSHIFT_DEFAULT_REGISTRY=docker-registry.default.svc:5000",
                "IMAGE_VERSION=v3.6.133",
                "NO_PROXY=.cluster.local,qe-jialiu-master-etcd-zone1-1,172.31.0.0/16,11.0.0.0/16",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "container=docker",
                "HOME=/root",
                "OPENSHIFT_CONTAINERIZED=true",
                "KUBECONFIG=/var/lib/origin/openshift.local.config/master/admin.kubeconfig"
            ],
            "Cmd": [
                "start",
                "master",
                "controllers",
                "--config=/etc/origin/master/master-config.yaml",
                "--loglevel=5",
                "--listen=https://0.0.0.0:8444"
            ],
            "Image": "openshift3/ose:v3.6.133",
            "Volumes": null,
            "WorkingDir": "/var/lib/origin",
            "Entrypoint": [
                "/usr/bin/openshift"
            ],
            "OnBuild": null,
            "Labels": {
                "BZComponent": "openshift-enterprise-docker",
                "Component": "openshift-enterprise-base-docker",
                "Name": "openshift3/ose",
                "Release": "1",
                "Version": "v3.6.133",
                "architecture": "x86_64",
                "authoritative-source-url": "registry.access.redhat.com",
                "build-date": "2017-07-04T06:10:21.671111",
                "com.redhat.build-host": "ip-10-29-120-57.ec2.internal",
                "com.redhat.component": "openshift-enterprise-docker",
                "description": "OpenShift Container Platform is a platform for developing, building, and deploying containerized applications.",
                "distribution-scope": "public",
                "io.k8s.description": "OpenShift Container Platform is a platform for developing, building, and deploying containerized applications.",
                "io.k8s.display-name": "OpenShift Container Platform Application Platform",
                "io.openshift.tags": "openshift,core",
                "name": "openshift3/ose",
                "release": "1",
                "summary": "Provides the latest release of Red Hat Enterprise Linux 7 in a fully featured and supported base image.",
                "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/openshift3/ose/images/v3.6.133-1",
                "vcs-ref": "01c5f1bd6443ea404f405bcaebf4f875d7fed258",
                "vcs-type": "git",
                "vendor": "Red Hat, Inc.",
                "version": "v3.6.133"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "4e12f33cb16a9eadb9dda42a61aa981170d5c2d9f0713cd9db00534940b87d85",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/default",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "host": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "00527de350ecda56cecf283a1f163f8662d30bd0c01e900a4d92e85b3d9f2dfa",
                    "EndpointID": "27ba4af5679ec47b2dea243d9234a5a83d14d745401ee9b38d6c186ae5dea26d",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": ""
                }
            }
        }
    }
]


Version-Release number of selected component (if applicable):
RHEL74
kernel-3.10.0-685.el7.x86_64
docker-1.12.6-40.1.gitf55a118.el7.x86_64
openshift-ansible-3.6.133-1.git.0.950bb48.el7

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
If this bug should be moved to docker component, pls go ahead.

Comment 2 Vivek Goyal 2017-07-05 17:59:16 UTC

So a container removal failed because device is busy in some other mount namespace.

And I think system tried to create another instance of container with same name which failed saying container name already exists.

So openshift first needs to make sure previous container got deleted. And if deletion failed because device is busy, figure out where is mount point leaking and how.

You can try running following script to figure out where all container mount point was mounted.



./find-busy-mnt.sh 209fa04d9a38f1

You can find this script here.

https://github.com/rhvgoyal/misc/blob/master/find-busy-mnt.sh

Comment 3 Vivek Goyal 2017-07-05 18:00:47 UTC

BTW, I think problem has already been around. In the past forced container removal will remove container anyway, even if graph driver failed to remove it. And that will result in leaked storage.

Now upstream has changed the behavior and container removal fails if graph driver failed to remove container. So that means container name can not be reused if container removal failed.

And this problem became visible.

Comment 4 Vivek Goyal 2017-07-05 18:04:41 UTC

QE, can you provide me the access to system which is in this state. I want to look around a bit.

Comment 5 Vivek Goyal 2017-07-05 18:11:16 UTC

As usual, I need an engineer from openshift team to break it down for me and tell me what these two services do and how container creation happens and possibly help with what options these are run with. And that might help determine how mount points are leaking.

atomic-openshift-master-api
atomic-openshift-master-controllers


Scott?

Comment 6 Vivek Goyal 2017-07-05 18:24:06 UTC

This issue has most likely come to surface due to following commit. This was backported recently from upstream.

I think this is right thing to do. If there is an error in container removal, it should be sent to caller instead of masking the error and leaving all sorts of bad state/resources behind in docker which will create problems in future.

commit afdec061eb47e6bd602654cc1e996e674949a9c3
Author: Sergio Lopez <slp>
Date:   Thu Jun 22 15:25:40 2017 +0200

    BACKPORT: Do not remove containers from memory on error
    
    Upstream: https://github.com/moby/moby/commit/54dcbab25ea4771da303fa95e0c26f2d39487b49
    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1463534
              https://bugzilla.redhat.com/show_bug.cgi?id=1460728

Comment 7 Scott Dodson 2017-07-05 18:39:04 UTC

Vivek,

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_master/templates/docker-cluster/atomic-openshift-master-api.service.j2

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_master/templates/docker-cluster/atomic-openshift-master-controllers.service.j2

Are the service definitions which lack the container cleanup as you've noted. I'm left wondering how this ever worked.

--
Scott

Comment 8 Vivek Goyal 2017-07-05 18:50:07 UTC

Scott, 

They seem to be using "force" removal of container. Which was supposed to be a developer option only for debugging and not a production thing. If you start using it in production, that means you are leaving all sort of resources behind which will never be freed up.

Enough people misused this option and then complained about thin pool being full that docker as not removed this option of being able to force remove a container. 

A container now will be removed only if it can be cleanly removed. Otherwise user will have to debug why container can't be removed.

Comment 9 Scott Dodson 2017-07-05 18:53:36 UTC

Yes, sorry, I missed the ExecStartPre. So what's the suggested remedy? we're removing it via force to handle scenarios where it may not have been shut down cleanly. So we need to remove it, how do we make sure that's more successful?

Comment 10 Vivek Goyal 2017-07-05 18:57:23 UTC

For example, ExecStartPre is doing a force removal of container (notice -f). Trying to clean it up.

ExecStartPre=-/usr/bin/docker rm -f {{ openshift.common.service_type}}-master-api

And then it launches container with option "--rm" which uses "-f" internally.

ExecStart=/usr/bin/docker run --rm --privileged --net=host .......

It worked in the past (and left mess behind in docker and unclaimable space in thin pool) but will not work going forward.

So real issue here is to figure out why container deletion failed and who is keeping device busy. And fix that.

In the past we ignored it and moved on. Can't ignore it any more.

Comment 11 Vivek Goyal 2017-07-05 18:58:33 UTC

(In reply to Scott Dodson from comment #9)
> Yes, sorry, I missed the ExecStartPre. So what's the suggested remedy? we're
> removing it via force to handle scenarios where it may not have been shut
> down cleanly. So we need to remove it, how do we make sure that's more
> successful?

Error message suggests that something is keeping container device/mount point busy. We need to figure out who is keeping it busy and why.

I suggested my script find-busy-mnt.sh as a starting point.

Comment 12 Vivek Goyal 2017-07-05 19:03:11 UTC

If I can get the system which is experiencing this issue, I would like to have a look. 

Can somebody please also provide "docker info" output.

Comment 13 Andrew Butcher 2017-07-05 19:58:49 UTC

Reproduced with a controllers restart.

Jul 05 15:48:51 master1.abutcher.com systemd[1]: Starting Atomic OpenShift Master Controllers...
Jul 05 15:48:52 master1.abutcher.com atomic-openshift-master-controllers[4412]: Error response from daemon: Driver devicemapper failed to remove root filesystem d0301573d9c467191de9a9927fcb1b3b7be911726c49ab535a77b1d6e076b277: remove /var/lib/docker/devicemapper/mnt/25c1debb83af5d9
Jul 05 15:48:52 master1.abutcher.com atomic-openshift-master-controllers[4420]: /usr/bin/docker-current: Error response from daemon: Conflict. The name "/atomic-openshift-master-controllers" is already in use by container d0301573d9c467191de9a9927fcb1b3b7be911726c49ab535a77b1d6e076
Jul 05 15:48:52 master1.abutcher.com atomic-openshift-master-controllers[4420]: See '/usr/bin/docker-current run --help'.
Jul 05 15:48:52 master1.abutcher.com systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=125/n/a

[root@master1 ~]# docker ps -a
CONTAINER ID        IMAGE                                   COMMAND                  CREATED             STATUS                         PORTS               NAMES
d0301573d9c4        openshift3/ose:v3.6.134                 "/usr/bin/openshift s"   14 minutes ago      Dead                                               atomic-openshift-master-controllers

[root@master1 ~]# docker info
Containers: 9
 Running: 4
 Paused: 0
 Stopped: 5
Images: 6
Server Version: 1.12.6
Storage Driver: devicemapper
 Pool Name: docker-253:0-22057-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 4.767 GB
 Data Space Total: 107.4 GB
 Data Space Available: 4.53 GB
 Metadata Space Used: 5.267 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.142 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 1
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: null bridge overlay host
 Authorization: rhel-push-plugin
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Security Options: seccomp selinux
Kernel Version: 3.10.0-514.16.1.el7.x86_64
Operating System: Employee SKU
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 1
Total Memory: 1.796 GiB
Name: master1.abutcher.com
ID: QY6W:IQ7I:YQGM:DQMG:F2D6:YWQW:ZGRD:AFQZ:PPGL:N7YO:XMF3:CZ4X
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
No Proxy: .cluster.local,.svc,master1.abutcher.com
Registry: https://brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888
 127.0.0.0/8
Registries: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888 (insecure), registry.access.redhat.com (secure), registry.access.redhat.com (secure), docker.io (secure)

[root@master1 ~]# ./find-busy-mnt.sh d0301573d9c4
PID     NAME            MNTNS
2222    openshift               mnt:[4026532316]
2222    openshift               mnt:[4026532316]
2222    openshift               mnt:[4026532316]
2275    journalctl              mnt:[4026532316]
2275    journalctl              mnt:[4026532316]
2275    journalctl              mnt:[4026532316]

2222: /usr/bin/openshift start node --config=/etc/origin/node/node-config.yaml --loglevel=2
2275: journalctl -k -f

Comment 14 Vivek Goyal 2017-07-05 20:39:23 UTC

Andrew, can you give me access to the system where you see the issue. I want to look at lot of things. Few more follow up questions I have are.

- Assuming this is rhel7.4 kernel?
- Is /proc/sys/fs/may_detach_mounts set to 1 or 0
- Is docker daemon running in host mount namespace or in a slave mount namespace (MountFlags=slave in docker.service)

I am assuming that these two processes "openshift" and "journalctl" are part of atomic-openshift-master-controllers? Can somebody confirm it.

If yes, question is why these two process are still running. I mean why container stop did not stop these processes, so that they release container rootfs and now container can be removed.

Can somebody try to stop master controllers service and see if it actually stops containers and its processes.

Comment 16 Johnny Liu 2017-07-06 12:30:53 UTC

Stop controllers, then start it successfully, no such issue, but if restart it directly, will encounter such issue.

[root@openshift-137 ~]# docker ps -a|grep controllers
b21fd5639f19        openshift3/ose:v3.6.133                 "/usr/bin/openshift s"   19 seconds ago      Up 17 seconds                           atomic-openshift-master-controllers
[root@openshift-137 ~]# sh find-busy-mnt.sh b21fd5639f19
PID	NAME		MNTNS
24542	dockerd-current		mnt:[4026532134]
24549	docker-containe		mnt:[4026532134]
24743	docker-containe		mnt:[4026532134]
24842	docker-containe		mnt:[4026532134]
24877	docker-containe		mnt:[4026532134]
25083	docker-containe		mnt:[4026532134]
25116	docker-containe		mnt:[4026532134]
[root@openshift-137 ~]# service atomic-openshift-master-controllers stop
Redirecting to /bin/systemctl stop atomic-openshift-master-controllers.service
[root@openshift-137 ~]# sh find-busy-mnt.sh b21fd5639f19
No pids found
[root@openshift-137 ~]# docker ps -a|grep controllers
[root@openshift-137 ~]# service atomic-openshift-master-controllers start
Redirecting to /bin/systemctl start atomic-openshift-master-controllers.service
[root@openshift-137 ~]# docker ps -a|grep controllers
04962992cc5a        openshift3/ose:v3.6.133                 "/usr/bin/openshift s"   16 seconds ago       Up 15 seconds                           atomic-openshift-master-controllers
[root@openshift-137 ~]# sh find-busy-mnt.sh 04962992cc5a
PID	NAME		MNTNS
24542	dockerd-current		mnt:[4026532134]
24549	docker-containe		mnt:[4026532134]
24743	docker-containe		mnt:[4026532134]
24842	docker-containe		mnt:[4026532134]
24877	docker-containe		mnt:[4026532134]
25083	docker-containe		mnt:[4026532134]
25774	docker-containe		mnt:[4026532134]

Comment 17 Vivek Goyal 2017-07-06 14:00:13 UTC

I think this issue is related to mount points leaking due to usage of "-v /:/rootfs" option. For example, atomic-openshift-node is using this and it will see other container's rootfs mounts.

oci-umount should fix it. But it is not being installed by default. I have opened a bug to install oci-umount by default.

https://bugzilla.redhat.com/show_bug.cgi?id=1468244

Even after that, somehow on this system oci-umount is not working. I see following error message.

Jul 06 09:51:58 openshift-137.lab.sjc.redhat.com oci-umount[9745]: umounthook <info>: Could not find mapping for mount [/var/lib/docker/devicemapper] from host to conatiner. Skipping.

It can't figure out that /var/lib/docker/devicemapper on host is mounted on <container-root>/rootfs/var/lib/docker inside container.

Need to debug why that's the case.

Comment 18 Vivek Goyal 2017-07-06 15:23:16 UTC

I built latest oci-umount from upstream and that seems to work in the sense it is able to figure out that /var/lib/docker/devicemapper maps to /rootfs/var/lib/docker/devicemapper inside container.

So we will need to rebuild docker package with latest oci-umount from upstream. I pinged lokesh about it already.

Still it does not work for atomic-openshift-node container. And I think reason being that additional volume mount (-v /var/lib/docker:/var/lib/docker) seems to keep mount point busy.

IOW, if I do.

docker run -ti -v /:/rootfs fedora bash

oci-umount is working.

But if I do

docker run -ti -v /:/rootfs -v /var/lib/docker:/var/lib/docker

it is not working.

May be it is getting confused that now /var/lib/docker/devicemapper is actually
visible at two places inside container. That is, /rootfs/var/lib/docker/devicemapper and /var/lib/docker/devicemapper.

Will look into it. 

BTW, why do we need to volume mount /var/lib/docker inside container?

CC eparis.

Comment 19 Scott Dodson 2017-07-06 15:51:55 UTC

I'm not certain, perhaps to monitor storage usage? It's been in there for the past 1.5yrs though. We could test without it.

Comment 20 Scott Dodson 2017-07-06 18:08:36 UTC

The node needs to mount /var/lib/docker in order to calculate container storage usage.

Comment 21 Vivek Goyal 2017-07-06 18:12:05 UTC

Scott, can you give some more details. What are the files node is looking at? oci-umount will unmount /var/lib/docker/devicemapper and /var/lib/docker/containers and it will be broken? rootfs of other containers is leaking into node container and that fails removal of other containers.

I think whatever data you need, you will have to go through docker api (docker info or docker inspect).

You can't expect /var/lib/docker/devicemapper or /var/lib/docker/containers to be mounted inside your container.

Comment 22 Vivek Goyal 2017-07-06 18:18:54 UTC

Did we try to request docker api to get to the data required (instead of poking at docker internal metadata directly).

Comment 23 Seth Jennings 2017-07-06 18:24:33 UTC

Vivek,

Kubernetes uses cadvisor to gather disk usage stats on a root/image filesysytem basis and the writable layer on a per-container basis.  It does not use the docker API to figure out the size of the writable layer.  Does the docker API provide that information?

Comment 25 Vivek Goyal 2017-07-06 18:42:36 UTC

(In reply to Seth Jennings from comment #23)
> Vivek,
> 
> Kubernetes uses cadvisor to gather disk usage stats on a root/image
> filesysytem basis and the writable layer on a per-container basis.  It does
> not use the docker API to figure out the size of the writable layer.  Does
> the docker API provide that information?

Seth, 

I believe "docker ps -s" gives layer size. But there might not be a way to specify for a specific container and that's why it probably is very slow.

So what size you are trying to look at? Layer size (changes made by container). I believe that's what "-s" provides.

And how do you determine that just by looking at container metadata? You probably are running some tools (df), on container mountpoint? And that itself is racy w.r.t container removal.

Comment 26 Vivek Goyal 2017-07-06 18:47:15 UTC

Anyway, another (less preferred) option is to rely on new kernel functionality to forcibly remove mounts from other mount namespaces when mountpoint directory is removed.

This is how it will work.

- It will need 7.4 kernel
- It will require /proc/sys/fs/may_detach_mounts to be 1
- It will require deferred device removal and deferred device deletion to be turned on.


Now when a container is removed, its device will be deferred deleted (despite it being busy). And then we will unmount container rootfs on host and remove that directory and that will remove leaked mount points as well. (Except for the case if container process was inside this mount point which is being removed).

This will not work on 7.3 kernel though.

So using oci-umount is more generic and will work both on 7.3 and 7.4 kernels as long as we can figure out how to not poke at docker metadata directly.

Comment 27 Vivek Goyal 2017-07-06 19:55:25 UTC

Ian, seth, derek and me had some conversations about this issue. oci-umount takes away the /var/lib/docker/devicemapper and /var/lib/docker/overlay2 and /var/lib/docker/containers mount points away from container. And there was concern that cadvisor disk stats feature might be broken.

Seth and derek said that disk stats feature as of now is supposed to work only with overlay graph driver. And looks like that will also be currently broken in containerized environment as mount points under /var/lib/docker/overlay2 don't propagate. A cadvisor container will only see the mount points at the time of start of container and not the mount point of containers launched later. IOW, in containerized environment disk stat feature is probably broken on overlay2 also.

So idea was that do not volume mount /var/lib/docker inside container. (-v /var/lib/docker:/var/lib/docker) and this most likely should be fine. They could not remember anything else being dependent on this.

Can we test latest docker (docker-1.12.6-41.1.gitf55a118.el7) with /var/lib/docker/ volume mount removed from /etc/systemd/system/atomic-openshift-node.service file and see if problem is fixed?

I am not sure who owns /etc/systemd/system/atomic-openshift-node.service file. They will have to make appropriate changes if this does fix the issue.

Comment 28 Vivek Goyal 2017-07-06 20:23:44 UTC

QE, please do following steps.

- Install/upgrade to docker -41 (docker-1.12.6-41.1.gitf55a118.el7)
- Edit /etc/systemd/system/atomic-openshift-node.service file and remove string "-v /var/lib/docker:/var/lib/docker"
- systemctl daemon-reload
- systemctl start atomic-openshift-node
- systemctl restart atomic-openshift-master-api

If this works, then we are in good shape.

Comment 29 Johnny Liu 2017-07-07 08:29:49 UTC

(In reply to Vivek Goyal from comment #28)
> QE, please do following steps.
> 
> - Install/upgrade to docker -41 (docker-1.12.6-41.1.gitf55a118.el7)
> - Edit /etc/systemd/system/atomic-openshift-node.service file and remove
> string "-v /var/lib/docker:/var/lib/docker"
> - systemctl daemon-reload
> - systemctl start atomic-openshift-node
> - systemctl restart atomic-openshift-master-api
> 
> If this works, then we are in good shape.

Just like what you mentioned in your above comments, after removing string "-v /var/lib/docker:/var/lib/docker", master api/controllers service is restarted successfully even with docker -40.

So my question is:
1). if removing string "-v /var/lib/docker:/var/lib/docker", why need update docker version.
2). if remove string "-v /var/lib/docker:/var/lib/docker", how to resolve comment #c23, it is related to openshift functionality.


And I have a new finding:
if /proc/sys/fs/may_detach_mounts is set to 1, even with docker -40, not removing "-v /var/lib/docker:/var/lib/docker" from node service, master service could be restarted successfully.

Comment 30 Vivek Goyal 2017-07-07 12:38:32 UTC

(In reply to Johnny Liu from comment #29)
> (In reply to Vivek Goyal from comment #28)
> > QE, please do following steps.
> > 
> > - Install/upgrade to docker -41 (docker-1.12.6-41.1.gitf55a118.el7)
> > - Edit /etc/systemd/system/atomic-openshift-node.service file and remove
> > string "-v /var/lib/docker:/var/lib/docker"
> > - systemctl daemon-reload
> > - systemctl start atomic-openshift-node
> > - systemctl restart atomic-openshift-master-api
> > 
> > If this works, then we are in good shape.
> 
> Just like what you mentioned in your above comments, after removing string
> "-v /var/lib/docker:/var/lib/docker", master api/controllers service is
> restarted successfully even with docker -40.
> 
> So my question is:
> 1). if removing string "-v /var/lib/docker:/var/lib/docker", why need update
> docker version.

Did you test on same node which you gave me for testing or on a different node. I had replaced /usr/libexec/oci/hooks.d/oci-umount on this node with upstream version. That's included in -41. In my testing oci-umount included with -40 was not working for some reason.


> 2). if remove string "-v /var/lib/docker:/var/lib/docker", how to resolve
> comment #c23, it is related to openshift functionality.
> 

We did talk to seth and cadvisor should not be affected too negatively. Read details in comment 27.

> 
> And I have a new finding:
> if /proc/sys/fs/may_detach_mounts is set to 1, even with docker -40, not
> removing "-v /var/lib/docker:/var/lib/docker" from node service, master
> service could be restarted successfully.

Right. I mentioned this option in comment 26. But this will only work with 7.4 kernel and not with 7.3 kernel. I am trying to find a solution which works with 7.3 kernel as well.

Comment 31 Vivek Goyal 2017-07-07 18:02:41 UTC

Ok, I have queued a PR for oci-umount to be able to handle multiple mappings of same source at multiple destinations inside container and be able to unmount all of them.

https://github.com/projectatomic/oci-umount/pull/10

With this PR, there is no need to remove "-v /var/lib/docker/:/var/lib/docker" from atomic-openshift-node.service file.

oci-umount will recognize that /var/lib/docker/devicemapper is mounted at two places inside container and unmount both of these. That way container mount points will not be busy inside node container and restart of master-api and master-controllers should work fine.

Comment 32 Vivek Goyal 2017-07-07 18:07:21 UTC

For example, if a container is run with following.

docker run -ti -v /:/rootfs -v /var/lib/docker/:/var/lib/docker fedora bash

After above PR, oci-umount will see /var/lib/docker/devicemapper on host mounted at two places inside container.

/var/lib/docker/devicemapper
/rootfs/var/lib/docker/devicemapper

And it will unmount both. Hence removing leaking of other container rootfs in the system.

Comment 33 Johnny Liu 2017-07-11 07:32:12 UTC

(In reply to Vivek Goyal from comment #30)
> (In reply to Johnny Liu from comment #29)
> > (In reply to Vivek Goyal from comment #28)
> > So my question is:
> > 1). if removing string "-v /var/lib/docker:/var/lib/docker", why need update
> > docker version.
> 
> Did you test on same node which you gave me for testing or on a different
> node. I had replaced /usr/libexec/oci/hooks.d/oci-umount on this node with
> upstream version. That's included in -41. In my testing oci-umount included
> with -40 was not working for some reason.
Good to know. 

Today I tried the same steps on a new install, -40 + NOT removing string "-v /var/lib/docker:/var/lib/docker", still encounter such issue.

After I update docker to -41, and not removing string "-v /var/lib/docker:/var/lib/docker", the issue disappeared.

I also tried -40 + not removing string "-v /var/lib/docker:/var/lib/docker" + echo 1 >/proc/sys/fs/may_detach_mounts, does not encounter such issue.

According to my understanding, our final resolution is:
For RHEL74 kernel: not removing string "-v /var/lib/docker:/var/lib/docker" + -41 docker.

For RHEL73, our plan is not removing string "-v /var/lib/docker:/var/lib/docker" + -41 docker, but still trying to find a resolution.

Am I right?

Comment 34 Vivek Goyal 2017-07-11 11:19:54 UTC

I think we should upgrade to -41 and remove "-v /var/lib/dokcer:/var/lib/docker" and that should work both on 7.3 and 7.4 kernels.

Comment 35 Scott Dodson 2017-07-11 12:43:43 UTC

Seth is that safe to remove the /var/lib/docker mount in all versions back to 3.4?

Comment 36 Seth Jennings 2017-07-11 14:26:11 UTC

Scott, for what I'm hearing on the list, is doesn't seem like it would be an issue.  I guess I'm sure it is safe as much as I'm sure it is a problem already i.e. the orphaned thin devices eating the storage pool.

Comment 40 Scott Dodson 2017-07-13 00:12:37 UTC

*** Bug 1470389 has been marked as a duplicate of this bug. ***

Comment 41 Scott Dodson 2017-07-13 00:38:54 UTC

https://github.com/openshift/openshift-ansible/pull/4748 removes /var/lib/docker mount and installs oci-umount and runc

Comment 42 Johnny Liu 2017-07-13 06:58:18 UTC

Today I tried the following scenarios on RHEL7.3 (kernel-3.10.0-514.25.2.el7.x86_64):
1). docker -40 + no oci-umount+runc installed + not removing "/var/lib/docker", FAIL
2). docker -40 + oci-umount+runc installed + not removing "/var/lib/docker", FAIL
3). docker -40 + oci-umount+runc installed + removing "/var/lib/docker", PASS
4). docker -45 + no runc installed + not removing "/var/lib/docker", PASS.


So according to my above test result, docker -45 will make both rhel73 and rhel74 work without need removing "/var/lib/docker".

Anyone could help have a confirm, if yes, I think we do not need remove "/var/lib/docker" from node for minimal code change to reduce the chance of introducing new regression bug.

Comment 43 Scott Dodson 2017-07-19 22:18:24 UTC

The CI jobs have been failing on my PR to remove /var/lib/docker because the node expects to be able to write to /var/lib/docker/network path.

Given that docker-1.12.6-40 was an interim build and that QE says docker-1.12.6-45 works without removing /var/lib/docker I'd like to defer making the change to remove /var/lib/docker from the node. 

Marking ON_QA for QE to test with docker-1.12.6-47.git0fdc778.el7 which is the latest build attached to the 7.4 errata.

Vivek, do you think it's critical that we remove /var/lib/docker right now?

Comment 44 Johnny Liu 2017-07-20 07:08:38 UTC

Re-test docker-1.12.6-47.git0fdc778.el7.x86_64 on both RHEL74(kernel-3.10.0-693.el7.x86_64) and RHEL73(kernel-3.10.0-514.26.1.el7.x86_64), not removing /var/lib/docker from node, both are working well.


Move back to "ASSIGNED", leave it there to do final decision.

Comment 45 Vivek Goyal 2017-07-20 12:09:57 UTC

(In reply to Scott Dodson from comment #43)
> The CI jobs have been failing on my PR to remove /var/lib/docker because the
> node expects to be able to write to /var/lib/docker/network path.
> 
> Given that docker-1.12.6-40 was an interim build and that QE says
> docker-1.12.6-45 works without removing /var/lib/docker I'd like to defer
> making the change to remove /var/lib/docker from the node. 
> 
> Marking ON_QA for QE to test with docker-1.12.6-47.git0fdc778.el7 which is
> the latest build attached to the 7.4 errata.
> 
> Vivek, do you think it's critical that we remove /var/lib/docker right now?

Scott,

Now oci-umount has the capability to be able to remove multiple mounts inside container. So removing /var/lib/docker/ volume mount is not strictly necessary. (It will be nice though).

We found a bug in oci-umount and it was crashing in certain conditions. It has now been fixed in build docker-2:1.12.6-48.git0fdc778. 

So please use docker -48 for all your future testing and deployments.

Comment 46 Scott Dodson 2017-07-20 12:54:50 UTC

Moving this to 3.6.1, clearing regression and testblocker flags. I'll try to coordinate with networking and other teams whether or not we can remove /var/lib/docker from the mounts.

Comment 47 Scott Dodson 2017-08-14 15:41:13 UTC

We cannot remove /var/lib/docker volume today and the problem no longer exists in docker-1.12.6-48 so closing this.

Comment 48 Johnny Liu 2017-08-15 06:19:47 UTC

This issue was reported with docker -40 version, should not closed as "NOTABUG", change to "CURRENTRELEASE"