Bug 1436731

Summary: Few builds failing due to ContainerCannotRun Error
Product: Red Hat Enterprise Linux 7 Reporter: Vikas Laad <vlaad>
Component: dockerAssignee: Mrunal Patel <mpatel>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: accounts+fedora, amurdaca, aos-bugs, jhonce, jokerman, lsm5, lsu, mifiedle, mmccomas, smahajan, vgoyal, vlaad
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 00:11:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
desc pod details
none
docker info none

Description Vikas Laad 2017-03-28 14:06:54 UTC
Description of problem:
While running concurrent builds test I am seeing few failures with following error.

root@ip-172-31-25-76: ~/svt/openshift_performance/ci/scripts # oc get pods --all-namespaces | grep ContainerCannotRun
proj23      cakephp-mysql-example-2-build   0/1       ContainerCannotRun   0          8m
proj25      cakephp-mysql-example-2-build   0/1       ContainerCannotRun   0          7m
proj27      cakephp-mysql-example-3-build   0/1       ContainerCannotRun   0          1m
proj5       cakephp-mysql-example-1-build   0/1       ContainerCannotRun   0          11m

oc describe pod has following error in events
  5m            5m              1       {kubelet ip-172-31-34-20.us-west-2.compute.internal}                                 Warning          FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "sti-build" with RunContainerError: "runContainer: Error response from daemon: {\"message\":\"linux runtime spec devices: lstat /dev/dm-22: no such file or directory\"}"


Version-Release number of selected component (if applicable):
openshift v3.5.3
kubernetes v1.5.2+43a9be4
etcd 3.1.0


Steps to Reproduce:
1. Create cluster with 1 master 1 infra and 2 worker nodes
2. Run 24 concurrent cakephp builds 

So far I have seen 5 failure out of 165 builds.

Actual results:
ContainerCannotRun

Expected results:
Build should complete.

Additional info:
See full describe pod details attached.

Comment 1 Vikas Laad 2017-03-28 14:08:51 UTC
Created attachment 1266965 [details]
desc pod details

Comment 2 Jhon Honce 2017-03-28 21:47:45 UTC
How was your storage configured?  [docker|container]-storage-setup or openshift-ansible?

Please attach:

- /etc/sysconfig/*-storage-setup
- docker log

Please attach results from:

# docker info
# lvs
# vgs
# dmsetup table
# dmsetup status
# systemctl status docker.service

Thanks

Comment 3 Vikas Laad 2017-03-29 14:42:06 UTC
Storage was configured using openshift-ansible

/etc/sysconfig/docker-storage-setup has following line
VG=docker_vg

docker logs attached
docker info attached

root@ip-172-31-34-20: ~ # lvs
  LV          VG        Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  docker-pool docker_vg twi-aot--- 29.70g             85.39  46.64                           

root@ip-172-31-34-20: ~ # vgs
  VG        #PV #LV #SN Attr   VSize  VFree  
  docker_vg   1   1   0 wz--n- 30.00g 236.00m


root@ip-172-31-34-20: ~ # dmsetup table
docker_vg-docker--pool_tdata: 0 25026560 linear 202:16 67584
docker_vg-docker--pool_tdata: 25026560 37265408 linear 202:16 25159680
docker-202:2-14037750-a94dc897c6727356a0369871ae12903c0ae6254c8fa47aab6f237bcc660f7acb: 0 20971520 thin 253:2 31
docker_vg-docker--pool_tmeta: 0 65536 linear 202:16 25094144
docker-202:2-14037750-6e62818c646dc993c4178c4b0a5ad3afb0ed60c9c418b3bc3f0723266e4dda2b: 0 20971520 thin 253:2 34
docker_vg-docker--pool: 0 62291968 thin-pool 253:0 253:1 1024 24332 1 skip_block_zeroing 

root@ip-172-31-34-20: ~ # dmsetup status
docker_vg-docker--pool_tdata: 0 25026560 linear 
docker_vg-docker--pool_tdata: 25026560 37265408 linear 
docker-202:2-14037750-a94dc897c6727356a0369871ae12903c0ae6254c8fa47aab6f237bcc660f7acb: 0 20971520 thin 496640 20971519
docker_vg-docker--pool_tmeta: 0 65536 linear 
docker-202:2-14037750-6e62818c646dc993c4178c4b0a5ad3afb0ed60c9c418b3bc3f0723266e4dda2b: 0 20971520 thin 946176 20971519
docker_vg-docker--pool: 0 62291968 thin-pool 126045 3821/8192 51947/60832 - rw no_discard_passdown queue_if_no_space - 


root@ip-172-31-34-20: ~ # systemctl status docker.service
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-03-27 12:06:00 EDT; 1 day 22h ago
     Docs: http://docs.docker.com
 Main PID: 14089 (dockerd-current)
   Memory: 379.3M
   CGroup: /system.slice/docker.service
           ├─14089 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --authorization-plugin=rhel-push-plugin --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libe...
           ├─14096 /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --runtime ...
           ├─16588 /usr/bin/docker-containerd-shim-current e7db9efaaf3b6a2ba1b945bd2dd3e448c0a64f03ce6fc558cabf1a85be0b3b32 /var/run/docker/libcontainerd/e7db9efaaf3b6a2ba1b945bd2dd3e448c0a64f03ce6fc558cabf1a85be0b3b32 /usr/libexec/docker/docker-runc...
           └─17092 /usr/bin/docker-containerd-shim-current d3c1bade7012e8634b47e6e7e3eac4544215c5d9222ea716b2ecdaf94b3b2330 /var/run/docker/libcontainerd/d3c1bade7012e8634b47e6e7e3eac4544215c5d9222ea716b2ecdaf94b3b2330 /usr/libexec/docker/docker-runc...

Mar 29 09:54:31 ip-172-31-34-20.us-west-2.compute.internal oci-systemd-hook[17776]: systemdhook <debug>: Skipping as container command is /usr/bin/openshift-sti-build, not init or systemd
Mar 29 09:54:33 ip-172-31-34-20.us-west-2.compute.internal dockerd-current[14089]: time="2017-03-29T09:54:33.024339234-04:00" level=info msg="{Action=stop, LoginUID=4294967295, PID=62198}"
Mar 29 09:54:33 ip-172-31-34-20.us-west-2.compute.internal oci-register-machine[17839]: 2017/03/29 09:54:33 Register machine: poststop af81974cfba51e7727a582c6b10a923f7a208f8e1c81d2366d5594e59acb0936 0 /var/lib/docker/devicemapper/mnt/5...eb89f72/rootfs
Mar 29 09:54:33 ip-172-31-34-20.us-west-2.compute.internal dockerd-current[14089]: time="2017-03-29T09:54:33.430639015-04:00" level=info msg="{Action=stop, LoginUID=4294967295, PID=62198}"
Mar 29 09:54:33 ip-172-31-34-20.us-west-2.compute.internal dockerd-current[14089]: time="2017-03-29T09:54:33.434910013-04:00" level=error msg="Handler for POST /v1.24/containers/af81974cfba51e7727a582c6b10a923f7a208f8e1c81d2366d5594e59a...ready stopped"
Mar 29 09:54:33 ip-172-31-34-20.us-west-2.compute.internal dockerd-current[14089]: time="2017-03-29T09:54:33.435028249-04:00" level=error msg="Handler for POST /v1.24/containers/af81974cfba51e7727a582c6b10a923f7a208f8e1c81d2366d5594e59a...ready stopped"
Mar 29 09:55:07 ip-172-31-34-20.us-west-2.compute.internal dockerd-current[14089]: time="2017-03-29T09:55:07.432215546-04:00" level=info msg="{Action=remove, LoginUID=4294967295, PID=62198}"
Mar 29 09:55:07 ip-172-31-34-20.us-west-2.compute.internal dockerd-current[14089]: time="2017-03-29T09:55:07.475223495-04:00" level=info msg="{Action=remove, LoginUID=4294967295, PID=62198}"
Mar 29 09:58:02 ip-172-31-34-20.us-west-2.compute.internal dockerd-current[14089]: time="2017-03-29T09:58:02.518478998-04:00" level=info msg="{Action=remove, LoginUID=4294967295, PID=62198}"
Mar 29 09:58:02 ip-172-31-34-20.us-west-2.compute.internal dockerd-current[14089]: time="2017-03-29T09:58:02.548314189-04:00" level=info msg="{Action=remove, LoginUID=4294967295, PID=62198}"
Hint: Some lines were ellipsized, use -l to show in full.

Comment 4 Vikas Laad 2017-03-29 14:44:56 UTC
Created attachment 1267313 [details]
docker info

Comment 6 Andrew Gunnerson 2017-04-17 20:06:06 UTC
This race condition was fixed upstream with this commit: https://github.com/opencontainers/runc/commit/8f55948aa5c97472c4a8ef6664b10ed662ce1df5

Can this patch be included in the docker package? (It would need to patch the vendored runc dependency)

Comment 8 Vikas Laad 2017-06-14 19:27:15 UTC
Vivek,

If you think lstat error is fixed this bug needs to be updated.

Comment 9 Vivek Goyal 2017-06-14 19:36:14 UTC
Yes, lstat error is fixed in docker version docker-2:1.12.6-18.git29d6f69

Comment 10 Vikas Laad 2017-06-16 13:54:47 UTC
Completed 100 cycle of concurrent builds on following version of docker, did not see this error.

root@ip-172-31-31-51: ~ # openshift version
openshift v3.6.106
kubernetes v1.6.1+5115d708d7
etcd 3.2.0
root@ip-172-31-31-51: ~ # docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Wed May 17 01:16:44 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-28.git1398f24.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1398f24/1.12.6
 Built:           Wed May 17 01:16:44 2017
 OS/Arch:         linux/amd64

Comment 12 Luwen Su 2017-07-22 15:42:06 UTC
Per comment#10 , move to verified.

Comment 14 errata-xmlrpc 2017-08-02 00:11:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2344