Bug 1146877

Summary: [origin_devexp_341][origin_devexp_339] Failed to docker/sti builds due to "/var/run/docker.sock" issue
Product: OKD Reporter: Lei Zhang <lzhang>
Component: ContainersAssignee: Ben Parees <bparees>
Status: CLOSED CURRENTRELEASE QA Contact: libra bugs <libra-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.xCC: bparees, chunchen, lzhang, mmccomas, wzheng, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-18 16:49:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
openshift.log none

Description Lei Zhang 2014-09-26 09:51:43 UTC
Description of problem:
Execute /data/src/github.com/openshift/origin/examples/simple-ruby-app/run.sh step by step, found failed to trigger build due to "Docker socket missing at /var/run/docker.sock", I have checked docker service is still running when this issue happened. 


Note: Step 9 https://github.com/openshift/origin/blob/master/examples/simple-ruby-app/README.md

[root@ip-10-142-186-14 simple-ruby-app]# $openshift kube list buildConfigs
ID                  Type                SourceURI
----------          ----------          ----------
build100            docker              git://github.com/openshift/ruby-hello-world.git

[root@ip-10-142-186-14 simple-ruby-app]# $openshift kube list builds
ID                                     Status              Pod ID
----------                             ----------          ----------
ca246e0e-4543-11e4-b8df-22000b0f8494   failed              build-docker-ca246e0e-4543-11e4-b8df-22000b0f8494

[root@ip-10-142-186-14 simple-ruby-app]# docker ps -a | grep docker-builder
9753ddf1a4a7        openshift/docker-builder:latest   "/tmp/build.sh"        29 minutes ago      Exited (1) 29 minutes ago                        k8s--docker_-_build.20e55dad--build_-_docker_-_ca246e0e_-_4543_-_11e4_-_b8df_-_22000b0f8494.etcd--d308380c_-_4543_-_11e4_-_b8df_-_22000b0f8494--d30a5045   
[root@ip-10-142-186-14 simple-ruby-app]# docker logs 9753ddf1a4a7
Docker socket missing at /var/run/docker.sock

Version-Release number of selected component (if applicable):
Origin V3
devenv-fedora_203 images

How reproducible:
always

Steps to Reproduce:
1.Launching openshift all-in-one server
#openshift="../../_output/go/bin/openshift"
#$openshift start --listenAddr="0.0.0.0:8080" &> logs/openshift.log &
2.Create a private Docker registry running in OpenShift and make sure Docker registry service to start
#$openshift kube apply -c registry_config/registry_config.json
#$openshift kube list pods | grep registryPod | grep Running
#$openshift kube list services | grep registryPod
3.Define and show a build cfg
#$openshift kube create buildConfigs -c buildcfg/buildcfg.json
#$openshift kube list buildConfigs
4.Trigger build via restapi
#curl -s -A "GitHub-Hookshot/github" -H "Content-Type:application/json" -H "X-Github-Event:push" -d @buildinvoke/pushevent.json http://$OSHOST:8080/osapi/v1beta1/buildConfigHooks/build100/secret101/github
5.Check build
#docker ps -a | grep docker-builder
#docker logs 9753ddf1a4a7

Actual results:
build failed due to "Docker socket missing at /var/run/docker.sock" 

Expected results:
build should complete

Additional info:

Comment 1 Lei Zhang 2014-09-26 09:52:44 UTC
Created attachment 941512 [details]
openshift.log

Comment 2 chunchen 2014-09-29 03:18:19 UTC
Similar issue for STI builds, please refer to the below testing steps:

Version-Release number of selected component:
devenv_fedora_210
openshift version 0.1, build ab07903
kubernetes v0.3-dev

Steps to Reproduce:
1. SSH into the instace

2. Start openshift service
$ USE_HOST_DOCKER_SOCKET=true openshift start --listenAddr="0.0.0.0:8080"

3. Create a build config json file with below lines:
cat ./stibuild.json

{
"kind": "Build",
"apiVersion": "v1beta1",
"input": {
"type": "sti",
"sourceURI": "git://github.com/pmorie/simple-ruby",
"builderImage": "openshift/ruby-19-centos",
"imageTag": "cewong/stiruby"
}
}

4. Trigger the build (use another terminal)
$ openshift kube -c ./stibuild.json create builds 

5. Check the log at step 2


At step 5:

<-----snip----->
I0928 06:51:12.428896 01184 etcd.go:81] Received state from etcd watch: [{Namespace: Name:build-sti-76f34a84-46d9-11e4-a2ad-22000a4f9760 Manifest:{Version:v1beta1 ID:build-sti-76f34a84-46d9-11e4-a2ad-22000a4f9760 UUID:80e0a1d2-46d9-11e4-a2ad-22000a4f9760 Volumes:[{Name:tmp Source:0xc210094030} {Name:docker-socket Source:0xc210094040}] Containers:[{Name:sti-build Image:openshift/sti-builder Command:[] WorkingDir: Ports:[] Env:[{Name:BUILD_TAG Value:cewong/stiruby} {Name:DOCKER_REGISTRY Value:} {Name:SOURCE_URI Value:git://github.com/pmorie/simple-ruby} {Name:SOURCE_REF Value:} {Name:BUILDER_IMAGE Value:openshift/ruby-19-centos} {Name:TEMP_DIR Value:/tmp/stibuild156235636} {Name:SERVICE_HOST Value:127.0.0.1}] Memory:0 CPU:0 VolumeMounts:[{Name:tmp ReadOnly:false MountPath:/tmp/stibuild156235636} {Name:docker-socket ReadOnly:false MountPath:/var/run/docker.sock}] LivenessProbe:<nil> Lifecycle:<nil> Privileged:false}] RestartPolicy:{Always:<nil> OnFailure:<nil> Never:0x14af0a0}}}]
I0928 06:51:12.429160 01184 config.go:209] Setting pods for source etcd : {[{ build-sti-76f34a84-46d9-11e4-a2ad-22000a4f9760 {v1beta1 build-sti-76f34a84-46d9-11e4-a2ad-22000a4f9760 80e0a1d2-46d9-11e4-a2ad-22000a4f9760 [{tmp 0xc210094030} {docker-socket 0xc210094040}] [{sti-build openshift/sti-builder []  [] [{BUILD_TAG cewong/stiruby} {DOCKER_REGISTRY } {SOURCE_URI git://github.com/pmorie/simple-ruby} {SOURCE_REF } {BUILDER_IMAGE openshift/ruby-19-centos} {TEMP_DIR /tmp/stibuild156235636} {SERVICE_HOST 127.0.0.1}] 0 0 [{tmp false /tmp/stibuild156235636} {docker-socket false /var/run/docker.sock}] <nil> <nil> false}] {<nil> <nil> 0x14af0a0}}}] 0}
I0928 06:51:12.429439 01184 kubelet.go:706] Containers changed [127.0.0.1]

<------snip------->

Comment 3 Ben Parees 2014-09-30 19:55:42 UTC
Not sure how you're running docker, but i've found I need to take the following steps lately:

(as root)
1) systemctl stop docker  #stop the docker daemon
2) rm /var/run/docker.sock
3) docker -d   # start the docker daemon

Can you try those steps and see if it helps?

Comment 4 chunchen 2014-10-08 03:32:36 UTC
It works well for me after executing the steps in #Comment 3, the primary steps are as below:

>> Start openshift service
1) systemctl stop docker  #stop the docker daemon
2) rm /var/run/docker.sock
3) docker -d   # start the docker daemon
4) USE_HOST_DOCKER_SOCKET=true openshift start

Thanks

Comment 5 Ben Parees 2014-10-08 13:32:26 UTC
I have also learned that "systemctl restart docker.sock" might also resolve it.

Comment 6 Wenjing Zheng 2014-10-10 10:34:36 UTC
Thanks, Ben! We tried the solution you provided, and build is successfull now, so move this bug to verified.

Comment 7 Lei Zhang 2014-12-30 01:55:27 UTC
Yes, This issue can't be reproduced, so verified it.