Bug 1468681 - [DOCKER] stat /usr/bin/openshift-deploy: no such file or directory
Summary: [DOCKER] stat /usr/bin/openshift-deploy: no such file or directory
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.4.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.9.0
Assignee: Tom Sweeney
QA Contact: Wang Haoran
URL:
Whiteboard:
Depends On:
Blocks: 1186913
TreeView+ depends on / blocked
 
Reported: 2017-07-07 16:08 UTC by Javier Ramirez
Modified: 2021-06-10 12:33 UTC (History)
17 users (show)

Fixed In Version: openshift v3.9.0-0.20.0
Doc Type: Known Issue
Doc Text:
Please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1518519 for any necessary Doc Text.
Clone Of:
Environment:
Last Closed: 2018-06-18 18:12:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1509227 0 urgent CLOSED [DOCKER] JBoss image build on OCP randomly fails due to a "no such file or directory". 2021-03-11 16:11:43 UTC

Internal Links: 1509227

Description Javier Ramirez 2017-07-07 16:08:24 UTC
Description of problem:
Failed to "StartContainer" for "sti-build" with RunContainerError. stat /usr/bin/ openshift-sti-build: no such file or directory

Version-Release number of selected component (if applicable):
v3.4.1.24  


How reproducible:
Not in every node and not for every image.


Steps to Reproduce:
1. Start a build or use "docker run" in one of the faulty node+image
2.
3.

Actual results:
Builds fail with:
Jun  8 18:52:56 shift-node-7kgh00o8 atomic-openshift-node: E0608 
18:52:56.327484   17907 pod_workers.go:184] Error syncing pod 
e7dee8f4-4c6a-11e7-a1ea-fa163e52ca24, skipping: failed to "StartContainer" for 
"sti-build" with RunContainerError: "runContainer: Error response from daemon: 
{\"message\":\"invalid header field value \\\"oci runtime error: 
container_linux.go:247: starting container process caused \\\\\\\"exec: \\\\\\
\\\\\\\\\"/usr/bin/openshift-sti-build\\\\\\\\\\\\\\\": stat /usr/bin/
openshift-sti-build: no such file or directory\\\\\\\"\\\\n\\\"\"}"

And docker run with:

+ docker run -it openshift3/ose-deployer:v3.4.1.24
/usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"/usr/bin/openshift-deploy\\\": stat /usr/bin/openshift-deploy: no such file or directory\"\n".


Expected results:
Builds to finish and docker run to work

Additional info:
We already checked:
- packages versions to be the same on faulty and working nodes
- Selinux context
- Docker process running with the same parameters.
- Docker inspect works and show all layers

Comment 4 Javier Ramirez 2017-07-07 16:38:17 UTC
More examples of failed runs:

container_linux.go:247: starting container process caused "exec: \"/usr/bin/openshift-deploy\": stat /usr/bin/openshift-deploy: no such file or directory"
/usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"/usr/bin/openshift-deploy\\\": stat /usr/bin/openshift-deploy: no such file or directory\"\n".


The error it does not happen with all the tags, for docker run works with instance openshift3/ose-deployer:latest but not with openshift3/ose-deployer:v3.4.1.24

And also happens with non-s2i images:
[root@shift-node-ww5m32us ~]# docker run -ti --rm rhscl/mysql-56-rhel7
/usr/bin/run-mysqld: line 3: /usr/share/container-scripts/mysql/common.sh: No such file or directory
/usr/bin/run-mysqld: line 6: export_setting_variables: command not found

If I try to dump the image I get the error

[root@shift-node-ww5m32us cloud-user]# docker save registry.access.redhat.com/rhscl/mysql-56-rhel7:5.6 -o mysql-5.6-rhel7.tar
Error response from daemon: open /var/lib/docker/devicemapper/mnt/f9c3760aac67f0e8dc79a696a67692340eab0bc1f5b8c3dccb0e6757f5d7a79e/rootfs/usr/share/container-scripts/mysql/README.md: no such file or director

Comment 11 Daniel Walsh 2017-07-25 18:51:34 UTC
I am not surprised.  This looks very strange.  I would like to see if we can recreated this on a newer version of docker.

The containers that this happens on, can you execute /bin/sh for the entrypoint or does this fail also?

Comment 12 Javier Ramirez 2017-07-26 17:03:20 UTC
(In reply to Daniel Walsh from comment #11)
> I am not surprised.  This looks very strange.  I would like to see if we can
> recreated this on a newer version of docker.
> 
> The containers that this happens on, can you execute /bin/sh for the
> entrypoint or does this fail also?

This is what customer reported:

--- docker run -it rhscl/mysql-56-rhel7:5.6 ---
the input device is not a TTY
--- /docker run -it rhscl/mysql-56-rhel7:5.6 ---
--- docker save registry.access.redhat.com/rhscl/mysql-56-rhel7:5.6 -o mysql-5.6-rhel7.tar ---
Error response from daemon: open /var/lib/docker/devicemapper/mnt/f9c3760aac67f0e8dc79a696a67692340eab0bc1f5b8c3dccb0e6757f5d7a79e/rootfs/usr/share/co
ntainer-scripts/mysql/README.md: no such file or directory
--- /docker save registry.access.redhat.com/rhscl/mysql-56-rhel7:5.6 -o mysql-5.6-rhel7.tar ---
--- docker run -it --entrypoint /bin/sh rhscl/mysql-56-rhel7:5.6 -c 'ls -ltr /usr/share/container-scripts/mysql/common.sh' ---
the input device is not a TTY
--- /docker run -it --entrypoint /bin/sh rhscl/mysql-56-rhel7:5.6 -c 'ls -ltr /usr/share/container-scripts/mysql/common.sh' ---
--- docker run -it --entrypoint /bin/sh rhscl/mysql-56-rhel7:5.6 -c 'ls -ltr /usr/share/ | grep container-scripts' ---
the input device is not a TTY
--- /docker run -it --entrypoint /bin/sh rhscl/mysql-56-rhel7:5.6 -c 'ls -ltr /usr/share/ | grep container-scripts' ---

Comment 13 Daniel Walsh 2017-07-26 17:33:07 UTC
Seems like something very bad has happened to the device.

Vivek have you seen something like this?

Comment 14 Daniel Walsh 2017-07-26 17:33:55 UTC
I know that OpenShift used to do force removals could this be one that failed and then left the image/container in a weird state?

Comment 18 Daniel Walsh 2017-08-14 13:47:12 UTC
Jon, this looks like it is not a container runtime problem, but the actual container image, which does not have the proper content,

Comment 19 Ben Parees 2017-08-14 15:52:30 UTC
the image looks fine to me (v3.4.1.44, as referenced in comment 17):

$ docker run --entrypoint=/bin/sh -it registry.access.redhat.com/openshift3/ose-sti-builder:v3.4.1.44 

# ls -l /usr/bin/openshift-sti-build 
lrwxrwxrwx. 1 root root 9 Jun 29 13:53 /usr/bin/openshift-sti-build -> openshift

# ls -l /usr/bin/openshift
-rwxr-xr-x. 1 root root 213359736 Jun 23 18:47 /usr/bin/openshift

$ docker run -it registry.access.redhat.com/openshift3/ose-sti-builder:v3.4.1.44 
The Build "" is invalid: 
* metadata.name: Required value: name or generateName is required
* metadata.namespace: Required value
* spec.strategy: Invalid value: {"DockerStrategy":null,"SourceStrategy":null,"CustomStrategy":null,"JenkinsPipelineStrategy":null}: must provide a value for exactly one of sourceStrategy, customStrategy, dockerStrategy, or jenkinsPipelineStrategy



Note that the original bug report also reported problems for two different images:

Builds fail with:
Jun  8 18:52:56 shift-node-7kgh00o8 atomic-openshift-node: E0608 
18:52:56.327484   17907 pod_workers.go:184] Error syncing pod 
e7dee8f4-4c6a-11e7-a1ea-fa163e52ca24, skipping: failed to "StartContainer" for 
"sti-build" with RunContainerError: "runContainer: Error response from daemon: 
{\"message\":\"invalid header field value \\\"oci runtime error: 
container_linux.go:247: starting container process caused \\\\\\\"exec: \\\\\\
\\\\\\\\\"/usr/bin/openshift-sti-build\\\\\\\\\\\\\\\": stat /usr/bin/
openshift-sti-build: no such file or directory\\\\\\\"\\\\n\\\"\"}"

And docker run with:

+ docker run -it openshift3/ose-deployer:v3.4.1.24
/usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"/usr/bin/openshift-deploy\\\": stat /usr/bin/openshift-deploy: no such file or directory\"\n".


Both of which could not run binaries.  And things worked on some nodes but not others (using presumably the same images).  So this seems much more likely to be an issue of either the container runtime, or bad images being somehow pulled to some nodes but not others.  Either way, that seems like a container issue to me.

Comment 23 Benjamin Schmaus 2017-09-05 11:49:06 UTC
Still looking for an update and potential work around on this issue.

Comment 24 Daniel Walsh 2017-09-05 15:07:06 UTC
Ben, no one knows what is going on here. :^(

Comment 30 Tom Sweeney 2017-10-20 21:26:31 UTC
Benjamin - I'd some initial trouble setting up my test system that was not at all related to this issue.  After having done so, I ran a number of tests and was not able to recreate the issue that Rackspace is seeing.

If you could, can you grab the docker and rhel version from the Rackspace system please?  Is it at all possible to upgrade docker/rhel there to the latest greatest?  I know there were some changes on both sides that may or may not have helped this situation.  

FWIW, my versions are:

# uname -a
Linux rhelbz.localdomain 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 13 10:46:25 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

# docker -v
Docker version 1.12.6, build 85d7426/1.12.6

Also if possible, when this error occurs there, could you grab the portion of the journalctl that happened a few minutes before and a minute or so after?

From looking at the issues and browsing through stuff, I think something might be corrupted down in their /var/lib/docker directories.

Vivek - would it make sense to recommend deleting that directory structure based on the errors that are going on in these reports?

Comment 49 Jhon Honce 2017-12-04 16:04:04 UTC
Please re-open if issue is reproduced.

Comment 51 Miheer Salunke 2017-12-22 02:18:11 UTC
Their version of OpenShift.
# rpm -qa | grep openshift
atomic-openshift-3.5.5.31-1.git.0.b6f55a2.el7.x86_64
atomic-openshift-sdn-ovs-3.5.5.31-1.git.0.b6f55a2.el7.x86_64
tuned-profiles-atomic-openshift-node-3.5.5.31-1.git.0.b6f55a2.el7.x86_64
atomic-openshift-clients-3.5.5.31-1.git.0.b6f55a2.el7.x86_64
atomic-openshift-node-3.5.5.31-1.git.0.b6f55a2.el7.x86_64
atomic-openshift-excluder-3.5.5.31-1.git.0.b6f55a2.el7.noarch
atomic-openshift-docker-excluder-3.5.5.31-1.git.0.b6f55a2.el7.noarch


If you guys think upgrade of docker version will help then to what shall I upgrade as per the above mentioned OpenShift version.

Comment 52 Daniel Walsh 2017-12-23 11:01:20 UTC
Yes I would get them upgraded.

Comment 62 Jhon Honce 2018-01-18 18:33:39 UTC
We're tracking latest issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1518519

Comment 63 Wang Haoran 2018-01-19 02:52:49 UTC
Verified this with:
docker run -it openshift3/ose-deployer:v3.9.0-0.20.0

openshift v3.9.0-0.20.0
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.8

docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-68.gitec8512b.el7.x86_64
 Go version:      go1.8.3
 Git commit:      ec8512b/1.12.6
 Built:           Thu Nov 16 15:19:17 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-1.12.6-68.gitec8512b.el7.x86_64
 Go version:      go1.8.3
 Git commit:      ec8512b/1.12.6
 Built:           Thu Nov 16 15:19:17 2017
 OS/Arch:         linux/amd64


Note You need to log in before you can comment on or make changes to this bug.