Bug 1468681 - stat /usr/bin/openshift-deploy: no such file or directory [NEEDINFO]
stat /usr/bin/openshift-deploy: no such file or directory
Status: NEW
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers (Show other bugs)
3.4.1
Unspecified Unspecified
unspecified Severity high
: ---
: 3.7.0
Assigned To: Jhon Honce
DeShuai Ma
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-07 12:08 EDT by Javier Ramirez
Modified: 2017-09-15 08:35 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
bschmaus: needinfo?


Attachments (Terms of Use)

  None (edit)
Description Javier Ramirez 2017-07-07 12:08:24 EDT
Description of problem:
Failed to "StartContainer" for "sti-build" with RunContainerError. stat /usr/bin/ openshift-sti-build: no such file or directory

Version-Release number of selected component (if applicable):
v3.4.1.24  


How reproducible:
Not in every node and not for every image.


Steps to Reproduce:
1. Start a build or use "docker run" in one of the faulty node+image
2.
3.

Actual results:
Builds fail with:
Jun  8 18:52:56 shift-node-7kgh00o8 atomic-openshift-node: E0608 
18:52:56.327484   17907 pod_workers.go:184] Error syncing pod 
e7dee8f4-4c6a-11e7-a1ea-fa163e52ca24, skipping: failed to "StartContainer" for 
"sti-build" with RunContainerError: "runContainer: Error response from daemon: 
{\"message\":\"invalid header field value \\\"oci runtime error: 
container_linux.go:247: starting container process caused \\\\\\\"exec: \\\\\\
\\\\\\\\\"/usr/bin/openshift-sti-build\\\\\\\\\\\\\\\": stat /usr/bin/
openshift-sti-build: no such file or directory\\\\\\\"\\\\n\\\"\"}"

And docker run with:

+ docker run -it openshift3/ose-deployer:v3.4.1.24
/usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"/usr/bin/openshift-deploy\\\": stat /usr/bin/openshift-deploy: no such file or directory\"\n".


Expected results:
Builds to finish and docker run to work

Additional info:
We already checked:
- packages versions to be the same on faulty and working nodes
- Selinux context
- Docker process running with the same parameters.
- Docker inspect works and show all layers
Comment 4 Javier Ramirez 2017-07-07 12:38:17 EDT
More examples of failed runs:

container_linux.go:247: starting container process caused "exec: \"/usr/bin/openshift-deploy\": stat /usr/bin/openshift-deploy: no such file or directory"
/usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"/usr/bin/openshift-deploy\\\": stat /usr/bin/openshift-deploy: no such file or directory\"\n".


The error it does not happen with all the tags, for docker run works with instance openshift3/ose-deployer:latest but not with openshift3/ose-deployer:v3.4.1.24

And also happens with non-s2i images:
[root@shift-node-ww5m32us ~]# docker run -ti --rm rhscl/mysql-56-rhel7
/usr/bin/run-mysqld: line 3: /usr/share/container-scripts/mysql/common.sh: No such file or directory
/usr/bin/run-mysqld: line 6: export_setting_variables: command not found

If I try to dump the image I get the error

[root@shift-node-ww5m32us cloud-user]# docker save registry.access.redhat.com/rhscl/mysql-56-rhel7:5.6 -o mysql-5.6-rhel7.tar
Error response from daemon: open /var/lib/docker/devicemapper/mnt/f9c3760aac67f0e8dc79a696a67692340eab0bc1f5b8c3dccb0e6757f5d7a79e/rootfs/usr/share/container-scripts/mysql/README.md: no such file or director
Comment 11 Daniel Walsh 2017-07-25 14:51:34 EDT
I am not surprised.  This looks very strange.  I would like to see if we can recreated this on a newer version of docker.

The containers that this happens on, can you execute /bin/sh for the entrypoint or does this fail also?
Comment 12 Javier Ramirez 2017-07-26 13:03:20 EDT
(In reply to Daniel Walsh from comment #11)
> I am not surprised.  This looks very strange.  I would like to see if we can
> recreated this on a newer version of docker.
> 
> The containers that this happens on, can you execute /bin/sh for the
> entrypoint or does this fail also?

This is what customer reported:

--- docker run -it rhscl/mysql-56-rhel7:5.6 ---
the input device is not a TTY
--- /docker run -it rhscl/mysql-56-rhel7:5.6 ---
--- docker save registry.access.redhat.com/rhscl/mysql-56-rhel7:5.6 -o mysql-5.6-rhel7.tar ---
Error response from daemon: open /var/lib/docker/devicemapper/mnt/f9c3760aac67f0e8dc79a696a67692340eab0bc1f5b8c3dccb0e6757f5d7a79e/rootfs/usr/share/co
ntainer-scripts/mysql/README.md: no such file or directory
--- /docker save registry.access.redhat.com/rhscl/mysql-56-rhel7:5.6 -o mysql-5.6-rhel7.tar ---
--- docker run -it --entrypoint /bin/sh rhscl/mysql-56-rhel7:5.6 -c 'ls -ltr /usr/share/container-scripts/mysql/common.sh' ---
the input device is not a TTY
--- /docker run -it --entrypoint /bin/sh rhscl/mysql-56-rhel7:5.6 -c 'ls -ltr /usr/share/container-scripts/mysql/common.sh' ---
--- docker run -it --entrypoint /bin/sh rhscl/mysql-56-rhel7:5.6 -c 'ls -ltr /usr/share/ | grep container-scripts' ---
the input device is not a TTY
--- /docker run -it --entrypoint /bin/sh rhscl/mysql-56-rhel7:5.6 -c 'ls -ltr /usr/share/ | grep container-scripts' ---
Comment 13 Daniel Walsh 2017-07-26 13:33:07 EDT
Seems like something very bad has happened to the device.

Vivek have you seen something like this?
Comment 14 Daniel Walsh 2017-07-26 13:33:55 EDT
I know that OpenShift used to do force removals could this be one that failed and then left the image/container in a weird state?
Comment 18 Daniel Walsh 2017-08-14 09:47:12 EDT
Jon, this looks like it is not a container runtime problem, but the actual container image, which does not have the proper content,
Comment 19 Ben Parees 2017-08-14 11:52:30 EDT
the image looks fine to me (v3.4.1.44, as referenced in comment 17):

$ docker run --entrypoint=/bin/sh -it registry.access.redhat.com/openshift3/ose-sti-builder:v3.4.1.44 

# ls -l /usr/bin/openshift-sti-build 
lrwxrwxrwx. 1 root root 9 Jun 29 13:53 /usr/bin/openshift-sti-build -> openshift

# ls -l /usr/bin/openshift
-rwxr-xr-x. 1 root root 213359736 Jun 23 18:47 /usr/bin/openshift

$ docker run -it registry.access.redhat.com/openshift3/ose-sti-builder:v3.4.1.44 
The Build "" is invalid: 
* metadata.name: Required value: name or generateName is required
* metadata.namespace: Required value
* spec.strategy: Invalid value: {"DockerStrategy":null,"SourceStrategy":null,"CustomStrategy":null,"JenkinsPipelineStrategy":null}: must provide a value for exactly one of sourceStrategy, customStrategy, dockerStrategy, or jenkinsPipelineStrategy



Note that the original bug report also reported problems for two different images:

Builds fail with:
Jun  8 18:52:56 shift-node-7kgh00o8 atomic-openshift-node: E0608 
18:52:56.327484   17907 pod_workers.go:184] Error syncing pod 
e7dee8f4-4c6a-11e7-a1ea-fa163e52ca24, skipping: failed to "StartContainer" for 
"sti-build" with RunContainerError: "runContainer: Error response from daemon: 
{\"message\":\"invalid header field value \\\"oci runtime error: 
container_linux.go:247: starting container process caused \\\\\\\"exec: \\\\\\
\\\\\\\\\"/usr/bin/openshift-sti-build\\\\\\\\\\\\\\\": stat /usr/bin/
openshift-sti-build: no such file or directory\\\\\\\"\\\\n\\\"\"}"

And docker run with:

+ docker run -it openshift3/ose-deployer:v3.4.1.24
/usr/bin/docker-current: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"exec: \\\"/usr/bin/openshift-deploy\\\": stat /usr/bin/openshift-deploy: no such file or directory\"\n".


Both of which could not run binaries.  And things worked on some nodes but not others (using presumably the same images).  So this seems much more likely to be an issue of either the container runtime, or bad images being somehow pulled to some nodes but not others.  Either way, that seems like a container issue to me.
Comment 23 Benjamin Schmaus 2017-09-05 07:49:06 EDT
Still looking for an update and potential work around on this issue.
Comment 24 Daniel Walsh 2017-09-05 11:07:06 EDT
Ben, no one knows what is going on here. :^(

Note You need to log in before you can comment on or make changes to this bug.