Bug 1550644
| Summary: | oc cp intermittently fails to finish copying files | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jason Montleon <jmontleo> |
| Component: | oc | Assignee: | Juan Vallejo <jvallejo> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Xingxing Xia <xxia> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.9.0 | CC: | aos-bugs, chezhang, jmatthew, jmontleo, jokerman, mmccomas, smunilla, wmeng, zitang |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 3.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | OCP 3.9.4 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1544606 | Environment: | |
| Last Closed: | 2018-06-18 17:42:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1552670 | ||
| Bug Blocks: | 1544606 | ||
|
Description
Jason Montleon
2018-03-01 16:56:34 UTC
I cannot reproduce this locally. Does this happen with only certain pods? All pods? Is there an error message, or does the command just hang? Can I get --loglevel=8 output for the command? `oc logs` for the pods as well please. Is there an environment I can use to reproduce this? It seems to be any pod in the environment. I tried: oc cp /tmp/db/db.dump -n default docker-registry-1-zrt6s:/tmp/db.dump And get the same intermittent failures. Other DB pods that were provisioned are doing the same. I believe they're seeing this in a docker environment. Sorry, hit enter too soon and I didn't mean to cancel needinfo. I am trying to get them to provide a host where I can reproduce this to confirm it's with docker. If true I agree it looks similar to https://bugzilla.redhat.com/show_bug.cgi?id=1549259, but I don't think it can be the same, if the fix is in cri-o. I am seeing this with a system using docker as well. I'm not sure if this is the same issue as BZ#1549259. It looks like it was suggested that that was due to a cri-o but, and if that's the case I don't see how they can be related. However, it is possible they are the same and it's not a cri-o (or not just a cri-o) issue, as it seems I now can't exec into the pod oc cp occassionally hangs with. 'oc exec -it -n default docker-registry-1-j2t8j /bin/bash' works now but 'oc exec -n default docker-registry-1-j2t8j' /bin/bash does not. I do see several tar and bash processes running inside, from trying to cp and exec. $ ps -ef UID PID PPID C STIME TTY TIME CMD 1000000+ 1 0 0 08:42 ? 00:00:15 /usr/bin/dockerregistry /etc/registry/config.yml 1000000+ 24 0 0 13:25 ? 00:00:00 tar xf - -C /tmp 1000000+ 64 0 0 13:25 ? 00:00:00 tar xf - -C /tmp 1000000+ 68 0 0 13:29 ? 00:00:00 /bin/bash 1000000+ 72 0 0 13:30 ? 00:00:00 /bin/bash 1000000+ 76 0 0 13:30 ? 00:00:00 /bin/bash 1000000+ 84 0 0 13:30 ? 00:00:00 tar xf - -C /tmp 1000000+ 104 0 0 13:30 ? 00:00:00 /bin/bash 1000000+ 108 0 0 13:30 ? 00:00:00 /bin/bash 1000000+ 112 0 0 13:30 ? 00:00:00 /bin/bash 1000000+ 116 0 0 13:31 ? 00:00:00 /bin/bash 1000000+ 120 0 0 13:32 ? 00:00:00 /bin/bash 1000000+ 124 0 0 13:33 ? 00:00:00 /bin/bash 1000000+ 131 124 0 13:33 ? 00:00:00 ps -ef (In reply to Jason Montleon from comment #10) > 'oc exec -it -n default docker-registry-1-j2t8j /bin/bash' works now but 'oc > exec -n default docker-registry-1-j2t8j' /bin/bash does not. Based on this information, this bug does appear to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1549259 Closing this one as a duplicate. Thanks *** This bug has been marked as a duplicate of bug 1549259 *** Same PR as 1552670 waiting PR lands in OCP and then will check First, tried to confirm way to reproduce issue: In on hand old version OCP 3.9.1 with docker 1.12,try repeating command [1], exec command not hangs. Later upgrade docker 1.12 to 1.13, restart docker/master/node, try command again, it hangs often enough. Second, install new version OCP 3.9.4 env with docker 1.13, try same command, exec didn't hang. So moving to VERIFIED [1] the tried steps: $ oc new-app mysql-ephemeral # wait pod running $ for i in $(seq 1 500) do echo "$i testing ... `date '+%H:%M:%S'`" oc cp pod.yaml mysql-1-b9nb7:/tmp/pod-$i.yaml echo "$i ended `date '+%H:%M:%S'`" # check timestamp difference echo "-----------------" done |