Bug 1561375
Summary: | pods in terminating status for more than 50 mins, sometimes | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Weihua Meng <wmeng> |
Component: | Node | Assignee: | Seth Jennings <sjenning> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | DeShuai Ma <dma> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.9.0 | CC: | aos-bugs, jokerman, mmccomas, sjenning, wjiang, wmeng |
Target Milestone: | --- | ||
Target Release: | 3.9.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-04-11 01:54:18 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Weihua Meng
2018-03-28 08:52:17 UTC
Is this still happening? If so, use the path in the TearDown error message as an argument to "lsof +D" on the node to figure out what process is holding the mount point open. What should I do when no such command on Atomic Host? Thanks. # which lsof /usr/bin/which: no lsof in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/.local/bin:/root/bin) Operating System: Red Hat Enterprise Linux Atomic Host 7.4.5 Sorry. I tried upgrade OCP 3.7.42(latest 3.7) to 3.9.19(latest 3.9), on AH 7.4.5 and AH 7.5.0. Did not meet this issue. I do keep an env for debug when I meet this, Unfortunately it is gone since 10+ days passed. May I have your suggestion what config to try next? and how long the env need to keep for debug if it occurs? I don't have a suggestion on how it could be recreated as it shouldn't happen in normal operation. Without an recreation environment/procedure or logs, there isn't sufficient information to progress on this bug. There could be a number of reasons why the mount point was busy but no way to tell without more information from an environment where the situation is happening. The this is recreated please reopen and provide access to the host that is experiencing the issue. The key will be finding the process that has a file open on the mount point. Can you take a look at BZ 1566150 Seems related. Thanks. "We have a fluentd point that won't terminate. We issued an "ocp delete pod logging-fluentd-dn9xt", and then just see the pod stay in the "Terminating" state. This is on OCP 3.9 and one of the starter clusters." |