Bug 1506448
| Summary: | [aws][ded-stg-aws] build is pending for hours and all pods are containercreating when creating app | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | wewang <wewang> |
| Component: | Node | Assignee: | Ryan Phillips <rphillips> |
| Node sub component: | CRI-O | QA Contact: | Xiaoli Tian <xtian> |
| Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | aos-bugs, decarr, eparis, jeder, jhonce, jokerman, jupierce, mmccomas, pportant, pruan, yufchang |
| Version: | unspecified | Keywords: | OnlineDedicated, OpsBlocker, TestBlocker |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-17 16:57:11 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Pods stuck in containers create, so sending to containers team. I had debugged these nodes yesterday, and the issue appeared as the iptables-restore issue kept recurring and preventing containers from actually starting. I'm going to move this back to pod and restate the purpose. This BZ is now being used to track the need for an event when the system is unable to pull the pause/ose-pod. ``` docker pull registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.6.173.0.49 Trying to pull repository registry.reg-aws.openshift.com:443/openshift3/ose-pod ... manifest unknown: manifest unknown ``` The docker pull for the pause always fails (probably because it isn't in the registry, but that's a different issue not to be solved in this bz). oc describe on a pod that is unable to start because of this looks like: ``` Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 26m 26m 1 default-scheduler Normal Scheduled Successfully assigned frontend-3v3gk to ip-172-31-55-154.ec2.internal 26m 1m 112 kubelet, ip-172-31-55-154.ec2.internal Warning FailedSync Error syncing pod [root@ded-stage-aws-master-89f93 eparis]# oc get node ip-172-31-55-154.ec2.internal -L hostname ``` It is not possible to see that the failure was because of an inability to pull pause/ose-pod. .49 images have now been pushed. apps can do builds now. @jeder The bug status is VERIFIED before, should I do anything else? Closed as stale, please re-open if this issue is still active. |
Description of problem: build is pending with error "FailedSync kubelet, ip-172-31-55-154.ec2.internal Error syncing pod" Version-Release number of selected component (if applicable): OpenShift Master: v3.6.173.0.49 How reproducible: always Steps to Reproduce: 1.Create ruby app 2. Check the build # oc get builds NAME TYPE FROM STATUS STARTED DURATION rubyapp-1 Source Git@master Pending 3. Check the log # oc get event LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 1h 1h 1 rubyapp-1-build Pod Normal Scheduled default-scheduler Successfully assigned rubyapp-1-build to ip-172-31-55-154.ec2.internal <invalid> 1h 510 rubyapp-1-build Pod Warning FailedSync kubelet, ip-172-31-55-154.ec2.internal Error syncing pod 1h 1h 1 rubyapp BuildConfig Warning BuildConfigInstantiateFailed buildconfig-controller gave up on Build for BuildConfig wewang4/rubyapp (0) due to fatal error: the LastVersion(1) on build config wewang4/rubyapp does not match the build request LastVersion(0) Actual results: build is pending Expected results: build is complete and pod is running Additional info: