Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1506448

Summary: [aws][ded-stg-aws] build is pending for hours and all pods are containercreating when creating app
Product: OpenShift Container Platform Reporter: wewang <wewang>
Component: NodeAssignee: Ryan Phillips <rphillips>
Node sub component: CRI-O QA Contact: Xiaoli Tian <xtian>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, decarr, eparis, jeder, jhonce, jokerman, jupierce, mmccomas, pportant, pruan, yufchang
Version: unspecifiedKeywords: OnlineDedicated, OpsBlocker, TestBlocker
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-17 16:57:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description wewang 2017-10-26 05:56:03 UTC
Description of problem:
build is pending with error "FailedSync                     kubelet, ip-172-31-55-154.ec2.internal   Error syncing pod"

Version-Release number of selected component (if applicable):
OpenShift Master:
    v3.6.173.0.49

How reproducible:
always 

Steps to Reproduce:
1.Create ruby app 
2. Check the build 
# oc get builds
NAME        TYPE      FROM         STATUS    STARTED   DURATION
rubyapp-1   Source    Git@master   Pending 
3. Check the log
# oc get event
LASTSEEN    FIRSTSEEN   COUNT     NAME              KIND          SUBOBJECT   TYPE      REASON                         SOURCE                                   MESSAGE
1h          1h          1         rubyapp-1-build   Pod                       Normal    Scheduled                      default-scheduler                        Successfully assigned rubyapp-1-build to ip-172-31-55-154.ec2.internal
<invalid>   1h          510       rubyapp-1-build   Pod                       Warning   FailedSync                     kubelet, ip-172-31-55-154.ec2.internal   Error syncing pod
1h          1h          1         rubyapp           BuildConfig               Warning   BuildConfigInstantiateFailed   buildconfig-controller                   gave up on Build for BuildConfig wewang4/rubyapp (0) due to fatal error: the LastVersion(1) on build config wewang4/rubyapp does not match the build request LastVersion(0)


Actual results:
build is pending
Expected results:
build is complete and pod is running


Additional info:

Comment 1 Ben Parees 2017-10-26 07:47:49 UTC
Pods stuck in containers create, so sending to containers team.

Comment 2 Derek Carr 2017-10-26 15:26:41 UTC
I had debugged these nodes yesterday, and the issue appeared as the iptables-restore issue kept recurring and preventing containers from actually starting.

Comment 6 Eric Paris 2017-10-26 17:36:09 UTC
I'm going to move this back to pod and restate the purpose.

This BZ is now being used to track the need for an event when the system is unable to pull the pause/ose-pod.
```
docker pull registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.6.173.0.49
Trying to pull repository registry.reg-aws.openshift.com:443/openshift3/ose-pod ... 
manifest unknown: manifest unknown
```

The docker pull for the pause always fails (probably because it isn't in the registry, but that's a different issue not to be solved in this bz).

oc describe on a pod that is unable to start because of this looks like:
```
Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----					-------------	--------	------		-------
  26m		26m		1	default-scheduler					Normal		Scheduled	Successfully assigned frontend-3v3gk to ip-172-31-55-154.ec2.internal
  26m		1m		112	kubelet, ip-172-31-55-154.ec2.internal			Warning		FailedSync	Error syncing pod
[root@ded-stage-aws-master-89f93 eparis]# oc get node ip-172-31-55-154.ec2.internal -L hostname
```

It is not possible to see that the failure was because of an inability to pull pause/ose-pod.

Comment 11 Justin Pierce 2017-10-26 19:39:27 UTC
.49 images have now been pushed.

Comment 12 Peter Ruan 2017-10-26 21:48:22 UTC
apps can do builds now.

Comment 15 wewang 2020-03-10 02:38:42 UTC
@jeder The bug status is VERIFIED before, should I do anything else?

Comment 16 Jhon Honce 2022-08-17 16:57:11 UTC
Closed as stale, please re-open if this issue is still active.