Description of problem: Deployments for the app fail when I create it from scratch with the default templates. Did this about a month ago while migrating an app that I had in Openshift v2 and it was working. Cluster is starter-us-east-1. Version-Release number of selected component (if applicable): How reproducible: Creating a new Node.js + MongoDB (Persistent) app. Steps to Reproduce: 1. Create project. 2. Select Node.js + MongoDB (Persistent) from the catalog. 3. Deployment starts automatically. Actual results: Two deployments for called mongodb and nodejs-mongo-persistent are created and both fail. Expected results: Deployments should be created successfully. Additional info: Log from mongodb deployment: --> Scaling mongodb-1 to 1 --> Waiting up to 10m0s for pods in rc mongodb-1 to become ready W1101 09:12:40.101770 1 reflector.go:323] github.com/openshift/origin/pkg/deploy/strategy/support/lifecycle.go:509: watch of *api.Pod ended with: too old resource version: 1727711998 (1727733124) error: update acceptor rejected mongodb-1: pods for rc "mongodb-1" took longer than 600 seconds to become ready Failed events: 7:46:17 PM nodejs-mongo-persistent Deployment Config Failed Deployer pod "nodejs-mongo-persistent-1-deploy" has gone missing 7:47:26 PM nodejs-mongo-persistent-1-deploy Pod Failed sync Error syncing pod 8:09:36 PM mongodb-1-z95n5 Pod Failed mount Unable to mount volumes for pod "mongodb-1-z95n5_wifi-player(1904f3e0-bee4-11e7-9157-1250f17a13c8)": timeout expired waiting for volumes to attach/mount for pod "wifi-player"/"mongodb-1-z95n5". list of unattached/unmounted volumes=[mongodb-data] 8:17:45 PM mongodb-1-deploy Pod Failed sync Error syncing pod 8:19:30 PM mongodb-1-z95n5 Pod Failed sync Error syncing pod
We have implemented a generic recovery mechanism in Openshift 3.9, which will detect volumes stuck on another instance (and if there is no pod that is actively using the volume on that instance) and detach them if necessary. One easy way to reproduce this problem is (before 3.9): 1. Create a standalone pod (no deployments, rc etc) with volumes. 2. Shutdown the node. 3. Now wait for the pod on the node to be deleted. 4. Once pod is deleted (spam kubectl get pods) but before controller-manager could detach the volume (there is minimum of 6 minute delay), restart the controller-manager. 5. Above action will cause volume information to be wiped from controller-manager 6. Now try to attach same PVC in another pod (may be scheduled on different node). The pod will stuck in "ContainerCreating" state in 3.7 but not on 3.9 There are few other ways to reproduce this error but this is perhaps easiest.
It is passed on oc v3.9.84 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-15-202.ec2.internal:8443 openshift v3.9.84 kubernetes v1.9.1+a0ce1bc657 1.create pvc/pod as standalone [root@ip-172-18-15-202 ~]# oc get pods NAME READY STATUS RESTARTS AGE mypod 1/1 Running 1 5m 2.Shutdown node server 3.Pod is deleted [root@ip-172-18-15-202 ~]# oc get pods No resources found. 4.Restart controller service [root@ip-172-18-15-202 ~]# systemctl restart atomic-openshift-master-controllers.service 5.Recreate a new pod with above pvc 6.Pod is running
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1642