Bug 1508378

Summary: Can't deploy Node.js + MongoDB app
Product: OpenShift Container Platform Reporter: Agustin <atamagno.test>
Component: StorageAssignee: Hemant Kumar <hekumar>
Status: CLOSED ERRATA QA Contact: Chao Yang <chaoyang>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: aos-bugs, aos-storage-staff, trankin
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-05 06:58:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Agustin 2017-11-01 10:33:07 UTC
Description of problem:

Deployments for the app fail when I create it from scratch with the default templates. Did this about a month ago while migrating an app that I had in Openshift v2 and it was working. Cluster is starter-us-east-1.

Version-Release number of selected component (if applicable):

How reproducible:

Creating a new Node.js + MongoDB (Persistent) app.

Steps to Reproduce:
1. Create project.
2. Select Node.js + MongoDB (Persistent) from the catalog.
3. Deployment starts automatically.

Actual results:

Two deployments for called mongodb and nodejs-mongo-persistent are created and both fail.

Expected results:

Deployments should be created successfully.

Additional info:

Log from mongodb deployment:

--> Scaling mongodb-1 to 1
--> Waiting up to 10m0s for pods in rc mongodb-1 to become ready
W1101 09:12:40.101770       1 reflector.go:323] github.com/openshift/origin/pkg/deploy/strategy/support/lifecycle.go:509: watch of *api.Pod ended with: too old resource version: 1727711998 (1727733124)
error: update acceptor rejected mongodb-1: pods for rc "mongodb-1" took longer than 600 seconds to become ready

Failed events:

7:46:17 PM	nodejs-mongo-persistent	Deployment Config Failed Deployer pod "nodejs-mongo-persistent-1-deploy" has gone missing

7:47:26 PM	nodejs-mongo-persistent-1-deploy Pod Failed sync Error syncing pod

8:09:36 PM	mongodb-1-z95n5	Pod Failed mount Unable to mount volumes for pod "mongodb-1-z95n5_wifi-player(1904f3e0-bee4-11e7-9157-1250f17a13c8)": timeout expired waiting for volumes to attach/mount for pod "wifi-player"/"mongodb-1-z95n5". list of unattached/unmounted volumes=[mongodb-data]

8:17:45 PM	mongodb-1-deploy Pod Failed sync Error syncing pod

8:19:30 PM	mongodb-1-z95n5	Pod Failed sync Error syncing pod

Comment 2 Hemant Kumar 2018-01-16 21:50:42 UTC
We have implemented a generic recovery mechanism in Openshift 3.9, which will detect volumes stuck on another instance (and if there is no pod that is actively using the volume on that instance) and detach them if necessary. 

One easy way to reproduce this problem is (before 3.9):

1. Create a standalone pod (no deployments, rc etc) with volumes.
2. Shutdown the node.
3. Now wait for the pod on the node to be deleted.
4. Once pod is deleted (spam kubectl get pods) but before controller-manager could detach the volume (there is minimum of 6 minute delay), restart the controller-manager.

5. Above action will cause volume information to be wiped from controller-manager
6. Now try to attach same PVC in another pod (may be scheduled on different node). The pod will stuck in "ContainerCreating" state in 3.7 but not on 3.9

There are few other ways to reproduce this error but this is perhaps easiest.

Comment 4 Chao Yang 2019-06-27 07:00:47 UTC
It is passed on
oc v3.9.84
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-15-202.ec2.internal:8443
openshift v3.9.84
kubernetes v1.9.1+a0ce1bc657

1.create pvc/pod as standalone
[root@ip-172-18-15-202 ~]# oc get pods
NAME      READY     STATUS    RESTARTS   AGE
mypod     1/1       Running   1          5m
2.Shutdown node server
3.Pod is deleted
[root@ip-172-18-15-202 ~]# oc get pods
No resources found.
4.Restart controller service
[root@ip-172-18-15-202 ~]# systemctl restart atomic-openshift-master-controllers.service
5.Recreate a new pod with above pvc
6.Pod is running

Comment 6 errata-xmlrpc 2019-07-05 06:58:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1642