1436324 – [preview][prod]deploying Postgresql Database

Bug 1436324 - [preview][prod]deploying Postgresql Database

Summary: [preview][prod]deploying Postgresql Database

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Hemant Kumar
QA Contact:	Wenqi He
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1435424 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-27 16:31 UTC by dbb2000
Modified:	2017-04-20 19:52 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-04-20 19:52:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
print screen showing too long deployment duration (42.77 KB, image/png) 2017-03-27 16:31 UTC, dbb2000	no flags	Details
trying to create a pod for more than a hour (43.70 KB, image/png) 2017-03-29 15:29 UTC, dbb2000	no flags	Details
no pod log available information (61.54 KB, image/png) 2017-03-29 15:30 UTC, dbb2000	no flags	Details
kibana screen (46.38 KB, image/png) 2017-03-30 15:38 UTC, dbb2000	no flags	Details
View All

Description dbb2000 2017-03-27 16:31:58 UTC

Created attachment 1266713 [details]
print screen showing too long deployment duration

It´s being quite dificult to deploy a Postgresql database instance in my project. OpenShift is trying to deploy it for more than 40 minutes with no success. I can´t even retrieve any log from this current process because is not shown to me. I just sending a print screen as evidence. Could anybody help me, please? Tks

Comment 1 dbb2000 2017-03-27 21:47:44 UTC

Here it follows additional log info from pod:

--> Scaling postgresql-2 to 1
--> Waiting up to 10m0s for pods in deployment postgresql-2 to become ready
W0327 19:58:46.367143       1 reflector.go:330] github.com/openshift/origin/pkg/deploy/strategy/support/lifecycle.go:468: watch of *api.Pod ended with: too old resource version: 1047728707 (1047752717)
error: update acceptor rejected postgresql-2: pods for deployment "postgresql-2" took longer than 600 seconds to become ready

Comment 2 Michal Fojtik 2017-03-29 14:31:29 UTC

Can you check if the postgresql-2 deployment created a pod and can you check the logs from that pod? To make sure postgresql is not stuck initializing.

Comment 3 dbb2000 2017-03-29 15:29:29 UTC

Created attachment 1267320 [details]
trying to create a pod for more than a hour

Comment 4 dbb2000 2017-03-29 15:30:35 UTC

Created attachment 1267322 [details]
no pod log available information

Comment 5 dbb2000 2017-03-29 15:33:34 UTC

I´m currently on 6th postgresql deployment with no success. It´s being difficult to scale up a pod and due to this there is no log available. I´m sending some print screen as evidences coz I haven´t got many info on Console.

Comment 6 Michal Fojtik 2017-03-29 15:36:20 UTC

Thanks! I'm going to ask the Online team to investigate if there are any capacity issues. Also, do you have any other deployments that are failing in similar way?

I can see the pod is in "pending" state, can you please check the "events" for that pod and see why it is pending?

Comment 10 dbb2000 2017-03-29 17:21:27 UTC

here it goes:


1:56:29 PM	Warning	Failed mount 	Failed to attach volume "pvc-0b930f1d-1303-11e7-b5d1-0ebeb1070c7f" on node "ip-172-31-2-63.ec2.internal" with: Timeout waiting for volume state: actual=detached, desired=attached



1:24:24 PM	Warning	Failed mount 	Failed to attach volume "pvc-0b930f1d-1303-11e7-b5d1-0ebeb1070c7f" on node "ip-172-31-2-63.ec2.internal" with: Error attaching EBS volume: VolumeInUse: vol-0399f530d8c354c39 is already attached to an instance status code: 400, request id:
63 times in the last 2 hours

Comment 23 dbb2000 2017-03-29 23:59:59 UTC

I removed all postgresql configuration and installed once again, but the problem persists. Here it follows the last log deployment :


--> Scaling postgresql-1 to 1
--> Waiting up to 10m0s for pods in deployment postgresql-1 to become ready
W0329 23:56:48.276495       1 reflector.go:330] github.com/openshift/origin/pkg/deploy/strategy/support/lifecycle.go:468: watch of *api.Pod ended with: too old resource version: 1056886168 (1056911807)
error: update acceptor rejected postgresql-1: pods for deployment "postgresql-1" took longer than 600 seconds to become ready

Comment 24 Hemant Kumar 2017-03-30 01:10:52 UTC

Thank you - I am working with our team to get relevant logs and see what happened.

Comment 27 dbb2000 2017-03-30 15:37:55 UTC

I´m sending another print screen from Kibana. I don´t know exactly what is, but the elasticsearch plugin is unavailable. There is a link on Deployment log screen pointed to Kibana status screen. I believe worth a look.

Comment 28 dbb2000 2017-03-30 15:38:29 UTC

Created attachment 1267611 [details]
kibana screen

Comment 29 dbb2000 2017-03-31 11:27:20 UTC

I found another log info:

--> Scaling postgresql-5 to 1
--> Waiting up to 10m0s for pods in deployment postgresql-5 to become ready
W0331 11:07:34.314016       1 reflector.go:330] github.com/openshift/origin/pkg/deploy/strategy/support/lifecycle.go:468: watch of *api.Pod ended with: too old resource version: 1063095846 (1063112715)
error: update acceptor rejected postgresql-5: pods for deployment "postgresql-5" took longer than 600 seconds to become ready

Comment 30 Hemant Kumar 2017-03-31 18:50:47 UTC

We have identified the problem and we are working on deploying a fix. 

The problem stems from the fact that, some volumes in AWS were for force detached. This was done as a last resort to fix some volumes which were stuck. But the problem is - a node must be rebooted to update kernel's view of devices after force detaching a volume.

Comment 31 dbb2000 2017-03-31 19:43:26 UTC

Thanks for your effort. Do you know how long will take to fix it, roughly?

Comment 32 Hemant Kumar 2017-04-03 17:57:16 UTC

*** Bug 1435424 has been marked as a duplicate of this bug. ***

Comment 34 Bradley Childs 2017-04-20 14:55:07 UTC

We have identified some problems similar to the one reported.  Do you still have an account db2000?   We went looking for logs but couldn't find your username.

Comment 35 dbb2000 2017-04-20 19:44:32 UTC

After my last post, the problem was solved and deployment service was working perfectly. But now my registration has been cancelled and I´m in a waiting list for a new access.

Cheers,

Davi

Note You need to log in before you can comment on or make changes to this bug.