857132 – Scale-Up operation fails with JBoss and Postgres

Bug 857132 - Scale-Up operation fails with JBoss and Postgres

Summary: Scale-Up operation fails with JBoss and Postgres

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	1.x
Hardware:	Unspecified
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bill DeCoste
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-13 16:14 UTC by Skye Book
Modified:	2012-09-13 23:48 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-09-13 23:48:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
scale_events.log (1.54 MB, text/plain) 2012-09-13 17:42 UTC, Skye Book	no flags	Details
Redeploying New Code Now Fails (2.46 KB, application/octet-stream) 2012-09-13 19:36 UTC, Skye Book	no flags	Details
Local Gear's server.log (145.30 KB, application/octet-stream) 2012-09-13 19:46 UTC, Skye Book	no flags	Details
Gear2 log (40.33 KB, text/plain) 2012-09-13 19:54 UTC, Skye Book	no flags	Details
View All

Description Skye Book 2012-09-13 16:14:19 UTC

Description of problem:
On a scaled JBoss application using Postgres attempts to scale up a gear appear to fail consistently.  According to the logs, JBoss is unable to connect to the Postgres data store and show L7STS/404 under the LstChk column in the haproxy_status page.  My local/home gear continues to function normally, but the new gear is never started fully.

On recommendation from mmcgrath in IRC, I SSH'd into the failed slave gear and executed 'ctl_all restart'.  On completion, the gear was responding normally in haproxy_status and began to serve requests (and was connected successfully to Postgres).


Version-Release number of selected component (if applicable):


How reproducible:
Consistently.

Steps to Reproduce:
1. Run scaled application with Postgres database
2. Execute 'haproxy_ctld -u'
3. Monitor APP_URL/haproxy_status.  Before the haproxy_ctld command completes, the new gear will show as DOWN
  
Actual results:


Expected results:


Additional info:

Comment 1 Bill DeCoste 2012-09-13 17:39:12 UTC

Hi Skye,

Could you please attach the scale_events.log file from your haproxy gear?

Thanks -Bill

Comment 2 Skye Book 2012-09-13 17:42:53 UTC

Created attachment 612529 [details]
scale_events.log

Attached is the scale_events log from my shared haproxy/app gear.

Comment 3 Skye Book 2012-09-13 19:36:52 UTC

Created attachment 612590 [details]
Redeploying New Code Now Fails

Now when pushing to the repository, the app is unable to deploy.  It hangs where the maven build should be starting.

rhc app restart fails.

rhc app force-stop succeeds.

rhc app start fails with cartridge error 121.

More in the attached log.

Comment 4 Skye Book 2012-09-13 19:39:03 UTC

Correction to my description, the initial "rhc app restart" says it has succeeded but does not actually work :)

Comment 5 Skye Book 2012-09-13 19:46:51 UTC

Created attachment 612610 [details]
Local Gear's server.log

Here is the server.log from the last hour or so on the local gear.

Comment 6 Skye Book 2012-09-13 19:54:03 UTC

Created attachment 612611 [details]
Gear2 log

Log from gear 2

Comment 7 Bill DeCoste 2012-09-13 21:07:41 UTC

The error and related stacktraces below are a known issue with database connections being dropped/invalidated between the application and database gears. JBoss should recover and create a new connection. The warning errors and stacktraces are just (annoying) noise


2012/09/13 11:43:26,714 WARN  [org.jboss.jca.adapters.jdbc.local.LocalManagedConnectionFactory] (http-127.8.4.129-127.8.4.129-8080-2) Destroying connection that is not valid, due to the following exception: org.postgresql.jdbc4.Jdbc4Connection@23bd2a5d: org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.

Comment 8 Bill DeCoste 2012-09-13 22:37:40 UTC

Skye,

I'd like to close this bug and open another one for your second issue (rhc app restart failing) if that's alright?

Thanks -Bill

Comment 9 Skye Book 2012-09-13 23:44:50 UTC

(In reply to comment #8)
> Skye,
> 
> I'd like to close this bug and open another one for your second issue (rhc
> app restart failing) if that's alright?
> 
> Thanks -Bill

By all means, please do.  If you could just CC me on the new bug or something that would be great.

Note You need to log in before you can comment on or make changes to this bug.