1553223 – when database connectivity is lost, webui ping feature first reports success, then fails after some time

Bug 1553223 - when database connectivity is lost, webui ping feature first reports success, then fails after some time

Summary: when database connectivity is lost, webui ping feature first reports success,...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	UI - OPS
Sub Component:
Version:	5.9.0
Hardware:	All
OS:	All
Priority:	high
Severity:	medium
Target Milestone:	GA
Target Release:	5.10.0
Assignee:	Dávid Halász
QA Contact:	Ievgen Zapolskyi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1561222
TreeView+	depends on / blocked

Reported:	2018-03-08 14:31 UTC by Felix Dewaleyne
Modified:	2021-06-10 15:10 UTC (History)
CC List:	11 users (show)
Fixed In Version:	5.10.0.0
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1561222 (view as bug list)
Environment:
Last Closed:	2019-02-11 14:05:45 UTC
Category:	---
Cloudforms Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Felix Dewaleyne 2018-03-08 14:31:05 UTC

Description of problem:
after stopping the postgresql service, the api ping call responds with a pong while the webui correctly raises an error

Version-Release number of selected component (if applicable):
5.9.0

How reproducible:
all the time

Steps to Reproduce:
1. stop postgres on the db / break database connectivity
2. connect to the appliance's ping page (https://appliance/ping)
3. connect to the appliance's api ping (https://appliance/api/ping)

Actual results:
the two pages do not behave the same way

Expected results:
the api ping will behave the same way as the older webui ping page

Additional info:

Comment 2 Joe Rafaniello 2018-03-08 22:24:14 UTC

The /ping endpoint hits a cache for the server timezone and when that cache expires after 5 minutes or if it was never cached, it tries to fetch the miq_servers row, which fails because the database is down.  See [1], [2], [3]

I believe both /api/ping and /ping should return PONG if they're accessible regardless of the database connectivity.  Therefore, before_actions must work for the ping endpoint even if the db is down.  /ping is a "I'm here" probe and 

[1] https://github.com/ManageIQ/manageiq-ui-classic/blob/0b11ca455aca9806cb772d464bba3cb6e99289db/app/controllers/application_controller.rb#L80

[2]
https://github.com/ManageIQ/manageiq-ui-classic/blob/0b11ca455aca9806cb772d464bba3cb6e99289db/app/controllers/application_controller.rb#L889

[3]
https://github.com/ManageIQ/manageiq-ui-classic/blob/0b11ca455aca9806cb772d464bba3cb6e99289db/app/controllers/application_controller/timezone.rb#L32


[----] I, [2018-03-08T17:00:19.014556 #2494:3f8740da1e14]  INFO -- : Processing by PingController#index as */*
[----] F, [2018-03-08T17:00:19.017215 #2494:3f8740da1e14] FATAL -- : Error caught: [PG::Error] invalid encoding name: utf8
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_adapters/postgresql_adapter.rb:685:in `set_client_encoding'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_adapters/postgresql_adapter.rb:685:in `configure_connection'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_adapters/postgresql_adapter.rb:255:in `reconnect!'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_adapters/abstract_adapter.rb:423:in `verify!'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:778:in `block in checkout_and_verify'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:126:in `call'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:506:in `block (2 levels) in compile'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:455:in `call'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:101:in `__run_callbacks__'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:750:in `_run_checkout_callbacks'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:777:in `checkout_and_verify'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:490:in `checkout'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:364:in `connection'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_adapters/abstract/connection_pool.rb:883:in `retrieve_connection'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_handling.rb:128:in `retrieve_connection'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/connection_handling.rb:91:in `connection'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/core.rb:283:in `cached_find_by_statement'
/usr/local/lib/ruby/gems/2.3.0/gems/activerecord-5.0.6/lib/active_record/core.rb:196:in `find_by'
/var/www/miq/vmdb/app/models/miq_server.rb:547:in `block in <class:MiqServer>'
/usr/local/lib/ruby/gems/2.3.0/gems/more_core_extensions-3.5.0/lib/more_core_extensions/core_ext/module/cache_with_timeout.rb:61:in `block (2 levels) in cache_with_timeout'
/usr/local/lib/ruby/2.3.0/sync.rb:234:in `block in sync_synchronize'
/usr/local/lib/ruby/2.3.0/sync.rb:231:in `handle_interrupt'
/usr/local/lib/ruby/2.3.0/sync.rb:231:in `sync_synchronize'
/usr/local/lib/ruby/gems/2.3.0/gems/more_core_extensions-3.5.0/lib/more_core_extensions/core_ext/module/cache_with_timeout.rb:51:in `block in cache_with_timeout'
/usr/local/lib/ruby/gems/2.3.0/bundler/gems/manageiq-ui-classic-0928245eca8f/app/controllers/application_controller/timezone.rb:32:in `server_timezone'
/usr/local/lib/ruby/gems/2.3.0/bundler/gems/manageiq-ui-classic-0928245eca8f/app/controllers/application_controller.rb:888:in `set_user_time_zone'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:382:in `block in make_lambda'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:169:in `block (2 levels) in halting'
/usr/local/lib/ruby/gems/2.3.0/gems/actionpack-5.0.6/lib/abstract_controller/callbacks.rb:12:in `block (2 levels) in <module:Callbacks>'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:170:in `block in halting'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:454:in `block in call'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:454:in `each'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:454:in `call'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:101:in `__run_callbacks__'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:750:in `_run_process_action_callbacks'
/usr/local/lib/ruby/gems/2.3.0/gems/activesupport-5.0.6/lib/active_support/callbacks.rb:90:in `run_callbacks'
/usr/local/lib/ruby/gems/2.3.0/gems/actionpack-5.0.6/lib/abstract_controller/callbacks.rb:19:in `process_action'
/usr/local/lib/ruby/gems/2.3.0/gems/actionpack-5.0.6/lib/action_controller/metal/rescue.rb:20:in `process_action'
/usr/local/lib/ruby/gems/2.3.0/gems/actionpack-5.0.6/lib/action_controller/metal/instrumentation.rb:32:in `block in process_action'

Comment 3 Joe Rafaniello 2018-03-08 22:25:17 UTC

Continued...

/ping is a "I'm here" probe and shouldn't directly or indirectly testing database connectivity.  We can expose an endpoint for that if desired.

Comment 4 Joe Rafaniello 2018-03-08 22:26:24 UTC

Reassigned to UI group to allow /ping to work without database connectivity

Comment 6 Martin Povolny 2018-03-12 17:36:02 UTC

So the loadbalancers use the /ping to check if the application is ready to server requests for the user, right?

And the UI is not able to serve any requests then it does not have the database. Right?

If the loadbalancer checks the /ping and it gets the response it starts directing requests to the app. Right?

But if the database is offline we cannot serve any requests. Can we?

My conclusion is that the current behavior is the usefull one and changing it will actually make in not usefull.

I don't want to make things not useful to match behavior of the API. That does not make any sense to me.

Is there a usecase where the behavior that JoeR suggest would actually be useful?

Is there any docs on this?

p.s. Think of a ticket office that has a sign "Open". A customer comes and wants to buy a ticket. The clerk answers: We are open but we do not provide any services today. We do not sell tickets, we do not return money. But right, as the sign reads: We are "Open".

Comment 7 Joe Rafaniello 2018-03-12 18:44:39 UTC

I can see the usefulness of the current implementation and agree we can decide on a finer grain ping endpoint for subsystems when we need them.

Even if /ping and /api/ping differ in their responses in this situation, the problem is that /ping checks the database "sometimes" and ignores it others.

As it is now, within the first X minutes you can hit the /ping endpoint (with the db down) and it will succeed with pong until the cache_with_timeout on MiqServer.my_server expires and it tries to issue a query.  This is why I assumed it wasn't intentionally testing the database.  If need be, I can open a new BZ since the current behavior led me down the wrong path.  We can leave this BZ as:

/ping to test ui availability (also checks database connectivity)
/api/ping to test api worker is accepting connections (does not check db)

But anyway, here's how I tested it:
1) bin/rails s
2) curl -k http:/127.0.0.1:3000/ping => pong
3) stop postgresql
4) curl -k http:/127.0.0.1:3000/ping => pong
5) After 5 minutes or so: 
curl -k http:/127.0.0.1:3000/ping => 

throw "error";
$("#center_div").html("\u003cdiv id='content' style='overflow: scroll; height: 100%; padding-left: 50px;'\u003e\n\u003ch1 style='color: #c90813'\u003e\nUnexpected error encountered\n\u003c/h1\u003e\n\u003cbr\u003e\n\u003ch2\u003e\nErrors in Management Engine can be caused by:\n\u003c/h2\u003e\n\u003cbr\u003e\n\u003cfieldset\u003e\n\u003cfont style='font-size: 116%;color: black'\u003e\u0026#x2460;\u003c/font\u003e\nAccessing Management Engine from multiple tabs
...

Comment 8 Martin Povolny 2018-03-23 09:12:42 UTC

David is taking this one right after he's back from Bath. I will help if needed. The plan is to get this resolved by the start of the next week.

Comment 9 CFME Bot 2018-03-26 12:22:10 UTC

https://github.com/ManageIQ/manageiq-ui-classic/pull/3680

Comment 10 CFME Bot 2018-03-27 15:53:21 UTC

New commit detected on ManageIQ/manageiq-ui-classic/master:

https://github.com/ManageIQ/manageiq-ui-classic/commit/5a7fb6e883331e86b93625c2eb8ae01fd136f79e
commit 5a7fb6e883331e86b93625c2eb8ae01fd136f79e
Author:     Dávid Halász <dhalasz>
AuthorDate: Mon Mar 26 08:10:44 2018 -0400
Commit:     Dávid Halász <dhalasz>
CommitDate: Mon Mar 26 08:10:44 2018 -0400

    Fail with an error on /ping immediately if DB is not accessible

    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1553223

 app/controllers/ping_controller.rb | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comment 11 Joe Rafaniello 2018-03-27 17:52:32 UTC

Note, for clarification, we should document that the rest api and ui both have ping endpoints for different use cases:

rest api:  /api/ping
ui:        /ping

The rest api ping, /api/ping, is only testing if the rest api will accept connections.

The ui ping, /ping, will also test database connectivity in addition to a simple normal UI request.  This endpoint should be used for testing if the UI is "up".

Comment 15 Joe Rafaniello 2019-01-22 15:13:01 UTC

Hi Ievgen,

Sorry for the lack of clarity.

I have updated the title of this bugzilla to reflect the change fixed in this bugzilla: Due to caching, the UI /ping endpoint is inconsistent:  first passes, then fails after 5 minutes... while the database is down.

In comment #11, there is still a difference between the rest api and ui ping endpoint as they have different use cases and we are NOT making these two endpoints (ui and api's ping) the same. See that comment for details.

For testing this fix made in this bugzilla:
From comment #7:

"As it is now, within the first X minutes you can hit the /ping endpoint (with the db down) and it will succeed with pong until the cache_with_timeout on MiqServer.my_server expires and it tries to issue a query"

Here's how I tested it:
1) bin/rails s
2) curl -k http:/127.0.0.1:3000/ping => pong
3) stop postgresql
4) curl -k http:/127.0.0.1:3000/ping => pong
5) After 5 minutes or so: 
curl -k http:/127.0.0.1:3000/ping => 

throw "error";
$("#center_div").html("\u003cdiv id='content' style='overflow: scroll; height: 100%; padding-left: 50px;'\u003e\n\u003ch1 style='color: #c90813'\u003e\nUnexpected error encountered\n\u003c/h1\u003e\n\u003cbr\u003e\n\u003ch2\u003e\nErrors in Management Engine can be caused by:\n\u003c/h2\u003e\n\u003cbr\u003e\n\u003cfieldset\u003e\n\u003cfont style='font-size: 116%;color: black'\u003e\u0026#x2460;\u003c/font\u003e\nAccessing Management Engine from multiple tabs
...
"

Comment 16 Ievgen Zapolskyi 2019-01-22 15:15:21 UTC

Hi Joe, 

Thanks for update.

Verified in 5.10.0.32

Note You need to log in before you can comment on or make changes to this bug.