1700378 – 500 internal Server Error on url /api/notifications

Bug 1700378 - 500 internal Server Error on url /api/notifications

Summary: 500 internal Server Error on url /api/notifications

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	API
Sub Component:
Version:	5.9.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	GA
Target Release:	5.11.0
Assignee:	drew uhlmann
QA Contact:	Parthvi Vala
Docs Contact:	Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks:	1704905 1714615
TreeView+	depends on / blocked

Reported:	2019-04-16 12:22 UTC by Gellert Kis
Modified:	2019-12-13 14:55 UTC (History)
CC List:	9 users (show)
Fixed In Version:	5.11.0.6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1714615 (view as bug list)
Environment:
Last Closed:	2019-12-13 14:55:16 UTC
Category:	Bug
Cloudforms Team:	CFME Core
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ManageIQ integration_tests pull 9091	0	'None'	closed	[1LP][RFR]Add manual tests for test coverage	2020-06-27 14:56:20 UTC

Comment 6 drew uhlmann 2019-05-08 14:58:32 UTC

Thanks Gellert. We can reproduce and are in the process of determining the correct way to fix.

Comment 9 drew uhlmann 2019-05-09 18:50:28 UTC

Here's the first part: 
https://github.com/ManageIQ/activerecord-virtual_attributes/pull/21

Comment 13 CFME Bot 2019-05-22 20:00:55 UTC

https://github.com/ManageIQ/manageiq/pull/18798

Comment 14 Joe Rafaniello 2019-05-22 21:30:24 UTC

The second PR has been opened to leverage the first one.  We're still working on reviewing it and getting it merged.

Comment 15 CFME Bot 2019-05-24 19:50:45 UTC

New commit detected on ManageIQ/manageiq/master:

https://github.com/ManageIQ/manageiq/commit/168f3ad530627603d1478aa6c3cff306e3f8b610
commit 168f3ad530627603d1478aa6c3cff306e3f8b610
Author:     Keenan Brock <keenan>
AuthorDate: Thu May  9 16:01:12 2019 -0400
Commit:     Keenan Brock <keenan>
CommitDate: Thu May  9 16:01:12 2019 -0400

    specify virtual_delegate types to avoid deadlock

    deriving the attribute type for delegates required the target
    class to be loaded.
    This forced a cascade of load_schema calls that end up
    introducing a race condition.

    This PR explicitly declares the attribute type so the target class
    no longer needs to be loaded and the race condition
    (and subsequent deadlocks) are avoided.

    https://bugzilla.redhat.com/show_bug.cgi?id=1700378

 Gemfile | 2 +-
 app/models/ems_cluster.rb | 2 +-
 app/models/entitlement.rb | 2 +-
 app/models/host.rb | 14 +-
 app/models/miq_group.rb | 2 +-
 app/models/miq_product_feature.rb | 2 +-
 app/models/miq_report_result.rb | 2 +-
 app/models/miq_server.rb | 2 +-
 app/models/miq_widget.rb | 4 +-
 app/models/mixins/authentication_mixin.rb | 1 +
 app/models/mixins/compliance_mixin.rb | 2 +
 app/models/mixins/drift_state_mixin.rb | 4 +-
 app/models/mixins/ownership_mixin.rb | 4 +-
 app/models/vm_or_template.rb | 24 +-
 14 files changed, 35 insertions(+), 32 deletions(-)

Comment 16 drew uhlmann 2019-05-28 11:38:53 UTC

I think I can move this to post now? I guess I'll find out pretty quick if I'm wrong...

Comment 20 Joe Rafaniello 2019-06-10 14:14:55 UTC

Hi, Parthvi,

To recreate this on a version that doesn't have this fix, you'll need to continually try to get two requests to hit the web service worker at the exact correct time.  If you get the requests to complete successfully, the timing wasn't right and you'll need to kill the web service worker and try again.

Before you run this, you'll want to decrease the web service workers count to 1.  Then, you should tail -f log/production.log and run the script below.

You'll be looking for a log message that shows up after a request "hangs" for 30+ seconds:

"Long running http(s) request"

Examples for this log line can be found in the PR that added it: https://github.com/ManageIQ/manageiq/pull/17842

If you don't get this message after you run the script below and wait 30+ seconds, you'll need to kill the web service worker.


2.times do
  Thread.new do
    `curl -L https://admin:smartvm@localhost/api/vms`
  end

  Thread.new do
    `curl -L https://admin:smartvm@localhost/api/notifications?expand=resources&attributes=details&sort_by=id&sort_order=desc&limit=100`
  end
end

Comment 24 Parthvi Vala 2019-06-14 15:01:37 UTC

FIXED. Verified on 5.11.0.8.

Note You need to log in before you can comment on or make changes to this bug.