1406489 – 500 Error: Remote Execution fails upon re-run of an existing job

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1406489 - 500 Error: Remote Execution fails upon re-run of an existing job

Summary: 500 Error: Remote Execution fails upon re-run of an existing job

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Remote Execution
Sub Component:
Version:	6.2.6
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	Unspecified
Assignee:	Shimon Shtein
QA Contact:	Ivan Necas
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1441119 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-12-20 16:55 UTC by Marc Richter
Modified:	2021-03-11 14:52 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-02-21 16:54:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Screenshot after verification (40.67 KB, image/png) 2017-08-16 08:42 UTC, Ivan Necas	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	18316	0	Normal	Closed	Rerun job with failed hosts fails with "Stack level too deep" in the log.	2020-11-17 13:38:21 UTC

Description Marc Richter 2016-12-20 16:55:36 UTC

Created attachment 1233935 [details]
Status of remote execution jobs before attempting re-run

Description of problem: Remote execution job service rhsmcertd ran on 835 hosts. When customer tried to rerun the failed jobs, the error message "Oops, we're sorry but something went wrong stack level too deep" appeared. 


Version-Release number of selected component (if applicable): Satellite 6.2.6. Was occurring with 6.2.4 as well.


How reproducible: Attempt to re-run failed remote execution jobs with more than a handful of hosts



Actual results:
Error page

Expected results:
Jobs re-run

Additional info:

Comment 1 Marc Richter 2016-12-20 16:58:56 UTC

Additional info from customer - error seems to happen when there are more than 240 failed jobs that need to be re-run.

Comment 3 Shimon Shtein 2017-01-31 09:29:08 UTC

Couldn't reproduce this bug.
Is it possible to attach foreman-debug output?

Comment 4 Marc Richter 2017-01-31 14:59:30 UTC

What build are you trying to reproduce on? One of the errata notes in 6.2.7 led me to believe that this may be fixed already.

Comment 5 Shimon Shtein 2017-01-31 16:16:37 UTC

On latest snap. What did you see there?

Comment 6 Marc Richter 2017-01-31 16:20:09 UTC

From the 6.2.7 notes:

* Remote Execution against many hosts was causing errors to appear. This 
case is now handled correctly. (BZ#1367606, BZ#1372708) 

Customer is upgrading to 6.2.7 this Friday. I'm curious to see if the errors go away.

Comment 7 Shimon Shtein 2017-01-31 16:30:47 UTC

Sounds promising.
Let's wait for Friday and see if it helps.
If it doesn't, please attach foreman-debug to this BZ.

Thanks!

Comment 8 Marc Richter 2017-01-31 16:32:32 UTC

Yep, that was my Evil Plan. ;-)

Comment 12 Shimon Shtein 2017-04-06 06:25:19 UTC

Connecting redmine issue http://projects.theforeman.org/issues/18316 from this bug

Comment 13 Adam Ruzicka 2017-04-11 10:01:06 UTC

*** Bug 1441119 has been marked as a duplicate of this bug. ***

Comment 19 Ivan Necas 2017-08-16 08:41:54 UTC


Verification steps:

1. prepare large amount of fake hosts
cat <<END | bundle exec rails console
User.current = User.first
group = Hostgroup.unscoped.find_or_create_by(:name => 'fakes')
group.save
location = Location.first
organization = Organization.first
group.locations << location
group.organizations << organization
1000.times do |i|
  i = i+11
  puts i
  h = Host.new(:name => "host-#{i+10}.sat.test")
  h.hostgroup_id = group.id
  h.organization = organization
  h.location = location
  h.save!
end
END
2. run any job against the `~ test` hosts
3. wait until it fails
4. use 'Rerun failed'
5. form is displayed properly and running the job works (after selecting the type of query, that we track in https://bugzilla.redhat.com/show_bug.cgi?id=1481981

Comment 20 Ivan Necas 2017-08-16 08:42:22 UTC

Created attachment 1314000 [details]
Screenshot after verification

Comment 21 Satellite Program 2018-02-21 16:54:37 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> > 
> > For information on the advisory, and where to find the updated files, follow the link below.
> > 
> > If the solution does not work for you, open a new bug report.
> > 
> > https://access.redhat.com/errata/RHSA-2018:0336

Note You need to log in before you can comment on or make changes to this bug.