Bug 1965218

Summary: pulp3: satellite-maintain content prepare failed when run as nohup job
Product: Red Hat Satellite Reporter: ir. Jan Gerrit Kootstra <jangerrit.kootstra>
Component: Satellite MaintainAssignee: Anurag Patel <apatel>
Status: CLOSED ERRATA QA Contact: Gaurav Talreja <gtalreja>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.10.0CC: apatel, aupadhye, ekohlvan, jjeffers, jsherril, kgaikwad, osousa, peter.vreman
Target Milestone: 6.9.7Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rubygem-foreman_maintain-0.7.14,tfm-rubygem-katello-3.18.1.45-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-10 16:20:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1957813    
Attachments:
Description Flags
the script used to run the required command none

Description ir. Jan Gerrit Kootstra 2021-05-27 08:12:41 UTC
Created attachment 1787519 [details]
the script used to run the required command

Description of problem:

nohup ./satellite-maintain-content-prepare.sh > content-prepare.log 2>&1
fails with error:

Failed executing foreman-rake katello:pulp3_migration, exit status 256

content of the script:

satellite-maintain content prepare

Version-Release number of selected component (if applicable):

6.9.2

How reproducible:


Steps to Reproduce:
1. create the script and make it executable
2. run the nohup command
3. tail -f logfile

Actual results:

Failed executing foreman-rake katello:pulp3_migration, exit status 256

Expected results:

no errors as the command does not require input

Additional info:

Comment 1 ir. Jan Gerrit Kootstra 2021-05-27 08:20:30 UTC
running the command in the foreground:

satellite-maintain content prepare

There seems to be no issue.

Comment 2 Peter Vreman 2021-06-17 07:47:38 UTC
A severity Urgent match more the impact on the users starting the migration process.
I also hit this issue directly out-of-the-box when trying the 'satellite-maintian content prepare' step

It is really confusing the users what is happening when using nohup or any other standard bash background process.


I made it work with the following patches
- replace script with tee
- rake task add flush to stdout (otherwise it buffers ~8K)
- add preserve_output to rake command to write multiple lines instead of overwriting the same line

~~~
--- /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.29/lib/katello/tasks/pulp3_migration.rake.210617-1      2021-06-03 15:05:18.000000000 +0000
+++ /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.18.1.29/lib/katello/tasks/pulp3_migration.rake       2021-06-17 07:22:45.320625830 +0000
@@ -24,6 +24,7 @@
         message = "#{Time.now.to_s}: #{task.humanized[:output]}"
         clear_count = message.length + 1
         $stdout.print(message)
+        $stdout.flush

         sleep(10)
         task = ForemanTasks::Task.find(task.id)
--- /usr/share/gems/gems/foreman_maintain-0.7.8/definitions/procedures/content/prepare.rb
+++ /usr/share/gems/gems/foreman_maintain-0.7.8/definitions/procedures/content/prepare.rb
@@ -7,7 +7,7 @@

     def run
       # use interactive to get realtime output
-      puts execute!('foreman-rake katello:pulp3_migration', :interactive => true)
+      puts execute!('foreman-rake katello:pulp3_migration preserve_output=true', :interactive => true)
     end
   end
 end
--- /usr/share/gems/gems/foreman_maintain-0.7.8/foreman_maintain/utils/command_runner.rb
+++ /usr/share/gems/gems/foreman_maintain-0.7.8/lib/foreman_maintain/utils/command_runner.rb
@@ -64,7 +64,7 @@
         # running interactively
         log_file = Tempfile.open('captured-output')
         exit_file = Tempfile.open('captured-exit-code')
-        Kernel.system("script -qc '#{full_command}; echo $? > #{exit_file.path}' #{log_file.path}")
+        Kernel.system("bash -c '#{full_command}; echo $? > #{exit_file.path}' | tee -i #{log_file.path}")
         File.open(log_file.path) { |f| @output = f.read }
         File.open(exit_file.path) do |f|
           exit_status = f.read.strip
~~~

Comment 3 James Jeffers 2021-07-29 16:02:05 UTC
Created redmine issue https://projects.theforeman.org/issues/33183 from this bug

Comment 4 Peter Vreman 2021-08-09 08:55:00 UTC
James,

The problem is that 'script' does not work nice togeher with nohup (or any other standard shell backgrounding from shell) because 'script' changes the 'setsid' and detaches from the parent process when going in the background, then the 'nohup' does not have a sub process to follow anymore

The PR https://github.com/theforeman/foreman_maintain/pull/513 is sadly implemented only half implement the proposed patch. It missing the important part to replace 'script' with standard 'bash' And because 'bash' does not log to files the 'tee' was needed.



Peter

Comment 6 Bryan Kearney 2021-08-25 20:04:50 UTC
Upstream bug assigned to apatel

Comment 7 Bryan Kearney 2021-08-25 20:04:52 UTC
Upstream bug assigned to apatel

Comment 8 Bryan Kearney 2021-09-01 20:05:02 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/33183 has been resolved.

Comment 12 James Jeffers 2021-10-11 14:02:59 UTC
I see where I am confused. The Katello change associated with this was not picked in.

Comment 13 Brad Buckingham 2021-10-11 14:35:57 UTC
Clearing needinfos based upon comment 12

Comment 15 Gaurav Talreja 2021-11-02 13:59:48 UTC
Verified.

Tested on Satellite 6.9.7 Snap 3.0
Version: rubygem-foreman_maintain-0.7.14-1.el7sat.noarch

Steps:
1. echo "satellite-maintain content prepare" > content_prepare.sh
2. nohup ./content_prepare.sh > content-prepare.log 2>&1

Observation:
content prepare command works when run as nohup job without any issues, also works in the foreground.

Change in f-m interactive command runner from `script` to `bash` started causing BZ 2013630, so also tested nohup job with new command runner `stdbuf` and it works fine.

Comment 18 errata-xmlrpc 2021-11-10 16:20:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Satellite Maintenance 6.9.7 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4611