Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1687190 - sporadic timeouts in opening TCP connection prevents Satellite upgrades
Summary: sporadic timeouts in opening TCP connection prevents Satellite upgrades
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Upgrades
Version: 6.4.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: 6.5.0
Assignee: satellite6-bugs
QA Contact: Nikhil Kathole
URL:
Whiteboard:
Depends On:
Blocks: 1688317
TreeView+ depends on / blocked
 
Reported: 2019-03-10 19:18 UTC by Pavel Moravec
Modified: 2023-10-06 18:10 UTC (History)
5 users (show)

Fixed In Version: foreman-1.20.1.25-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1688317 (view as bug list)
Environment:
Last Closed: 2019-05-14 12:40:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1549088 0 medium CLOSED Various Action:: Tasks types stop with warning: "no such file or directory" for file in foreman cache hierarchy 2022-03-13 14:43:51 UTC
Red Hat Product Errata RHSA-2019:1222 0 None None None 2019-05-14 12:40:32 UTC

Description Pavel Moravec 2019-03-10 19:18:00 UTC
Description of problem:
Upgrading Satellite (e.g. from 6.3 to 6.4) can fail at random rake script with backtrace like:

RestClient::Exceptions::OpenTimeout: Timed out connecting to server
/opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:731:in `rescue in transmit'
/opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:642:in `transmit'
/opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:145:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:52:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/resource.rb:67:in `post'
/opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/base.rb:92:in `get_response'
/opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/base.rb:72:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/resources/repository.rb:147:in `unit_search'
/opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/extensions/repository.rb:199:in `distributions'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/repository.rb:620:in `import_distribution_data'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/repository.rb:615:in `block in import_distributions'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activerecord-5.1.6/lib/active_record/relation/delegation.rb:39:in `each'
/opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activerecord-5.1.6/lib/active_record/relation/delegation.rb:39:in `each'

(now it depends per use case / rake script failed, e.g. few times:

/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/repository.rb:614:in `import_distributions'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/lib/katello/tasks/upgrades/3.7/make_all_ks_repos_bootable.rake:7:in `block (4 levels) in <top (required)>'
/opt/rh/rh-ruby24/root/usr/share/gems/gems/rake-12.0.0/exe/rake:27:in `<top (required)>'

)

Usually it happens in a rake script that calls REST requests in a tight loop. Very random occurrences, more often on 6.5.


inecas++ ++ found out a fix:

diff --git a/app/models/setting.rb b/app/models/setting.rb
index 72a094024aa135fa41e0f6454d07c1d79fa3ac0a..e19ad55b24afb74c83f9e3033db2756541830fab 100644
--- a/app/models/setting.rb
+++ b/app/models/setting.rb
@@ -93,12 +93,8 @@ class Setting < ApplicationRecord
   def self.[](name)
     name = name.to_s
     cache_value = Setting.cache.read(name)
-    if cache_value.nil?
-      value = find_by(:name => name).try(:value)
-      Setting.cache.write(name, value)
-      return value
-    else
-      return cache_value
+    cache.fetch(name) do
+      find_by(:name => name).try(:value)
     end
   end

that really works.

This BZ is rather a tracker to ensure we fix it in downstream / to decide where all to fix it.


Version-Release number of selected component (if applicable):
Sat 6.4.2


How reproducible:
very sporadicaly during upgrade. Bit artificial reproducer found by Ivan is provided below.


Steps to Reproduce:
1. Decrease katello rest timeout to see the timeouts much sooner than after 1 hour:
/etc/foreman/plugins/katello.yaml :

  :rest_client_timeout: 30

2. Run either (or all) scripts that fires REST requests in a tight loop:

a) requests to 127.0.0.1/pub :
mkdir -p ~/repro
cat <<EOF | foreman-rake console | tee ~/repro/pub-$(date -Isec).log
   conf.return_format = ""
   log = Logger.new(STDOUT)
   log.info "=== process pid #{\$\$}"
   url = URI.parse('http://127.0.0.1/pub/')
   threads = 1.times.map do |i|
     Thread.new do
       log.info "Start thread no. #{i+1}"
         iteration = 0
         cycle = 0
         while true
         begin
           cycle += 1
           if cycle == 1000
             iteration += 1
             cycle = 0
             log.info "=== Thread #{i+1}: iteration #{iteration}k"
           end
           Setting['http_cache']
           Net::HTTP.get_response(url)
         rescue Exception => e
           log.info "Thread #{i+1}: error at iteration #{iteration * 1000 + cycle}"
           log.info "#{e.class} #{e.message}"
           log.info e.backtrace.join("\n")
           sleep(1)
         end
       end
     end
   end
   threads.each(&:join)
EOF

b) requests to pulp repo details:
cat <<EOF | foreman-rake console | tee ~/repro/pulp-$(date -Isec).log
   conf.return_format = ""
   log = Logger.new(STDOUT)
   pulp_id = Katello::Repository.first.pulp_id
   Katello.pulp_server.extensions.repository
   threads = 4.times.map do |i|
     Thread.new do
       log.info "Start thread no. #{i+1}"
       iteration = 0
       cycle = 0
       while true        
         begin
           cycle += 1
           if cycle == 1000
             iteration += 1
             cycle = 0
             log.info "=== iteration #{iteration}k"
           end
           Katello.pulp_server.extensions.repository.distributions(pulp_id)
         rescue Exception => e
           log.info "Error at iteration #{iteration * 1000 + cycle}"
           log.info "#{e.class} #{e.message}"
           log.info e.backtrace.join("\n")
         end
       end
     end
   end
   threads.each(&:join)
EOF

c) optionally, run requests against candlepin like (in rake console):

  4.times do
    Thread.new do
      begin
      org = Organization.first
      label = org.label
      iteration = 0
      cycle = 0
      while true        
        cycle += 1
        if cycle == 1000
          iteration += 1
          cycle = 0
          puts "=== iteration #{iteration}k"
        end
        Katello::Resources::Candlepin::Owner.get_ueber_cert(label)
      end
      rescue Exception => e
        puts e.message
        puts e.backtrace.join("\n")
        raise e
      end
    end
  end

3. Wait few hours and count number of timeouts


Actual results:
3. VERY depending on system, few to many timeouts happen


Expected results:
3. no timeouts


Additional info:

Comment 10 Nikhil Kathole 2019-03-29 08:37:59 UTC
VERIFIED

Version tested: 
Satellite 6.5 snap 21

Able to reproduce with 6.4 but could not see timeout on 6.5 snap 21 with above reproducer steps.

I, [2019-03-29T04:18:54.060583 #5203]  INFO -- : === Thread 1: iteration 163k
I, [2019-03-29T04:18:56.548624 #5203]  INFO -- : === Thread 1: iteration 164k
I, [2019-03-29T04:18:59.073265 #5203]  INFO -- : === Thread 1: iteration 165k
I, [2019-03-29T04:19:01.414366 #5203]  INFO -- : === Thread 1: iteration 166k
I, [2019-03-29T04:19:03.754766 #5203]  INFO -- : === Thread 1: iteration 167k

I, [2019-03-29T04:35:39.720092 #12961]  INFO -- : === iteration 19k
I, [2019-03-29T04:35:40.208662 #12961]  INFO -- : === iteration 19k
I, [2019-03-29T04:35:41.687506 #12961]  INFO -- : === iteration 19k
I, [2019-03-29T04:36:22.321864 #12961]  INFO -- : === iteration 20k
I, [2019-03-29T04:36:22.376873 #12961]  INFO -- : === iteration 20k
I, [2019-03-29T04:36:23.768408 #12961]  INFO -- : === iteration 20k
I, [2019-03-29T04:36:25.250421 #12961]  INFO -- : === iteration 20k

Comment 13 errata-xmlrpc 2019-05-14 12:40:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222


Note You need to log in before you can comment on or make changes to this bug.