Bug 1687190
| Summary: | sporadic timeouts in opening TCP connection prevents Satellite upgrades | |||
|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> | |
| Component: | Upgrades | Assignee: | satellite6-bugs <satellite6-bugs> | |
| Status: | CLOSED ERRATA | QA Contact: | Nikhil Kathole <nkathole> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 6.4.2 | CC: | egolov, inecas, ktordeur, mbacovsk, sthirugn | |
| Target Milestone: | 6.5.0 | Keywords: | FieldEngineering, PrioBumpField, Triaged | |
| Target Release: | Unused | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | foreman-1.20.1.25-1 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1688317 (view as bug list) | Environment: | ||
| Last Closed: | 2019-05-14 12:40:24 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1688317 | |||
VERIFIED Version tested: Satellite 6.5 snap 21 Able to reproduce with 6.4 but could not see timeout on 6.5 snap 21 with above reproducer steps. I, [2019-03-29T04:18:54.060583 #5203] INFO -- : === Thread 1: iteration 163k I, [2019-03-29T04:18:56.548624 #5203] INFO -- : === Thread 1: iteration 164k I, [2019-03-29T04:18:59.073265 #5203] INFO -- : === Thread 1: iteration 165k I, [2019-03-29T04:19:01.414366 #5203] INFO -- : === Thread 1: iteration 166k I, [2019-03-29T04:19:03.754766 #5203] INFO -- : === Thread 1: iteration 167k I, [2019-03-29T04:35:39.720092 #12961] INFO -- : === iteration 19k I, [2019-03-29T04:35:40.208662 #12961] INFO -- : === iteration 19k I, [2019-03-29T04:35:41.687506 #12961] INFO -- : === iteration 19k I, [2019-03-29T04:36:22.321864 #12961] INFO -- : === iteration 20k I, [2019-03-29T04:36:22.376873 #12961] INFO -- : === iteration 20k I, [2019-03-29T04:36:23.768408 #12961] INFO -- : === iteration 20k I, [2019-03-29T04:36:25.250421 #12961] INFO -- : === iteration 20k Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:1222 |
Description of problem: Upgrading Satellite (e.g. from 6.3 to 6.4) can fail at random rake script with backtrace like: RestClient::Exceptions::OpenTimeout: Timed out connecting to server /opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:731:in `rescue in transmit' /opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:642:in `transmit' /opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:145:in `execute' /opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/request.rb:52:in `execute' /opt/theforeman/tfm/root/usr/share/gems/gems/rest-client-2.0.1/lib/restclient/resource.rb:67:in `post' /opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/base.rb:92:in `get_response' /opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/base.rb:72:in `call' /opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/resources/repository.rb:147:in `unit_search' /opt/theforeman/tfm/root/usr/share/gems/gems/runcible-2.8.1/lib/runcible/extensions/repository.rb:199:in `distributions' /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/repository.rb:620:in `import_distribution_data' /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/repository.rb:615:in `block in import_distributions' /opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activerecord-5.1.6/lib/active_record/relation/delegation.rb:39:in `each' /opt/theforeman/tfm-ror51/root/usr/share/gems/gems/activerecord-5.1.6/lib/active_record/relation/delegation.rb:39:in `each' (now it depends per use case / rake script failed, e.g. few times: /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/app/models/katello/repository.rb:614:in `import_distributions' /opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.7.0.46/lib/katello/tasks/upgrades/3.7/make_all_ks_repos_bootable.rake:7:in `block (4 levels) in <top (required)>' /opt/rh/rh-ruby24/root/usr/share/gems/gems/rake-12.0.0/exe/rake:27:in `<top (required)>' ) Usually it happens in a rake script that calls REST requests in a tight loop. Very random occurrences, more often on 6.5. inecas++ ++ found out a fix: diff --git a/app/models/setting.rb b/app/models/setting.rb index 72a094024aa135fa41e0f6454d07c1d79fa3ac0a..e19ad55b24afb74c83f9e3033db2756541830fab 100644 --- a/app/models/setting.rb +++ b/app/models/setting.rb @@ -93,12 +93,8 @@ class Setting < ApplicationRecord def self.[](name) name = name.to_s cache_value = Setting.cache.read(name) - if cache_value.nil? - value = find_by(:name => name).try(:value) - Setting.cache.write(name, value) - return value - else - return cache_value + cache.fetch(name) do + find_by(:name => name).try(:value) end end that really works. This BZ is rather a tracker to ensure we fix it in downstream / to decide where all to fix it. Version-Release number of selected component (if applicable): Sat 6.4.2 How reproducible: very sporadicaly during upgrade. Bit artificial reproducer found by Ivan is provided below. Steps to Reproduce: 1. Decrease katello rest timeout to see the timeouts much sooner than after 1 hour: /etc/foreman/plugins/katello.yaml : :rest_client_timeout: 30 2. Run either (or all) scripts that fires REST requests in a tight loop: a) requests to 127.0.0.1/pub : mkdir -p ~/repro cat <<EOF | foreman-rake console | tee ~/repro/pub-$(date -Isec).log conf.return_format = "" log = Logger.new(STDOUT) log.info "=== process pid #{\$\$}" url = URI.parse('http://127.0.0.1/pub/') threads = 1.times.map do |i| Thread.new do log.info "Start thread no. #{i+1}" iteration = 0 cycle = 0 while true begin cycle += 1 if cycle == 1000 iteration += 1 cycle = 0 log.info "=== Thread #{i+1}: iteration #{iteration}k" end Setting['http_cache'] Net::HTTP.get_response(url) rescue Exception => e log.info "Thread #{i+1}: error at iteration #{iteration * 1000 + cycle}" log.info "#{e.class} #{e.message}" log.info e.backtrace.join("\n") sleep(1) end end end end threads.each(&:join) EOF b) requests to pulp repo details: cat <<EOF | foreman-rake console | tee ~/repro/pulp-$(date -Isec).log conf.return_format = "" log = Logger.new(STDOUT) pulp_id = Katello::Repository.first.pulp_id Katello.pulp_server.extensions.repository threads = 4.times.map do |i| Thread.new do log.info "Start thread no. #{i+1}" iteration = 0 cycle = 0 while true begin cycle += 1 if cycle == 1000 iteration += 1 cycle = 0 log.info "=== iteration #{iteration}k" end Katello.pulp_server.extensions.repository.distributions(pulp_id) rescue Exception => e log.info "Error at iteration #{iteration * 1000 + cycle}" log.info "#{e.class} #{e.message}" log.info e.backtrace.join("\n") end end end end threads.each(&:join) EOF c) optionally, run requests against candlepin like (in rake console): 4.times do Thread.new do begin org = Organization.first label = org.label iteration = 0 cycle = 0 while true cycle += 1 if cycle == 1000 iteration += 1 cycle = 0 puts "=== iteration #{iteration}k" end Katello::Resources::Candlepin::Owner.get_ueber_cert(label) end rescue Exception => e puts e.message puts e.backtrace.join("\n") raise e end end end 3. Wait few hours and count number of timeouts Actual results: 3. VERY depending on system, few to many timeouts happen Expected results: 3. no timeouts Additional info: