Bug 654456 - subscription manager caches failed connection
Summary: subscription manager caches failed connection
Alias: None
Product: Candlepin
Classification: Community
Component: candlepin
Version: 0.5
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Chris Duryee
QA Contact: spandey
Depends On:
Blocks: Entitlement-Beta
TreeView+ depends on / blocked
Reported: 2010-11-17 21:52 UTC by Eric Blake
Modified: 2015-05-14 15:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2011-02-02 19:04:27 UTC

Attachments (Terms of Use)
error rhsm.log (141.73 KB, text/x-log)
2011-01-21 09:47 UTC, spandey
no flags Details
strace.out (8.57 KB, text/plain)
2011-02-02 13:02 UTC, spandey
no flags Details

Description Eric Blake 2010-11-17 21:52:53 UTC
Description of problem:
The subscription manager remembers if it failed to connect to the entitlement server, and even if steps are taken to allow that connection, it requires restarting the subscription manager before a new connection attempt can take place.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. take down the vpn connection
2. start subscription-manager-gui
3. try to register a system
4. restore the vpn connection
5. try to register a system
Actual results:
At step 3, the attempt fails with 'network error, unable to connect to server', which is expected.  But at step 5, even though 'ping candlepin.redhat.com' succeeds, the manager still gives the same network error failure.  In other words, it is caching the failed connection attempt and not retrying.

Expected results:
each attempt to connect should start from scratch, in case I've brought up the vpn in the meantime

exit the subscription manager gui, bring up the network, then restart connection manager; at that point, the connection attempt will succeed.

Additional info:

Comment 1 Devan Goodwin 2010-11-18 18:40:17 UTC
I think this is ok in the new UI.

Tested by shutting down the tomcat6 server hosting candlepin, attempts to register and such will popup a network error dialog. I can then bring the server back up and re-attempt (without re-starting RHSM gui) and the operation succeeds. This is likely because of how we manage connections vs the globals used before.

Comment 2 spandey 2011-01-21 09:47:14 UTC
Again I am able to repro issue

Prequisities : 


Steps to verify : 

1) Disable client network
2) Invoke Subscription manager gui 
3) try to register client to candlepin 
4) Enable client network 
5) try to register to candlepin 

Expected Result :  
Client registration process should fail at step 3 
Client should register to candlepin at step 5 

Actual Result : 
Client registration process is getting failed at step 3 and 5.
Able to register client to candlepin  after subscription manager restart .

Comment 3 spandey 2011-01-21 09:47:46 UTC
Created attachment 474605 [details]
error rhsm.log

Comment 4 Chris Duryee 2011-01-27 19:30:54 UTC
I'm not able to reproduce this. Can you do me a favor, and run the following during step #2?

strace -f -s 128 -e network -o /tmp/strace.out subscription-manager-gui

After that, attach strace.out to this bug and I'll take a look at what's going on.

Comment 5 spandey 2011-02-02 13:00:46 UTC
Again I am bale to repro this 

Attached strace.out

Comment 6 spandey 2011-02-02 13:02:10 UTC
Created attachment 476563 [details]

Comment 7 Chris Duryee 2011-02-02 19:04:27 UTC
Here is what's going on:

The issue at hand is related to dnsmasq, a dns caching utility that runs on most systems.

When you start up sm-gui, glibc will determine the best network interface to make DNS calls on. In this case, it will be the loopback interface, since that's the only one with a running DNS "server" (dnsmasq). All calls to this will return "host not found", which is correct behavior since you're not connected to anything and the cache is empty.

Once you restart your network, dnsmasq will still return "host not found" for subscriptions.webqa.redhat.com, since it uses a fancy algorithm that doesn't prefer company-specific DNS servers over public servers (see http://www.thekelleys.org.uk/dnsmasq/docs/FAQ for more detail). This usually isn't an issue, since glibc will try eth0 for the DNS query and it will work. However, if the process is still only using the loopback interface, this is an issue and you get the "host not found" error.

I'm going to mark this as NOTABUG, since it's a design decision with glibc and dnsmasq, let me know if you disagree and we can look into it more.

Note You need to log in before you can comment on or make changes to this bug.