Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 906795

Summary: engine doesn't poll vdsm host status after network error.
Product: [Retired] oVirt Reporter: Mark Wu <wudxw>
Component: ovirt-engine-coreAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED NOTABUG QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.2CC: acathrow, iheim, jkt, wudxw
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-28 07:18:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log
none
refreshing host - engine log none

Description Mark Wu 2013-02-01 13:58:56 UTC
Created attachment 691590 [details]
engine.log

Description of problem:
If the host gets non-responsive because of a network error, then it can't get chance to come up. The setup just include one data center, one cluster and one host. The problematic host is SPM. I can't do any operation to recover it. I can run 'vdsClient -s 0 getVdsCaps' on host and login the vdsm host from engine. But I can't capture any network traffic target to vdsm host's 54321 port.

Version-Release number of selected component (if applicable):
ovirt-engine-3.2.0-2.fc18

How reproducible:
Sometimes.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Mark Wu 2013-02-04 01:04:58 UTC
*** Bug 906796 has been marked as a duplicate of this bug. ***

Comment 2 Yaniv Bronhaim 2013-02-04 12:12:38 UTC
Your host doesn't response to refreshVdsRunTimeInfo requests due to the network errors. It might be because you use secure connection on host. Try to set ssl off and check again:

vdsm.conf: ssl=False
libvirtd.conf: listen_tcp=1, auth_tcp="none"
qemu.conf: spice_tls=0.

Engine tries to send getCaps to the host until the response is returned. Until then you'll see in the UI that the host in non-responsive.

Comment 3 Mark Wu 2013-02-05 08:52:26 UTC
Yes, but should engine keep polling the status? The problem is that the network connectivity comes back, but the host is still non-responsive.  And I can't see any packet sent to the vdsm host. I did use the default secure network connection.
But anyway, I should be able to see the packet header if request was sent to host?
So why does engine not send refreshVdsRunTimeInfo request to the host?

I am sorry that the test env has been destroyed already because I can't do anything on the admin UI to recovery it.

Comment 4 Yaniv Bronhaim 2013-02-18 10:05:48 UTC
Created attachment 698802 [details]
refreshing host - engine log

I added an host via the webadmin, blocked all incoming packets from the host's address and waited until the host became non-operational. 
Then, I disabled the firewall and checked that the connection between the two worked. After 1 minute or so the host turned up again.
I attached the log with my comments on it, let me know if you did something else.

By briefly reading your log, seems like you had another error with your host's installation there that doesn't related to this issue, I couldn't find the network exception in it.

Please check the log you attached and let me know if the scenario you describe is part of it.

Comment 5 Yaniv Bronhaim 2013-02-18 10:33:29 UTC
Sorry, I checked different log, I can see in your log the network errors started at 2013-02-01 17:02:56,547 and since 2013-02-01 17:03:56,630 the engine tries to get storage pool info, and when you add an host again in 2013-02-01 18:14:39,695 you still have network problems. please try to reproduce it and tell me what you found.

Comment 6 Yaniv Bronhaim 2013-08-27 17:57:49 UTC
Is this bug still relevant? Was it an iptable problem or other configuration issues?? It's been awhile since it raised , so please update so we'll be able to move forward with that

Thanks.

Comment 7 Mark Wu 2013-08-28 07:18:33 UTC
Yaniv,
I can't reproduce this problem with my relatively new setup.  So I am going close this bug.

Thanks.