Bug 906795
| Summary: | engine doesn't poll vdsm host status after network error. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] oVirt | Reporter: | Mark Wu <wudxw> | ||||||
| Component: | ovirt-engine-core | Assignee: | Yaniv Bronhaim <ybronhei> | ||||||
| Status: | CLOSED NOTABUG | QA Contact: | |||||||
| Severity: | unspecified | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 3.2 | CC: | acathrow, iheim, jkt, wudxw | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | infra | ||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2013-08-28 07:18:33 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
*** Bug 906796 has been marked as a duplicate of this bug. *** Your host doesn't response to refreshVdsRunTimeInfo requests due to the network errors. It might be because you use secure connection on host. Try to set ssl off and check again: vdsm.conf: ssl=False libvirtd.conf: listen_tcp=1, auth_tcp="none" qemu.conf: spice_tls=0. Engine tries to send getCaps to the host until the response is returned. Until then you'll see in the UI that the host in non-responsive. Yes, but should engine keep polling the status? The problem is that the network connectivity comes back, but the host is still non-responsive. And I can't see any packet sent to the vdsm host. I did use the default secure network connection. But anyway, I should be able to see the packet header if request was sent to host? So why does engine not send refreshVdsRunTimeInfo request to the host? I am sorry that the test env has been destroyed already because I can't do anything on the admin UI to recovery it. Created attachment 698802 [details]
refreshing host - engine log
I added an host via the webadmin, blocked all incoming packets from the host's address and waited until the host became non-operational.
Then, I disabled the firewall and checked that the connection between the two worked. After 1 minute or so the host turned up again.
I attached the log with my comments on it, let me know if you did something else.
By briefly reading your log, seems like you had another error with your host's installation there that doesn't related to this issue, I couldn't find the network exception in it.
Please check the log you attached and let me know if the scenario you describe is part of it.
Sorry, I checked different log, I can see in your log the network errors started at 2013-02-01 17:02:56,547 and since 2013-02-01 17:03:56,630 the engine tries to get storage pool info, and when you add an host again in 2013-02-01 18:14:39,695 you still have network problems. please try to reproduce it and tell me what you found. Is this bug still relevant? Was it an iptable problem or other configuration issues?? It's been awhile since it raised , so please update so we'll be able to move forward with that Thanks. Yaniv, I can't reproduce this problem with my relatively new setup. So I am going close this bug. Thanks. |
Created attachment 691590 [details] engine.log Description of problem: If the host gets non-responsive because of a network error, then it can't get chance to come up. The setup just include one data center, one cluster and one host. The problematic host is SPM. I can't do any operation to recover it. I can run 'vdsClient -s 0 getVdsCaps' on host and login the vdsm host from engine. But I can't capture any network traffic target to vdsm host's 54321 port. Version-Release number of selected component (if applicable): ovirt-engine-3.2.0-2.fc18 How reproducible: Sometimes. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: