Bug 1263512

Summary:

virt-who can't detect the connect to vcenter was timeout

Product:

Red Hat Enterprise Linux 7

Reporter:

Liushihui <shihliu>

Component:

virt-who

Assignee:

Radek Novacek <rnovacek>

Status:

CLOSED ERRATA

QA Contact:

Eko <hsun>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

7.2

CC:

hsun, ldai, ovasik, owwang, sgao, shihliu, wpinheir

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

virt-who-0.17-3.el7

Doc Type:

No Doc Update

Doc Text:

undefined

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-11-04 05:05:59 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1291737

Bug Blocks:

Attachments:

Description	Flags
virt-who's log	none

Description Liushihui 2015-09-16 03:29:17 UTC

Description of problem:
Block the connection between vcenter and virt-who when virt-who send host/guest mapping to server every interval,virt-who hasn't shown any error info after 15min

Version-Release number of selected component (if applicable):
python-rhsm-1.15.4-4.el7.x86_64
subscription-manager-1.15.9-7.el7.x86_64
virt-who-0.14-7.el7.noarch
SAM-1.4.1-RHEL-6-20141113.0

How reproducible:
Always

Steps to Reproduce:
1. Register system to SAM
2. Configure virt-who run at esx mode and refresh interval is 5s.
[root@hp-z220-05 ~]# cat /etc/sysconfig/virt-who  | grep -v ^# | grep -v ^$
VIRTWHO_DEBUG=1
VIRTWHO_INTERVAL=5
VIRTWHO_ESX=1
VIRTWHO_ESX_OWNER=ACME_Corporation
VIRTWHO_ESX_ENV=Library
VIRTWHO_ESX_SERVER=10.66.78.2
VIRTWHO_ESX_USERNAME=Administrator
VIRTWHO_ESX_PASSWORD=qwer1234P!
3. Restart virt-who service, virt-who send host/guest mapping to server every 5s
4. In vcenter, Set firewall to block the request from virt-who
5. Check the virt-who's status's and log
[root@hp-z220-05 ~]# ps -ef|grep virt-who
root      5433     1  0 09:56 ?        00:00:00 /usr/bin/python /usr/share/virt-who/virtwho.py
root      5478  5433  0 09:56 ?        00:00:00 /usr/bin/python /usr/share/virt-who/virtwho.py
root      6124  8324  0 10:08 pts/0    00:00:00 grep --color=auto virt-who 

Actual results:
After 15min ,virt-who hasn't shown any error info to remind "Connection timed out", please see the virt-who log in attachment.

Expected results:
virt-who should show error info when failed to connect vcenter

Additional info:
If the connection between vcenter and virt-who has blocked before start virt-who,virt-who can throw error info when start virt-who service.

Comment 1 Liushihui 2015-09-16 03:30:40 UTC

Created attachment 1073887 [details]
virt-who's log

Comment 2 Radek Novacek 2015-09-17 07:54:27 UTC

I don't see any reasonable way how to solve this. We would need some kind of heartbeat mechanism to poll if the connection to ESX is alive. But that would defeat the purpose of listening to ESX events.

virt-who now retries the connection to ESX after 30 minutes of inactivity. I think this interval is quite fine for this kind of unexpected event that shouldn't occur very often.

What do you think?

Comment 3 Liushihui 2015-09-17 08:28:39 UTC

I think it's ok to retries the connection to ESX after 30min. However, I also consider if we can show some remind info to customers about what happened to the connection between virt-who and vcenter. maybe it needn't to run like heartbeat mechanism, it just remind one time. Eg: "Failed to connect with vcenter, we'll try the connection again after 30min". is that acceptable?

Comment 4 Radek Novacek 2015-09-17 08:33:06 UTC

The problem is that virt-who can't detect that the connection is not working any more. From virt-who perspective, the connection is still fine without any communication from ESX. Without some check (heartbeat) there is no way to distinguish between just waiting for event and broken connection, both of them mean no communication.

Comment 5 Liushihui 2015-09-17 09:27:02 UTC

Radek, I understand virt-who can't detect the connection is not working when the refresh interval is too long since virt-who hasn't the heart-beat mechanism to detect it. However,I wonder why virt-who still can't detect that the connection is not working when the refresh interval is 5s. In my opinion, virt-who should get the host/guest mapping info from vcenter every 5s. If it can't get response from vcenter, it should detect the connection is not working. In this situation, it should show some error info. Maybe it needn't to send the error info every 5s, it just need to remind one time.

Comment 6 Radek Novacek 2015-09-22 06:42:48 UTC

I agree that virt-who behavior is not optimal. I'll look into it for next version.

Comment 7 Radek Novacek 2015-12-15 14:32:57 UTC

This issue has been resolved upstream. This will fixed by virt-who rebase.

Comment 8 Radek Novacek 2016-05-17 13:03:46 UTC

Fixed in virt-who-0.17-1.el7.

Comment 10 Liushihui 2016-06-03 00:15:46 UTC

Reopen it on virt-who-0.17-2.el7.noarch since virt-who can't generate any info if it failed to connect vcenter during monitor process.

Checked version:
subscription-manager-1.17.6-1.el7.x86_64
python-rhsm-1.17.2-1.el7.x86_64
python-rhsm-1.17.2-1.el7.x86_64

Checked process:
1. Register system to Satellite6.2
2. Configure virt-who run at esx mode and refresh interval is 60s.
3. Restart virt-who service, virt-who send host/guest mapping to server every 60s
4. In vcenter, Set firewall to block the connection from virt-who's machine
5. Check the virt-who's status's and log
[root@localhost ~]# ps -ef|grep virt-who
root     11254     1  0 03:34 ?        00:00:00 /usr/bin/python2 /usr/bin/virt-who
root     11261 11254  0 03:34 ?        00:00:00 /usr/bin/python2 /usr/bin/virt-who
root     20567 10354  0 03:46 pts/0    00:00:00 grep --color=auto virt-who

Result:
After 1 hour ,virt-who still hasn't generate any info.

Comment 11 Liushihui 2016-06-03 01:46:57 UTC

When it occurred this problem, virt-who still run normally. However, if restore the connection between vcenter and virt-who, virt-who still can't detect the connection(still no any info show on the log), unregister system, it will show error info as the following:
"
2016-06-02 21:41:34,664 [virtwho.main WARNING] MainProcess(11254):MainThread @executor.py:reload:326 - virt-who reload
2016-06-02 21:41:34,696 [virtwho.test-esx1 INFO] Esx-1(11261):MainThread @esx.py:logout:346 - Can't log out from ESX: Server raised fault: 'The session is not authenticated.'"

Comment 12 Radek Novacek 2016-06-09 10:36:49 UTC

Fixed upstream:

https://github.com/virt-who/virt-who/commit/300a7c7dd2aa638e154b896891cc9c093df810ed

Comment 13 Radek Novacek 2016-06-23 14:39:42 UTC

Fixed in virt-who-0.17-3.el7.

Comment 14 Liushihui 2016-07-27 05:59:13 UTC

Verified it on virt-who-0.17-6.el7 since virt-who can detect the disconnect of vcenter and it also can recover the connection after vcenter can reachable. Therefore, verify it.

Verified version:
virt-who-0.17-6.el7.noarch
subscription-manager-1.17.9-1.el7.x86_64 
python-rhsm-1.17.5-1.el7.x86_64

Verified process:
1. Register system to Satellite6.2
2. Configure virt-who run at esx mode and refresh interval is 60s.
3. Restart virt-who service, virt-who send host/guest mapping to server every 60s
4. In vcenter, Set firewall to block the connection from virt-who's machine
5. After 60s, check virt-who's log, virt-who will throw error info in the log
# tail -f /var/log/rhsm/rhsm.log
2016-07-27 01:52:58,899 [virtwho.env_cmdline ERROR] Esx-1(50919):MainThread @virt.py:run:377 - Virt backend 'env/cmdline' fails with exception:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/virtwho/virt/virt.py", line 372, in run
    self._run()
  File "/usr/lib/python2.7/site-packages/virtwho/virt/esx/esx.py", line 175, in _run
    options=options)
  File "/usr/lib/python2.7/site-packages/suds/client.py", line 542, in __call__
    return client.invoke(args, kwargs)
  File "/usr/lib/python2.7/site-packages/suds/client.py", line 602, in invoke
    result = self.send(soapenv)
  File "/usr/lib/python2.7/site-packages/suds/client.py", line 641, in send
    reply = transport.send(request)
  File "/usr/lib/python2.7/site-packages/virtwho/virt/esx/esx.py", line 96, in send
    verify=False
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 507, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 433, in send
    raise ReadTimeout(e, request=request)
ReadTimeout: HTTPSConnectionPool(host='10.73.2.95', port=443): Read timed out. (read timeout=64)
2016-07-27 01:52:58,900 [virtwho.env_cmdline INFO] Esx-1(50919):MainThread @virt.py:run:390 - Waiting 60 seconds before retrying backend 'env/cmdline'

6. In vcenter, delete the firewall to make virt-who connect vcenter normally.
7. After 60s, check virt-who's log, virt-who can throw normally info in the log.
# tail -f /var/log/rhsm/rhsm.log
2016-07-27 01:53:58,963 [virtwho.env_cmdline DEBUG] Esx-1(50919):MainThread @esx.py:_prepare:128 - Log into ESX
2016-07-27 01:54:00,581 [virtwho.env_cmdline DEBUG] Esx-1(50919):MainThread @esx.py:_prepare:131 - Creating ESX event filter
2016-07-27 01:54:01,263 [virtwho.env_cmdline DEBUG] Esx-1(50919):MainThread @virt.py:enqueue:357 - Report for config "env/cmdline" gathered, putting to queue for sending
2016-07-27 01:54:01,264 [virtwho.main INFO] MainProcess(50912):MainThread @executor.py:run:250 - Report for config "env/cmdline" hasn't changed, not sending

Comment 17 errata-xmlrpc 2016-11-04 05:05:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2387.html