Bug 1261255 - Failed to run "virt-who -o -d --vdsm " as "500 Internal Server Error"
Summary: Failed to run "virt-who -o -d --vdsm " as "500 Internal Server Error"
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: ---
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Piotr Kliczewski
QA Contact: Eko
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-09 05:17 UTC by Liushihui
Modified: 2022-06-30 13:02 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-10 12:06:45 UTC
oVirt Team: Infra
Embargoed:
sbonazzo: ovirt-4.1-


Attachments (Terms of Use)
vdsm.log file (49.70 KB, text/plain)
2015-09-10 09:04 UTC, Radek Novacek
no flags Details
vdsm_libvirt.log (113.62 KB, application/x-gzip)
2015-09-17 08:56 UTC, Liushihui
no flags Details
new_vdsm_libvirt_log (505.85 KB, application/x-gzip)
2015-09-17 09:21 UTC, Liushihui
no flags Details
vdsm-4.17.9-1.log (931.55 KB, text/plain)
2015-10-22 09:16 UTC, Liushihui
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-46847 0 None None None 2022-06-30 13:02:16 UTC
oVirt gerrit 47637 0 'None' 'ABANDONED' 'xmlrpc: retry when flushing socket' 2019-12-02 10:08:26 UTC

Description Liushihui 2015-09-09 05:17:11 UTC
Description of problem:
Sometimes failed to run "virt-who -o -d --vdsm" as "ProtocolError: <ProtocolError for 10.66.144.3:54321/RPC2: 500 Internal Server Error>"

Version-Release number of selected component (if applicable):
virt-who-0.14-7.el7.noarch
subscription-manager-1.15.9-7.el7.x86_64
python-rhsm-1.15.4-4.el7.x86_64

How reproducible:
50%

Steps to Reproduce:
1.Register system to SAM 
2.Run "virt-who -o -d --vdsm" two or three times
[root@hp-z220-05 ~]# virt-who -o -d --vdsm 
2015-09-09 13:10:37,243 INFO: Using configuration "env/cmdline" ("vdsm" mode)
2015-09-09 13:10:37,342 DEBUG: Authenticating with certificate: /etc/pki/consumer/cert.pem
2015-09-09 13:10:37,364 INFO: Sending domain info: [
    {
        "guestId": "4f1f93b8-c4c5-424d-89d8-1822367d2274", 
        "state": 1, 
        "attributes": {
            "active": 1, 
            "virtWhoType": "vdsm", 
            "hypervisorType": "qemu"
        }
    }
]
2015-09-09 13:10:37,980 INFO: virt-who guest list update successful
2015-09-09 13:10:37,981 DEBUG: Association for config env/cmdline gathered
2015-09-09 13:10:37,981 DEBUG: virt-who shut down started
[root@hp-z220-05 ~]# virt-who -o -d --vdsm 
2015-09-09 13:10:40,280 INFO: Using configuration "env/cmdline" ("vdsm" mode)
2015-09-09 13:10:40,331 ERROR: Virt backend 'env/cmdline' fails with exception:
Traceback (most recent call last):
  File "/usr/share/virt-who/virt/virt.py", line 301, in run
    self._run()
  File "/usr/share/virt-who/virt/virt.py", line 332, in _run
    report = self._get_report()
  File "/usr/share/virt-who/virt/virt.py", line 276, in _get_report
    return DomainListReport(self.config, self.listDomains())
  File "/usr/share/virt-who/virt/vdsm/vdsm.py", line 123, in listDomains
    response = self.server.list(True)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/site-packages/M2Crypto/m2xmlrpclib.py", line 60, in request
    headers
ProtocolError: <ProtocolError for 10.66.144.3:54321/RPC2: 500 Internal Server Error>

Actual results:
When run "virt-who -o -d --vdsm" at the second or third time, it will failed to send h/g mapping as "ProtocolError"

Expected results:


Additional info:

Comment 1 Liushihui 2015-09-09 05:28:06 UTC
When it occur this problem, virt-who and vdsmd service work normally.

[root@hp-z220-05 ~]# systemctl status virt-who
● virt-who.service - Daemon for reporting virtual guest IDs to subscription-manager
   Loaded: loaded (/usr/lib/systemd/system/virt-who.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2015-09-09 13:25:25 CST; 13s ago
 Main PID: 22969 (python)
   CGroup: /system.slice/virt-who.service
           ├─22969 /usr/bin/python /usr/share/virt-who/virtwho.py
           └─22976 /usr/bin/python /usr/share/virt-who/virtwho.py

Sep 09 13:25:25 hp-z220-05.qe.lab.eng.nay.redhat.com virt-who[22969]: File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
Sep 09 13:25:25 hp-z220-05.qe.lab.eng.nay.redhat.com virt-who[22969]: return self.__send(self.__name, args)
Sep 09 13:25:25 hp-z220-05.qe.lab.eng.nay.redhat.com virt-who[22969]: File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
Sep 09 13:25:25 hp-z220-05.qe.lab.eng.nay.redhat.com virt-who[22969]: verbose=self.__verbose
Sep 09 13:25:25 hp-z220-05.qe.lab.eng.nay.redhat.com virt-who[22969]: File "/usr/lib64/python2.7/site-packages/M2Crypto/m2xmlrpclib.p...uest
Sep 09 13:25:25 hp-z220-05.qe.lab.eng.nay.redhat.com virt-who[22969]: headers
Sep 09 13:25:25 hp-z220-05.qe.lab.eng.nay.redhat.com virt-who[22969]: ProtocolError: <ProtocolError for 10.66.144.3:54321/RPC2: 500 I...ror>
Sep 09 13:25:25 hp-z220-05.qe.lab.eng.nay.redhat.com python[22976]: Virt backend 'env/cmdline' fails with exception:
Sep 09 13:25:25 hp-z220-05.qe.lab.eng.nay.redhat.com python[22976]: Waiting 3600 seconds before retrying backend 'env/cmdline'
Sep 09 13:25:25 hp-z220-05.qe.lab.eng.nay.redhat.com virt-who[22969]: 2015-09-09 13:25:25,296 INFO: Waiting 3600 seconds before retry...ine'
Hint: Some lines were ellipsized, use -l to show in full.



[root@hp-z220-05 ~]# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2015-09-09 11:20:14 CST; 2h 5min ago
  Process: 6030 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh --post-stop (code=exited, status=0/SUCCESS)
  Process: 6033 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 6105 (vdsm)
   CGroup: /system.slice/vdsmd.service
           ├─ 6105 /usr/bin/python /usr/share/vdsm/vdsm
           ├─ 6252 /usr/libexec/ioprocess --read-pipe-fd 73 --write-pipe-fd 71 --max-threads 10 --max-queued-requests 10
           ├─ 6270 /usr/libexec/ioprocess --read-pipe-fd 85 --write-pipe-fd 81 --max-threads 10 --max-queued-requests 10
           ├─ 6281 /usr/libexec/ioprocess --read-pipe-fd 91 --write-pipe-fd 89 --max-threads 10 --max-queued-requests 10
           ├─23004 /usr/libexec/ioprocess --read-pipe-fd 47 --write-pipe-fd 46 --max-threads 10 --max-queued-requests 10
           ├─23011 /usr/libexec/ioprocess --read-pipe-fd 57 --write-pipe-fd 54 --max-threads 10 --max-queued-requests 10
           └─23018 /usr/libexec/ioprocess --read-pipe-fd 65 --write-pipe-fd 64 --max-threads 10 --max-queued-requests 10

Sep 09 11:32:22 hp-z220-05.qe.lab.eng.nay.redhat.com python[6105]: DIGEST-MD5 make_client_response()
Sep 09 11:32:22 hp-z220-05.qe.lab.eng.nay.redhat.com python[6105]: DIGEST-MD5 client step 2
Sep 09 11:32:22 hp-z220-05.qe.lab.eng.nay.redhat.com python[6105]: DIGEST-MD5 parse_server_challenge()
Sep 09 11:32:22 hp-z220-05.qe.lab.eng.nay.redhat.com python[6105]: DIGEST-MD5 ask_user_info()
Sep 09 11:32:22 hp-z220-05.qe.lab.eng.nay.redhat.com python[6105]: DIGEST-MD5 make_client_response()
Sep 09 11:32:22 hp-z220-05.qe.lab.eng.nay.redhat.com python[6105]: DIGEST-MD5 client step 3
Sep 09 11:32:22 hp-z220-05.qe.lab.eng.nay.redhat.com vdsm[6105]: vdsm vm.Vm WARNING vmId=`4f1f93b8-c4c5-424d-89d8-1822367d2274`::Unk...found
Sep 09 11:32:22 hp-z220-05.qe.lab.eng.nay.redhat.com vdsm[6105]: vdsm vm.Vm WARNING vmId=`4f1f93b8-c4c5-424d-89d8-1822367d2274`::Unk...found
Sep 09 11:32:22 hp-z220-05.qe.lab.eng.nay.redhat.com vdsm[6105]: vdsm vm.Vm WARNING vmId=`4f1f93b8-c4c5-424d-89d8-1822367d2274`::Unk...found
Sep 09 11:32:22 hp-z220-05.qe.lab.eng.nay.redhat.com vdsm[6105]: vdsm vm.Vm ERROR vmId=`4f1f93b8-c4c5-424d-89d8-1822367d2274`::Alias... host
Hint: Some lines were ellipsized, use -l to show in full.

Comment 2 Radek Novacek 2015-09-10 09:04:13 UTC
vdsm daemon is returning this "ProtocolError: <ProtocolError for 10.66.144.3:54321/RPC2: 500 Internal Server Error>" error message. I don't know what is the reason for this message.

I'll reassign this bug to vdsm for now. Can someone tell me what could be the cause of this issue? It happens when virt-who tries to connect to vdsm xmlrpc API multiple times in short time period.

vdsm.log file is attached, what other information do you need?

vdsm-4.16.21-1.el7ev.x86_64

Comment 3 Radek Novacek 2015-09-10 09:04:43 UTC
Created attachment 1072063 [details]
vdsm.log file

Comment 4 Piotr Kliczewski 2015-09-16 11:27:44 UTC
Xmlrpc connections are not related and expected behavior. I can see in the log following entries:

VM Channels Listener::DEBUG::2015-09-10 17:01:42,845::vmchannels::133::vds::(_handle_unconnected) Trying to connect fileno 63.
VM Channels Listener::DEBUG::2015-09-10 17:01:42,846::guestagent::201::vm.Vm::(_connect) vmId=`4f1f93b8-c4c5-424d-89d8-1822367d2274`::Attempting connection to /var/lib/libvirt/qemu/channels/4f1f93b8-c4c5-424d-89d8-1822367d2274.com.redhat.rhevm.vdsm
VM Channels Listener::DEBUG::2015-09-10 17:01:42,846::guestagent::215::vm.Vm::(_connect) vmId=`4f1f93b8-c4c5-424d-89d8-1822367d2274`::Failed to connect to /var/lib/libvirt/qemu/channels/4f1f93b8-c4c5-424d-89d8-1822367d2274.com.redhat.rhevm.vdsm with 11

Please attach libvirtd log when you enable libvirt debug log level as described in http://wiki.libvirt.org/page/DebugLogs

Comment 5 Radek Novacek 2015-09-17 08:21:25 UTC
Liushihui,

do you have the system still deployed? Can you provide requested infromation?

Thanks.

Comment 6 Liushihui 2015-09-17 08:56:32 UTC
Please see the libvirt and vdsm log in the attachment(vdsm_libvirt.log)

Comment 7 Liushihui 2015-09-17 08:56:58 UTC
Created attachment 1074361 [details]
vdsm_libvirt.log

Comment 8 Piotr Kliczewski 2015-09-17 09:04:31 UTC
I do not see the issue to connect to libvirt in provided vdsm.log. This time I found:

Thread-94::ERROR::2015-09-17 16:49:02,470::dispatcher::79::Storage.Dispatcher::(wrapper) Secured object is not in safe state
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 71, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 103, in wrapper
    return m(self, *a, **kw)
  File "/usr/share/vdsm/storage/task.py", line 1179, in prepare
    raise self.error
SecureError: Secured object is not in safe state
Thread-94::DEBUG::2015-09-17 16:49:02,471::stompReactor::163::yajsonrpc.StompServer::(send) Sending response

Was the same scenario tested?

Comment 9 Liushihui 2015-09-17 09:20:45 UTC
Yes, it's the same scenario and the error info in the terminal side is the same as before. It is ocurred at the following two conditions. I have attached the latest vdsm.log and libvirt.log ,please check it again. Thanks.
1. run "virt-who -o -d --vdsm" at the second time 
2. there is a guest on host.
[root@hp-z220-05 vdsm]# virt-who -o -d --vdsm
2015-09-17 17:11:59,939 INFO: Using configuration "env/cmdline" ("vdsm" mode)
2015-09-17 17:11:59,989 ERROR: Virt backend 'env/cmdline' fails with exception:
Traceback (most recent call last):
  File "/usr/share/virt-who/virt/virt.py", line 301, in run
    self._run()
  File "/usr/share/virt-who/virt/virt.py", line 332, in _run
    report = self._get_report()
  File "/usr/share/virt-who/virt/virt.py", line 276, in _get_report
    return DomainListReport(self.config, self.listDomains())
  File "/usr/share/virt-who/virt/vdsm/vdsm.py", line 123, in listDomains
    response = self.server.list(True)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/site-packages/M2Crypto/m2xmlrpclib.py", line 60, in request
    headers
ProtocolError: <ProtocolError for 10.66.144.3:54321/RPC2: 500 Internal Server Error>
2015-09-17 17:12:00,992 DEBUG: virt-who shut down started

Comment 10 Liushihui 2015-09-17 09:21:55 UTC
Created attachment 1074371 [details]
new_vdsm_libvirt_log

Comment 11 Francesco Romani 2015-09-17 10:35:03 UTC
(In reply to Liushihui from comment #10)
> Created attachment 1074371 [details]
> new_vdsm_libvirt_log

From this log we can see that VDSM was stopped:

MainThread::DEBUG::2015-09-17 17:09:50,016::vdsm::58::vds::(sigtermHandler) Received signal 15

The reason of failures can be that VDSM is restarting when virt-who is querying.

We need to understand why and when VDSM is stopped.
It is unlikely that virt-who triggers that.

Comment 12 Michal Skrivanek 2015-09-21 08:52:05 UTC
(In reply to Francesco Romani from comment #11)
> (In reply to Liushihui from comment #10)
> > Created attachment 1074371 [details]
> > new_vdsm_libvirt_log
> 
> From this log we can see that VDSM was stopped:
> 
> MainThread::DEBUG::2015-09-17 17:09:50,016::vdsm::58::vds::(sigtermHandler)
> Received signal 15
> 
> The reason of failures can be that VDSM is restarting when virt-who is
> querying.
> 
> We need to understand why and when VDSM is stopped.
> It is unlikely that virt-who triggers that.

it seems unrelated. Before 17:11:59 it was fully back up again
I see a lot of connections on vdsm side, e.g.:
Detector thread::DEBUG::2015-09-17 17:11:56,988::BindingXMLRPC::1173::XmlDetector::(handleSocket) xml over http detected from ('10.66.144.3', 35423)
Detector thread::DEBUG::2015-09-17 17:11:57,035::protocoldetector::187::vds.MultiProtocolAcceptor::(_add_connection) Adding connection from 10.66.144.3:35425
Detector thread::DEBUG::2015-09-17 17:11:57,039::protocoldetector::201::vds.MultiProtocolAcceptor::(_remove_connection) Connection removed from 10.66.144.3:35425

without any more detail...then 17:12:00 that failure in virt-who and only afterwards there are multiple responses:
 clientIFinit::INFO::2015-09-17 17:12:14,456::logUtils::44::dispatcher::(wrapper) Run and protect: getConnectedStoragePoolsList(options=None)
clientIFinit::INFO::2015-09-17 17:12:14,456::logUtils::47::dispatcher::(wrapper) Run and protect: getConnectedStoragePoolsList, Return response: {'poollist': []}

is it what virt-who is asking for? why so often?

Comment 13 Radek Novacek 2015-09-22 06:12:26 UTC
virt-who is asking for list of guests that are running on the vdsm. The list is then reported to Subscription Manager (or Satellite). It uses `list` xmlrpc method.

It doesn't ask that often in real-world scenarios. But we tried to run it a couple of times as often as possible to see if everything works.

Comment 14 Michal Skrivanek 2015-09-22 07:28:11 UTC
adding back Piotr
I guess we need to understand if those queries are even coming from virt-who.
And either way why they cause internal err
Piotr, can you explain those weird messages, can we figure out what APIs are they?

Comment 15 Piotr Kliczewski 2015-09-22 09:12:17 UTC
From what I see in the logs there are bunch of call to Host.getConnectedStoragePools verb. It seems to be invoked every 5 seconds.

Comment 16 Michal Skrivanek 2015-09-22 14:31:07 UTC
(In reply to Piotr Kliczewski from comment #15)
> From what I see in the logs there are bunch of call to
> Host.getConnectedStoragePools verb. It seems to be invoked every 5 seconds.

so that should again be unrelated to the original problem.
So I wonder how to reproduce this properly...

Piotr, I was more wondering about the burst of connections at 17:11:57...I don't see what kind of API is that

Comment 17 Piotr Kliczewski 2015-09-22 14:38:51 UTC
I see the connections that you have in mind. There was no api invocation just incoming connections accepted from 10.66.144.3. Do we know what caused those connections?

Comment 18 Michal Skrivanek 2015-09-22 14:43:19 UTC
can you describe the nature of those "we tried to run it a couple of times as often as possible "
is that built-in functionality of virt-who? Or you mean you just run it manually?

Comment 19 Radek Novacek 2015-09-22 14:54:00 UTC
Michal, sorry that I was not specific enough. I meant that we ran virt-who manually from command line a couple of times to check if everything works.

It is not some specific virt-who feature. By default, virt-who should call vdsm each 60 seconds but this can be lowered by user (down to one second). In that case I think users might hit this issue.

Liushihui, can you please try if this can be reproduced with virt-who interval set to one second. If not, I think we can close this bug.

Comment 20 Liushihui 2015-09-23 03:25:27 UTC
When set virt-who's refresh interval to 1, virt-who won't generate this error info. virt-who work normally. However, it still show this error when run "virt-who -o -d --vdsm" at the second time after 5min. please see the log as the following:
[root@hp-z220-05 ~]# virt-who -o -d --vdsm
2015-09-23 10:58:54,647 INFO: Using configuration "env/cmdline" ("vdsm" mode)
2015-09-23 10:58:54,744 DEBUG: Authenticating with certificate: /etc/pki/consumer/cert.pem
2015-09-23 10:58:54,777 INFO: Sending domain info: [
    {
        "guestId": "290e063b-d7d7-4e89-a7cf-4bc59f04f164", 
        "state": 1, 
        "attributes": {
            "active": 1, 
            "virtWhoType": "vdsm", 
            "hypervisorType": "qemu"
        }
    }
]
2015-09-23 10:58:55,456 INFO: virt-who guest list update successful
2015-09-23 10:58:55,457 DEBUG: Association for config env/cmdline gathered
2015-09-23 10:58:55,457 DEBUG: virt-who shut down started

[root@hp-z220-05 ~]# virt-who -o -d --vdsm
2015-09-23 11:05:23,758 INFO: Using configuration "env/cmdline" ("vdsm" mode)
2015-09-23 11:05:23,814 ERROR: Virt backend 'env/cmdline' fails with exception:
Traceback (most recent call last):
  File "/usr/share/virt-who/virt/virt.py", line 301, in run
    self._run()
  File "/usr/share/virt-who/virt/virt.py", line 332, in _run
    report = self._get_report()
  File "/usr/share/virt-who/virt/virt.py", line 276, in _get_report
    return DomainListReport(self.config, self.listDomains())
  File "/usr/share/virt-who/virt/vdsm/vdsm.py", line 123, in listDomains
    response = self.server.list(True)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/site-packages/M2Crypto/m2xmlrpclib.py", line 60, in request
    headers
ProtocolError: <ProtocolError for 10.66.144.3:54321/RPC2: 500 Internal Server Error>


(In reply to Radek Novacek from comment #19)
> Michal, sorry that I was not specific enough. I meant that we ran virt-who
> manually from command line a couple of times to check if everything works.
> 
> It is not some specific virt-who feature. By default, virt-who should call
> vdsm each 60 seconds but this can be lowered by user (down to one second).
> In that case I think users might hit this issue.
> 
> Liushihui, can you please try if this can be reproduced with virt-who
> interval set to one second. If not, I think we can close this bug.

Comment 21 Piotr Kliczewski 2015-10-14 10:13:58 UTC
I tried to reproduce it with vdsm 4.16.26.1.el6ev and with virt-who-0.12-10.el6 and I was not able to reproduce the issue. Please update your vdsm to latest 4.16 and provide more information how to reproduce the issue.

Comment 22 Radek Novacek 2015-10-15 05:33:08 UTC
Liushihui,

can you please retest it with vdsm 4.16? Thank you.

Comment 23 Liushihui 2015-10-16 08:17:43 UTC
Sorry, I can't add rhel7.2 (with vdsm 4.16.26.1.el6ev) to rhevm3.5.5 as bug 1264667. It reproduced with vdsm-4.17.9-1.el7ev.noarch + virt-who-0.14-9.el7.noarch. it occurred the same problem as before.
[root@hp-z220-06 ~]# virt-who -o -d --vdsm
2015-10-16 16:10:35,270 INFO: Using configuration "env/cmdline" ("vdsm" mode)
2015-10-16 16:10:35,322 ERROR: Virt backend 'env/cmdline' fails with exception:
Traceback (most recent call last):
  File "/usr/share/virt-who/virt/virt.py", line 301, in run
    self._run()
  File "/usr/share/virt-who/virt/virt.py", line 332, in _run
    report = self._get_report()
  File "/usr/share/virt-who/virt/virt.py", line 276, in _get_report
    return DomainListReport(self.config, self.listDomains())
  File "/usr/share/virt-who/virt/vdsm/vdsm.py", line 123, in listDomains
    response = self.server.list(True)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/site-packages/M2Crypto/m2xmlrpclib.py", line 60, in request
    headers
ProtocolError: <ProtocolError for 10.66.144.4:54321/RPC2: 500 Internal Server Error>

Comment 24 Piotr Kliczewski 2015-10-21 10:24:52 UTC
I tested in on el 7.2 using vdsm-vdsm-4.17.9-8 and virt-who 0.15-1. Vdsm was built from the latest 3.6 and virt-who was downloaded from here [1]. I run the
command and got following output:

[root@localhost vdsm]# virt-who -o -d --vdsm
2015-10-21 06:18:46,921 INFO: Using configuration "env/cmdline" ("vdsm" mode)
2015-10-21 06:18:47,040 DEBUG: Authenticating with certificate: /etc/pki/consumer/cert.pem
2015-10-21 06:18:48,411 INFO: Sending domain info: []
2015-10-21 06:18:49,409 INFO: virt-who guest list update successful
2015-10-21 06:18:49,410 DEBUG: Association for config env/cmdline gathered
2015-10-21 06:18:49,410 DEBUG: virt-who shut down started

I found no issues in vdsm.log. Please provide more information how to reproduce the issue.


[1] https://brewweb.devel.redhat.com/buildinfo?buildID=450671

Comment 25 Liushihui 2015-10-22 02:01:26 UTC
There are two key points on this bug:
1. Make sure guest is running on the host 
2. Run "virt-who -o -d --vdsm" over one time, it usually reproduce this problem when you run this command at the second/third time.please see the detail as the following:

[root@hp-z220-05 ~]# virt-who -o -d --vdsm
2015-10-22 09:57:04,102 INFO: Using configuration "env/cmdline" ("vdsm" mode)
2015-10-22 09:57:04,202 DEBUG: Authenticating with certificate: /etc/pki/consumer/cert.pem
2015-10-22 09:57:04,234 INFO: Sending domain info: [
    {
        "guestId": "a92a5f0d-dd88-4b85-9edd-8d19cb4fbc44", 
        "state": 1, 
        "attributes": {
            "active": 1, 
            "virtWhoType": "vdsm", 
            "hypervisorType": "qemu"
        }
    }
]
2015-10-22 09:57:04,833 INFO: virt-who guest list update successful
2015-10-22 09:57:04,833 DEBUG: Association for config env/cmdline gathered
2015-10-22 09:57:04,834 DEBUG: virt-who shut down started
[root@hp-z220-05 ~]# virt-who -o -d --vdsm
2015-10-22 09:57:07,094 INFO: Using configuration "env/cmdline" ("vdsm" mode)
2015-10-22 09:57:07,144 ERROR: Virt backend 'env/cmdline' fails with exception:
Traceback (most recent call last):
  File "/usr/share/virt-who/virt/virt.py", line 301, in run
    self._run()
  File "/usr/share/virt-who/virt/virt.py", line 332, in _run
    report = self._get_report()
  File "/usr/share/virt-who/virt/virt.py", line 276, in _get_report
    return DomainListReport(self.config, self.listDomains())
  File "/usr/share/virt-who/virt/vdsm/vdsm.py", line 123, in listDomains
    response = self.server.list(True)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/site-packages/M2Crypto/m2xmlrpclib.py", line 60, in request
    headers
ProtocolError: <ProtocolError for 10.66.144.3:54321/RPC2: 500 Internal Server Error>
2015-10-22 09:57:08,147 DEBUG: virt-who shut down started

Comment 26 Piotr Kliczewski 2015-10-22 08:19:50 UTC
I used my previously configure env to run the command 20 times. I did not experience any failures. Please provide vdsm log after you reproduce the issue.

Comment 27 Liushihui 2015-10-22 09:16:41 UTC
Created attachment 1085459 [details]
vdsm-4.17.9-1.log

Comment 28 Liushihui 2015-10-22 09:18:08 UTC
It should be easily to reproduce it if there is at least one guest is running in the host when you run "virt-who -o -d --vdsm" at the second time.

Comment 29 Piotr Kliczewski 2015-10-22 10:18:56 UTC
I started one client and run the command 20 times but still the code haven't failed. 

From your recent log I see that we have following issue:

Thread-75456::ERROR::2015-10-22 16:40:27,879::xmlrpc::89::vds.XMLRPCServer::(_process_requests) Unhandled exception in request handler for 10.66.144.8:43211
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/xmlrpc.py", line 86, in _process_requests
    self.requestHandler(sock, addr, self)
  File "/usr/lib64/python2.7/SocketServer.py", line 651, in __init__
    self.finish()
  File "/usr/share/vdsm/rpc/bindingxmlrpc.py", line 291, in finish
    xmlrpc.IPXMLRPCRequestHandler.finish(self)
  File "/usr/lib64/python2.7/SocketServer.py", line 705, in finish
    self.wfile.flush()
  File "/usr/lib64/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
  File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 222, in write
    return self._write_bio(data)
  File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 205, in _write_bio
    return m2.ssl_write(self.ssl, data, self._timeout)
SSLError: bad write retry

Comment 30 Piotr Kliczewski 2015-10-22 13:35:25 UTC
I am not able to reproduce and I need an access to your environment to test my fix. Please send it offline.

Comment 31 Liushihui 2015-10-23 03:08:21 UTC
After apply the path in https://gerrit.ovirt.org/#/c/47637/, it will show error info when run "virt-who -o -d --vdsm" at the first time:
[root@hp-z220-10 rpc]# virt-who -o -d --vdsm
2015-10-23 11:05:59,134 INFO: Using configuration "env/cmdline" ("vdsm" mode)
2015-10-23 11:05:59,186 ERROR: Virt backend 'env/cmdline' fails with exception:
Traceback (most recent call last):
  File "/usr/share/virt-who/virt/virt.py", line 301, in run
    self._run()
  File "/usr/share/virt-who/virt/virt.py", line 332, in _run
    report = self._get_report()
  File "/usr/share/virt-who/virt/virt.py", line 276, in _get_report
    return DomainListReport(self.config, self.listDomains())
  File "/usr/share/virt-who/virt/vdsm/vdsm.py", line 123, in listDomains
    response = self.server.list(True)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.7/site-packages/M2Crypto/m2xmlrpclib.py", line 60, in request
    headers
ProtocolError: <ProtocolError for 10.66.144.8:54321/RPC2: -1 ''>
2015-10-23 11:06:00,189 DEBUG: virt-who shut down started

you can use this machine to debug it:
10.66.144.8 root/red2015

Comment 32 Piotr Kliczewski 2015-10-23 08:07:48 UTC
I made small mistake to base my change on ssl patch which is not merged yet. As I result I had from import but after fixing it I run the command 20 times without any issues. Will update the patch and push it for review.

Comment 33 Mike McCune 2016-03-28 23:19:36 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 34 Sandro Bonazzola 2016-05-02 10:08:34 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 35 Yaniv Lavi 2016-05-23 13:23:35 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 36 Yaniv Lavi 2016-05-23 13:25:33 UTC
oVirt 4.0 beta has been released, moving to RC milestone.

Comment 37 Yaniv Kaul 2016-07-06 07:04:24 UTC
We are missing two important, general issues here:
1. virt-who is using the VDSM API. There is no public VDSM API. This sounds risky to me in general.
2. virt-who is using the XML-RPC API, which we wish to deprecate in favor of the JSON API.

Action items:
1. We need to somehow document in *VDSM* code the calls that are being used by virt-who - to ensure no one thinks of removing them without looking at virt-who code to ensure we don't break it!
2. Need to file a bug on virt-who to move to the JSON API.

Comment 38 Piotr Kliczewski 2016-07-06 07:22:41 UTC
New BZ #1353119 created. I will make sure that api which is used by virt-who will be kept available for some time.

Comment 40 Piotr Kliczewski 2017-01-10 12:06:45 UTC
I was unable to reproduce the issue in other envs than the issue occurred. We have already declared xmlrpc deprecated so once virt-who would migrate to newer protocol the problem would go away. Due to above reasons closing this bug.


Note You need to log in before you can comment on or make changes to this bug.