Bug 709788

Summary: [vdsm] [libvirt] 'vdsm' loose connection to 'libvirtd' and continues with migration
Product: Red Hat Enterprise Linux 6 Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Erez Shinan <erez>
Status: CLOSED ERRATA QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: abaron, bazulay, danken, hateya, iheim, ilvovsky, jlibosva, mgoldboi, Rhev-m-bugs, syeghiay, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.9-92.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 622446 Environment:
Last Closed: 2011-12-06 07:20:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 622446    
Bug Blocks:    
Attachments:
Description Flags
logs none

Comment 2 Erez Shinan 2011-06-09 12:53:08 UTC
Couldn't reproduce.
What exactly is the problem, what do you expect should happen, and how does this bug happen in proper usage? (without arbitrarily killing vital processes?)

Comment 3 Dafna Ron 2011-06-12 07:26:14 UTC
Please see Haim's steps to reproduce:

(*) how ? 

1) make sure you have 2 hosts ('host-a' and 'host-b')
2) make sure you have connected storage (i use iscsi)
3) run the following commands on 'host-b' 
   - kill -STOP `pgrep libvirt`
   - service libvirtd restart 
4) create a new domain (machine) on 'host-a' and make sure it runs 
5) migrate this vm from 'host-a' to 'host-b' 

result: 

migration will fail on source, however, it will succeeds on libvirt.



We expect that if vdsm fails migration that libvirt will fail as well - especially in HA VM's which have to start automatically once their processes dies and if Libvirt is alive while vdsm is dead the VM will not be restarted

Comment 4 Erez Shinan 2011-06-14 08:50:24 UTC
http://gerrit.usersys/#change,585

Comment 9 Dafna Ron 2011-07-14 13:08:55 UTC
Created attachment 512890 [details]
logs

checked on vdsm-4.9-81.el6 

it is not fixed - we now have other issues.  

I checked on two vm's.
one is a regular vm one is high availability vm. 

for both vm's the vm is shown as down in the GUI but it is still running in db and in status migrating. 


vdsm tries to restart and after shutting down the vm could not start it again: 

Thread-173::ERROR::2011-07-14 15:32:35,492::libvirtconnection::73::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-173::INFO::2011-07-14 15:32:35,493::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, args: ()


Thread-173::INFO::2011-07-14 15:32:39,256::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, Return response: {'status': {'message': 'OK', 'code': 0}}
Thread-173::INFO::2011-07-14 15:32:39,257::vm::379::vm.Vm::(_startUnderlyingVm) vmId=`230ea70f-a3ee-4f5f-a736-8bd11cb110a2`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 359, in _startUnderlyingVm
    self._waitForIncomingMigrationFinish()
  File "/usr/share/vdsm/libvirtvm.py", line 990, in _waitForIncomingMigrationFinish
    self._connection.lookupByUUIDString(self.id),
  File "/usr/share/vdsm/libvirtconnection.py", line 63, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1877, in lookupByUUIDString
    if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self)
libvirtError: cannot send data: Broken pipe - Timed out (did not recieve success event)
Thread-173::DEBUG::2011-07-14 15:32:39,261::vm::736::vm.Vm::(setDownStatus) vmId=`230ea70f-a3ee-4f5f-a736-8bd11cb110a2`::Changed state to Down: cannot send data: Broken pipe - Timed out (did not recieve success event)


the vdsm is shown as up: 

[root@blond-vdsg ~]# service vdsmd status
VDS daemon server is running

but vdsClient will get stuck:

[root@blond-vdsg ~]# vdsClient -s 0 list table
^CTraceback (most recent call last):
  File "/usr/share/vdsm/vdsClient.py", line 1994, in <module>
    code, message = commands[command][0](commandArgs)
  File "/usr/share/vdsm/vdsClient.py", line 180, in do_list
    response = self.s.list(True)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.6/site-packages/M2Crypto/m2xmlrpclib.py", line 49, in request
    h.endheaders()
  File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders
    self._send_output()
  File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.6/httplib.py", line 739, in send
    self.connect()
  File "/usr/lib64/python2.6/site-packages/M2Crypto/httpslib.py", line 50, in connect
    self.sock.connect((self.host, self.port))
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 185, in connect
    ret = self.connect_ssl()
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 178, in connect_ssl
    return m2.ssl_connect(self.ssl, self._timeout)
KeyboardInterrupt



rhevm log will show that backend still thinks that the vm is migrating:

2011-07-14 15:49:04,371 INFO  [org.nogah.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-60) vds::refreshVmList vm id 230ea70f-a3ee-4f5f-a736-8bd11cb110a2 is migrating to vds blond-vdsg.qa.lab.tlv.redhat.com ignoring it in the refresh till migration is done
2011-07-14 15:49:04,371 INFO  [org.nogah.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-60) vds::refreshVmList vm id 1db51eff-5df2-4496-b410-29d755fa3526 is migrating to vds blond-vdsg.qa.lab.tlv.redhat.com ignoring it in the refresh till migration is done

Comment 10 Dan Kenigsberg 2011-08-04 20:17:29 UTC
I guess that the problem discussed at comment 9 is a dup of bug 723579.

Comment 11 Dafna Ron 2011-08-16 11:30:39 UTC
verified on ic136.2
vdsm-4.9-92.el6.x86_64

migration failed - vm's remained up on source host

RuntimeError: migration destination error: Error creating the requested virtual machine

Comment 12 errata-xmlrpc 2011-12-06 07:20:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html