Hide Forgot
Couldn't reproduce. What exactly is the problem, what do you expect should happen, and how does this bug happen in proper usage? (without arbitrarily killing vital processes?)
Please see Haim's steps to reproduce: (*) how ? 1) make sure you have 2 hosts ('host-a' and 'host-b') 2) make sure you have connected storage (i use iscsi) 3) run the following commands on 'host-b' - kill -STOP `pgrep libvirt` - service libvirtd restart 4) create a new domain (machine) on 'host-a' and make sure it runs 5) migrate this vm from 'host-a' to 'host-b' result: migration will fail on source, however, it will succeeds on libvirt. We expect that if vdsm fails migration that libvirt will fail as well - especially in HA VM's which have to start automatically once their processes dies and if Libvirt is alive while vdsm is dead the VM will not be restarted
http://gerrit.usersys/#change,585
Created attachment 512890 [details] logs checked on vdsm-4.9-81.el6 it is not fixed - we now have other issues. I checked on two vm's. one is a regular vm one is high availability vm. for both vm's the vm is shown as down in the GUI but it is still running in db and in status migrating. vdsm tries to restart and after shutting down the vm could not start it again: Thread-173::ERROR::2011-07-14 15:32:35,492::libvirtconnection::73::vds::(wrapper) connection to libvirt broken. taking vdsm down. Thread-173::INFO::2011-07-14 15:32:35,493::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, args: () Thread-173::INFO::2011-07-14 15:32:39,256::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, Return response: {'status': {'message': 'OK', 'code': 0}} Thread-173::INFO::2011-07-14 15:32:39,257::vm::379::vm.Vm::(_startUnderlyingVm) vmId=`230ea70f-a3ee-4f5f-a736-8bd11cb110a2`::The vm start process failed Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 359, in _startUnderlyingVm self._waitForIncomingMigrationFinish() File "/usr/share/vdsm/libvirtvm.py", line 990, in _waitForIncomingMigrationFinish self._connection.lookupByUUIDString(self.id), File "/usr/share/vdsm/libvirtconnection.py", line 63, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1877, in lookupByUUIDString if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self) libvirtError: cannot send data: Broken pipe - Timed out (did not recieve success event) Thread-173::DEBUG::2011-07-14 15:32:39,261::vm::736::vm.Vm::(setDownStatus) vmId=`230ea70f-a3ee-4f5f-a736-8bd11cb110a2`::Changed state to Down: cannot send data: Broken pipe - Timed out (did not recieve success event) the vdsm is shown as up: [root@blond-vdsg ~]# service vdsmd status VDS daemon server is running but vdsClient will get stuck: [root@blond-vdsg ~]# vdsClient -s 0 list table ^CTraceback (most recent call last): File "/usr/share/vdsm/vdsClient.py", line 1994, in <module> code, message = commands[command][0](commandArgs) File "/usr/share/vdsm/vdsClient.py", line 180, in do_list response = self.s.list(True) File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request verbose=self.__verbose File "/usr/lib64/python2.6/site-packages/M2Crypto/m2xmlrpclib.py", line 49, in request h.endheaders() File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders self._send_output() File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output self.send(msg) File "/usr/lib64/python2.6/httplib.py", line 739, in send self.connect() File "/usr/lib64/python2.6/site-packages/M2Crypto/httpslib.py", line 50, in connect self.sock.connect((self.host, self.port)) File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 185, in connect ret = self.connect_ssl() File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 178, in connect_ssl return m2.ssl_connect(self.ssl, self._timeout) KeyboardInterrupt rhevm log will show that backend still thinks that the vm is migrating: 2011-07-14 15:49:04,371 INFO [org.nogah.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-60) vds::refreshVmList vm id 230ea70f-a3ee-4f5f-a736-8bd11cb110a2 is migrating to vds blond-vdsg.qa.lab.tlv.redhat.com ignoring it in the refresh till migration is done 2011-07-14 15:49:04,371 INFO [org.nogah.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-60) vds::refreshVmList vm id 1db51eff-5df2-4496-b410-29d755fa3526 is migrating to vds blond-vdsg.qa.lab.tlv.redhat.com ignoring it in the refresh till migration is done
I guess that the problem discussed at comment 9 is a dup of bug 723579.
verified on ic136.2 vdsm-4.9-92.el6.x86_64 migration failed - vm's remained up on source host RuntimeError: migration destination error: Error creating the requested virtual machine
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2011-1782.html