| Summary: | [vdsm] [libvirt] 'vdsm' loose connection to 'libvirtd' and continues with migration | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dafna Ron <dron> | ||||
| Component: | vdsm | Assignee: | Erez Shinan <erez> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Dafna Ron <dron> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 6.1 | CC: | abaron, bazulay, danken, hateya, iheim, ilvovsky, jlibosva, mgoldboi, Rhev-m-bugs, syeghiay, ykaul | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | vdsm-4.9-92.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | 622446 | Environment: | |||||
| Last Closed: | 2011-12-06 07:20:15 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | 622446 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Comment 2
Erez Shinan
2011-06-09 12:53:08 UTC
Please see Haim's steps to reproduce:
(*) how ?
1) make sure you have 2 hosts ('host-a' and 'host-b')
2) make sure you have connected storage (i use iscsi)
3) run the following commands on 'host-b'
- kill -STOP `pgrep libvirt`
- service libvirtd restart
4) create a new domain (machine) on 'host-a' and make sure it runs
5) migrate this vm from 'host-a' to 'host-b'
result:
migration will fail on source, however, it will succeeds on libvirt.
We expect that if vdsm fails migration that libvirt will fail as well - especially in HA VM's which have to start automatically once their processes dies and if Libvirt is alive while vdsm is dead the VM will not be restarted
Created attachment 512890 [details]
logs
checked on vdsm-4.9-81.el6
it is not fixed - we now have other issues.
I checked on two vm's.
one is a regular vm one is high availability vm.
for both vm's the vm is shown as down in the GUI but it is still running in db and in status migrating.
vdsm tries to restart and after shutting down the vm could not start it again:
Thread-173::ERROR::2011-07-14 15:32:35,492::libvirtconnection::73::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-173::INFO::2011-07-14 15:32:35,493::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, args: ()
Thread-173::INFO::2011-07-14 15:32:39,256::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, Return response: {'status': {'message': 'OK', 'code': 0}}
Thread-173::INFO::2011-07-14 15:32:39,257::vm::379::vm.Vm::(_startUnderlyingVm) vmId=`230ea70f-a3ee-4f5f-a736-8bd11cb110a2`::The vm start process failed
Traceback (most recent call last):
File "/usr/share/vdsm/vm.py", line 359, in _startUnderlyingVm
self._waitForIncomingMigrationFinish()
File "/usr/share/vdsm/libvirtvm.py", line 990, in _waitForIncomingMigrationFinish
self._connection.lookupByUUIDString(self.id),
File "/usr/share/vdsm/libvirtconnection.py", line 63, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1877, in lookupByUUIDString
if ret is None:raise libvirtError('virDomainLookupByUUIDString() failed', conn=self)
libvirtError: cannot send data: Broken pipe - Timed out (did not recieve success event)
Thread-173::DEBUG::2011-07-14 15:32:39,261::vm::736::vm.Vm::(setDownStatus) vmId=`230ea70f-a3ee-4f5f-a736-8bd11cb110a2`::Changed state to Down: cannot send data: Broken pipe - Timed out (did not recieve success event)
the vdsm is shown as up:
[root@blond-vdsg ~]# service vdsmd status
VDS daemon server is running
but vdsClient will get stuck:
[root@blond-vdsg ~]# vdsClient -s 0 list table
^CTraceback (most recent call last):
File "/usr/share/vdsm/vdsClient.py", line 1994, in <module>
code, message = commands[command][0](commandArgs)
File "/usr/share/vdsm/vdsClient.py", line 180, in do_list
response = self.s.list(True)
File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
return self.__send(self.__name, args)
File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
verbose=self.__verbose
File "/usr/lib64/python2.6/site-packages/M2Crypto/m2xmlrpclib.py", line 49, in request
h.endheaders()
File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders
self._send_output()
File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output
self.send(msg)
File "/usr/lib64/python2.6/httplib.py", line 739, in send
self.connect()
File "/usr/lib64/python2.6/site-packages/M2Crypto/httpslib.py", line 50, in connect
self.sock.connect((self.host, self.port))
File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 185, in connect
ret = self.connect_ssl()
File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 178, in connect_ssl
return m2.ssl_connect(self.ssl, self._timeout)
KeyboardInterrupt
rhevm log will show that backend still thinks that the vm is migrating:
2011-07-14 15:49:04,371 INFO [org.nogah.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-60) vds::refreshVmList vm id 230ea70f-a3ee-4f5f-a736-8bd11cb110a2 is migrating to vds blond-vdsg.qa.lab.tlv.redhat.com ignoring it in the refresh till migration is done
2011-07-14 15:49:04,371 INFO [org.nogah.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-60) vds::refreshVmList vm id 1db51eff-5df2-4496-b410-29d755fa3526 is migrating to vds blond-vdsg.qa.lab.tlv.redhat.com ignoring it in the refresh till migration is done
I guess that the problem discussed at comment 9 is a dup of bug 723579. verified on ic136.2 vdsm-4.9-92.el6.x86_64 migration failed - vm's remained up on source host RuntimeError: migration destination error: Error creating the requested virtual machine Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2011-1782.html |