| Summary: | VM migration in fail on authentication certificate | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Artyom <alukiano> | ||||||||||||||
| Component: | vdsm | Assignee: | Yaniv Bronhaim <ybronhei> | ||||||||||||||
| Status: | CLOSED WONTFIX | QA Contact: | |||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||
| Priority: | high | ||||||||||||||||
| Version: | 3.3.0 | CC: | acathrow, alonbl, alukiano, bazulay, dfediuck, dougsland, gpadgett, hateya, iheim, istein, knesenko, lpeer, Rhev-m-bugs, yeylon | ||||||||||||||
| Target Milestone: | --- | Flags: | istein:
needinfo+
|
||||||||||||||
| Target Release: | 3.3.1 | ||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||
| OS: | Unspecified | ||||||||||||||||
| Whiteboard: | Infra | ||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||
| Last Closed: | 2014-01-01 15:10:42 UTC | Type: | Bug | ||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
| Attachments: |
|
||||||||||||||||
How did you try to migrate the VM? I try to migrate vm manually, also without give specific host and with given specific host, result the same, migration failed with libvirt error in vdsm log. looking at libvirt logs seems to fail to storage not available: I can see a lot of errno=11 which means 'Resource temporarily unavailable'. sounds like a network / nfs / firewall issue. Can you attach also vdsm logs? On is27, basic Migrate VM, run from UI, fail on
vdsm.log error:
==============
Thread-2728::ERROR::2013-12-16 11:44:53,727::vm::338::vm.Vm::(run) vmId=`f0a7a3b9-b3b9-4a48-932c-9d1836a3bb30`::Failed to migrate
Traceback (most recent call last):
File "/usr/share/vdsm/vm.py", line 324, in run
self._startUnderlyingMigration()
File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration
None, maxBandwidth)
File "/usr/share/vdsm/vm.py", line 842, in f
ret = attr(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2
if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.109.14/system
libvirtd.log error:
==================
2013-12-16 09:44:53.592+0000: 16364: error : virNetTLSContextValidCertificate:1031 : Certificate failed validation: The certificate hasn't got a known issuer.
2013-12-16 09:44:53.592+0000: 16364: warning : virNetTLSContextCheckCertificate:1142 : Certificate check failed Certificate failed validation: The certificate hasn't got a known issuer.
2013-12-16 09:44:53.592+0000: 16364: error : virNetTLSContextCheckCertificate:1145 : authentication failed: Failed to verify peer's certificate
2013-12-16 09:44:53.593+0000: 16364: debug : do_open:1180 : driver 2 remote returned ERROR
2013-12-16 09:44:53.593+0000: 16364: error : doPeer2PeerMigrate:2728 : operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.109.14/system
Created attachment 837212 [details]
vdsm log
Created attachment 837213 [details]
libvirt log
More info: 1. On automatic migration test, we do not see this failure (running on jenkins-qe, same rhevm, libvirt, vdsm versions) 2. So far it occurred to 3 of qe members on local environment. Alon, Can you please check if this is related to host deploy? thanks, Ilanit. Is it hosted engine configuration? is source or destination is a host to engine? First I opened this bug for hosted-engine, because I was unable to reproduce problem with migration on simple rhevm(not hosted), but Meni and Ilanit encountered with the same problem in simple rhevm. Last libvirt and vdsm logs from host that append to simple rhevm, and it's source host. I find it hard to believe :) Need host-deploy log for regular hosts. On simple rhevm, with migration certificate failure, Host reboot resolved it. ofrenkel mentioned the problem might same as old bug we had, where new host certificate was set, but libvirt restart step was missing, and libvirt didn't get the new certificate. Thus migration fail (The libvirt secured communication fail). Attaching host deploy logs for 2 hosts in setup, which after that migration failed. Created attachment 837237 [details]
host1 deploy log
Created attachment 837238 [details]
host2 deploy log
Created attachment 837241 [details]
libvirt vdsm host_deploy
How I understand, restart of hosts help to solve problem in simple rhevm, but in hosted engine problem still exist, so I attach vdsm and libvirt logs from hosted host, that source in migration process. And also host-deploy logs for two hosts.
For the hosted engine I would like someone that own the feature to investigate. As for the libvirtd restart, there were too many changes in this at vdsm, I think I just will stop it every deploy, will open another bug. Sandro, I am unsure this is related to host-deploy, please make sure that nothing else is wrong, check certificates and such. We should have gotten these reports long ago for regular hosts. Thanks, (In reply to Alon Bar-Lev from comment #16) > Sandro, > I am unsure this is related to host-deploy, please make sure that nothing > else is wrong, check certificates and such. > We should have gotten these reports long ago for regular hosts. > Thanks, I'll do. However sounds strange it happens on regular rhevm as per comment #11. Maybe it's not an host-deploy but a libvirt / vdsm issue not reloading certificates (on AIO and hosted-engine we don't reboot the host) (In reply to Sandro Bonazzola from comment #17) > (In reply to Alon Bar-Lev from comment #16) > > Sandro, > > I am unsure this is related to host-deploy, please make sure that nothing > > else is wrong, check certificates and such. > > We should have gotten these reports long ago for regular hosts. > > Thanks, > > I'll do. However sounds strange it happens on regular rhevm as per comment > #11. > Maybe it's not an host-deploy but a libvirt / vdsm issue not reloading > certificates (on AIO and hosted-engine we don't reboot the host) as wrote in comment#11, restart of libvirt is sufficient on these host, and at comment#14 it states that it is not working for this issue. (In reply to Artyom from comment #14) > Created attachment 837241 [details] > libvirt vdsm host_deploy > > How I understand, restart of hosts help to solve problem in simple rhevm, > but in hosted engine problem still exist, so I attach vdsm and libvirt logs > from hosted host, that source in migration process. And also host-deploy > logs for two hosts. Artyom have you moved the host to global maintenance and shutdown the VM (since migration won't work) before rebooting it? Yes I tried also this scenario too, but Greg find what, problem, problem was that in libvirt conf file absent lines path to certificates and also was appear line listen_tls = 0, I don't know why vdsm process not update correct libvirt file, also after vdsm restart libvirt conf was not updated(when ssl = true in vdsm.conf and also all certificates exists). So tried vdsm-tool utility. #vdsm-tool libvirt-configure libvirt is already configured for vdsm(still no path to certificates and listen_tls = 0) #vdsm-tool libvirt-configure --force Reconfiguration of libvirt is done.(appear paths to certificates and remove listen_tls = 0) So maybe if host-deploy process use vdsm-tool libvirt-configure, it must use it with force flag(maybe you can ask Alon) libvirt-0.10.2-29.el6.1.x86_64 vdsm-4.13.2-0.2.rc.el6ev.x86_64 (In reply to Artyom from comment #20) > Yes I tried also this scenario too, but Greg find what, problem, problem was > that in libvirt conf file absent lines path to certificates and also was > appear line listen_tls = 0, I don't know why vdsm process not update correct > libvirt file, also after vdsm restart libvirt conf was not updated(when ssl > = true in vdsm.conf and also all certificates exists). > So tried vdsm-tool utility. > #vdsm-tool libvirt-configure > libvirt is already configured for vdsm(still no path to certificates and > listen_tls = 0) > #vdsm-tool libvirt-configure --force > Reconfiguration of libvirt is done.(appear paths to certificates and remove > listen_tls = 0) > > So maybe if host-deploy process use vdsm-tool libvirt-configure, it must use > it with force flag(maybe you can ask Alon) > libvirt-0.10.2-29.el6.1.x86_64 > vdsm-4.13.2-0.2.rc.el6ev.x86_64 No, hosted-engine --deploy calls "/etc/init.d/vdsmd reconfigure" as discussed on http://gerrit.ovirt.org/#/c/20766/ so in the end it seems a vdsm bug. probably you had modified libvirtd.conf on that host. the reconfigure verb found it already configured with listen_tls=0 that I guess was set manually. so as far as we encountered since the configuration change, most users expect that each installation will override the old configuration. the attach patch reverts to old method of libvirt configure --force flag for ovirt-3.4 Although, you opened this bug on vdsm 4.13.2 (3.3) which still overrides the configuration when "/etc/init.d/vdsmd reconfigure" is called and reset libvirtd automatically afterwards. can you just verify the versions ? After checking also on hosted-engine and also on simple rhevm, revealed that redeploying of host not override libvirtd.conf file. You are right. in both we call to vdsmd reconfigure without "force" specified as an argument . this means, we don't override the conf files as should. The attach patch propose to omit the force additional argument for the reconfigure vdsmd verb. reconfigure will reconfigure all conf files forcefully when called. this is relevant both for ovirt-3.3 and 3.4 """Alon Bar-Lev Dec 31 8:48 PM Patch Set 1: well, apparently people either manage vdsm manually or automatically... most automatically... the question is if we have regression for this version or this is a fix to next, and if so it is not needed.""" In response to Alon's comment on the PS, this is truly the treatment we had for long time and nobody had problem with that. Although some would expected the behavior to be to override the old configuration, this is not a regression in ovirt-3.3 and changing it will change the previous behavior that some might depend on. I suggest to close the bug as known treatment that we prefer not to change, or move it to later version closing as WONTFIX for current version. If we decide to change the behavior please reopen |
Created attachment 836890 [details] Vdsm and libvirtd logs Description of problem: Migration in hosted engine failed with libvirt error in vdsm.log: operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.64.85/system Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-1.0.0-0.11.rc.el6ev.noarch How reproducible: Always Steps to Reproduce: 1. Run command hosted-engine --deploy one one host finish all setup steps. 2. Run command hosted-engine --deploy one second host and add this host to the same hosted-engine. 3. Try to migrate hosted-engine vm. vi Actual results: Migration failed, Migration failed due to Error: Fatal error during migration and in vdsm.log operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.64.85/system Expected results: Migration success Additional info: Did installation on clean hosts, with the same versions of libvirt and vdsm libvirt-0.10.2-29.el6.1.x86_64 vdsm-4.13.2-0.2.rc.el6ev.x86_64 Also not sure about chosen component, but was failed to reproduce this bug on simple rhevm(not hosted), so seems like problem in installation hosts to hosted-engine(maybe something with certificates)