Bug 1043227

Summary: VM migration in fail on authentication certificate
Product: Red Hat Enterprise Virtualization Manager Reporter: Artyom <alukiano>
Component: vdsmAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: acathrow, alonbl, alukiano, bazulay, dfediuck, dougsland, gpadgett, hateya, iheim, istein, knesenko, lpeer, Rhev-m-bugs, yeylon
Target Milestone: ---Flags: istein: needinfo+
Target Release: 3.3.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-01 15:10:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Vdsm and libvirtd logs
none
vdsm log
none
libvirt log
none
host1 deploy log
none
host2 deploy log
none
libvirt vdsm host_deploy none

Description Artyom 2013-12-15 07:51:39 UTC
Created attachment 836890 [details]
Vdsm and libvirtd logs

Description of problem:
Migration in hosted engine failed with libvirt error in vdsm.log:
operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.64.85/system

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-1.0.0-0.11.rc.el6ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Run command hosted-engine --deploy one one host finish all setup steps.
2. Run command hosted-engine --deploy one second host and add this host to the same hosted-engine.
3. Try to migrate hosted-engine vm.
vi
Actual results:
Migration failed, Migration failed due to Error: Fatal error during migration
and in vdsm.log operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.64.85/system

Expected results:
Migration success

Additional info:
Did installation on clean hosts, with the same versions of libvirt and vdsm
libvirt-0.10.2-29.el6.1.x86_64
vdsm-4.13.2-0.2.rc.el6ev.x86_64
Also not sure about chosen component, but was failed to reproduce this bug on simple rhevm(not hosted), so seems like problem in installation hosts to hosted-engine(maybe something with certificates)

Comment 1 Doron Fediuck 2013-12-15 09:20:39 UTC
How did you try to migrate the VM?

Comment 2 Artyom 2013-12-16 09:43:56 UTC
I try to migrate vm manually, also without give specific host and with given specific host, result the same, migration failed with libvirt error in vdsm log.

Comment 3 Sandro Bonazzola 2013-12-16 10:35:25 UTC
looking at libvirt logs seems to fail to storage not available: I can see a lot of errno=11 which means 'Resource temporarily unavailable'. sounds like a network / nfs / firewall issue.

Can you attach also vdsm logs?

Comment 4 Ilanit Stein 2013-12-16 10:52:30 UTC
On is27, basic Migrate VM, run from UI, fail on 
vdsm.log error:
==============
Thread-2728::ERROR::2013-12-16 11:44:53,727::vm::338::vm.Vm::(run) vmId=`f0a7a3b9-b3b9-4a48-932c-9d1836a3bb30`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 324, in run
    self._startUnderlyingMigration()
  File "/usr/share/vdsm/vm.py", line 403, in _startUnderlyingMigration
    None, maxBandwidth)
  File "/usr/share/vdsm/vm.py", line 842, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 76, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1178, in migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.109.14/system

libvirtd.log error:
==================
2013-12-16 09:44:53.592+0000: 16364: error : virNetTLSContextValidCertificate:1031 : Certificate failed validation: The certificate hasn't got a known issuer.
2013-12-16 09:44:53.592+0000: 16364: warning : virNetTLSContextCheckCertificate:1142 : Certificate check failed Certificate failed validation: The certificate hasn't got a known issuer.
2013-12-16 09:44:53.592+0000: 16364: error : virNetTLSContextCheckCertificate:1145 : authentication failed: Failed to verify peer's certificate
2013-12-16 09:44:53.593+0000: 16364: debug : do_open:1180 : driver 2 remote returned ERROR
2013-12-16 09:44:53.593+0000: 16364: error : doPeer2PeerMigrate:2728 : operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.109.14/system

Comment 5 Ilanit Stein 2013-12-16 10:53:11 UTC
Created attachment 837212 [details]
vdsm log

Comment 6 Ilanit Stein 2013-12-16 10:56:11 UTC
Created attachment 837213 [details]
libvirt log

Comment 7 Ilanit Stein 2013-12-16 11:01:58 UTC
More info:
1. On automatic migration test, we do not see this failure (running on jenkins-qe, same rhevm, libvirt, vdsm versions)

2. So far it occurred to 3 of qe members on local environment.

Alon,

Can you please check if this is related to host deploy?

thanks,
Ilanit.

Comment 8 Alon Bar-Lev 2013-12-16 13:21:45 UTC
Is it hosted engine configuration? is source or destination is a host to engine?

Comment 9 Artyom 2013-12-16 13:36:00 UTC
First I opened this bug for hosted-engine, because I was unable to reproduce problem with migration on simple rhevm(not hosted), but Meni and Ilanit encountered with the same problem in simple rhevm.
Last libvirt and vdsm logs from host that append to simple rhevm, and it's source host.

Comment 10 Alon Bar-Lev 2013-12-16 13:42:45 UTC
I find it hard to believe :)
Need host-deploy log for regular hosts.

Comment 11 Ilanit Stein 2013-12-16 14:19:50 UTC
On simple rhevm, with migration certificate failure, Host reboot resolved it.

ofrenkel mentioned the problem might same as old bug we had, where new host certificate was set, but libvirt restart step was missing, and libvirt didn't get the new certificate. Thus migration fail (The libvirt secured communication fail).

Attaching host deploy logs for 2 hosts in setup, which after that migration failed.

Comment 12 Ilanit Stein 2013-12-16 14:21:24 UTC
Created attachment 837237 [details]
host1 deploy log

Comment 13 Ilanit Stein 2013-12-16 14:21:59 UTC
Created attachment 837238 [details]
host2 deploy log

Comment 14 Artyom 2013-12-16 14:26:50 UTC
Created attachment 837241 [details]
libvirt vdsm host_deploy

How I understand, restart of hosts help to solve problem in simple rhevm, but in hosted engine problem still exist, so I attach vdsm and libvirt logs from hosted host, that source in migration process. And also host-deploy logs for two hosts.

Comment 15 Alon Bar-Lev 2013-12-16 14:38:05 UTC
For the hosted engine I would like someone that own the feature to investigate.

As for the libvirtd restart, there were too many changes in this at vdsm, I think I just will stop it every deploy, will open another bug.

Comment 16 Alon Bar-Lev 2013-12-16 18:09:11 UTC
Sandro,
I am unsure this is related to host-deploy, please make sure that nothing else is wrong, check certificates and such.
We should have gotten these reports long ago for regular hosts.
Thanks,

Comment 17 Sandro Bonazzola 2013-12-18 07:59:54 UTC
(In reply to Alon Bar-Lev from comment #16)
> Sandro,
> I am unsure this is related to host-deploy, please make sure that nothing
> else is wrong, check certificates and such.
> We should have gotten these reports long ago for regular hosts.
> Thanks,

I'll do. However sounds strange it happens on regular rhevm as per comment #11.
Maybe it's not an host-deploy but a libvirt / vdsm issue not reloading certificates (on AIO and hosted-engine we don't reboot the host)

Comment 18 Alon Bar-Lev 2013-12-18 09:59:15 UTC
(In reply to Sandro Bonazzola from comment #17)
> (In reply to Alon Bar-Lev from comment #16)
> > Sandro,
> > I am unsure this is related to host-deploy, please make sure that nothing
> > else is wrong, check certificates and such.
> > We should have gotten these reports long ago for regular hosts.
> > Thanks,
> 
> I'll do. However sounds strange it happens on regular rhevm as per comment
> #11.
> Maybe it's not an host-deploy but a libvirt / vdsm issue not reloading
> certificates (on AIO and hosted-engine we don't reboot the host)

as wrote in comment#11, restart of libvirt is sufficient on these host, and at comment#14 it states that it is not working for this issue.

Comment 19 Sandro Bonazzola 2013-12-18 14:45:54 UTC
(In reply to Artyom from comment #14)
> Created attachment 837241 [details]
> libvirt vdsm host_deploy
> 
> How I understand, restart of hosts help to solve problem in simple rhevm,
> but in hosted engine problem still exist, so I attach vdsm and libvirt logs
> from hosted host, that source in migration process. And also host-deploy
> logs for two hosts.

Artyom have you moved the host to global maintenance and shutdown the VM (since migration won't work) before rebooting it?

Comment 20 Artyom 2013-12-19 12:24:21 UTC
Yes I tried also this scenario too, but Greg find what, problem, problem was that in libvirt conf file absent lines path to certificates and also was appear line listen_tls = 0, I don't know why vdsm process not update correct libvirt file, also after vdsm restart libvirt conf was not updated(when ssl = true in vdsm.conf and also all certificates exists).
So tried vdsm-tool utility.
#vdsm-tool libvirt-configure
libvirt is already configured for vdsm(still no path to certificates and listen_tls = 0)
#vdsm-tool libvirt-configure --force
Reconfiguration of libvirt is done.(appear paths to certificates and remove listen_tls = 0)

So maybe if host-deploy process use vdsm-tool libvirt-configure, it must use it with force flag(maybe you can ask Alon)
libvirt-0.10.2-29.el6.1.x86_64
vdsm-4.13.2-0.2.rc.el6ev.x86_64

Comment 21 Sandro Bonazzola 2013-12-19 13:58:45 UTC
(In reply to Artyom from comment #20)
> Yes I tried also this scenario too, but Greg find what, problem, problem was
> that in libvirt conf file absent lines path to certificates and also was
> appear line listen_tls = 0, I don't know why vdsm process not update correct
> libvirt file, also after vdsm restart libvirt conf was not updated(when ssl
> = true in vdsm.conf and also all certificates exists).
> So tried vdsm-tool utility.
> #vdsm-tool libvirt-configure
> libvirt is already configured for vdsm(still no path to certificates and
> listen_tls = 0)
> #vdsm-tool libvirt-configure --force
> Reconfiguration of libvirt is done.(appear paths to certificates and remove
> listen_tls = 0)
> 
> So maybe if host-deploy process use vdsm-tool libvirt-configure, it must use
> it with force flag(maybe you can ask Alon)
> libvirt-0.10.2-29.el6.1.x86_64
> vdsm-4.13.2-0.2.rc.el6ev.x86_64

No, hosted-engine --deploy calls "/etc/init.d/vdsmd reconfigure" as discussed on 
http://gerrit.ovirt.org/#/c/20766/

so in the end it seems a vdsm bug.

Comment 22 Yaniv Bronhaim 2013-12-30 10:33:32 UTC
probably you had modified libvirtd.conf on that host. the reconfigure verb found it already configured with listen_tls=0 that I guess was set manually. 

so as far as we encountered since the configuration change, most users expect that each installation will override the old configuration. 

the attach patch reverts to old method of libvirt configure --force flag for ovirt-3.4

Although, you opened this bug on vdsm 4.13.2 (3.3) which still overrides the configuration when "/etc/init.d/vdsmd reconfigure" is called and reset libvirtd automatically afterwards. 

can you just verify the versions ?

Comment 23 Artyom 2013-12-30 15:34:34 UTC
After checking also on hosted-engine and also on simple rhevm, revealed that redeploying of host not override libvirtd.conf file.

Comment 24 Yaniv Bronhaim 2013-12-31 14:05:07 UTC
You are right. in both we call to vdsmd reconfigure without "force" specified as an argument . this means, we don't override the conf files as should. 

The attach patch propose to omit the force additional argument for the reconfigure vdsmd verb. reconfigure will reconfigure all conf files forcefully when called. this is relevant both for ovirt-3.3 and 3.4

Comment 25 Yaniv Bronhaim 2014-01-01 08:46:44 UTC
"""Alon Bar-Lev		Dec 31 8:48 PM

Patch Set 1:

well, apparently people either manage vdsm manually or automatically... most automatically... the question is if we have regression for this version or this is a fix to next, and if so it is not needed."""

In response to Alon's comment on the PS, this is truly the treatment we had for long time and nobody had problem with that. Although some would expected the behavior to be to override the old configuration, this is not a regression in ovirt-3.3 and changing it will change the previous behavior that some might depend on.

I suggest to close the bug as known treatment that we prefer not to change, or move it to later version

Comment 26 Yaniv Bronhaim 2014-01-01 15:10:42 UTC
closing as WONTFIX for current version. 

If we decide to change the behavior please reopen