Bug 810272
Summary: | Live migration not working (connection refused) | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] oVirt | Reporter: | marcik4 | ||||
Component: | ovirt-node | Assignee: | Mike Burns <mburns> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | unspecified | CC: | abaron, acathrow, bazulay, danken, dyasny, fdeutsch, iheim, jboggs, jwyatt, mburns, mgoldboi, mishu, mivaho, ovirt-bugs, ovirt-maint, ykaul | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.4.0 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-06-14 13:35:38 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
marcik4
2012-04-05 14:00:04 UTC
Can you confirm that iptables rules are correct and that the two hosts can resolve eachothers names. Yes - iptables rules allow all traffic and hosts can resolve each others. During migration, source host tries to connect to destination libvirtd on port 16514 (it tries to connect to "qemu+tls://host/system" aka "host:16514" as in log). But with default config, libvirtd listens only on port 16509. Enabling tls in config, resulted in libvirt listening with tls on port 16514 and migration started to work. I've also tried to redirect incoming traffic from port 16514 to 16509 but it ended with tls errors so i suspect that libvirtd on port 16509 doesn't support tls connections. So as I see it, to fix it either tls has to be enabled in libvirtd config (as I did myself) or migration process has to be changed not to use tls. I suppose that the two nodes were managed by oVirt Engine, is this correct? Yes. All on fresh installs. Okay, then I'm reassigning this to the vdsm component, as vdsm is responsible for this file after Node is registered to an Engine. (In reply to comment #5) > Okay, then I'm reassigning this to the vdsm component, as vdsm is responsible > for this file after Node is registered to an Engine. indeed, could you please try the following: - we would like to get the output of the following files before (current state) and after the following command: - service vdsmd reconfigure * /etc/vdsm/vdsm.conf * /etc/libvirt/libvirt.conf * /etc/libvirt/qemu.conf ** please revert your changes of-course. also, what version of vdsm are you working with ? Could you share your /etc/libvirt/libvirtd.conf at the source and at the destination hosts? They should have had a line showing listen_tls=1 # by vdsm If not, does running /lib/systemd/systemd-vdsmd fix your config? *** Bug 824605 has been marked as a duplicate of this bug. *** Sorry, don't have testing environment anymore. But as all my testing were done on clean installs (latest node, fedora16 and latest packages installed according to guide on website; even reinstalled everything, just be sure) it should be very easy to reproduce in lab. I can assemble environment for testing again, but it could take a few weeks. Created attachment 586665 [details]
VDSM logs of both nodes the fail to migrate towards each other.
I've reported this problem to the vdsm and node list a few times. Just now found this bug report :) In my /etc/libvirt/libvirtd.conf on the node I'm migrating away from the line says: listen_tls = 0 On the node I'm migrating towards it also says: listen_tls = 0 Or do you want me to attache the files? I've attached my vdsm logs from both nodes at the moment of migration so you can see it the problems the same as mentioned by marcik4. See entry above this one. I reported this same bug in the oVirt Node section not knowing it was a vdsm issue. It really is a simple fix. The default for libvirtd is to enable tls. You actually have to add the line listen_tls=0 to disable it. Deleteing or commenting that line out fixes the issue. Whoever customized that config file for the install just needs to not disable tls on purpose. Yes, thnx Jacob, Remarking "listen_tls= 0" did the trick. Funny because the instrucions for installing VDSM says just that, to remark it before starting. Strange that the node does otherwise :) I've put it in /config/etc/libvirt/libvirtd.conf so a reboot isn't a problem A migration between my 2 nodes worked perfectly. I'll post it to the list as well so others can find it in the archives While I still contend that vdsm should be setting all values correctly that it depends on (and I'm not closing this bug for that reason), I'll submit a patch to ovirt-node that makes sure it's set the same way. I'm with you on the vdsm part should be doing this, still thnx for patching ovirt-node so one way or another it get's fixed. Hasn't this come up before or is everyone migrating between installed VDSM's and not between nodes? ovirt-node patch: http://gerrit.ovirt.org/4813 (In reply to comment #15) > I'm with you on the vdsm part should be doing this, still thnx for patching > ovirt-node so one way or another it get's fixed. > > Hasn't this come up before or is everyone migrating between installed VDSM's > and not between nodes? I'm surprised more people haven't complained about this, although maybe people figured out what the problem was and just haven't reported it. Or maybe things are just so stable that no one has needed to migrate to a node. Another possibility is that no one is really running in production yet. For whatever reason, this was never flagged as an issue before. (In reply to comment #15) > I'm with you on the vdsm part should be doing this, still thnx for patching > ovirt-node so one way or another it get's fixed. > > Hasn't this come up before or is everyone migrating between installed VDSM's > and not between nodes? Vdsm tries to touch libvirtd.conf as little as possible, so it does not set listen_tls=1, as this is the libvirt default. Vdsm assumes that if someone/something else changes the default, it knows what it is doing. Apparently this was not the case this time. I believe that Vdsm's minimalistic approach is the Right Thing to do, and about to close the Vdsm part of this bug. All parts of Ovirt should be in agreement on what to use. listen_tls on or off so to speak.
So VDSM does not want to touch libvirtd.conf and therefore uses it's default, which is listen_tls=1. The node on the other hand doesn't touch anything as well and thus also has listen_tls=1. But a migration between nodes goes wrong. The moment you force it to listen_tls=0 we can migrate.
>Vdsm assumes that if someone/something else changes the default,
> it knows what it is doing. Apparently this was not the case this time.
As far as I can see no one/nothing touched the defaults. We touch the defaults now because otherwise we can not migrate between nodes because of the defaults :)
I'm not saying therefore it is a VSDM problem just that no one changed the defaults.
(In reply to comment #18) > All parts of Ovirt should be in agreement on what to use. listen_tls on or > off so to speak. > > So VDSM does not want to touch libvirtd.conf and therefore uses it's > default, which is listen_tls=1. The node on the other hand doesn't touch > anything as well and thus also has listen_tls=1. But a migration between > nodes goes wrong. The moment you force it to listen_tls=0 we can migrate. > > >Vdsm assumes that if someone/something else changes the default, > > it knows what it is doing. Apparently this was not the case this time. > > As far as I can see no one/nothing touched the defaults. We touch the > defaults now because otherwise we can not migrate between nodes because of > the defaults :) > > I'm not saying therefore it is a VSDM problem just that no one changed the > defaults. I think you have that backwards, Michel. The variable listen_tls defaults to 1 (http://libvirt.org/remote.html) which is what we want. By simply NOT adding any configuration to libvirtd.conf file you fix the problem. Someone, at some point, changed libvirtd.conf adding the line listen_tls=0 thereby disabling the ability to use TLS for migration. It should be easy to determine if this should be handled by the VDSM team or the oVirt Node team by finding out who added listen_tls=0 to the file. It's not a matter of someone doing something. It's a matter of stopping someone from doing what they're already doing. Before calling me backward :) Just kidding ;) Maybe my explanation was off but what I said was that in the libvirtd.conf file the line listen_tls=0 was remarked by default and thus the default listen_tls=1 was in effect. With this I could not migrate. After manually removing the remark so listen_tls=0 was used I could migrate. So as far as I can see no one changed the default on the node at install. Default node install? Then listen_tls=1 but migration not working Maybe we are talking about different things here. oooh no sorry I'm confusing myself now. Scratch that last comment of mine. I am backwards mburns - it sounds like no change is needed (this bug is in POST with no patch in commnents) itamar -- see comment 16 sorry - i missed that patch. i see it was merged, shouldn't this be in MODIFIED then? yes listen_tls=1 requires certificates to reside in /etc/pki/CA - but they aren't part of Node and only deployed when Node is registered with Engine. Libvirt will fail to start if the certificates are not available and further more parts of Node's init scripts will also fail. Vdsm should be the component enabling listen_tls after it deployed the certificates. Node should set it to listen_tls=0 as it does not provide the required certificates. (bug #829267) not sure i agree. I think a better approach would be for libvirt to not start until it has a certficiate configured by vdsm bootstrap. danken - thoughts? (In reply to comment #28) > not sure i agree. > I think a better approach would be for libvirt to not start until it has a > certficiate configured by vdsm bootstrap. I'm afraid we need libvirt to already run during bootstrap - we use libvirt to define host management network. (In reply to comment #29) > I'm afraid we need libvirt to already run during bootstrap - we use libvirt > to define host management network. so there is no way around node disabling listen_tls for vdsm to enable it? how does it work on a normal fedora if libvirt is installed on it? F17 libvirt works out of the box. I haven't heard a convincing reason why ovirt-node should touch it. Once vdsm takes responsibility, it should bring in the certs, reconfigure libvirt, and restart it. (In reply to comment #30) > (In reply to comment #29) > > I'm afraid we need libvirt to already run during bootstrap - we use libvirt > > to define host management network. > > so there is no way around node disabling listen_tls for vdsm to enable it? > how does it work on a normal fedora if libvirt is installed on it? Both, Fedora and oVirt Node, are having an commented out listen_tls=0 in their libvirtd.conf, Node's behavior differs, because Node's /etc/sysconfig/libvirtd differs and passes the "--listen" argument to libvirtd. So it's working on Fedora wihout errors because it's not listening on tcp at all by default (which is required for listen_tls to have any effect [afaiu]). The change to pass "--listen" to the libvirt daemon is introduced when node is build. As far as I understand the situation, there are a few options to prevent the error: 1. stick to F17 defaults and VDSM provides certificates and enables --listen if required 2. at node build time, node enables --listen and sets listen_tls=0 because there are no certificates yet In both situations libvirt should still be listening on the unix socket. Similar to what Dan said in comment #31, I don't know a reason why libvirt should be listening on external interfaces by default, so I'd go with 2. (In reply to comment #32) ... > 1. stick to F17 defaults and VDSM provides certificates and enables --listen > if required > > 2. at node build time, node enables --listen and sets listen_tls=0 because > there are no certificates yet > > In both situations libvirt should still be listening on the unix socket. > Similar to what Dan said in comment #31, I don't know a reason why libvirt > should be listening on external interfaces by default, so I'd go with 2. you meant you'd go with option 1, right? (In reply to comment #33) > (In reply to comment #32) > ... > > 1. stick to F17 defaults and VDSM provides certificates and enables --listen > > if required > > > > 2. at node build time, node enables --listen and sets listen_tls=0 because > > there are no certificates yet > > > > In both situations libvirt should still be listening on the unix socket. > > Similar to what Dan said in comment #31, I don't know a reason why libvirt > > should be listening on external interfaces by default, so I'd go with 2. > > you meant you'd go with option 1, right? yes - a typo. (In reply to comment #34) > (In reply to comment #33) > > (In reply to comment #32) > > ... > > > 1. stick to F17 defaults and VDSM provides certificates and enables --listen > > > if required The following patch prevents node from touching libvirt config files at build time: http://gerrit.ovirt.org/#/c/5122/ |