Created attachment 612186 [details] vdsm-engine-logs Description of problem: 1.Go to DataCenter and create a new network (and add a new network to cluster) 2.Go to host ,choose NIC and attach created network to it. The bridge is created, ifcfg file contains IP. RHEVM doesn't show it & host doesn't have this IP. Version-Release number of selected component (if applicable): vdsm-4.9.6-31.0.el6_3.x86_64 How reproducible: 15% Steps to Reproduce: 1. see above 2. 3. Actual results: See the logs starts: Thread-506::DEBUG::2012-09-12 16:45:17,667::BindingXMLRPC::864::vds::(wrapper) client [10.34.63.19]::call setupNetworks with ({'sit2': {'nic': 'eth1', 'netmask': '255.255.255.0', 'ipaddr': '192.168.99.5', 'bridged': 'true', 'STP': 'no'}}, {}, {'connectivityCheck': 'true', 'connectivityTimeout': 120}) {} flowID [26d7c5d7] Expected results: Additional info:
Indeed there is a bug here: network 'sit2' reports an empty addr, even though its cfg requests it 'networks': { ... 'sit2': {'iface': 'sit2', 'addr': '', 'cfg': {'IPADDR': '192.168.99.5', 'DELAY': '0', 'NM_CONTROLLED': 'no', 'NETMASK': '255.255.255.0', 'BOOTPROTO': 'none', 'STP': 'no', 'DEVICE': 'sit2', 'TYPE': 'Bridge', 'ONBOOT': 'yes'}, 'mtu': '1500', 'netmask': '', 'stp': 'off', 'bridged': True, 'gateway': '0.0.0.0', 'ports': ['eth1']}} Would you please reproduce this with vdsm-4.9.6-34.0, which added useful logging of ifup/ifdown ? Is there anything fishy in /var/log/message.
I don't have that machine any more with its messages. Maybe some partners could help here. I can't reproduce on RHEL with vdsm-4.9.6-34.0.el6_3.x86_64.
Please reopen when this reproduces.
I've tried on SI18 rhevh (20120910.0.rhev31.el6_3) with vdsm-reg-4.9.6-31.0.el6_3.noarch and RHEL 6.3 with vdsm-4.9.6-34.0.el6_3.x86_64 multiple times, but the bug does not reproduce.
Tried on si18 same HW with clean rhevh 20120910.0.rhev31.el6_3 - can't repro - tried again 8times.
happened again on si18.1, rhevh, vdsm -34
Pavel, vdsm.log is missing from this tarball - it seems to include only engine sosreport, which is of little use here. However, we've found an ovirt-node (RHEV-H) bug 846326 with serious consequence on setting networking. When you re-reproduce the bug, please either use a node image with this bug fixed, or make sure that the files under not bind-mounted /etc/libvirt/qemu/networks/ (do not appear in /proc/mounts).
Dan, I can clearly see the vdsm log is there. Please confirm that you can see it as well in package under: /tmp/logcollector/RHEVH-and-PostgreSQL-reports/10.34.63.136/10.34.63.136-sos... you need to unpack the 2nd archive and then browse to the vdsm.log (I just recheck it & I can see it there).
MainProcess|Thread-5133::DEBUG::2012-09-25 11:36:00,619::__init__::1164::Storage.Misc.excCmd::(_log ) '/sbin/ifup vvv' (cwd None) 37512 MainProcess|Thread-5133::DEBUG::2012-09-25 11:36:01,684::__init__::1164::Storage.Misc.excCmd::(_log ) FAILED: <err> = ''; <rc> = 1 The problem is, obviously (that the ifup of the bridge fails). This is not a very common thing to happen and, unfortunately, as we can see, ifup does not give any further information as to what the cause might be. In any case, this exposes a thing that we do very very wrong. ifupping everything without checking if we fail, and then, and this is the worse part, call "configWriter.createLibvirtNetwork(network, bridged, iface)", which unless skipBackup is set backs up the new not working configuration. This all creates the situation in which the configuration shows the new ifcfg we want but with the config not applied. If we were to set skipBackup on any ifup error, at least the new non-working configuration would not get backed up and the ping fail rollback will restore the old configurations. Having said that, I'd rather throw an exception on ifup error that would bubble up and return a "Configuration could not be successfully applied" to the engine.
Pavel, thanks for showing me where vdsm.log hides. /tmp is not very intuitive. Would you be kind to try to reproduce this issue after changing the first line of /etc/sysconfig/network-scripts/ifup-eth to #!/bin/bash -xv ? This is going to generate a lot of noise into the log, but may give a clue on why ifup occasionally fails. Toni, yes, the fact that we happily continue with the operation even after a crucial step (ifup) failed is questionable.
I found the culprit of the ifup mishap: Sep 25 11:36:01 slot-6 /etc/sysconfig/network-scripts/ifup-eth: Error, some other host already uses address 192.168.99.5. Sep 25 11:36:03 slot-6 ntpd[7898]: Listening on interface #14 em2, fe80::868f:69ff:fe67:1f04#123 Enabled As you see, some other host already had that IP set, so ifup failed. IP collision, then. @Pavel, I think it is not necessary anymore for you to reproduce with what Dan suggested. @Dan, This kind of error can happen in a non isolated way by admin's mistakes. I will work on putting the control for ifup return values.
Thanks Toni! Pavel, there's no need to dig any further (unless you find another case with this behavior). Given the new information, I do not think it is urgent enough to rush a fix such as http://gerrit.ovirt.org/8415 into rhev-3.1.
*** Bug 787709 has been marked as a duplicate of this bug. ***
removing the regression keyword as this is not a regression from previous versions. Having said that i think this is an important bug to fix, if we fail to ifup a device we should not continue as if nothing happened. Let's try to push it to rhev-3.2.
now behavior changed (for details see attached vdsm.log) ifup does not fail when user assigns duplicate IP the sequence used when IP is changed is: 1) ifdown bridge 2) ifdown physical interface 3) ifup physical interface 4) ifup bridge (with duplicate IP 10.34.67.1 configured) sequence above does not detect that duplicate IP was used MainProcess|Thread-5453::DEBUG::2013-02-07 13:44:26,659::misc::83::Storage.Misc.excCmd::(<lambda>) SUCCESS: <err> = '+ . /etc/init.d/functions\n++ TEXTDOMAIN=initscripts\n++ umask 022\n++ PATH=/sbin:/usr/sbin:/bin:/usr/bin\n++ export PATH\n++ \'[\' -z \'\' \']\'\n++ COLUMNS=80\n++ \'[\' -z \'\' \']\'\n+++ /sbin/consoletype\n++ CONSOLETYPE=serial\n++ \'[\' -f /etc/sysconfig/i18n if just the bridge part is ifdowned/ifuped manually duplicate IP is detected 1) ifdown bridge 2) ifup bridge (with duplicate IP 10.34.67.1 configured) + /usr/bin/logger -p daemon.err -t /etc/sysconfig/network-scripts/ifup-eth 'Error, some other host already uses address 10.34.67.1.'
Created attachment 694473 [details] vdsm.log
The fact that ifup does not fail as expected is not really an issue of vdsm - I suspect initscript or the kernel. Please open a separate bug on that, and add it to the rhev-3.2 tracker bug. Could you find other circumstances where ifup can be made to fail? Please try, and test that vdsm notices the error, and reports it back to Engine.
verified on SF9
3.2 has been released