Created attachment 633233 [details] vds_bootstrap Description of problem: connecting an ovirt engine (f17) with f18 with latest git vdsm fails fails to set iptables error trying to add bridge and logs are getting deleted from /tmp after reboot. Version-Release number of selected component (if applicable): How reproducible: always Steps to Reproduce: 1.install ovirt engine on f17 2.install vdsm on f18 3.connect host to engine Actual results: bootstrapping fails. host is non responsive. Expected results: bootstrap succeeded Additional info:
Created attachment 633234 [details] vds_installer
You took the logs long before the reboot... :) To overcome this add /etc/profile.d/vdsm-bootstrap.sh with: --- export OVIRT_LOGDIR=<some other place> ---
For the iptables, it won't work... just do not set iptables at enigne. The service has changed...
For the engine log... it is not the right logs... should be engine.log with all the debug messages...
Created attachment 633260 [details] engine.log
Created attachment 633261 [details] bootstrap
Created attachment 633262 [details] vds_bootstrap_complete
Which version of engine do you use?!?!?!
3.1.0-2 upstream
Nothing will be changed at 3.1 upstream. And so many fixes had been applied since then. 3.1 upstream is total obsolete. You should work with master.
ok. I updated the engine. bootstrap fails on Error:[Errno 2] No such file or directory: '/etc/sysconfig/network-scripts/ifcfg-em1'
Created attachment 633372 [details] bootstrap
Moran, Remind me how did you forced out network manager? This is my next stage of bootstrap rewrite... I will check fedora 18 as well.
setting to urgent for basic bootstrap to work. i suggest opening a separate bug on making it work with firewalld
(In reply to comment #14) > setting to urgent for basic bootstrap to work. > i suggest opening a separate bug on making it work with firewalld Can you please describe the cause of urgency? For example, is a solution of manually create the bridge before bootstrap acceptable for now?
(In reply to comment #15) > (In reply to comment #14) > > setting to urgent for basic bootstrap to work. > > i suggest opening a separate bug on making it work with firewalld > > Can you please describe the cause of urgency? > > For example, is a solution of manually create the bridge before bootstrap > acceptable for now?
Ohad, Please try to manually create the bridge and retry. Let's see if there are more issues.
(In reply to comment #15) > (In reply to comment #14) > > setting to urgent for basic bootstrap to work. > > i suggest opening a separate bug on making it work with firewalld > > Can you please describe the cause of urgency? > > For example, is a solution of manually create the bridge before bootstrap > acceptable for now? urgency is to resolve a working ovirt/vdsm for the planned november 14th ovirt 3.2 beta which i prefer to focus on fedora 18
(In reply to comment #18) > (In reply to comment #15) > > (In reply to comment #14) > > > setting to urgent for basic bootstrap to work. > > > i suggest opening a separate bug on making it work with firewalld > > > > Can you please describe the cause of urgency? > > > > For example, is a solution of manually create the bridge before bootstrap > > acceptable for now? > > urgency is to resolve a working ovirt/vdsm for the planned november 14th > ovirt 3.2 beta which i prefer to focus on fedora 18 Question: Do we want new bootstrap at 3.2? If that's so manual defining the bridge for now will allow me to focus on the 3.2 solution. I am not sure I will make it to support node in two weeks. But we can go with the current solution for node. So I will alter my order to implement the engine side before supporting node at bootstrap. What is the status of this 'beta' I guess that more changes will be required afterwards.
(In reply to comment #19) ... > Question: Do we want new bootstrap at 3.2? If that's so manual defining the > bridge for now will allow me to focus on the 3.2 solution. > > I am not sure I will make it to support node in two weeks. But we can go > with the current solution for node. So I will alter my order to implement > the engine side before supporting node at bootstrap. > > What is the status of this 'beta' I guess that more changes will be required > afterwards. beta is usually defined as feature freeze. after beta we only backport release blockers iirc. so maybe a good idea to focus on getting new bootstrap into working shape first, then adding node support as 2nd phase post ovirt 3.2. there is another open question and that is do we release ovirt 3.2 without fedora 18 support.
Waiting for vdsm's addNetwork script to work on fedora before continue this.
tar is not available at vanilla installation, revert to using simple tar python implementation. libselinux-python issue resolved. So all ready except of the addNetwork support. I even queried the network manager for interface parameters in order to feed some sane date into addNetwork, experimental but should be enough for start.
Forgot to mention that firewalld is also unsupported by both engine and bootstrap.
(In reply to comment #23) > Forgot to mention that firewalld is also unsupported by both engine and > bootstrap. I don't know how/when but now fedora do install iptables service so all is working.
(In reply to comment #21) > Waiting for vdsm's addNetwork script to work on fedora before continue this. Alon, can you please add more info on what problems are you having?
(In reply to comment #25) > (In reply to comment #21) > > Waiting for vdsm's addNetwork script to work on fedora before continue this. > > Alon, can you please add more info on what problems are you having? 1. When execute addNetwork to create the ovirtmgmt bridge, the bridge is created with proper network settings, the eth0 is modified to be a member of the bridge, however the eth0 is at DOWN state. Executing ifup eth0 solve the issue. 2. After reboot, network service is stopped by vdsm and not started. Manual start of network service does not help, as vdsm keeps stopping it. Removing the network dependency from vdsm service solves this. However I do not understand why vdsm does not ifdown/ifup interfaces and messes up with the network service.
(In reply to comment #26) > (In reply to comment #25) > > (In reply to comment #21) > > > Waiting for vdsm's addNetwork script to work on fedora before continue this. > > > > Alon, can you please add more info on what problems are you having? > > 1. When execute addNetwork to create the ovirtmgmt bridge, the bridge is > created with proper network settings, the eth0 is modified to be a member of > the bridge, however the eth0 is at DOWN state. Executing ifup eth0 solve the > issue. > I understand that Igor can not reproduce this and is working with you to understand the problem. > 2. After reboot, network service is stopped by vdsm and not started. Manual > start of network service does not help, as vdsm keeps stopping it. Removing > the network dependency from vdsm service solves this. However I do not > understand why vdsm does not ifdown/ifup interfaces and messes up with the > network service. We are aware of the problem and Igor should send a patch to remove the network service from systemd. About handling the network more gently and not using brute force with network service stop/start, we are in agreement and have two bugs that implies this should be fixed - bug 871481 and bug 877006
(In reply to comment #27) > (In reply to comment #26) > > (In reply to comment #25) > > > (In reply to comment #21) > > > > Waiting for vdsm's addNetwork script to work on fedora before continue this. > > > > > > Alon, can you please add more info on what problems are you having? > > > > 1. When execute addNetwork to create the ovirtmgmt bridge, the bridge is > > created with proper network settings, the eth0 is modified to be a member of > > the bridge, however the eth0 is at DOWN state. Executing ifup eth0 solve the > > issue. > > > > I understand that Igor can not reproduce this and is working with you to > understand the problem. > We found the issue and reported a bug on it 879180
OK, fedora-18 beta. /etc/sysconfig/network - exists. However the addNetwork behaves even worse. it completely non responsive, even kill -9 does not work! Now that new bootstrap is merged, there is no need for me to debug the network stuff... I will be happy to assist if addNetwork is not used properly. Alon --- ps --- root 1107 1099 2 20:01 ? 00:00:12 /bin/python /tmp/ovirt-AxAJVfPEsc/pythonlib/otopi/__main__.py BASE/pluginPath=str:/tmp/ovirt-AxAJVfPEsc/otopi-plugins APPEND:BASE/pluginGroups=str:ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True root 1429 1 0 20:05 ? 00:00:00 /usr/sbin/libvirtd root 2182 1107 0 20:05 ? 00:00:00 /bin/bash /usr/share/vdsm/addNetwork ovirtmgmt eth0 BOOTPROTO=dhcp ONBOOT=yes UUID=79b9e740-c950-4471-aad1-017efb85dbb8 blockingdhcp=true root 2186 2182 0 20:05 ? 00:00:00 /usr/bin/python -m configNetwork add ovirtmgmt vlan= bonding= nics=eth0 BOOTPROTO=dhcp ONBOOT=yes UUID=79b9e740-c950-4471-aad1-017efb85dbb8 blockingdhcp=true root 2205 2186 0 20:05 ? 00:00:00 /bin/bash /etc/sysconfig/network-scripts/ifdown-eth ifcfg-ovirtmgmt root 2254 2205 0 20:05 ? 00:00:00 /bin/sh /etc/sysconfig/network-scripts/ifdown-post ifcfg-ovirtmgmt root 2319 2254 0 20:05 ? 00:00:00 /usr/bin/python -Es /usr/bin/firewall-cmd --remove-interface=ovirtmgmt --- --- messages --- Nov 28 20:05:24 alonbl5 systemd-vdsmd[2308]: Note: Forwarding request to 'systemctl disable libvirt-guests.service'. Nov 28 20:05:24 alonbl5 systemd[1]: Reloading. Nov 28 20:05:24 alonbl5 systemd[1]: tuned.service has a D-Bus service name specified, but is not of type dbus. Ignoring. Nov 28 20:05:24 alonbl5 systemd-vdsmd[2308]: vdsm: libvirt already configured for vdsm [ OK ] Nov 28 20:05:24 alonbl5 systemd-vdsmd[2308]: Starting multipathd... Nov 28 20:05:24 alonbl5 systemd-vdsmd[2308]: Redirecting to /bin/systemctl start multipathd.service Nov 28 20:05:24 alonbl5 systemd[1]: Started Device-Mapper Multipath Device Controller. Nov 28 20:05:24 alonbl5 systemd-vdsmd[2308]: Starting iscsid: Nov 28 20:05:24 alonbl5 systemd-vdsmd[2308]: Redirecting to /bin/systemctl start libvirtd.service Nov 28 20:05:24 alonbl5 systemd[1]: Started Virtualization daemon. Nov 28 20:05:24 alonbl5 systemd[1]: Stopping Virtual Desktop Server Manager... Nov 28 20:05:24 alonbl5 systemd[1]: Stopping LSB: Bring up/down networking... Nov 28 20:05:24 alonbl5 systemd-vdsmd[2308]: Stopping network (via systemctl): Nov 28 20:05:24 alonbl5 systemd[1]: Stopped Virtual Desktop Server Manager. Nov 28 20:05:24 alonbl5 network[2454]: Shutting down interface eth0: Error: Device 'eth0' (/org/freedesktop/NetworkManager/Devices/0) disconnecting failed: This device is not active Nov 28 20:05:24 alonbl5 network[2454]: [FAILED] Nov 28 20:05:24 alonbl5 network[2454]: Shutting down loopback interface: [ OK ] Nov 28 20:05:24 alonbl5 NetworkManager[589]: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt ... Nov 28 20:05:24 alonbl5 NetworkManager[589]: ifcfg-rh: error: Bridge connections are not yet supported Nov 28 20:05:24 alonbl5 NetworkManager[589]: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt ... Nov 28 20:05:24 alonbl5 NetworkManager[589]: ifcfg-rh: error: Bridge connections are not yet supported Nov 28 20:05:24 alonbl5 systemd[1]: Stopped LSB: Bring up/down networking. ---
Disabling firewalld (firewall-cmd) was a workaround... at least installation continued. However the bug of need ifup eth0 still exists.
Now the bridge is even not created: 2012-11-28 20:30:17 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.bridge plugin.executeRaw:324 execute: ['/usr/share/vdsm/addNetwork', 'ovirtmgmt', '', '', u'eth0', 'onboot=yes', 'bootproto=dhcp', 'blockingdhcp=true'], env=None 2012-11-28 20:30:20 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.bridge plugin.executeRaw:341 execute-result: ['/usr/share/vdsm/addNetwork', 'ovirtmgmt', '', '', u'eth0', 'onboot=yes', 'bootproto=dhcp', 'blockingdhcp=true'], rc=0 2012-11-28 20:30:20 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.bridge plugin.execute:388 execute-output: ['/usr/share/vdsm/addNetwork', 'ovirtmgmt', '', '', u'eth0', 'onboot=yes', 'bootproto=dhcp', 'blockingdhcp=true'] stdout: 2012-11-28 20:30:20 DEBUG otopi.plugins.ovirt_host_deploy.vdsm.bridge plugin.execute:393 execute-output: ['/usr/share/vdsm/addNetwork', 'ovirtmgmt', '', '', u'eth0', 'onboot=yes', 'bootproto=dhcp', 'blockingdhcp=true'] stderr: INFO:root:Adding network ovirtmgmt with vlan=, bonding=, nics=['eth0'], bondingOptions=None, mtu=None, bridged=True, options={'bootproto': 'dhcp', 'blockingdhcp': 'true', 'onboot': 'yes'} WARNING:root:usage: ifdown <device name> INFO:root: Determining IP information for ovirtmgmt... failed. WARNING:root:/etc/sysconfig/network-scripts/ifup-eth: line 285: 1241 Terminated /sbin/dhclient ${DHCLIENTARGS} ${DEVICE} arping: recvfrom: Network is down Cannot find device "ovirtmgmt" Cannot find device "ovirtmgmt" libvir: Network Driver error : Network not found: no network with matching name 'vdsm-ovirtmgmt' Exception in thread libvirtEventLoop (most likely raised during interpreter shutdown): Traceback (most recent call last): File "/usr/lib64/python2.7/threading.py", line 551, in __bootstrap_inner File "/usr/lib64/python2.7/threading.py", line 504, in run File "/usr/share/vdsm/libvirtev.py", line 406, in virEventLoopPureRun File "/usr/share/vdsm/libvirtev.py", line 233, in run_loop File "/usr/share/vdsm/libvirtev.py", line 212, in run_once <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'time'
Hi, Notes from the last test with new bootstrap code (Engine F17 + ovirt nightly repo) and Fedora 18 as host (BETA ISO), bootstrap worked and reboot happened. However, had to do manual steps to see the host UP, please see below. - bridge is OK after reboot - iptables is blocking communication between vdsm and engine (I had to do iptables -F) - I had to manual start vdsm daemon After these steps the Fedora 18 host became UP. This is a nice improvement, the previous Fedora 18 the bridge couldn't survive the reboot. Additional info: - the service firewalld is not running. - even removing network.service from /lib/systemd/system/vdsmd.service the vdsm daemon doesn't start automatically from reboot.
(In reply to comment #32) > - iptables is blocking communication between vdsm and engine (I had to do > iptables -F) This is easy to debug. Refer to iptables -L and see what you have. Refer to /etc/sysconfig/iptables and see what you should. Run service iptables restart and see if anything change. My guess is that firewalld is still doing something although you think it is disabled.
Yes, forgot to share the output of iptables. I am checking, maybe I am missing something around... Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 INPUT_direct all -- 0.0.0.0/0 0.0.0.0/0 INPUT_ZONES all -- 0.0.0.0/0 0.0.0.0/0 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 FORWARD_direct all -- 0.0.0.0/0 0.0.0.0/0 FORWARD_ZONES all -- 0.0.0.0/0 0.0.0.0/0 ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0 REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited Chain OUTPUT (policy ACCEPT) target prot opt source destination OUTPUT_direct all -- 0.0.0.0/0 0.0.0.0/0 Chain FORWARD_ZONES (1 references) target prot opt source destination Chain FORWARD_direct (1 references) target prot opt source destination Chain FWDO_ZONE_external (0 references) target prot opt source destination FWDO_ZONE_external_deny all -- 0.0.0.0/0 0.0.0.0/0 FWDO_ZONE_external_allow all -- 0.0.0.0/0 0.0.0.0/0 Chain FWDO_ZONE_external_allow (1 references) target prot opt source destination ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 Chain FWDO_ZONE_external_deny (1 references) target prot opt source destination Chain INPUT_ZONES (1 references) target prot opt source destination Chain INPUT_direct (1 references) target prot opt source destination Chain IN_ZONE_dmz (0 references) target prot opt source destination IN_ZONE_dmz_deny all -- 0.0.0.0/0 0.0.0.0/0 IN_ZONE_dmz_allow all -- 0.0.0.0/0 0.0.0.0/0 Chain IN_ZONE_dmz_allow (1 references) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 ctstate NEW Chain IN_ZONE_dmz_deny (1 references) target prot opt source destination Chain IN_ZONE_external (0 references) target prot opt source destination IN_ZONE_external_deny all -- 0.0.0.0/0 0.0.0.0/0 IN_ZONE_external_allow all -- 0.0.0.0/0 0.0.0.0/0 Chain IN_ZONE_external_allow (1 references) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 ctstate NEW Chain IN_ZONE_external_deny (1 references) target prot opt source destination Chain IN_ZONE_home (0 references) target prot opt source destination IN_ZONE_home_deny all -- 0.0.0.0/0 0.0.0.0/0 IN_ZONE_home_allow all -- 0.0.0.0/0 0.0.0.0/0 Chain IN_ZONE_home_allow (1 references) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 ctstate NEW ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:631 ctstate NEW ACCEPT udp -- 0.0.0.0/0 224.0.0.251 udp dpt:5353 ctstate NEW ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:137 ctstate NEW ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:138 ctstate NEW Chain IN_ZONE_home_deny (1 references) target prot opt source destination Chain IN_ZONE_internal (0 references) target prot opt source destination IN_ZONE_internal_deny all -- 0.0.0.0/0 0.0.0.0/0 IN_ZONE_internal_allow all -- 0.0.0.0/0 0.0.0.0/0 Chain IN_ZONE_internal_allow (1 references) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 ctstate NEW ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:631 ctstate NEW ACCEPT udp -- 0.0.0.0/0 224.0.0.251 udp dpt:5353 ctstate NEW ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:137 ctstate NEW ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:138 ctstate NEW Chain IN_ZONE_internal_deny (1 references) target prot opt source destination Chain IN_ZONE_public (0 references) target prot opt source destination IN_ZONE_public_deny all -- 0.0.0.0/0 0.0.0.0/0 IN_ZONE_public_allow all -- 0.0.0.0/0 0.0.0.0/0 Chain IN_ZONE_public_allow (1 references) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 ctstate NEW ACCEPT udp -- 0.0.0.0/0 224.0.0.251 udp dpt:5353 ctstate NEW Chain IN_ZONE_public_deny (1 references) target prot opt source destination Chain IN_ZONE_work (0 references) target prot opt source destination IN_ZONE_work_deny all -- 0.0.0.0/0 0.0.0.0/0 IN_ZONE_work_allow all -- 0.0.0.0/0 0.0.0.0/0 Chain IN_ZONE_work_allow (1 references) target prot opt source destination ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 ctstate NEW ACCEPT udp -- 0.0.0.0/0 224.0.0.251 udp dpt:5353 ctstate NEW ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 udp dpt:631 ctstate NEW Chain IN_ZONE_work_deny (1 references) target prot opt source destination Chain OUTPUT_direct (1 references) target prot opt source destination
(In reply to comment #34) > Yes, forgot to share the output of iptables. I am checking, maybe I am > missing something around... This is firewalld rules. Please make sure you stop and disable firewalld.
Hi Alon, Correct, just found that firewalld was randomly crashing and affecting the system. Here the rules with it disabled, we are fine about iptables. # iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere ACCEPT tcp -- anywhere anywhere tcp dpt:54321 ACCEPT tcp -- anywhere anywhere tcp dpt:ssh ACCEPT udp -- anywhere anywhere udp dpt:snmp ACCEPT tcp -- anywhere anywhere tcp dpt:16514 ACCEPT tcp -- anywhere anywhere multiport dports xprtld:6166 ACCEPT tcp -- anywhere anywhere multiport dports 49152:49216 REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination REJECT all -- anywhere anywhere PHYSDEV match ! --physdev-is-bridged reject-with icmp-host-prohibited
commit 7f79bea629f21d9165f92dddb90b4d41183524ef Author: Alon Bar-Lev <alonbl> Date: Thu Nov 29 21:21:08 2012 +0200 packaging: add net-tools dependency In fedora-18 net-tools(ifconfig) are not within vanilla. In future we should remove the ifconfig usage entirely. Change-Id: I74b238156f3751214fc9555f035feecb39daa967 Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=869963 Signed-off-by: Alon Bar-Lev <alonbl> http://gerrit.ovirt.org/#/c/9586/1
commit bd00153fcfad1034a560b018cf18c356230326f3 Author: Alon Bar-Lev <alonbl> Date: Thu Nov 29 21:40:28 2012 +0200 iptables: disable and stop firewalld to avoid conflict Change-Id: I423846092615a25352539cb0abc19088408ac93e Signed-off-by: Alon Bar-Lev <alonbl> http://gerrit.ovirt.org/#/c/9587/
commit f7e5a78fe27064c1e97b71a59c12063f1f89a0e0 Author: Alon Bar-Lev <alonbl> Date: Thu Nov 29 22:19:25 2012 +0200 bridge: temporary workaround for firewalld behaviour Currently, addNetwork just hangs if firewalld is up while creating the bridge. So as temporary workaround stop firewalld before creating the bridge. Change-Id: Ib6c248a7c49492f7889fc71f11ae1dad1e3434ae Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=869963 Signed-off-by: Alon Bar-Lev <alonbl> http://gerrit.ovirt.org/9588
Using fedora-18-beta DVD ovirt-engine master otopi master ovirt-host-deploy master Provided I install manually net-tools, I have a working setup!
Problem with vdsm/network manager are being worked, no more issues in host deployment.
3.2 beta built, moving to ON_QA status to allow testing