Bug 988397 - ovirt-node post-installation setup networks fails when NetworkManager is running
Summary: ovirt-node post-installation setup networks fails when NetworkManager is running
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.3
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: ---
: 3.3
Assignee: Antoni Segura Puimedon
QA Contact: Haim
URL:
Whiteboard: network
Depends On: 988916 988995
Blocks: 918494
TreeView+ depends on / blocked
 
Reported: 2013-07-25 13:51 UTC by Mike Burns
Modified: 2013-09-23 07:34 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-23 07:34:42 UTC
oVirt Team: ---
Embargoed:


Attachments (Terms of Use)
supervdsm (47.73 KB, text/plain)
2013-07-25 13:51 UTC, Mike Burns
no flags Details
vdsm (198.93 KB, text/plain)
2013-07-25 13:51 UTC, Mike Burns
no flags Details
engine log (3.94 MB, text/plain)
2013-07-25 14:04 UTC, Mike Burns
no flags Details
supervdsm f19 static ip (53.99 KB, text/plain)
2013-07-26 15:05 UTC, Mike Burns
no flags Details
vdsm f19 static ip (56.23 KB, text/plain)
2013-07-26 15:05 UTC, Mike Burns
no flags Details
webui of the process of fail then magically UP (170.79 KB, image/png)
2013-07-26 19:42 UTC, Antoni Segura Puimedon
no flags Details
ovirt-node-fix supervdsm.log (31.60 KB, text/x-log)
2013-07-26 19:57 UTC, Antoni Segura Puimedon
no flags Details
ovirt-node-fix vdsm.log (220.78 KB, text/x-log)
2013-07-26 20:00 UTC, Antoni Segura Puimedon
no flags Details
systemd-journald-vdsmd.log (6.42 KB, text/x-log)
2013-07-26 20:08 UTC, Antoni Segura Puimedon
no flags Details
journal.log (123.29 KB, text/x-log)
2013-07-26 20:19 UTC, Antoni Segura Puimedon
no flags Details
json-journal.log (741.06 KB, text/x-log)
2013-07-26 20:32 UTC, Antoni Segura Puimedon
no flags Details

Description Mike Burns 2013-07-25 13:51:29 UTC
Created attachment 778268 [details]
supervdsm

Description of problem:
install ovirt-node and configure network without bridges.  

apply workarounds for other issues (enable firewall port 54321, set root password and ssh password authentication manually using /usr/libexec/ovirt-config-password)

add the host from ovirt-engine add host flow

Version-Release number of selected component (if applicable):
ovirt-node-iso-3.0.0-5.0.5.vdsm.fc19.iso
vdsm-4.12.0-0.1.rc3

How reproducible:
always (on f19, haven't tried el6 yet)

Steps to Reproduce:
1.install ovirt-node
2.set root password and ssh passwd auth
3.add host from engine

Actual results:
fails to setup the management network

Expected results:
sets up management network

Additional info:

Comment 1 Mike Burns 2013-07-25 13:51:52 UTC
Created attachment 778269 [details]
vdsm

Comment 2 Mike Burns 2013-07-25 14:04:22 UTC
Created attachment 778271 [details]
engine log

Comment 3 Mike Burns 2013-07-25 15:42:32 UTC
This also fails on EL6 using ovirt-node-iso-3.0.0-5.1.5.vdsm.el6.iso

Comment 4 Dan Kenigsberg 2013-07-25 19:17:19 UTC
Does this work with static (non-dhcp) addresses? Does using ovirt-host-deploy with http://gerrit.ovirt.org/#/c/17306/ make the issue hide away?

MainProcess|Thread-15::DEBUG::2013-07-25 13:22:50,803::ifcfg::651::Storage.Misc.excCmd::(_ifup) SUCCESS: <err> = '/etc/dhcp/dhclient.d/sourceRoute.sh: line 6: /var/run/vdsm/sourceRoutes/1374758570: Permission denied\n'; <rc> = 0
MainProcess|Thread-15::ERROR::2013-07-25 13:22:50,803::supervdsmServer::91::SuperVdsm.ServerCallback::(wrapper) Error in setupNetworks
Traceback (most recent call last):
  File "/usr/share/vdsm/supervdsmServer.py", line 89, in wrapper
  File "/usr/share/vdsm/supervdsmServer.py", line 187, in setupNetworks
  File "/usr/share/vdsm/configNetwork.py", line 541, in setupNetworks
ConfigNetworkError: (10, 'connectivity check failed')


This is, again, the yet-unexplained effect of systemd stopping vdsmd while in setupNetworks(). See bug 988004.

Thread-15::DEBUG::2013-07-25 13:20:32,473::BindingXMLRPC::979::vds::(wrapper) client [172.31.0.3]::call setupNetworks with ({'ovirtmgmt': {'nic': 'eth0', 'bootproto': 'dhcp', 'STP': 'no', 'bridged': 'true'}}, {}, {'connectivityCheck': 'true', 'connectivityTimeout': 120}) {}
Thread-16::DEBUG::2013-07-25 13:20:32,476::BindingXMLRPC::979::vds::(wrapper) client [172.31.0.3]::call ping with () {}
Thread-16::DEBUG::2013-07-25 13:20:32,476::BindingXMLRPC::986::vds::(wrapper) return ping with {'status': {'message': 'Done', 'code': 0}}
MainThread::INFO::2013-07-25 13:20:52,126::vdsm::101::vds::(run) (PID: 4730) I am the actual vdsm 4.12.0-0.1.rc3.fc19 localhost.localdomain (3.9.9-302.fc19.x86_64)

Comment 5 Mike Burns 2013-07-25 19:44:32 UTC
(In reply to Dan Kenigsberg from comment #4)
> Does this work with static (non-dhcp) addresses? Does using
> ovirt-host-deploy with http://gerrit.ovirt.org/#/c/17306/ make the issue
> hide away?
> 

Static does not help.  

running with the newer ovirt-host-deploy does not help either (note:  ovirt-host-deploy-offline is used in ovirt-node)

Comment 6 Dan Kenigsberg 2013-07-25 20:02:38 UTC
I'm a bit out of touch with ovirt-node these days; can you think of a way to disable NetworkManage and enable legacy "network" service before setupNetwork takes place?

Comment 7 Mike Burns 2013-07-25 20:33:06 UTC
(In reply to Dan Kenigsberg from comment #6)
> I'm a bit out of touch with ovirt-node these days; can you think of a way to
> disable NetworkManage and enable legacy "network" service before
> setupNetwork takes place?

No need to disable it, we don't use NetworkManager at all in ovirt-node currently.  (it's on the roadmap, but not used in any way now).

Comment 8 Dan Kenigsberg 2013-07-26 08:23:12 UTC
Mike, would you attach logs (vdsm.log, supervdsm.log, journalctl) with static addresses? We need any hint we can get.

Comment 9 Mike Burns 2013-07-26 12:59:43 UTC
Need to re-run the scenario to do get the logs, so it will be a little bit.

As for steps to reproduce, use the isos from ovirt.org/beta/iso

* boot the iso to the installer and follow the screens to install
* reboot
* when it comes up again, login as admin with the password you provided in the installer
* Press F2 to drop to a shell [1]
* run "/usr/libexec/ovirt-config-password" (interactive command line tool)
** run "set_ssh_password_authentication" in the tool, answer "Y" to the question
** run "set_root_password" in the tool, follow prompts to set root password
** run "quit" to exit the tool
* open firewall port
** f19:  firewall-cmd --zone=public --add-port 54321/tcp
** EL6:  iptables -A INPUT -p tcp --dport 54321 -j ACCEPT
* exit to leave the shell and go back to the TUI
* configure networking by selecting the Network tab
** Choose your nic
** select DHCP or static
** fill out fields if static
** choose save

* Go to oVirt Engine
* Add a new host with the root password and ip/hostname of the node


[1] the firewall and password setting issues are bugs that are being worked on

Comment 10 Mike Burns 2013-07-26 15:05:16 UTC
Created attachment 778794 [details]
supervdsm f19 static ip

el6 dhcp and static worked today

Comment 11 Mike Burns 2013-07-26 15:05:48 UTC
Created attachment 778795 [details]
vdsm f19 static ip

Comment 12 Mike Burns 2013-07-26 15:22:58 UTC
Thread-15::ERROR::2013-07-26 15:14:29,132::API::1261::vds::(setupNetworks) connectivity check failed
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1259, in setupNetworks
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
ConfigNetworkError: (10, 'connectivity check failed')
Thread-15::DEBUG::2013-07-26 15:14:29,133::BindingXMLRPC::986::vds::(wrapper) return setupNetworks with {'status': {'message': 'connectivity check failed', 'code': 10}}

Comment 13 Antoni Segura Puimedon 2013-07-26 17:03:19 UTC
Reproduced in ovirt-node f19. Funny thing is that webui reports first setupnetworks failure on install. Then, without any action on my part, after a while, it notices that it is reachable (the net was properly created) and sets it to non-operational because it doesn't see ovirtmgmt (was correctly created but rolled back).

Comment 14 Antoni Segura Puimedon 2013-07-26 17:46:25 UTC
Remaining tasks:

Test with the new iso resulting from the resolution of #988916 and if there are no suprises, this one will be closed.

Comment 15 Antoni Segura Puimedon 2013-07-26 19:40:03 UTC
Well, I tested it with the fix in http://resources.ovirt.org/releases/beta/iso/ovirt-node-iso-3.0.0-5.1.6.vdsm.fc19.iso

The result is that ovirtmgmt is created and is up, but setupNetworks operation is marked as failed and the networks are not persisted (I guess that ovirt-host-deploy seeing that it fails doesn't call setSafeNetworkConfig).

Comment 16 Antoni Segura Puimedon 2013-07-26 19:42:08 UTC
Created attachment 778880 [details]
webui of the process of fail then magically UP

Comment 17 Antoni Segura Puimedon 2013-07-26 19:57:30 UTC
Created attachment 778884 [details]
ovirt-node-fix supervdsm.log

supervdsm.log shows clearly how the operation of setupNetworks does not do rollback due to loss of connectivity.

Comment 18 Antoni Segura Puimedon 2013-07-26 20:00:21 UTC
Created attachment 778885 [details]
ovirt-node-fix vdsm.log

Comment 19 Alon Bar-Lev 2013-07-26 20:05:34 UTC
(In reply to Antoni Segura Puimedon from comment #15)
> Well, I tested it with the fix in
> http://resources.ovirt.org/releases/beta/iso/ovirt-node-iso-3.0.0-5.1.6.vdsm.
> fc19.iso
> 
> The result is that ovirtmgmt is created and is up, but setupNetworks
> operation is marked as failed and the networks are not persisted (I guess
> that ovirt-host-deploy seeing that it fails doesn't call
> setSafeNetworkConfig).

Can I have log please?

Recent ovirt-host-deploy calls vdsm-store-net-config on success and vdsm-restore-net-config on failure.

Comment 20 Antoni Segura Puimedon 2013-07-26 20:08:07 UTC
Created attachment 778886 [details]
systemd-journald-vdsmd.log

Result of doing journalctl -b -u vdsmd

It shows that during vdmsd operation systemd is stopping and starting the daemon without any apparent reason. This was the reason that the successful setupNetworks never got the chance to return the successful message to the engine.

Comment 21 Antoni Segura Puimedon 2013-07-26 20:19:57 UTC
Created attachment 778887 [details]
journal.log

Full journal since boot to see more info of what is going on.

Comment 22 Antoni Segura Puimedon 2013-07-26 20:32:24 UTC
Created attachment 778888 [details]
json-journal.log

This log level shows much more information.

Comment 23 Dan Kenigsberg 2013-07-31 14:46:33 UTC
I do not believe there's anything to do here. Let's see if I'm wrong.

Comment 24 Itamar Heim 2013-09-23 07:34:42 UTC
closing as this should be in 3.3 (doing so in bulk, so may be incorrect)


Note You need to log in before you can comment on or make changes to this bug.