Bug 871481 - Host connectivity can be lost when restoring backups
Summary: Host connectivity can be lost when restoring backups
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.2.0
Assignee: Antoni Segura Puimedon
QA Contact: Meni Yakove
URL:
Whiteboard: network
Depends On:
Blocks: 917401
TreeView+ depends on / blocked
 
Reported: 2012-10-30 13:59 UTC by Antoni Segura Puimedon
Modified: 2016-02-10 19:50 UTC (History)
10 users (show)

Fixed In Version: vdsm-4.10.2-10.0.el6ev
Doc Type: Release Note
Doc Text:
Previously, the vdsmd service could be restarted by the spmprotect script, which triggered an attempt to restore the host network configuration to its last known safe state. If the host lost its Storage Pool Manager role, it would lose its current network connectivity. Now, the host network configuration is only restored on boot time, not when the vdsmd service is restarted. As a result, the service vdsmd restart command does not adversely affect host networking.
Clone Of:
Environment:
Last Closed:
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
network addition python script using the ovirt/rhevm rest api. (4.65 KB, text/x-python)
2012-10-30 13:59 UTC, Antoni Segura Puimedon
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 10334 0 None None None Never

Description Antoni Segura Puimedon 2012-10-30 13:59:30 UTC
Created attachment 635600 [details]
network addition python script using the ovirt/rhevm rest api.

Description of problem: When there are a lot of networks set in the host as temporary networks, i.e., there has not been a call to setsafeconfig, a vdsmd restart can result in a few minutes long loss of connectivity.


Version-Release number of selected component (if applicable): 


How reproducible: 100%


Steps to Reproduce:
1. Create 300 VLANs using the attached script: python create_networks.py 0 300
(the script should be modified to match your datacenter, cluster, host and ethernet ids). Mind you, the addtition uses the old api, so it can take up to 33min to add all the nets.
2. Restart vdsmd doing: service vdsmd restart.
3. Wait some minutes (Up to 14min) to get connectivity back, log in the machine and see that, though extremely long, the restore has been successful.
  
Actual results: The connectivity is lost for a really long time (as long as the vlans are added to an ethernet interface with a name alphabetically precedent to 'rhevm'/'ovirmgmt'.


Expected results: Connectivity loss is only as small as taking down and up the management interface takes. The rest of the interfaces are processed afterwards.


Additional info: the script uses the requests library. To install it, to easy_install requests.

Comment 2 Dan Kenigsberg 2012-11-19 13:37:48 UTC
this is somewhat related to bug 877006: we would like to keep connectivity to storage and management networks as much as we can, even during rollback. This require the brute-force approach of `network stop` && `network start`.

Comment 3 Antoni Segura Puimedon 2012-11-19 14:02:18 UTC
A thing that might help is making the network settings dynamic (and set by the engine), as proposed by some in the mailing list and keep just the management interface persistent and as untouched as possible by restarts/restores.

Comment 4 Antoni Segura Puimedon 2013-01-15 01:55:34 UTC
With the patch http://gerrit.ovirt.org/#/c/10334/ this bug will not be reproducible on vdsmd restart.

However, it can still happen when doing a regular rollback, although less likely, as now http://gerrit.ovirt.org/#/c/9506/ only stops and starts those networks that were really modified, which not often will include the management interface, as it is probably wise to modify that one by itself and set those changes as safe separately. However, if need be, we could potentially do a trick of making the management be the last to take down and the first to take up.

Comment 6 Dan Kenigsberg 2013-02-20 19:22:11 UTC
I feel comfortable enough to backport

  http://gerrit.ovirt.org/10334
  split restore-net-conf away of vdsmd.init service

so as to avoid 80% of the cases where the problem would manifest in.

Comment 7 Meni Yakove 2013-03-03 10:41:00 UTC
Verified on vdsm-4.10.2-10.0.el6ev.x86_64.

Comment 8 Itamar Heim 2013-06-11 09:51:42 UTC
3.2 has been released

Comment 9 Itamar Heim 2013-06-11 09:51:53 UTC
3.2 has been released

Comment 10 Itamar Heim 2013-06-11 09:58:44 UTC
3.2 has been released


Note You need to log in before you can comment on or make changes to this bug.