Created attachment 645662 [details] vdsm.log + engine.log Description of problem: If user sets wrong gateway on host rhevm interface, settings do not roll back after connectivity check fails Version-Release number of selected component (if applicable): Red Hat Enterprise Virtualization Manager Version: '3.1.0-28.el6ev' Host: RHEL - 6Server - 6.3.0.3.el6 kernel 2.6.32 - 279.11.1.el6.x86_64 kvm 0.12.1.2 - 2.295.el6_3.5 vdsm-4.9.6-42.0.el6_3 How reproducible: 100% Steps to Reproduce: 1. Have working host in setup 2. Host -> your host -> network interfaces -> setup host networks 3. edit rhevm interface -> set wrong default gateway (valid IP address but not address of actual GW) -> click OK 3. tick check boxes on "Verify connectivity between Host and Engine" and Save network configuration 4. wait until error message appears in GUI and "Save network configuration" Actual results: host remains unreachable with wrong GW set Expected results: host rollbacks, connectivity restores with old settings Additional info: vdsm.log MainProcess|Thread-379::DEBUG::2012-11-15 14:46:35,588::configNetwork::1358::setupNetworks::(setupNetworks) Checking connectivity... Thread-33::WARNING::2012-11-15 14:47:31,108::remoteFileHandler::185::Storage.CrabRPCProxy::(callCrabRPCFunction) Problem with handler, treating as timeout Traceback (most recent call last): File "/usr/share/vdsm/storage/remoteFileHandler.py", line 177, in callCrabRPCFunction rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 143, in _recvAll raise Timeout() Timeout Thread-33::ERROR::2012-11-15 14:47:31,112::domainMonitor::208::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain a0d5fbad-032d-4534-841e-2bfc8d4c9af8 monitoring information Traceback (most recent call last): File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in _monitorDomain self.domain.selftest() File "/usr/share/vdsm/storage/nfsSD.py", line 137, in selftest fileSD.FileStorageDomain.selftest(self) File "/usr/share/vdsm/storage/fileSD.py", line 426, in selftest self.oop.os.statvfs(self.domaindir) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 286, in callCrabRPCFunction raise Timeout("Operation stuck on remote handler") Timeout: Operation stuck on remote handler
Created attachment 645663 [details] sosreport
Created attachment 645664 [details] ifcfg files before wrong gw
Created attachment 645665 [details] ifcfg files after wrong gw
crorrection Steps to Reproduce: 4. wait until error message appears in GUI and "Save network configuration" should be 4. wait until error message appears in GUI
The main culprit here is Thread-374::INFO::2012-11-15 14:46:24,305::logUtils::39::dispatcher::(wrapper) Run and protect: getSpmStatus, Return response: {'spm_st': {'spmId': 2, 'spmStatus': 'SPM', 'spmLver': 13}} MainThread::INFO::2012-11-15 14:48:29,010::vdsm::70::vds::(run) I am the actual vdsm 4.9-42.0 Vdsm was restarted before it managed to roll back the network config. Your host was SPM when that happened, and could not pet its lease due to the bad gw. It is not prudent to let people edit networking on such an important node - we should consider to block it in UI. Maybe we should also roll back configuration upon vdsm process startup, instead of upon the sysv service restart. This should not be done hastily; I am this lowering the severity of the bug.
The UI block thing not, but the rollback only on startup instead of vdsmd restart is fixed with http://gerrit.ovirt.org/#/c/10334/
Marin, could you check the behavior on a recent rhev-3.2, when you have a dedicated storage network? Editing the storage network would never ever work for SPM, but hopefully, nowardays, you could edit the management network.
(In reply to comment #7) > Marin, could you check the behavior on a recent rhev-3.2, when you have a > dedicated storage network? > > Editing the storage network would never ever work for SPM, but hopefully, > nowardays, you could edit the management network. With dedicated storage network rhevm (sf13.1, vdsm-4.10.2-15.0.el6ev.x86_64) bridge settings roll back correctly. Logs are atteched as log_collector2.
Created attachment 736709 [details] log_collector2
(In reply to Dan Kenigsberg from comment #5) > > Maybe we should also roll back configuration upon vdsm process startup, > instead of upon the sysv service restart. This should not be done hastily; I > am this lowering the severity of the bug. Since I've written this comment, we went to the opposite direction: rollback no longer occurs during sysv (and certainly not during process startup) but only during boot. One of the motivations for this was this flow exactly: and spm failover used to cause rollback of unrelated network changes. http://gerrit.ovirt.org/10334 The bug has become much less acute since we no longer shut off all networking when rolling back a network configurations: we shut off only the relevant networks. Thus, this bug pops up only when configuring the storage network of the spm node. The currently-remaining behavior is annoying, and would require a power cycle to fix, but I do not see any way we can avoid it --- any way besides asking users not to configure their storage network on the SPM.