Bug 1229227

Summary: Auto register rhevh6 to rhevm3.5.3 failed using management_server=$rhevm_ip:443 during the auto-install
Product: Red Hat Enterprise Virtualization Manager Reporter: wanghui <huiwa>
Component: ovirt-nodeAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED ERRATA QA Contact: wanghui <huiwa>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.5.3CC: bazulay, cshao, cwu, danken, dougsland, fdeutsch, gklein, huiwa, leiwang, lpeer, lsurette, myakove, yaniwang, ybronhei, ycui, yeylon, ykaul
Target Milestone: ovirt-3.6.0-rc3Keywords: Reopened, TestOnly, ZStream
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-node-3.3.0-0.13.20151008git03eefb5.el7ev.noarch Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1240288 (view as bug list) Environment:
Last Closed: 2016-03-09 14:28:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1203422, 1233059, 1249396, 1249397    
Bug Blocks: 1240288    
Attachments:
Description Flags
/var/log files
none
vdsm logs none

Description wanghui 2015-06-08 10:05:50 UTC
Created attachment 1036278 [details]
/var/log files

Description of problem:
Auto register to rhevm3.5.3 failed using management_server=$rhevm_ip:443 during the auto-install. The rhevh losts all the network connection and rhevh can not be up in rhevm3.5.3. This issue can be reproduced when you wait for few minutes before you login rhevh.

Version-Release number of selected component (if applicable):
rhev-hypervisor6-6.6-20150603.0
ovirt-node-3.2.3-3.el6.noarch
ovirt-node-plugin-vdsm-0.2.0-24.el6ev.noarch
Red Hat Enterprise Virtualization Manager Version: 3.5.3.1-1.4.el6ev

How reproducible:
50%

Steps to Reproduce:
1. Auto install rhev-hypervisor6-6.6-20150603.0 with follow parameters.
   BOOTIF=eth2 storage_init=/dev/sda management_server=10.8.51.171:443 adminpw=4DHc2Jl0D05xk firstboot
2. After finished installation, wait for 5 minutes before login the rhevh.
3. Login rhevh and check the ip address.
4. Up rhevh in rhevm3.5.3.

Actual results:
1. After step3, rhevh lost the network connection. No configuration file for eth2 and rhevm.
2. Rhevh can not be up in rhevm3.5.3.

Expected results:
1. After step3, the rhevh should have the ip address for rhevm bridge.
2. After step4, the rhevh can be up in rhevm3.5.3.

Additional info:

Comment 2 Douglas Schilling Landgraf 2015-06-08 18:45:42 UTC
I can reproduce the report.

Few points:

0) Registration happens, RHEV-H is available in RHEV-M to approve. So the network, was available to communicate with Engine befo
1) The netconf link is broken:
   # ls -la /var/lib/vdsm/persistence
   netconf -> /var/lib/vdsm/persistence/netconf.1433775746165103281

2) Doesn't contain ifcfg-eth0 or ifcfg-rhevm in:
   /etc/sysconfig/network-scripts/ 

3) virsh # net-list --all

Name        State     Autostart    Persistent
;vdsmdummy; activate  no           no
default     inactive  no           yes


Some logs from supervdsm.log
================================
sourceRoute::WARNING::2015-06-08 15:02:14,364::utils::129::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/eth0 already removed
sourceRoute::DEBUG::2015-06-08 15:02:14,364::sourceroutethread::39::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1433775683
sourceRoute::INFO::2015-06-08 15:02:14,365::sourceroutethread::60::root::(process_IN_CLOSE_WRITE_filePath) interface eth0 is not a libvirt interface
sourceRoute::WARNING::2015-06-08 15:02:14,365::utils::129::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/eth0 already removed
sourceRoute::DEBUG::2015-06-08 15:02:14,365::sourceroutethread::39::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1433775688
sourceRoute::INFO::2015-06-08 15:02:14,365::sourceroute::166::root::(remove) Removing gateway - device: rhevm
sourceRoute::DEBUG::2015-06-08 15:02:14,365::utils::739::root::(execCmd) /sbin/ip rule (cwd None)
sourceRoute::DEBUG::2015-06-08 15:02:14,378::utils::759::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
sourceRoute::ERROR::2015-06-08 15:02:14,378::sourceroute::153::root::(_getRules) Routing rules not found for device rhevm
sourceRoute::DEBUG::2015-06-08 15:02:14,378::sourceroutethread::39::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1433775690
sourceRoute::INFO::2015-06-08 15:02:14,379::sourceroutethread::60::root::(process_IN_CLOSE_WRITE_filePath) interface rhevm is not a libvirt interface
sourceRoute::WARNING::2015-06-08 15:02:14,379::utils::129::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/rhevm already removed
sourceRoute::DEBUG::2015-06-08 15:02:14,379::sourceroutethread::39::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1433775693
sourceRoute::INFO::2015-06-08 15:02:14,380::sourceroutethread::60::root::(process_IN_CLOSE_WRITE_filePath) interface rhevm is not a libvirt interface
sourceRoute::WARNING::2015-06-08 15:02:14,380::utils::129::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/rhevm already removed
MainThread::DEBUG::2015-06-08 15:02:28,668::netconfpersistence::134::root::(_getConfigs) Non-existing config set.
MainThread::DEBUG::2015-06-08 15:02:28,668::netconfpersistence::134::root::(_getConfigs) Non-existing config set.
MainThread::DEBUG::2015-06-08 15:02:28,668::vdsm-restore-net-config::60::root::(unified_restoration) Removing all networks ({}) and bonds ({}) in running config.

Comment 3 Douglas Schilling Landgraf 2015-06-08 18:46:20 UTC
Created attachment 1036455 [details]
vdsm logs

Comment 6 Ying Cui 2015-06-25 12:18:30 UTC
Note that, this bug affect rhevh 6.6 for 3.5.3, and here if the bug still affect RHEVH 6.7 for rhev 3.5.4, let's consider it is a blocker. Thanks.

Comment 7 wanghui 2015-06-30 09:03:15 UTC
Still has the same issue on rhev-hypervisor6-6.7-20150609.0.iso.

Comment 8 Ido Barkan 2015-06-30 10:22:30 UTC
looking at the logs, this does not seem like VDSM's fault. At least not the network part of it. So I can already say that this will probably not be solved in 3.5.4.

But I do see that vdsm-reg is failing when trying to create a bridge. It fails because libvirt is down. Last time I saw this on rhev-h, libvirt refused to go up if there were no interfaces with IP to bind to (not sure if lo was enough for it).

I think vdsm-reg tried to connect to the engine although the bridge creation failed.

Douglas can you please take a look at /var/log/vdsm-reg/vdsm-reg.log ?

Comment 9 Fabian Deutsch 2015-06-30 10:42:07 UTC
Nice findings. Maybe bug 1235350 and th evdsm part helps to improve this.

But maybe Douglas also finds another reason why libvirtd does not come up.

Comment 10 Yaniv Lavi 2015-06-30 11:38:49 UTC
(In reply to Ido Barkan from comment #8)
> looking at the logs, this does not seem like VDSM's fault. At least not the
> network part of it. So I can already say that this will probably not be
> solved in 3.5.4.
> 
> But I do see that vdsm-reg is failing when trying to create a bridge. It
> fails because libvirt is down. Last time I saw this on rhev-h, libvirt
> refused to go up if there were no interfaces with IP to bind to (not sure if
> lo was enough for it).
> 
> I think vdsm-reg tried to connect to the engine although the bridge creation
> failed.
> 
> Douglas can you please take a look at /var/log/vdsm-reg/vdsm-reg.log ?

Can the ONBOOT=no due to the other bug affect this?

Comment 11 Dan Kenigsberg 2015-06-30 11:45:07 UTC
MainThread::DEBUG::2015-06-08 09:25:40,055::deployUtil::453::root::_getMGTIface IP=10.8.51.171 strIface=em1
MainThread::DEBUG::2015-06-08 09:25:40,056::deployUtil::1059::root::makeBridge found the following bridge paramaters: ['BOOTPROTO=dhcp', 'IPV6INIT=no', 'IPV6_AUTOCONF=no', 'ONBOOT=yes', 'PEERNTP=yes']
MainThread::DEBUG::2015-06-08 09:25:40,057::deployUtil::140::root::['/usr/share/vdsm/addNetwork', 'rhevm', '', '', 'em1', 'BOOTPROTO=dhcp', 'IPV6INIT=no', 'IPV6_AUTOCONF=no', 'ONBOOT=yes', 'PEERNTP=yes', 'blockingdhcp=true']
MainThread::DEBUG::2015-06-08 09:25:50,799::deployUtil::149::root::
MainThread::DEBUG::2015-06-08 09:25:50,803::deployUtil::150::root::libvirt: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

To me it smells as another manifestation of bug 1235591


(In reply to Yaniv Dary from comment #10)
> > Douglas can you please take a look at /var/log/vdsm-reg/vdsm-reg.log ?
> 
> Can the ONBOOT=no due to the other bug affect this?

It does not seem related at all - the error above takes place before vdsm has the chance to write ifcfg files.

Comment 12 Fabian Deutsch 2015-06-30 13:59:29 UTC
The nasty thing about bug 1235591 is, that we cna not reproduce it anymore, and thus no fix was introduced for it.

Comment 13 Yaniv Bronhaim 2015-07-02 08:29:42 UTC
Now with the libvirtd upstart script if libvirt crashes over el6 it respawns quickly so it might fix it ... although crashes are never intentional and we need to figure why it happened, but we can't proceed without seeing such issue and understand why libvirtd stopped. Again I would suggest to add libvirt debug log to check that out when we will be able to reproduce it. I don't face it with the latest image I check.

Closing this bug. If this issue is raised again please re-open quickly.

Comment 14 Yaniv Lavi 2015-07-02 09:06:04 UTC
Moving to ON_QA to make sure this is tested.

Comment 15 Yaniv Lavi 2015-07-02 09:07:56 UTC
Please provide acks, clone and move to ON_QA for testing.

Comment 16 Ying Cui 2015-07-02 09:12:51 UTC
This bug affect rhevh6,

Comment 22 wanghui 2015-10-27 08:14:08 UTC
Test version:
rhev-hypervisor7-7.2-20151025.0.el7ev
ovirt-node-3.3.0-0.18.20151022git82dc52c.el7ev.noarch
Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.18.el6

Test steps:
1. Auto install rhev-hypervisor6-6.6-20150603.0 with follow parameters.
   BOOTIF=em1 storage_init=/dev/sda management_server=10.8.51.171:443 adminpw=4DHc2Jl0D05xk firstboot
2. After finished installation, wait for 5 minutes before login the rhevh.
3. Login rhevh and check the ip address.
4. Up rhevh in rhevm3.6.0

Test result:
1. After step4, rhevh can up in rhevm3.6.0.

So this issue is fixed in ovirt-node-3.3.0-0.18.20151022git82dc52c.el7ev.noarch. Change the status to verified.

Comment 24 errata-xmlrpc 2016-03-09 14:28:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0378.html