Created attachment 1036278 [details] /var/log files Description of problem: Auto register to rhevm3.5.3 failed using management_server=$rhevm_ip:443 during the auto-install. The rhevh losts all the network connection and rhevh can not be up in rhevm3.5.3. This issue can be reproduced when you wait for few minutes before you login rhevh. Version-Release number of selected component (if applicable): rhev-hypervisor6-6.6-20150603.0 ovirt-node-3.2.3-3.el6.noarch ovirt-node-plugin-vdsm-0.2.0-24.el6ev.noarch Red Hat Enterprise Virtualization Manager Version: 3.5.3.1-1.4.el6ev How reproducible: 50% Steps to Reproduce: 1. Auto install rhev-hypervisor6-6.6-20150603.0 with follow parameters. BOOTIF=eth2 storage_init=/dev/sda management_server=10.8.51.171:443 adminpw=4DHc2Jl0D05xk firstboot 2. After finished installation, wait for 5 minutes before login the rhevh. 3. Login rhevh and check the ip address. 4. Up rhevh in rhevm3.5.3. Actual results: 1. After step3, rhevh lost the network connection. No configuration file for eth2 and rhevm. 2. Rhevh can not be up in rhevm3.5.3. Expected results: 1. After step3, the rhevh should have the ip address for rhevm bridge. 2. After step4, the rhevh can be up in rhevm3.5.3. Additional info:
I can reproduce the report. Few points: 0) Registration happens, RHEV-H is available in RHEV-M to approve. So the network, was available to communicate with Engine befo 1) The netconf link is broken: # ls -la /var/lib/vdsm/persistence netconf -> /var/lib/vdsm/persistence/netconf.1433775746165103281 2) Doesn't contain ifcfg-eth0 or ifcfg-rhevm in: /etc/sysconfig/network-scripts/ 3) virsh # net-list --all Name State Autostart Persistent ;vdsmdummy; activate no no default inactive no yes Some logs from supervdsm.log ================================ sourceRoute::WARNING::2015-06-08 15:02:14,364::utils::129::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/eth0 already removed sourceRoute::DEBUG::2015-06-08 15:02:14,364::sourceroutethread::39::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1433775683 sourceRoute::INFO::2015-06-08 15:02:14,365::sourceroutethread::60::root::(process_IN_CLOSE_WRITE_filePath) interface eth0 is not a libvirt interface sourceRoute::WARNING::2015-06-08 15:02:14,365::utils::129::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/eth0 already removed sourceRoute::DEBUG::2015-06-08 15:02:14,365::sourceroutethread::39::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1433775688 sourceRoute::INFO::2015-06-08 15:02:14,365::sourceroute::166::root::(remove) Removing gateway - device: rhevm sourceRoute::DEBUG::2015-06-08 15:02:14,365::utils::739::root::(execCmd) /sbin/ip rule (cwd None) sourceRoute::DEBUG::2015-06-08 15:02:14,378::utils::759::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0 sourceRoute::ERROR::2015-06-08 15:02:14,378::sourceroute::153::root::(_getRules) Routing rules not found for device rhevm sourceRoute::DEBUG::2015-06-08 15:02:14,378::sourceroutethread::39::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1433775690 sourceRoute::INFO::2015-06-08 15:02:14,379::sourceroutethread::60::root::(process_IN_CLOSE_WRITE_filePath) interface rhevm is not a libvirt interface sourceRoute::WARNING::2015-06-08 15:02:14,379::utils::129::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/rhevm already removed sourceRoute::DEBUG::2015-06-08 15:02:14,379::sourceroutethread::39::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1433775693 sourceRoute::INFO::2015-06-08 15:02:14,380::sourceroutethread::60::root::(process_IN_CLOSE_WRITE_filePath) interface rhevm is not a libvirt interface sourceRoute::WARNING::2015-06-08 15:02:14,380::utils::129::root::(rmFile) File: /var/run/vdsm/trackedInterfaces/rhevm already removed MainThread::DEBUG::2015-06-08 15:02:28,668::netconfpersistence::134::root::(_getConfigs) Non-existing config set. MainThread::DEBUG::2015-06-08 15:02:28,668::netconfpersistence::134::root::(_getConfigs) Non-existing config set. MainThread::DEBUG::2015-06-08 15:02:28,668::vdsm-restore-net-config::60::root::(unified_restoration) Removing all networks ({}) and bonds ({}) in running config.
Created attachment 1036455 [details] vdsm logs
Note that, this bug affect rhevh 6.6 for 3.5.3, and here if the bug still affect RHEVH 6.7 for rhev 3.5.4, let's consider it is a blocker. Thanks.
Still has the same issue on rhev-hypervisor6-6.7-20150609.0.iso.
looking at the logs, this does not seem like VDSM's fault. At least not the network part of it. So I can already say that this will probably not be solved in 3.5.4. But I do see that vdsm-reg is failing when trying to create a bridge. It fails because libvirt is down. Last time I saw this on rhev-h, libvirt refused to go up if there were no interfaces with IP to bind to (not sure if lo was enough for it). I think vdsm-reg tried to connect to the engine although the bridge creation failed. Douglas can you please take a look at /var/log/vdsm-reg/vdsm-reg.log ?
Nice findings. Maybe bug 1235350 and th evdsm part helps to improve this. But maybe Douglas also finds another reason why libvirtd does not come up.
(In reply to Ido Barkan from comment #8) > looking at the logs, this does not seem like VDSM's fault. At least not the > network part of it. So I can already say that this will probably not be > solved in 3.5.4. > > But I do see that vdsm-reg is failing when trying to create a bridge. It > fails because libvirt is down. Last time I saw this on rhev-h, libvirt > refused to go up if there were no interfaces with IP to bind to (not sure if > lo was enough for it). > > I think vdsm-reg tried to connect to the engine although the bridge creation > failed. > > Douglas can you please take a look at /var/log/vdsm-reg/vdsm-reg.log ? Can the ONBOOT=no due to the other bug affect this?
MainThread::DEBUG::2015-06-08 09:25:40,055::deployUtil::453::root::_getMGTIface IP=10.8.51.171 strIface=em1 MainThread::DEBUG::2015-06-08 09:25:40,056::deployUtil::1059::root::makeBridge found the following bridge paramaters: ['BOOTPROTO=dhcp', 'IPV6INIT=no', 'IPV6_AUTOCONF=no', 'ONBOOT=yes', 'PEERNTP=yes'] MainThread::DEBUG::2015-06-08 09:25:40,057::deployUtil::140::root::['/usr/share/vdsm/addNetwork', 'rhevm', '', '', 'em1', 'BOOTPROTO=dhcp', 'IPV6INIT=no', 'IPV6_AUTOCONF=no', 'ONBOOT=yes', 'PEERNTP=yes', 'blockingdhcp=true'] MainThread::DEBUG::2015-06-08 09:25:50,799::deployUtil::149::root:: MainThread::DEBUG::2015-06-08 09:25:50,803::deployUtil::150::root::libvirt: XML-RPC error : Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory To me it smells as another manifestation of bug 1235591 (In reply to Yaniv Dary from comment #10) > > Douglas can you please take a look at /var/log/vdsm-reg/vdsm-reg.log ? > > Can the ONBOOT=no due to the other bug affect this? It does not seem related at all - the error above takes place before vdsm has the chance to write ifcfg files.
The nasty thing about bug 1235591 is, that we cna not reproduce it anymore, and thus no fix was introduced for it.
Now with the libvirtd upstart script if libvirt crashes over el6 it respawns quickly so it might fix it ... although crashes are never intentional and we need to figure why it happened, but we can't proceed without seeing such issue and understand why libvirtd stopped. Again I would suggest to add libvirt debug log to check that out when we will be able to reproduce it. I don't face it with the latest image I check. Closing this bug. If this issue is raised again please re-open quickly.
Moving to ON_QA to make sure this is tested.
Please provide acks, clone and move to ON_QA for testing.
This bug affect rhevh6,
Test version: rhev-hypervisor7-7.2-20151025.0.el7ev ovirt-node-3.3.0-0.18.20151022git82dc52c.el7ev.noarch Red Hat Enterprise Virtualization Manager Version: 3.6.0-0.18.el6 Test steps: 1. Auto install rhev-hypervisor6-6.6-20150603.0 with follow parameters. BOOTIF=em1 storage_init=/dev/sda management_server=10.8.51.171:443 adminpw=4DHc2Jl0D05xk firstboot 2. After finished installation, wait for 5 minutes before login the rhevh. 3. Login rhevh and check the ip address. 4. Up rhevh in rhevm3.6.0 Test result: 1. After step4, rhevh can up in rhevm3.6.0. So this issue is fixed in ovirt-node-3.3.0-0.18.20151022git82dc52c.el7ev.noarch. Change the status to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0378.html