Bug 979437

Summary: rhevm-setup fails and drops network
Product: Red Hat Enterprise Virtualization Manager Reporter: James Laska <jlaska>
Component: ovirt-engine-setupAssignee: Alex Lourie <alourie>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: aberezin, acathrow, alourie, aweiteka, bazulay, iheim, jkt, jturner, Rhev-m-bugs, sbonazzo
Target Milestone: ---Keywords: Regression, Triaged
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: integration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-24 07:32:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ovirt-installer-logs.tgz
none
ip a
none
ip a # From a working system none

Description James Laska 2013-06-28 14:05:15 UTC
Created attachment 766580 [details]
ovirt-installer-logs.tgz

Description of problem:

Running rhevm-setup (with allinone plugin installed) causes network to drop and rhevm-setup to fail.


Version-Release number of selected component (if applicable):
 * qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.src.rpm
 * rhev-guest-tools-iso-3.2-8.src.rpm
 * rhevm-3.2.0-11.33.el6ev.src.rpm
 * rhevm-cli-3.2.0.9-1.el6ev.src.rpm
 * rhevm-doc-3.2.0-4.el6eng.src.rpm
 * rhevm-image-uploader-3.2.2-2.el6ev.src.rpm
 * rhevm-iso-uploader-3.2.2-2.el6ev.src.rpm
 * rhevm-log-collector-3.2.2-3.el6ev.src.rpm
 * rhevm-sdk-3.2.0.11-1.el6ev.src.rpm
 * rhevm-spice-client-3.2-10.el6ev.src.rpm


How reproducible:
 * Happens every time with rhev-3.2
 * Happens never with rhev-3.1


Steps to Reproduce:
1. Install packages ...

> # yum install rhevm rhevm-setup-plugin-allinone

2. Prepare answer file...

> # cat << EOF > setup.cfg
> [general]
> OVERRIDE_HTTPD_CONFIG=yes
> HTTP_PORT=80
> HTTPS_PORT=443
> RANDOM_PASSWORDS=no
> MAC_RANGE=00:1A:4A:A8:7A:00-00:1A:4A:A8:7A:FF
> HOST_FQDN=ibm-x3250m4-06.lab.bos.redhat.com
> AUTH_PASS=redhat
> ORG_NAME=lab.bos.redhat.com
> DC_TYPE=NFS
> DB_REMOTE_INSTALL=local
> DB_HOST=
> DB_PORT=5432
> DB_ADMIN=postgres
> DB_REMOTE_PASS=redhat
> DB_SECURE_CONNECTION=no
> DB_LOCAL_PASS=redhat
> NFS_MP=/var/lib/exports/iso
> ISO_DOMAIN_NAME=ISO_DOMAIN
> CONFIG_NFS=yes
> OVERRIDE_IPTABLES=yes
> OVERRIDE_FIREWALL=iptables
> CONFIG_ALLINONE=yes
> STORAGE_PATH=/data
> SUPERUSER_PASS=redhat
> EOF

3. Setup RHEV

> # rhevm-setup --answer-file setup.cfg

Actual results:

> # rhevm-setup --answer-file setup.cfg
> Welcome to RHEV Manager setup utility
>  Warning: Weak Password.
> Warning: Weak Password.
> 
> Installing:
> AIO: Validating CPU Compatibility...                              [ DONE ]
> AIO: Adding firewall rules...                                     [ DONE ]
> Configuring RHEV Manager...                                       [ DONE ]
> Configuring JVM...                                                [ DONE ]
> Creating CA...                                                    [ DONE ]
> Updating ovirt-engine service...                                  [ DONE ]
> Setting Database Configuration...                                 [ DONE ]
> Setting Database Security...                                      [ DONE ]
> Creating Database...                                              [ DONE ]
> Updating the Default Data Center Storage Type...                  [ DONE ]
> Editing RHEV Manager Configuration...                             [ DONE ]
> Editing Postgresql Configuration...                               [ DONE ]
> Configuring the Default ISO Domain...                             [ DONE ]
> Configuring Firewall...                                           [ DONE ]
> Starting ovirt-engine Service...                                  [ DONE ]
> Configuring HTTPD...                                              [ DONE ]
> AIO: Creating storage directory...                                [ DONE ]
> AIO: Adding Local Datacenter and cluster...                       [ DONE ]
> AIO: Adding Local host (This may take several minutes)...      [ ERROR ]
> [ERROR]::oVirt API connection failure, [Errno -2] Name or service not known
> Please check log file /var/log/ovirt-engine/engine-setup_2013_06_28_08_52_50.log for more information
> Write failed: Broken pipe

Expected results:

It should work

Additional info:

See attached system logs:

> -rw------- root/root    183591 2013-06-28 09:46 var/log/messages
> drwxr-xr-x ovirt/ovirt       0 2013-06-28 09:35 var/log/ovirt-engine/
> -rw-r--r-- ovirt/ovirt   87650 2013-06-28 09:47 var/log/ovirt-engine/engine.log
> -rw-r--r-- root/root         0 2013-06-28 08:54 var/log/ovirt-engine/engine-config.log
> -rw-r----- ovirt/ovirt       0 2013-06-28 08:54 var/log/ovirt-engine/console.log
> -rw-r--r-- root/root     21927 2013-06-28 08:54 var/log/ovirt-engine/engine-db-install-2013_06_28_08_53_02.log
> -rw-r--r-- ovirt/ovirt   25134 2013-06-28 08:55 var/log/ovirt-engine/server.log
> -rw-r--r-- ovirt/ovirt     430 2013-06-28 08:54 var/log/ovirt-engine/boot.log
> drwxr-xr-x ovirt/ovirt       0 2013-06-28 08:55 var/log/ovirt-engine/host-deploy/
> -rw-r--r-- ovirt/ovirt  193237 2013-06-28 08:55 var/log/ovirt-engine/host-deploy/ovirt-20130628085542-ibm-x3250m4-06.lab.bos.redhat.com-38137e4.log
> -rw-r--r-- root/root    133715 2013-06-28 09:05 var/log/ovirt-engine/engine-setup_2013_06_28_08_52_50.log
> drwxr-xr-x ovirt/ovirt       0 2013-06-18 14:10 var/log/ovirt-engine/notifier/
> drwxr-xr-x vdsm/kvm          0 2013-06-28 08:55 var/log/vdsm/
> drwxr-xr-x vdsm/kvm          0 2013-05-27 11:00 var/log/vdsm/backup/
> -rw-r--r-- root/root       898 2013-06-28 08:55 var/log/vdsm/supervdsm.log
> -rw-r--r-- vdsm/kvm          0 2013-06-28 08:55 var/log/vdsm/metadata.log
> -rw-r--r-- vdsm/kvm      21405 2013-06-28 08:56 var/log/vdsm/vdsm.log

Comment 1 Alex Lourie 2013-07-17 11:41:36 UTC
@James

1. The AIO failed to finish the installation because

2. The network goes through some problems:


Jun 28 08:55:42 ibm-x3250m4-06 kernel: Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
Jun 28 08:55:42 ibm-x3250m4-06 kernel: bonding: bond4 is being created...
Jun 28 08:55:42 ibm-x3250m4-06 kernel: bonding: bond1 is being created...
Jun 28 08:55:42 ibm-x3250m4-06 kernel: bonding: bond2 is being created...
Jun 28 08:55:42 ibm-x3250m4-06 kernel: bonding: bond3 is being created...
Jun 28 08:55:43 ibm-x3250m4-06 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Jun 28 08:55:43 ibm-x3250m4-06 kernel: 8021q: adding VLAN 0 to HW filter on device eth1
Jun 28 08:55:43 ibm-x3250m4-06 multipathd: force queue_without_daemon (operator)
Jun 28 08:55:43 ibm-x3250m4-06 multipathd: --------shut down-------
Jun 28 08:55:43 ibm-x3250m4-06 kernel: device-mapper: multipath round-robin: version 1.0.0 loaded
Jun 28 08:55:43 ibm-x3250m4-06 kernel: device-mapper: table: 253:2: multipath: error getting device
Jun 28 08:55:43 ibm-x3250m4-06 kernel: device-mapper: ioctl: error adding target to table
Jun 28 08:55:43 ibm-x3250m4-06 kernel: device-mapper: table: 253:2: multipath: error getting device
Jun 28 08:55:43 ibm-x3250m4-06 kernel: device-mapper: ioctl: error adding target to table
Jun 28 08:55:43 ibm-x3250m4-06 multipathd: 1ATA_ST500NM0011_39M4517_42C0468IBM_Z1M11JGD: ignoring map
Jun 28 08:55:43 ibm-x3250m4-06 kernel: device-mapper: table: 253:2: multipath: error getting device
Jun 28 08:55:43 ibm-x3250m4-06 kernel: device-mapper: ioctl: error adding target to table
Jun 28 08:55:43 ibm-x3250m4-06 kernel: device-mapper: table: 253:2: multipath: error getting device
Jun 28 08:55:43 ibm-x3250m4-06 kernel: device-mapper: ioctl: error adding target to table
Jun 28 08:55:43 ibm-x3250m4-06 multipathd: 1ATA_ST500NM0011_39M4517_42C0468IBM_Z1M12LMF: ignoring map
Jun 28 08:55:43 ibm-x3250m4-06 multipathd: path checkers start up
Jun 28 08:55:45 ibm-x3250m4-06 dhclient[15585]: DHCPDISCOVER on usb0 to 255.255.255.255 port 67 interval 8 (xid=0x1fe6e1e4)
Jun 28 08:55:45 ibm-x3250m4-06 dhclient[15585]: DHCPOFFER from 169.254.95.118


Looks like from that moment on, there's no network on the system, hence AIO fails and SSH connection drops.

Can you please provide more details on the systems hardware? Can you try running the setup on a different machine?

Thanks.

Comment 2 James Laska 2013-07-22 12:43:41 UTC
Thanks for your feedback!  

> Can you please provide more details on the systems hardware? 

Certainly, the hardware I've verified the failure on all has...
 * RAM: 16G
 * DISK: 2T
 * CPU: 1 x Intel(R) Xeon(R) CPU E3-1220 V2 (quad-core)
 * NIC: Ethernet 1 Intel 82574L Ethernet Controller

> Can you try running the setup on a different machine?

I've reproduced this problem on several different systems in beaker.  The problem occurs while rhevm-setup is running, and before completion.  No other workflow on the system is running.  Perhaps the all-in-one configuration package may be a contributing cause?

Comment 3 Alex Lourie 2013-07-22 14:17:45 UTC
Hi James

Thanks for the info.

Would you mind please showing me the output of the 'ip a' command?

Thanks.

Comment 4 James Laska 2013-07-22 20:09:51 UTC
Created attachment 777051 [details]
ip a

(In reply to Alex Lourie from comment #3)
> Would you mind please showing me the output of the 'ip a' command?

Not at all, please see attached outputl.

Comment 5 James Laska 2013-07-31 13:40:02 UTC
Interestingly enough, I believe I have 2 classes of hardware that I'ev tested rhevm-setup on.  One class works, the other doesn't and triggered this bug report.

== Works ==
 * https://beaker.engineering.redhat.com/view/qeblade40.rhq.lab.eng.bos.redhat.com
>   # lspci | grep Ethernet
>   03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>   03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>   04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>   04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
>   08:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
>   08:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
>   09:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
>   09:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)

== Doesn't work ==
 * https://beaker.engineering.redhat.com/view/ibm-x3250m4-01.lab.bos.redhat.com
>   # lspci | grep Ethernet
>   06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
>   0b:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

Comment 6 James Laska 2013-07-31 13:40:55 UTC
Created attachment 781135 [details]
ip a # From a working system

Attaching `ip a` output from qeblade40, a system that runs rhevm-setup with no difficulty.

Comment 7 James Laska 2013-07-31 17:10:57 UTC
> 10:37:04  alourie: is there a way you add this problematic machine as a host to another engine?
> 10:37:21  alourie: I want to see whether it gets configured correctly

At the suggestion of alourie, I was able to deploy a RHEL host on the same hardware used to trigger this bug, and add it as a host to existing RHEV-M instance.  I didn't encounter any issues.  Please note, that the all-in-one setup didn't allow for me to add the new host to the 'local_datacenter'.  I therefore added the new host to the 'Default' datacenter.

The RHEV-M system is available for inspection at https://qeblade40.rhq.lab.eng.bos.redhat.com (admin/redhat)

Comment 8 Alex Lourie 2013-08-01 13:06:07 UTC
@james

Was this the ibm-3250 machine that you added to the engine?

Comment 9 James Laska 2013-08-01 13:57:27 UTC
(In reply to Alex Lourie from comment #8)
> Was this the ibm-3250 machine that you added to the engine?

It was, yes.

Comment 10 Sandro Bonazzola 2013-08-06 11:05:18 UTC
Can you attach host-deploy, engine and server logs?

Comment 11 Alex Lourie 2013-08-07 08:44:58 UTC
Sandro

The logs are attached as an archive file.

Comment 12 James Laska 2013-08-22 16:40:39 UTC
Any updates?  We seem to be able to trigger this problem without difficulty on a specific class of hardware.

Comment 13 Alex Lourie 2013-08-25 22:03:49 UTC
James

We are still investigating.

Comment 16 Alex Lourie 2013-09-01 20:50:55 UTC
James

Could you please test it with the latest 3.3 build? We changed a lot of logic in the code for that version, I want to know whether it works on this system.

Thanks.

Comment 19 Alex Lourie 2013-09-23 15:08:42 UTC
Suggesting to close this issue due to missing information.

Comment 20 Sandro Bonazzola 2013-09-24 07:32:24 UTC
(In reply to Alex Lourie from comment #19)
> Suggesting to close this issue due to missing information.

I agree. Please reopen if you're able to reproduce the issue with the latest builds.