Bug 1546839 - SHE Node Zero deployment over Gluster failed on RHEL7.5.
Summary: SHE Node Zero deployment over Gluster failed on RHEL7.5.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 2.2.8
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ovirt-4.2.2
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1534483 1538630
Blocks: 1539734
TreeView+ depends on / blocked
 
Reported: 2018-02-19 18:11 UTC by Nikolai Sednev
Modified: 2018-04-01 10:07 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-03-29 11:02:55 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
engine logs (9.23 MB, application/x-xz)
2018-02-19 18:11 UTC, Nikolai Sednev
no flags Details
alma03 logs (9.64 MB, application/x-xz)
2018-02-19 18:12 UTC, Nikolai Sednev
no flags Details

Description Nikolai Sednev 2018-02-19 18:11:40 UTC
Created attachment 1397975 [details]
engine logs

Description of problem:
I've used part of FQDN of the engine as engine admin's password.
Ansible deployment of SHE on RHEL7.5 over Gluster on these componets failed:

[ INFO  ] TASK [Get ovirtmgmt route table id]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, "cmd": "ip rule list | grep ovirtmgmt | sed s/\\\\[.*\\\\]\\ //g | awk '{ print $9 }'", "delta": "0:00:00.011787", "end": "2018-02-19 19:53:09.673841", "rc": 0, "start": "2018-02-19 19:53:09.662054", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] ok: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180219195317.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180219193322-xb8oaq.log
[root@alma03 ~]# date
Mon Feb 19 19:54:11 IST 2018

Version-Release number of selected component (if applicable):
ovirt-engine-setup-4.2.1.5-0.1.el7.noarch
ovirt-hosted-engine-ha-2.2.5-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.10-1.el7ev.noarch
rhvm-appliance.noarch 2:4.2-20180202.0.el7
Red Hat Enterprise Linux Server release 7.5 Beta (Maipo)
Linux 3.10.0-829.el7.x86_64 #1 SMP Tue Jan 9 23:06:01 EST 2018 x86_64 x86_64 x86_64 GNU/Linux

alma03 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.35.95.254    0.0.0.0         UG    100    0        0 enp5s0f0
10.35.92.0      0.0.0.0         255.255.252.0   U     100    0        0 enp5s0f0
192.168.122.0   0.0.0.0         255.255.255.0   U     0      0        0 virbr0

alma03 ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
virbr0          8000.525400a85bc3       yes             virbr0-nic
                                                        vnet0
alma03 ~]# ip rule list
0:      from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default 

alma03 ~]# arp -a
vm-93-254.qa.lab.tlv.redhat.com (10.35.95.254) at 9c:cc:83:52:81:60 [ether] on enp5s0f0
alma04.qa.lab.tlv.redhat.com (10.35.92.4) at a0:36:9f:3b:16:7c [ether] on enp5s0f0
nsednev-he-1.qa.lab.tlv.redhat.com (192.168.122.167) at 00:16:3e:7b:b8:53 [ether] on virbr0

nsednev-he-1 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.122.1   0.0.0.0         UG    100    0        0 eth0
192.168.122.0   0.0.0.0         255.255.255.0   U     100    0        0 eth0

nsednev-he-1 ~]# ip rule list
0:      from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default 

nsednev-he-1 ~]# arp -a
gateway (192.168.122.1) at 52:54:00:a8:5b:c3 [ether] on eth0

How reproducible:
100%

Steps to Reproduce:
1.Deploy SHE Node 0 over Gluster on RHEL7.5 and provide engine admin's password as part of SHE's FQDN, e.g. nsednev-he-1.qa.lab.tlv.redhat.com , so admin's password be nsednev.

Actual results:
Deployment failed with errors described above.

Expected results:
Deployment should succeed.

Additional info:
sosreports from engine and host attached.

Comment 1 Nikolai Sednev 2018-02-19 18:12:48 UTC
Created attachment 1397976 [details]
alma03 logs

Comment 2 Nikolai Sednev 2018-02-19 18:44:21 UTC
The problem disappeared after running rhese, on host, prior to deployment:
systemctl stop NetworkManager
systemctl restart network

Pretty the same as was made here https://bugzilla.redhat.com/show_bug.cgi?id=1540451#c16.

Comment 3 Nikolai Sednev 2018-02-20 09:59:04 UTC
I was able to successfully deploy Node 0 on RHEL7.5, over Gluster, after stopping NetworkManager and restarting the network on host, before making SHE deployment.

Comment 4 Simone Tiraboschi 2018-02-20 10:05:10 UTC
This seams relevant:
Feb 19 19:44:53 alma03 vdsm-tool: Traceback (most recent call last):
Feb 19 19:44:53 alma03 vdsm-tool: File "/usr/bin/vdsm-tool", line 219, in main
Feb 19 19:44:53 alma03 vdsm-tool: return tool_command[cmd]["command"](*args)
Feb 19 19:44:53 alma03 vdsm-tool: File "/usr/lib/python2.7/site-packages/vdsm/tool/network.py", line 94, in dump_bonding_options
Feb 19 19:44:53 alma03 vdsm-tool: sysfs_options_mapper.dump_bonding_options()
Feb 19 19:44:53 alma03 vdsm-tool: File "/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py", line 50, in dump_bonding_options
Feb 19 19:44:53 alma03 vdsm-tool: jdump(_get_bonding_options_name2numeric(), f)
Feb 19 19:44:53 alma03 vdsm-tool: File "/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py", line 90, in _get_bonding_options_name2numeric
Feb 19 19:44:53 alma03 vdsm-tool: with _bond_device(bond_name, mode):
Feb 19 19:44:53 alma03 vdsm-tool: File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
Feb 19 19:44:53 alma03 vdsm-tool: return self.gen.next()
Feb 19 19:44:53 alma03 vdsm-tool: File "/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py", line 102, in _bond_device
Feb 19 19:44:53 alma03 vdsm-tool: _change_mode(bond_name, mode)
Feb 19 19:44:53 alma03 vdsm-tool: File "/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py", line 112, in _change_mode
Feb 19 19:44:53 alma03 vdsm-tool: opt.write(mode)
Feb 19 19:44:53 alma03 vdsm-tool: IOError: [Errno 22] Invalid argument
Feb 19 19:44:53 alma03 NetworkManager[579]: <info>  [1519062293.6920] manager: (bondscan-irTVhQ): new Bond device (/org/freedesktop/NetworkManager/Devices/29)
Feb 19 19:44:53 alma03 iscsid: iSCSI daemon with pid=93652 started!
Feb 19 19:44:53 alma03 systemd: vdsm-network-init.service: main process exited, code=exited, status=1/FAILURE
Feb 19 19:44:53 alma03 systemd: Failed to start Virtual Desktop Server Manager network IP+link restoration.
Feb 19 19:44:53 alma03 systemd: Dependency failed for Virtual Desktop Server Manager network restoration.
Feb 19 19:44:53 alma03 systemd: Dependency failed for Virtual Desktop Server Manager.
Feb 19 19:44:53 alma03 systemd: Dependency failed for MOM instance configured for VDSM purposes.
Feb 19 19:44:53 alma03 systemd: Job mom-vdsm.service/start failed with result 'dependency'.
Feb 19 19:44:53 alma03 systemd: Job vdsmd.service/start failed with result 'dependency'.
Feb 19 19:44:53 alma03 systemd: Job vdsm-network.service/start failed with result 'dependency'.
Feb 19 19:44:53 alma03 systemd: Unit vdsm-network-init.service entered failed state.
Feb 19 19:44:53 alma03 systemd: vdsm-network-init.service failed.

Comment 5 Dan Kenigsberg 2018-02-21 08:51:55 UTC
please retry with newer kernel.

Comment 6 Nikolai Sednev 2018-02-21 12:39:03 UTC
We still have the same kernel version as was reported within the description:
Linux 3.10.0-829.el7.x86_64 #1 SMP Tue Jan 9 23:06:01 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
Why this bug had been moved to ON_QA?
Did I missed something or we have an old kernel?

Comment 7 Simone Tiraboschi 2018-02-21 12:45:20 UTC
it should be >= -851 according to https://bugzilla.redhat.com/show_bug.cgi?id=1540451#c19

-829 is not >= -851

Comment 8 Nikolai Sednev 2018-02-21 14:01:09 UTC
Works for me on these components:
ovirt-hosted-engine-ha-2.2.5-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.10-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-851.el7.x86_64 #1 SMP Mon Feb 12 07:53:52 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 Beta (Maipo)

Comment 9 Sandro Bonazzola 2018-03-29 11:02:55 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.