Bug 1364941 - lldpad.socket are not running after adding rhvh4.0 to rhvm4.0
Summary: lldpad.socket are not running after adding rhvh4.0 to rhvm4.0
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.18.10
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.0.5
: 4.18.13
Assignee: Dan Kenigsberg
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On: 1353456
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-08 09:16 UTC by dguo
Modified: 2017-01-18 08:42 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-18 07:39:57 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-4.0.z+
rule-engine: planning_ack+
rule-engine: devel_ack+
acanan: testing_ack+


Attachments (Terms of Use)
Output of "journal -xe" (157.45 KB, text/x-vhdl)
2016-08-09 01:55 UTC, dguo
no flags Details
logs and 'journalctl -xe' output (17.52 MB, application/x-gzip)
2016-09-08 14:08 UTC, Elad
no flags Details
logs-11.9.16 (1.05 MB, application/x-gzip)
2016-09-11 08:44 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 62365 0 ovirt-4.0.2 ABANDONED hook-fcoe: enable lldpad and fceo service on boot 2020-03-26 02:36:53 UTC

Description dguo 2016-08-08 09:16:26 UTC
Description of problem:
lldpad.socket are not running after adding rhvh4.0 to rhvm4.0

Version-Release number of selected component (if applicable):
redhat-virtualization-host-4.0-20160803.3.x86_64
imgbased-0.7.4-0.1.el7ev.noarch
vdsm-4.18.10-1.el7ev.x86_64
Red Hat Virtualization Manager Version: 4.0.2.4-0.1.el7ev

How reproducible:
100%


Steps to Reproduce:
1. Install RHVH4.0, then add to RHVM 4.0
2. Check the below three service status in rhvh after the rhvh had been added to rhvm
#systemctl status fcoe.service
#systemctl status lldpad.socket
#systemctl status lldpad.service


Actual results:
1. After step1, lldpad.socket is not running

Expected results:
1. After step1, the three services should be running 


Additional info:
[root@dhcp-10-31 ~]# systemctl status fcoe.service
● fcoe.service - Open-FCoE Inititator.
   Loaded: loaded (/usr/lib/systemd/system/fcoe.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2016-08-05 17:13:03 CST; 2 days ago
 Main PID: 13900 (fcoemon)
   CGroup: /system.slice/fcoe.service
           └─13900 /usr/sbin/fcoemon --syslog

Aug 05 17:13:03 dhcp-10-31.nay.redhat.com systemd[1]: Starting Open-FCoE Inititator....
Aug 05 17:13:03 dhcp-10-31.nay.redhat.com systemd[1]: Started Open-FCoE Inititator..
[root@dhcp-10-31 ~]# systemctl status lldpad.socket
● lldpad.socket
   Loaded: loaded (/usr/lib/systemd/system/lldpad.socket; disabled; vendor preset: disabled)
   Active: inactive (dead)
   Listen: @/com/intel/lldpad (Datagram)
[root@dhcp-10-31 ~]# systemctl status lldpad.service
● lldpad.service - Link Layer Discovery Protocol Agent Daemon.
   Loaded: loaded (/usr/lib/systemd/system/lldpad.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2016-08-05 17:13:03 CST; 2 days ago
 Main PID: 13880 (lldpad)
   CGroup: /system.slice/lldpad.service
           └─13880 /usr/sbin/lldpad -t

Aug 05 17:13:03 dhcp-10-31.nay.redhat.com systemd[1]: Started Link Layer Discovery Protocol Agent Daemon..
Aug 05 17:13:03 dhcp-10-31.nay.redhat.com systemd[1]: Starting Link Layer Discovery Protocol Agent Daemon....
[root@dhcp-10-31 ~]# systemctl start lldpad.socket
Job for lldpad.socket failed. See "systemctl status lldpad.socket" and "journalctl -xe" for details.
Aug 08 14:26:36 dhcp-10-31.nay.redhat.com sshd[11587]: pam_unix(sshd:session): session opened for user root by (uid=0)
Aug 08 15:06:04 dhcp-10-31.nay.redhat.com polkitd[1116]: Registered Authentication Agent for unix-process:11935:25183438 (system bus name :1.79 [/usr/bin/pkttyagent --notify-fd 6 --fall
Aug 08 15:06:04 dhcp-10-31.nay.redhat.com systemd[1]: Socket service lldpad.service already active, refusing.
Aug 08 15:06:04 dhcp-10-31.nay.redhat.com systemd[1]: Failed to listen on lldpad.socket.
-- Subject: Unit lldpad.socket has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit lldpad.socket has failed.
--

Comment 1 Dan Kenigsberg 2016-08-08 12:16:45 UTC
dguo, can you supply your `journalctl -xe`?

Ryan, have we not checked that these services are running after boot on ngn?

Comment 2 dguo 2016-08-09 01:55:43 UTC
Created attachment 1188992 [details]
Output of "journal -xe"

As the requirement, upload the log of "journal -xe"

Comment 4 Red Hat Bugzilla Rules Engine 2016-08-09 10:04:35 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 5 Fabian Deutsch 2016-08-09 10:19:00 UTC
Dan, do you see the impact of this bug?
Will lldapd.service maybe be started by the hook?

Comment 6 Yaniv Lavi 2016-08-09 15:35:40 UTC
It should be started by hook.

Comment 7 Ryan Barry 2016-08-10 12:06:52 UTC
(In reply to Dan Kenigsberg from comment #1)
> dguo, can you supply your `journalctl -xe`?
> 
> Ryan, have we not checked that these services are running after boot on ngn?

No, these are not running after installation.

I would think that the hook would start them. We can start these as part of node if that isn't the case, though.

Comment 8 Red Hat Bugzilla Rules Engine 2016-08-10 13:01:44 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 9 Dan Kenigsberg 2016-08-16 06:58:20 UTC
Elad, you have tested the hook on NGN, right? If you configure the networks to use fcoe, and reboot the system, do the services start properly?

Comment 10 Elad 2016-08-16 07:35:13 UTC
I Did not use  NGN in my tests.
After reboot, while the networks are configured to have FCoE using the hook, the services started properly. The hook configured them to be enabled.

Comment 11 Dan Kenigsberg 2016-08-16 11:11:12 UTC
Elad, would you recheck that with a recent ovirt-ngn-4.0.2? this should be included in storage coverage tests.

If it fails, we'd need to quickly take https://gerrit.ovirt.org/#/c/62365/

Comment 12 Aharon Canan 2016-08-16 11:25:31 UTC
(In reply to Dan Kenigsberg from comment #11)
> Elad, would you recheck that with a recent ovirt-ngn-4.0.2? this should be
> included in storage coverage tests.
> 
> If it fails, we'd need to quickly take https://gerrit.ovirt.org/#/c/62365/

It will take us some time as we are focusing on 4.0 GA, we will be able to recheck after, 
I see it is targeted to 4.0.4 anyway...

Comment 13 Dan Kenigsberg 2016-08-16 14:59:38 UTC
Aharon, testing FCoE should be an integral part of 4.0 GA testing. Can you make sure that this is so?

Comment 14 Aharon Canan 2016-08-17 07:17:09 UTC
(In reply to Dan Kenigsberg from comment #13)
> Aharon, testing FCoE should be an integral part of 4.0 GA testing. Can you
> make sure that this is so?

We already tested it with 4.0 , as this is manual testing like other features, I am not sure we will rerun.
In case we will we will consider this issue of course.

Comment 15 Dan Kenigsberg 2016-08-17 09:48:22 UTC
Aharon, if FCoE on NGN was tested I'm cool. But this bug suggest that we actually have a problem, and Elad said

(In reply to Elad from comment #10)
> I Did not use  NGN in my tests.

so I'm a bit confused.

Comment 16 Aharon Canan 2016-08-17 10:31:43 UTC
We tested FCOE exactly like we all agreed, no one asked for NGN back then.
Everyone also confirm the verification (detailed verification information on [1])

Testing against NGN wasn't part of the plan.
As for now, this issue is targeted to 4.0.4 and will be tested as part of 4.0.4 testing.

If you need it for 4.0 GA please set the relevant target versions and lets scrub it.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1334745#c2

Not sure but please check if https://bugzilla.redhat.com/show_bug.cgi?id=1353456 is related (You asked about it on comment #9)

Comment 17 Dan Kenigsberg 2016-08-17 11:01:24 UTC
(In reply to Aharon Canan from comment #16)
> Testing against NGN wasn't part of the plan.

it should be.

> As for now, this issue is targeted to 4.0.4 and will be tested as part of
> 4.0.4 testing.
> 
> If you need it for 4.0 GA please set the relevant target versions and lets
> scrub it.
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1334745#c2
> 
> Not sure but please check if
> https://bugzilla.redhat.com/show_bug.cgi?id=1353456 is related (You asked
> about it on comment #9)

yes, I suspect that we have a regression in this regard. On vintage node, the services where running on boot; I suspect they would not run on NGN.

Comment 18 Yaniv Lavi 2016-08-21 11:32:20 UTC
We should align to same method as in RHEL. To be tested with BZ #1353456 on NGN.

Comment 19 Elad 2016-09-07 13:20:52 UTC
I have the 4.0 (rhvh-4.0-0.20160829.0) rhv-h and I'm trying to update vdsm with 'yum update vdsm' and nothing is being updated although I have the right repos (which work on RHEL7.2 hosts) and the vdsm I have installed is not the latest (vdsm-4.18.11-1.el7ev.x86_64).


Fabian/Dan, it's currently blocking me from testing the fix for verification, can you please assist? 

Thanks

Comment 20 Ryan Barry 2016-09-07 13:25:26 UTC
Please show "yum repolist rhev-4.0.4-1"

Comment 21 Elad 2016-09-07 13:35:00 UTC
repo id                                                                                                 repo name                                                                     status
rhev-4.0.4-1/7RedHatVirtualizationHost                                                                  RHEV 4.0.4-1                                                                  disabled
repolist: 0

Comment 22 Yaniv Lavi 2016-09-08 07:07:18 UTC
Please use the image that contained the VDSM version you need, do not install it on RHV-H.

Comment 23 Elad 2016-09-08 07:23:50 UTC
Yaniv, following an offline discussion in mail (you were cc'd), Dan told us to test this on RHV-H.

Comment 24 Dan Kenigsberg 2016-09-08 09:13:13 UTC
Yes, we should FCoE on RHV-H, but on RHV-H should not use yum. You should have everything preinstalled.

Comment 25 Elad 2016-09-08 14:03:18 UTC
With RHV-H: fcoe.service and lldpad.service remain disabled after setup network with using the fcoe hook is confirmed. 


Thread-258::INFO::2016-09-08 16:50:13,654::xmlrpc::91::vds.XMLRPCServer::(_process_requests) Request handler for ::1:48764 stopped
jsonrpc.Executor/4::DEBUG::2016-09-08 16:50:17,274::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'Host.setupNetworks' in bridge with {u'bondings': {}, u'networks': {u'fcoe
2': {u'ipv6autoconf': True, u'nic': u'em2_1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': u'false', u'custom': {u'fcoe': u'enable=yes,dcb=yes,auto_vlan=yes'}}, u'fcoe1'
: {u'ipv6autoconf': True, u'nic': u'em1_1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': u'false', u'custom': {u'fcoe': u'enable=yes,dcb=yes,auto_vlan=yes'}}}, u'options
': {u'connectivityCheck': u'true', u'connectivityTimeout': 120}}


[root@green-vdsd yum.repos.d]# systemctl is-enabled fcoe
disabled
[root@green-vdsd yum.repos.d]# systemctl is-enabled lldpad
disabled

====================================================================
Re-opening.


Used:
rhvh-4.0-0.20160829.0
vdsm-4.18.13-1.el7ev.x86_64
vdsm-hook-fcoe-4.18.13-1.el7ev.noarch

Comment 26 Elad 2016-09-08 14:08:44 UTC
Created attachment 1199121 [details]
logs and 'journalctl -xe' output

Comment 27 Dan Kenigsberg 2016-09-08 18:32:03 UTC
Could you add vdsm.log from the times where setupNetworks was called?

Comment 28 Fabian Deutsch 2016-09-10 10:31:40 UTC
A quick note from looking at this:
On upstream Node with vdsm-4.18.11-1 and vdsm-hook-fcoe-4.18.11-1 installed, the 85-vdsm-hook-fcoe.preset file is _not_ installed.
Also rpm -ql vdsm-hook-fcoe does not list the preset file.

Comment 29 Elad 2016-09-11 08:44:40 UTC
Created attachment 1199868 [details]
logs-11.9.16

2016-09-11 11:41:32,244 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-10) [7cb11fc7] START, HostSetupNetworksVDSCommand(HostName = green-vdsd, HostSetupNetworksVdsCommandParameters:{runAsync='true', hostId='10de04ce-dd52-4b76-9e57-86e4a46e9c53', vds='Host[green-vdsd,10de04ce-dd52-4b76-9e57-86e4a46e9c53]', rollbackOnFailure='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='false', bonding='false', networkName='fcoe2', nicName='em2_1', vlan='null', mtu='0', vmNetwork='false', stp='false', properties='[fcoe=enable=yes,dcb=yes,auto_vlan=yes]', ipv4BootProtocol='NONE', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='AUTOCONF', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', switchType='LEGACY'},
                HostNetwork:{defaultRoute='false', bonding='false', networkName='fcoe1', nicName='em1_1', vlan='null', mtu='0', vmNetwork='false', stp='false', properties='[fcoe=enable=yes,dcb=yes,auto_vlan=yes]', ipv4BootProtocol='NONE', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='AUTOCONF', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', switchType='LEGACY'}]', removedNetworks='[]', bonds='[]', removedBonds='[]'}), log id: 5dc91e6d



jsonrpc.Executor/1::DEBUG::2016-09-11 11:41:32,248::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'Host.setupNetworks' in bridge with {u'bondings': {}, u'networks': {u'fcoe
2': {u'ipv6autoconf': True, u'nic': u'em2_1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': u'false', u'custom': {u'fcoe': u'enable=yes,dcb=yes,auto_vlan=yes'}}, u'fcoe1'
: {u'ipv6autoconf': True, u'nic': u'em1_1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': u'false', u'custom': {u'fcoe': u'enable=yes,dcb=yes,auto_vlan=yes'}}}, u'options
': {u'connectivityCheck': u'true', u'connectivityTimeout': 120}}

Comment 30 Dan Kenigsberg 2016-09-11 10:02:15 UTC
(In reply to Fabian Deutsch from comment #28)
> A quick note from looking at this:
> On upstream Node with vdsm-4.18.11-1 and vdsm-hook-fcoe-4.18.11-1 installed,
> the 85-vdsm-hook-fcoe.preset file is _not_ installed.
> Also rpm -ql vdsm-hook-fcoe does not list the preset file.

The preset file was introduced in 4.18.12, and Elad tested vdsm-4.18.13.

Elad, can you reproduce and show me a live system with the issue?

Comment 31 Elad 2016-09-11 10:38:59 UTC
Yes, setup details in mail

Comment 32 Dan Kenigsberg 2016-09-11 11:34:08 UTC
The hook has %post script to apply systemd presets, but somehow they do not apply on ngn. Running `systemctl preset lldpad` manually works fine; could it be that symlinks are not persisted in ngn?


# rpm -q --scripts vdsm-hook-fcoe
postinstall scriptlet (using /bin/sh):

if [ $1 -eq 1 ] ; then 
        # Initial installation 
        systemctl preset lldpad.service >/dev/null 2>&1 || : 
fi 


if [ $1 -eq 1 ] ; then 
        # Initial installation 
        systemctl preset fcoe.service >/dev/null 2>&1 || : 
fi

# systemctl status lldpad
● lldpad.service - Link Layer Discovery Protocol Agent Daemon.
   Loaded: loaded (/usr/lib/systemd/system/lldpad.service; disabled; vendor preset: enabled)

# systemctl preset lldpad.service
Created symlink from /etc/systemd/system/multi-user.target.wants/lldpad.service to /usr/lib/systemd/system/lldpad.service.
Created symlink from /etc/systemd/system/sockets.target.wants/lldpad.socket to /usr/lib/systemd/system/lldpad.socket.

# systemctl status lldpad
● lldpad.service - Link Layer Discovery Protocol Agent Daemon.
   Loaded: loaded (/usr/lib/systemd/system/lldpad.service; enabled; vendor preset: enabled)

Comment 33 Ryan Barry 2016-09-11 16:28:52 UTC
(In reply to Dan Kenigsberg from comment #32)
> The hook has %post script to apply systemd presets, but somehow they do not
> apply on ngn. Running `systemctl preset lldpad` manually works fine; could
> it be that symlinks are not persisted in ngn?

NGN doesn't have the concept of persistence per-se. It's a writable root filesystem, and changes are kept until an upgrade to a new image happens.

redhat-virtualization-host-20160829.0 suffered from the circular dependency problem (which could have evidenced itself in lldpad as well), but upgrading to a new vdsm with a plain RPM (as Elad did) should behave identically to a RHEL system...

Elad --

Did you upgrade vdsm from a repo?

Comment 34 Elad 2016-09-13 06:48:59 UTC
> Elad --
> 
> Did you upgrade vdsm from a repo?

Yes

Comment 35 Ryan Barry 2016-09-13 16:37:40 UTC
%systemd_post evaluates to:

%systemd_post() \
if [ $1 -eq 1 ] ; then \
        # Initial installation \
        systemctl --no-reload preset %{?*} >/dev/null 2>&1 || : \
fi \
%{nil}

This won't actually enable the service on an upgrade (it must be $1 -ge 1)

$new_version:%post triggers before $old_version:%[pre|post]un, so $1 == 2 on upgrades, and %systemd_post passes.

If you remove vdsm-hook-fcoe and reinstall it from the repo (rpm -e --nodeps vdsm-hook-fcoe && yum -y install vdsm-hook-fcoe), this works, which leads me to believe that it will also work in the next squashfs build (since it's not an upgrade).

Dan:

However, if we want vdsm users to get this preset on RHEL automatically, vdsm should probably not use %systemd_post, and should instead checK

if [ $1 -ge 1 ]; then
    systemctl preset ...

Comment 36 Dan Kenigsberg 2016-09-14 05:56:48 UTC
Thanks, Ryan!

I don't believe too many people have installed the vdsm-hook-fcoe prior the `preset` patch. So let us just re-check it once a new rhvh image (without the circular dependency bug) is available.

Comment 37 Kevin Alon Goldblatt 2016-10-25 13:50:05 UTC
Tested with the following code:
----------------------------------------
rhevm-4.0.5-0.1.el7ev.noarch
vdsm-4.18.13-1.el7ev.x86_64

Tested with the following scenario:

Steps to Reproduce:
1. Added a RHEVH 4.0.1 host to the 4.0.5 engine
2. Upgraded the RHEVH to 4.0.5
3. systemctl status reports that fcoe.service, lldpad.socket and lldpad.service services are all running.




Actual results:
systemctl status reports that fcoe.service, lldpad.socket and lldpad.service services are all running

Expected results:


Moving to VERIFIED!


Note You need to log in before you can comment on or make changes to this bug.