Description of problem: lldpad.socket are not running after adding rhvh4.0 to rhvm4.0 Version-Release number of selected component (if applicable): redhat-virtualization-host-4.0-20160803.3.x86_64 imgbased-0.7.4-0.1.el7ev.noarch vdsm-4.18.10-1.el7ev.x86_64 Red Hat Virtualization Manager Version: 4.0.2.4-0.1.el7ev How reproducible: 100% Steps to Reproduce: 1. Install RHVH4.0, then add to RHVM 4.0 2. Check the below three service status in rhvh after the rhvh had been added to rhvm #systemctl status fcoe.service #systemctl status lldpad.socket #systemctl status lldpad.service Actual results: 1. After step1, lldpad.socket is not running Expected results: 1. After step1, the three services should be running Additional info: [root@dhcp-10-31 ~]# systemctl status fcoe.service ● fcoe.service - Open-FCoE Inititator. Loaded: loaded (/usr/lib/systemd/system/fcoe.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2016-08-05 17:13:03 CST; 2 days ago Main PID: 13900 (fcoemon) CGroup: /system.slice/fcoe.service └─13900 /usr/sbin/fcoemon --syslog Aug 05 17:13:03 dhcp-10-31.nay.redhat.com systemd[1]: Starting Open-FCoE Inititator.... Aug 05 17:13:03 dhcp-10-31.nay.redhat.com systemd[1]: Started Open-FCoE Inititator.. [root@dhcp-10-31 ~]# systemctl status lldpad.socket ● lldpad.socket Loaded: loaded (/usr/lib/systemd/system/lldpad.socket; disabled; vendor preset: disabled) Active: inactive (dead) Listen: @/com/intel/lldpad (Datagram) [root@dhcp-10-31 ~]# systemctl status lldpad.service ● lldpad.service - Link Layer Discovery Protocol Agent Daemon. Loaded: loaded (/usr/lib/systemd/system/lldpad.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2016-08-05 17:13:03 CST; 2 days ago Main PID: 13880 (lldpad) CGroup: /system.slice/lldpad.service └─13880 /usr/sbin/lldpad -t Aug 05 17:13:03 dhcp-10-31.nay.redhat.com systemd[1]: Started Link Layer Discovery Protocol Agent Daemon.. Aug 05 17:13:03 dhcp-10-31.nay.redhat.com systemd[1]: Starting Link Layer Discovery Protocol Agent Daemon.... [root@dhcp-10-31 ~]# systemctl start lldpad.socket Job for lldpad.socket failed. See "systemctl status lldpad.socket" and "journalctl -xe" for details. Aug 08 14:26:36 dhcp-10-31.nay.redhat.com sshd[11587]: pam_unix(sshd:session): session opened for user root by (uid=0) Aug 08 15:06:04 dhcp-10-31.nay.redhat.com polkitd[1116]: Registered Authentication Agent for unix-process:11935:25183438 (system bus name :1.79 [/usr/bin/pkttyagent --notify-fd 6 --fall Aug 08 15:06:04 dhcp-10-31.nay.redhat.com systemd[1]: Socket service lldpad.service already active, refusing. Aug 08 15:06:04 dhcp-10-31.nay.redhat.com systemd[1]: Failed to listen on lldpad.socket. -- Subject: Unit lldpad.socket has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit lldpad.socket has failed. --
dguo, can you supply your `journalctl -xe`? Ryan, have we not checked that these services are running after boot on ngn?
Created attachment 1188992 [details] Output of "journal -xe" As the requirement, upload the log of "journal -xe"
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.
Dan, do you see the impact of this bug? Will lldapd.service maybe be started by the hook?
It should be started by hook.
(In reply to Dan Kenigsberg from comment #1) > dguo, can you supply your `journalctl -xe`? > > Ryan, have we not checked that these services are running after boot on ngn? No, these are not running after installation. I would think that the hook would start them. We can start these as part of node if that isn't the case, though.
Elad, you have tested the hook on NGN, right? If you configure the networks to use fcoe, and reboot the system, do the services start properly?
I Did not use NGN in my tests. After reboot, while the networks are configured to have FCoE using the hook, the services started properly. The hook configured them to be enabled.
Elad, would you recheck that with a recent ovirt-ngn-4.0.2? this should be included in storage coverage tests. If it fails, we'd need to quickly take https://gerrit.ovirt.org/#/c/62365/
(In reply to Dan Kenigsberg from comment #11) > Elad, would you recheck that with a recent ovirt-ngn-4.0.2? this should be > included in storage coverage tests. > > If it fails, we'd need to quickly take https://gerrit.ovirt.org/#/c/62365/ It will take us some time as we are focusing on 4.0 GA, we will be able to recheck after, I see it is targeted to 4.0.4 anyway...
Aharon, testing FCoE should be an integral part of 4.0 GA testing. Can you make sure that this is so?
(In reply to Dan Kenigsberg from comment #13) > Aharon, testing FCoE should be an integral part of 4.0 GA testing. Can you > make sure that this is so? We already tested it with 4.0 , as this is manual testing like other features, I am not sure we will rerun. In case we will we will consider this issue of course.
Aharon, if FCoE on NGN was tested I'm cool. But this bug suggest that we actually have a problem, and Elad said (In reply to Elad from comment #10) > I Did not use NGN in my tests. so I'm a bit confused.
We tested FCOE exactly like we all agreed, no one asked for NGN back then. Everyone also confirm the verification (detailed verification information on [1]) Testing against NGN wasn't part of the plan. As for now, this issue is targeted to 4.0.4 and will be tested as part of 4.0.4 testing. If you need it for 4.0 GA please set the relevant target versions and lets scrub it. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1334745#c2 Not sure but please check if https://bugzilla.redhat.com/show_bug.cgi?id=1353456 is related (You asked about it on comment #9)
(In reply to Aharon Canan from comment #16) > Testing against NGN wasn't part of the plan. it should be. > As for now, this issue is targeted to 4.0.4 and will be tested as part of > 4.0.4 testing. > > If you need it for 4.0 GA please set the relevant target versions and lets > scrub it. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1334745#c2 > > Not sure but please check if > https://bugzilla.redhat.com/show_bug.cgi?id=1353456 is related (You asked > about it on comment #9) yes, I suspect that we have a regression in this regard. On vintage node, the services where running on boot; I suspect they would not run on NGN.
We should align to same method as in RHEL. To be tested with BZ #1353456 on NGN.
I have the 4.0 (rhvh-4.0-0.20160829.0) rhv-h and I'm trying to update vdsm with 'yum update vdsm' and nothing is being updated although I have the right repos (which work on RHEL7.2 hosts) and the vdsm I have installed is not the latest (vdsm-4.18.11-1.el7ev.x86_64). Fabian/Dan, it's currently blocking me from testing the fix for verification, can you please assist? Thanks
Please show "yum repolist rhev-4.0.4-1"
repo id repo name status rhev-4.0.4-1/7RedHatVirtualizationHost RHEV 4.0.4-1 disabled repolist: 0
Please use the image that contained the VDSM version you need, do not install it on RHV-H.
Yaniv, following an offline discussion in mail (you were cc'd), Dan told us to test this on RHV-H.
Yes, we should FCoE on RHV-H, but on RHV-H should not use yum. You should have everything preinstalled.
With RHV-H: fcoe.service and lldpad.service remain disabled after setup network with using the fcoe hook is confirmed. Thread-258::INFO::2016-09-08 16:50:13,654::xmlrpc::91::vds.XMLRPCServer::(_process_requests) Request handler for ::1:48764 stopped jsonrpc.Executor/4::DEBUG::2016-09-08 16:50:17,274::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'Host.setupNetworks' in bridge with {u'bondings': {}, u'networks': {u'fcoe 2': {u'ipv6autoconf': True, u'nic': u'em2_1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': u'false', u'custom': {u'fcoe': u'enable=yes,dcb=yes,auto_vlan=yes'}}, u'fcoe1' : {u'ipv6autoconf': True, u'nic': u'em1_1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': u'false', u'custom': {u'fcoe': u'enable=yes,dcb=yes,auto_vlan=yes'}}}, u'options ': {u'connectivityCheck': u'true', u'connectivityTimeout': 120}} [root@green-vdsd yum.repos.d]# systemctl is-enabled fcoe disabled [root@green-vdsd yum.repos.d]# systemctl is-enabled lldpad disabled ==================================================================== Re-opening. Used: rhvh-4.0-0.20160829.0 vdsm-4.18.13-1.el7ev.x86_64 vdsm-hook-fcoe-4.18.13-1.el7ev.noarch
Created attachment 1199121 [details] logs and 'journalctl -xe' output
Could you add vdsm.log from the times where setupNetworks was called?
A quick note from looking at this: On upstream Node with vdsm-4.18.11-1 and vdsm-hook-fcoe-4.18.11-1 installed, the 85-vdsm-hook-fcoe.preset file is _not_ installed. Also rpm -ql vdsm-hook-fcoe does not list the preset file.
Created attachment 1199868 [details] logs-11.9.16 2016-09-11 11:41:32,244 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-10) [7cb11fc7] START, HostSetupNetworksVDSCommand(HostName = green-vdsd, HostSetupNetworksVdsCommandParameters:{runAsync='true', hostId='10de04ce-dd52-4b76-9e57-86e4a46e9c53', vds='Host[green-vdsd,10de04ce-dd52-4b76-9e57-86e4a46e9c53]', rollbackOnFailure='true', connectivityTimeout='120', networks='[HostNetwork:{defaultRoute='false', bonding='false', networkName='fcoe2', nicName='em2_1', vlan='null', mtu='0', vmNetwork='false', stp='false', properties='[fcoe=enable=yes,dcb=yes,auto_vlan=yes]', ipv4BootProtocol='NONE', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='AUTOCONF', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', switchType='LEGACY'}, HostNetwork:{defaultRoute='false', bonding='false', networkName='fcoe1', nicName='em1_1', vlan='null', mtu='0', vmNetwork='false', stp='false', properties='[fcoe=enable=yes,dcb=yes,auto_vlan=yes]', ipv4BootProtocol='NONE', ipv4Address='null', ipv4Netmask='null', ipv4Gateway='null', ipv6BootProtocol='AUTOCONF', ipv6Address='null', ipv6Prefix='null', ipv6Gateway='null', switchType='LEGACY'}]', removedNetworks='[]', bonds='[]', removedBonds='[]'}), log id: 5dc91e6d jsonrpc.Executor/1::DEBUG::2016-09-11 11:41:32,248::__init__::530::jsonrpc.JsonRpcServer::(_handle_request) Calling 'Host.setupNetworks' in bridge with {u'bondings': {}, u'networks': {u'fcoe 2': {u'ipv6autoconf': True, u'nic': u'em2_1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': u'false', u'custom': {u'fcoe': u'enable=yes,dcb=yes,auto_vlan=yes'}}, u'fcoe1' : {u'ipv6autoconf': True, u'nic': u'em1_1', u'mtu': 1500, u'switch': u'legacy', u'dhcpv6': False, u'bridged': u'false', u'custom': {u'fcoe': u'enable=yes,dcb=yes,auto_vlan=yes'}}}, u'options ': {u'connectivityCheck': u'true', u'connectivityTimeout': 120}}
(In reply to Fabian Deutsch from comment #28) > A quick note from looking at this: > On upstream Node with vdsm-4.18.11-1 and vdsm-hook-fcoe-4.18.11-1 installed, > the 85-vdsm-hook-fcoe.preset file is _not_ installed. > Also rpm -ql vdsm-hook-fcoe does not list the preset file. The preset file was introduced in 4.18.12, and Elad tested vdsm-4.18.13. Elad, can you reproduce and show me a live system with the issue?
Yes, setup details in mail
The hook has %post script to apply systemd presets, but somehow they do not apply on ngn. Running `systemctl preset lldpad` manually works fine; could it be that symlinks are not persisted in ngn? # rpm -q --scripts vdsm-hook-fcoe postinstall scriptlet (using /bin/sh): if [ $1 -eq 1 ] ; then # Initial installation systemctl preset lldpad.service >/dev/null 2>&1 || : fi if [ $1 -eq 1 ] ; then # Initial installation systemctl preset fcoe.service >/dev/null 2>&1 || : fi # systemctl status lldpad ● lldpad.service - Link Layer Discovery Protocol Agent Daemon. Loaded: loaded (/usr/lib/systemd/system/lldpad.service; disabled; vendor preset: enabled) # systemctl preset lldpad.service Created symlink from /etc/systemd/system/multi-user.target.wants/lldpad.service to /usr/lib/systemd/system/lldpad.service. Created symlink from /etc/systemd/system/sockets.target.wants/lldpad.socket to /usr/lib/systemd/system/lldpad.socket. # systemctl status lldpad ● lldpad.service - Link Layer Discovery Protocol Agent Daemon. Loaded: loaded (/usr/lib/systemd/system/lldpad.service; enabled; vendor preset: enabled)
(In reply to Dan Kenigsberg from comment #32) > The hook has %post script to apply systemd presets, but somehow they do not > apply on ngn. Running `systemctl preset lldpad` manually works fine; could > it be that symlinks are not persisted in ngn? NGN doesn't have the concept of persistence per-se. It's a writable root filesystem, and changes are kept until an upgrade to a new image happens. redhat-virtualization-host-20160829.0 suffered from the circular dependency problem (which could have evidenced itself in lldpad as well), but upgrading to a new vdsm with a plain RPM (as Elad did) should behave identically to a RHEL system... Elad -- Did you upgrade vdsm from a repo?
> Elad -- > > Did you upgrade vdsm from a repo? Yes
%systemd_post evaluates to: %systemd_post() \ if [ $1 -eq 1 ] ; then \ # Initial installation \ systemctl --no-reload preset %{?*} >/dev/null 2>&1 || : \ fi \ %{nil} This won't actually enable the service on an upgrade (it must be $1 -ge 1) $new_version:%post triggers before $old_version:%[pre|post]un, so $1 == 2 on upgrades, and %systemd_post passes. If you remove vdsm-hook-fcoe and reinstall it from the repo (rpm -e --nodeps vdsm-hook-fcoe && yum -y install vdsm-hook-fcoe), this works, which leads me to believe that it will also work in the next squashfs build (since it's not an upgrade). Dan: However, if we want vdsm users to get this preset on RHEL automatically, vdsm should probably not use %systemd_post, and should instead checK if [ $1 -ge 1 ]; then systemctl preset ...
Thanks, Ryan! I don't believe too many people have installed the vdsm-hook-fcoe prior the `preset` patch. So let us just re-check it once a new rhvh image (without the circular dependency bug) is available.
Tested with the following code: ---------------------------------------- rhevm-4.0.5-0.1.el7ev.noarch vdsm-4.18.13-1.el7ev.x86_64 Tested with the following scenario: Steps to Reproduce: 1. Added a RHEVH 4.0.1 host to the 4.0.5 engine 2. Upgraded the RHEVH to 4.0.5 3. systemctl status reports that fcoe.service, lldpad.socket and lldpad.service services are all running. Actual results: systemctl status reports that fcoe.service, lldpad.socket and lldpad.service services are all running Expected results: Moving to VERIFIED!