Bug 1414299

Summary: lldpad.socket is not enabled since preset file is not in /usr/lib/systemd/system-preset/*.preset
Product: [oVirt] vdsm Reporter: dguo
Component: GeneralAssignee: Dan Kenigsberg <danken>
Status: CLOSED CURRENTRELEASE QA Contact: dguo
Severity: high Docs Contact:
Priority: high    
Version: 4.19.1CC: bugs, cshao, danken, dguo, huzhao, jiawu, leiwang, qiyuan, rbarry, sbonazzo, weiwang, yaniwang, ybronhei, ycui, yzhao
Target Milestone: ovirt-4.1.0-rcKeywords: Regression
Target Release: 4.19.3Flags: rbarry: needinfo-
rule-engine: ovirt-4.1+
rule-engine: blocker+
cshao: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-16 14:48:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
journactl -xe log none

Description dguo 2017-01-18 09:28:33 UTC
Created attachment 1242097 [details]
journactl -xe log

Description of problem:
lldpad.socket is not running after rhvh4.1 be added to engine

Version-Release number of selected component (if applicable):
redhat-virtualization-host-4.1-0.20170116.0
imgbased-0.9.4-0.1.el7ev.noarch
kernel-3.10.0-514.2.2.el7.x86_64
Red Hat Virtualization Manager Version: 4.1.0-0.3.beta2.el7
vdsm-4.19.1-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Install RHVH4.1 automatically
2. Reboot and add host to engine
3. check status:
#systemctl status fcoe.service
#systemctl status lldpad.socket
#systemctl status lldpad.service
4. Restart the above three services manually by command
#systemctl restart $service-name
5. Re-check their status

Actual results:
1. After step 3, lldpad.socket is inactive
2. After step 5, lldpad.socket failed to restart

Expected results:
1. After step 3, lldpad.socket is active at startup
2. After step 3, lldpad.socket is active after restart

Additional info:
[root@dhcp-8-140 ~]# service fcoe.service status
Redirecting to /bin/systemctl status  fcoe.service.service
● fcoe.service - Open-FCoE Inititator.
   Loaded: loaded (/usr/lib/systemd/system/fcoe.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2017-01-18 04:15:37 EST; 2min 25s ago
  Process: 21394 ExecStart=/usr/sbin/fcoemon $FCOEMON_OPTS (code=exited, status=0/SUCCESS)
  Process: 21376 ExecStartPre=/sbin/modprobe -qa $SUPPORTED_DRIVERS (code=exited, status=0/SUCCESS)
 Main PID: 21396 (fcoemon)
   CGroup: /system.slice/fcoe.service
           └─21396 /usr/sbin/fcoemon --syslog

Jan 18 04:15:37 dhcp-8-140.nay.redhat.com systemd[1]: Starting Open-FCoE Inititator....
Jan 18 04:15:37 dhcp-8-140.nay.redhat.com systemd[1]: Started Open-FCoE Inititator..
[root@dhcp-8-140 ~]# service lldpad.service status
Redirecting to /bin/systemctl status  lldpad.service.service
● lldpad.service - Link Layer Discovery Protocol Agent Daemon.
   Loaded: loaded (/usr/lib/systemd/system/lldpad.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2017-01-18 04:15:37 EST; 2min 28s ago
 Main PID: 21374 (lldpad)
   CGroup: /system.slice/lldpad.service
           └─21374 /usr/sbin/lldpad -t

Jan 18 04:15:37 dhcp-8-140.nay.redhat.com systemd[1]: Started Link Layer Discovery Protocol Agent Daemon..
Jan 18 04:15:37 dhcp-8-140.nay.redhat.com systemd[1]: Starting Link Layer Discovery Protocol Agent Daemon....
[root@dhcp-8-140 ~]# service lldpad.socket status
Redirecting to /bin/systemctl status  lldpad.socket.service
● lldpad.socket
   Loaded: loaded (/usr/lib/systemd/system/lldpad.socket; disabled; vendor preset: disabled)
   Active: inactive (dead)
   Listen: @/com/intel/lldpad (Datagram)
[root@dhcp-8-140 ~]# service lldpad.socket status
Redirecting to /bin/systemctl status  lldpad.socket.service
● lldpad.socket
   Loaded: loaded (/usr/lib/systemd/system/lldpad.socket; disabled; vendor preset: disabled)
   Active: inactive (dead)
   Listen: @/com/intel/lldpad (Datagram)
[root@dhcp-8-140 ~]# service lldpad.socket start
Redirecting to /bin/systemctl start  lldpad.socket.service
Job for lldpad.socket failed. See "systemctl status lldpad.socket" and "journalctl -xe" for details.
[root@dhcp-8-140 ~]# journalctl -xe >> lldpad.socket.log

Comment 1 dguo 2017-01-18 09:31:39 UTC
This issue had been caught on rhvh4.0 and been resolved, please see bug 1364941

Comment 2 dguo 2017-01-18 09:33:04 UTC
(In reply to dguo from comment #1)
> This issue had been caught on rhvh4.0 and been resolved, please see bug
> 1364941

But still can be caught in recent build rhvh-4.1-0.20170116.0

Comment 3 Ryan Barry 2017-01-18 13:41:52 UTC
This needs the same fix as comment#35 in  bug 1364941. I imagine it missed the branch.

Comment 4 Red Hat Bugzilla Rules Engine 2017-01-18 13:42:01 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 5 Yaniv Kaul 2017-01-19 08:52:14 UTC
Dan, remind me again - are we even using LLDP?

Comment 6 Dan Kenigsberg 2017-01-19 14:40:14 UTC
(In reply to Yaniv Kaul from comment #5)
> Dan, remind me again - are we even using LLDP?

Vdsm is not using lldap, but vdsm-hook-fcoe (that is shipped on node) does.

(In reply to Ryan Barry from comment #3)
> This needs the same fix as comment#35 in  bug 1364941. I imagine it missed
> the branch.

No, https://gerrit.ovirt.org/61304 that enabled lldap and fceo services is in the 4.1 branch.

dguo, could you attach *vdsm.log and also report the `systemctl status lldpad` after your step 1.

Can you give the times of your steps relative to the the attached logs? From the logs it seems that lldap was running when someone re-started it.

Jan 18 04:24:49 dhcp-8-140.nay.redhat.com systemd[1]: Socket service lldpad.service already active, refusing.
Jan 18 04:24:49 dhcp-8-140.nay.redhat.com systemd[1]: Failed to listen on lldpad.socket.
-- Subject: Unit lldpad.socket has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit lldpad.socket has failed.

Comment 7 Ryan Barry 2017-01-19 15:37:21 UTC
(In reply to Dan Kenigsberg from comment #6)
> (In reply to Yaniv Kaul from comment #5)
> > Dan, remind me again - are we even using LLDP?
> 
> Vdsm is not using lldap, but vdsm-hook-fcoe (that is shipped on node) does.
> 
> (In reply to Ryan Barry from comment #3)
> > This needs the same fix as comment#35 in  bug 1364941. I imagine it missed
> > the branch.

comment#35 in the other bug is still relevant here.

I don't know if engine starts these services or not, but they are not enabled by default on RHV-H 4.1.

The failure to restart is a problem, but the preset does not appear to be applied here:

[root@localhost ~]# imgbase w
[INFO] You are on rhvh-4.1-0.20170116.0+1
[root@localhost ~]# systemctl status fcoe.service
● fcoe.service - Open-FCoE Inititator.
   Loaded: loaded (/usr/lib/systemd/system/fcoe.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
[root@localhost ~]# systemctl status lldpad.service
● lldpad.service - Link Layer Discovery Protocol Agent Daemon.
   Loaded: loaded (/usr/lib/systemd/system/lldpad.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Comment 8 Dan Kenigsberg 2017-01-19 16:22:14 UTC
Ryan, can you tell if /usr/lib64/systemd/system-preset/85-vdsm-hook-fcoe.preset is there? And `rpm -q --scripts vdsm-hook-fcoe` ?

Comment 9 Ryan Barry 2017-01-19 16:27:03 UTC
[root@localhost ~]# ls -l /usr/lib64/systemd/system-preset/85-vdsm-hook-fcoe.preset 
-rw-r--r--. 1 root root 122 Dec 22 05:10 /usr/lib64/systemd/system-preset/85-vdsm-hook-fcoe.preset

[root@localhost ~]# rpm -q --scripts vdsm-hook-fcoe
postinstall scriptlet (using /bin/sh):

if [ $1 -eq 1 ] ; then 
        # Initial installation 
        systemctl preset lldpad.service >/dev/null 2>&1 || : 
fi 


if [ $1 -eq 1 ] ; then 
        # Initial installation 
        systemctl preset fcoe.service >/dev/null 2>&1 || : 
fi

Comment 10 Dan Kenigsberg 2017-01-19 16:52:12 UTC
Ryan, when you run these from the command line, do they fix the vendor preset mode of the services?

Do you have any clue why this did not work on image building time?

Comment 11 Ryan Barry 2017-01-19 16:57:37 UTC
No they don't work.

I did a little moving. systemd expects these to be in /usr/lib/..., not /usr/lib64/...

If they're in /usr/lib, "systemd preset fcoe.service" (for example) works as expected.

Comment 12 Dan Kenigsberg 2017-01-19 22:22:04 UTC
How come this does not affect vdsmd and the rest of the services listed in:

# rpm -ql vdsm|grep preset
/usr/lib64/systemd/system-preset/85-vdsmd.preset

?


Anyway, we should follow the systemd.preset(5) docs for everything.

Comment 13 Ryan Barry 2017-01-19 22:36:10 UTC
I'm not sure that they do...

systemctl status mom-vdsm.service (for example) shows that it's disabled, with vendor preset disabled.

Comment 14 Dan Kenigsberg 2017-01-22 08:18:17 UTC
Yaniv, I think this merits as a 4.1.0 rc blocker. Vdsm would not start after reboot.

Comment 15 Sandro Bonazzola 2017-01-23 13:55:32 UTC
Moving to 4.1.1 since 4.19.3 is not included in 4.1.0.

Comment 16 Sandro Bonazzola 2017-01-26 09:12:20 UTC
4.19.3 has been included in 4.1.0 RC2

Comment 17 dguo 2017-03-09 11:22:29 UTC
Verified on rhvh-4.1-20170309.0

There are 85-vdsmd.preset and 85-vdsm-hook-fcoe.preset file under 
/usr/lib/systemd/system-preset/

[root@dhcp-10-82 ~]# imgbase w
[INFO] You are on rhvh-4.1-0.20170309.0+1
[root@dhcp-10-82 ~]# ls -l /usr/lib/systemd/system-preset/
total 20
-rw-r--r--. 1 root root 249 Mar  2 21:37 85-vdsmd.preset
-rw-r--r--. 1 root root 122 Mar  2 21:37 85-vdsm-hook-fcoe.preset
-rw-r--r--. 1 root root 896 Nov 29 22:55 90-systemd.preset
-rw-r--r--. 1 root root 175 Mar  8 11:17 98-ovirt-host-node.preset
-rw-r--r--. 1 root root  10 Nov 29 22:55 99-default-disable.preset

Before adding host to engine, fcoe.service/lldpad.socket/lldpad.service are 
active(running)

[root@dhcp-10-82 ~]# service fcoe.service status
Redirecting to /bin/systemctl status  fcoe.service.service
● fcoe.service - Open-FCoE Inititator.
   Loaded: loaded (/usr/lib/systemd/system/fcoe.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2017-03-09 18:47:49 CST; 14min ago
 Main PID: 1389 (fcoemon)
   CGroup: /system.slice/fcoe.service
           └─1389 /usr/sbin/fcoemon --syslog

Mar 09 10:47:47 dhcp-10-82.nay.redhat.com systemd[1]: Starting Open-FCoE Inititator....
Mar 09 18:47:49 dhcp-10-82.nay.redhat.com systemd[1]: Started Open-FCoE Inititator..
[root@dhcp-10-82 ~]# service lldpad.service status
Redirecting to /bin/systemctl status  lldpad.service.service
● lldpad.service - Link Layer Discovery Protocol Agent Daemon.
   Loaded: loaded (/usr/lib/systemd/system/lldpad.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2017-03-09 10:47:47 CST; 8h ago
 Main PID: 1348 (lldpad)
   CGroup: /system.slice/lldpad.service
           └─1348 /usr/sbin/lldpad -t

Mar 09 10:47:47 dhcp-10-82.nay.redhat.com systemd[1]: Started Link Layer Discovery Protocol Agent Daemon..
Mar 09 10:47:47 dhcp-10-82.nay.redhat.com systemd[1]: Starting Link Layer Discovery Protocol Agent Daemon....
[root@dhcp-10-82 ~]# service lldpad.socket status
Redirecting to /bin/systemctl status  lldpad.socket.service
● lldpad.socket
   Loaded: loaded (/usr/lib/systemd/system/lldpad.socket; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-03-09 10:47:23 CST; 8h ago
   Listen: @/com/intel/lldpad (Datagram)

Mar 09 10:47:23 localhost.localdomain systemd[1]: Listening on lldpad.socket.
Mar 09 10:47:23 localhost.localdomain systemd[1]: Starting lldpad.socket.



After adding host to engine, they are still active(running)

[root@dhcp-10-82 ~]# service fcoe.service status
Redirecting to /bin/systemctl status  fcoe.service.service
● fcoe.service - Open-FCoE Inititator.
   Loaded: loaded (/usr/lib/systemd/system/fcoe.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2017-03-09 19:02:49 CST; 3min 48s ago
  Process: 15192 ExecStart=/usr/sbin/fcoemon $FCOEMON_OPTS (code=exited, status=0/SUCCESS)
  Process: 15190 ExecStartPre=/sbin/modprobe -qa $SUPPORTED_DRIVERS (code=exited, status=0/SUCCESS)
 Main PID: 15194 (fcoemon)
   CGroup: /system.slice/fcoe.service
           └─15194 /usr/sbin/fcoemon --syslog

Mar 09 19:02:49 dhcp-10-82.nay.redhat.com systemd[1]: Starting Open-FCoE Inititator....
Mar 09 19:02:49 dhcp-10-82.nay.redhat.com systemd[1]: Started Open-FCoE Inititator..
[root@dhcp-10-82 ~]# service lldpad.service status
Redirecting to /bin/systemctl status  lldpad.service.service
● lldpad.service - Link Layer Discovery Protocol Agent Daemon.
   Loaded: loaded (/usr/lib/systemd/system/lldpad.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2017-03-09 19:02:49 CST; 3min 52s ago
 Main PID: 15187 (lldpad)
   CGroup: /system.slice/lldpad.service
           └─15187 /usr/sbin/lldpad -t

Mar 09 19:02:49 dhcp-10-82.nay.redhat.com systemd[1]: Started Link Layer Discovery Protocol Agent Daemon..
Mar 09 19:02:49 dhcp-10-82.nay.redhat.com systemd[1]: Starting Link Layer Discovery Protocol Agent Daemon....
[root@dhcp-10-82 ~]# service lldpad.socket status
Redirecting to /bin/systemctl status  lldpad.socket.service
● lldpad.socket
   Loaded: loaded (/usr/lib/systemd/system/lldpad.socket; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-03-09 10:47:23 CST; 8h ago
   Listen: @/com/intel/lldpad (Datagram)

Mar 09 10:47:23 localhost.localdomain systemd[1]: Listening on lldpad.socket.
Mar 09 10:47:23 localhost.localdomain systemd[1]: Starting lldpad.socket.