Bug 1397049

Summary: [fdProd] [OVS] systemd ordering of OVS service causes hang during restart when there is NFS mount over OVS
Product: Red Hat Enterprise Linux 7 Reporter: Flavio Leitner <fleitner>
Component: openvswitchAssignee: Flavio Leitner <fleitner>
Status: CLOSED ERRATA QA Contact: Rick Alongi <ralongi>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: aconole, asupugad, atelang, atragler, eparis, erich, fbaudin, fleitner, lxin, qding, ralongi, rhowe, systemd-maint-list, systemd-maint, wchadwic
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openvswitch-2.5.0-19.git20160727.el7fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1378135 Environment:
Last Closed: 2017-01-05 20:48:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1378135    
Bug Blocks: 1267746    

Comment 4 Rick Alongi 2016-11-30 20:33:08 UTC
Fix verified using OVS FDP 2.5.0.22.  Details below.

Thanks,
Rick

*****************************************

Kernel used on both systems: 3.10.0-514.el7.x86_64

On client (netqe13):

[root@netqe13 network-scripts]# cat ifcfg-ovsbr0
DEVICE=ovsbr0
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSBridge
BOOTPROTO=none
HOTPLUG=no
MTU=
NM_CONTROLLED=no

[root@netqe13 network-scripts]# cat ifcfg-intport0
DEVICE=intport0
DEVICETYPE=ovs
ONBOOT=yes
OVS_BRIDGE=ovsbr0
BOOTPROTO=none
HOTPLUG=no
NM_CONTROLLED=no
HWADDR=
TYPE=OVSIntPort
IPADDR=192.168.1.1
NETMASK=255.255.255.0

[root@netqe13 network-scripts]# cat ifcfg-p2p1
DEVICE=p2p1
DEVICETYPE=ovs
ONBOOT=yes
OVS_BRIDGE=ovsbr0
BOOTPROTO=none
HOTPLUG=no
NM_CONTROLLED=no
HWADDR=a0:36:9f:85:8d:1c
TYPE=OVSPort

[root@netqe13 network-scripts]# ping -c1 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.203 ms

[root@netqe13 network-scripts]# cat /etc/fstab | grep netdev
192.168.1.2:/home/ralongi /test nfs _netdev 0 0

[root@netqe13 network-scripts]# mount | grep test | grep netdev
192.168.1.2:/home/ralongi on /test type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600, retrans=2,sec=sys,clientaddr=192.168.1.1,local_lock=none,addr=192.168.1.2,_netdev)

[root@netqe13 network-scripts]# ls /test | wc -l
3

[root@netqe13 network-scripts]# df -h /test
Filesystem                 Size  Used Avail Use% Mounted on
192.168.1.2:/home/ralongi  849G   32M  849G   1% /test


On server (netqe14):

[root@netqe14 ralongi]# ip a | grep p2p1
4: p2p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    inet 192.168.1.2/24 scope global p2p1
    
[root@netqe14 ralongi]# cat /etc/exports
/home/ralongi   *(rw,sync,no_root_squash)

*******************************
Without fix (OVS FDP 2.5.0.14), problem is observed on console:

[root@netqe13 ~]# shutdown -h now
[ 2113.196291] nfsd: last server has exited, flushing export cache
[  OK  ] Started Show Plymouth Power Off Screen.
[  OK  ] Stopped Open vSwitch Internal Unit.
[  OK  ] Started Restore /run/initramfs.
[  OK  ] Stopped Dynamic System Tuning Daemon.
[    **] A stop job is running for /test (3min 30s / 4min 30s)[ 2324.132484] nft
[**    ] A stop job is running for /test (10min 13s / 10min 31s)[ 2726.771270] t
[ 2726.777756] nfs: server 192.168.1.2 not responding, timed out
[***   ] A stop job is running for /test (15min 19s / 16min 32s)[ 3032.946390] t
[    **] A stop job is running for /test (20min 25s / 21min 3s)[ 3339.121441] nfs: server 192.168.1.2 not responding, timed out
[  *** ] A stop job is running for /test (25min 31s / 25min 33s)[ 3645.296553] nfs: server 192.168.1.2 not responding, timed out
[   ***] A stop job is running for /test (30min / 3[ 3913.318485] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
[ !!  ] Forcibly powering off as result of failure.
[ 3913.336604] systemd-journald[668]: Received SIGTERM from PID 1 (systemd-shutdow).
[ 3913.346735] type=1305 audit(1480537406.792:281): audit_pid=0 old=1032 auid=4294967295 ses=4294967295 subj=system_u:system_r:a1
[ 3923.337766] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[ 3923.349314] systemd-shutdown[1]: Sending SIGKILL to PID 14266 (umount).
[ 3923.357051] systemd-shutdown[1]: Unmounting file systems.
[ 3923.364095] systemd-shutdown[1]: Unmounting /run/user/0.
[ 3923.370255] systemd-shutdown[1]: Unmounting /test.
[ 3923.383319] systemd-shutdown[1]: Unmounting /home.
[ 3923.388774] XFS (dm-2): Unmounting Filesystem
[ 3923.394541] systemd-shutdown[1]: Unmounting /boot.
[ 3923.403825] XFS (sda1): Unmounting Filesystem
[ 3923.416715] systemd-shutdown[1]: All filesystems unmounted.
[ 3923.422936] systemd-shutdown[1]: Deactivating swaps.
[ 3923.428994] systemd-shutdown[1]: All swaps deactivated.
[ 3923.434827] systemd-shutdown[1]: Detaching loop devices.
[ 3923.441227] systemd-shutdown[1]: All loop devices detached.
[ 3923.447450] systemd-shutdown[1]: Detaching DM devices.
[ 3923.453473] systemd-shutdown[1]: Detaching DM 253:2.
[ 3923.472635] systemd-shutdown[1]: Detaching DM 253:1.
[ 3923.493665] systemd-shutdown[1]: Not all DM devices detached, 1 left.
[ 3923.500893] systemd-shutdown[1]: Detaching DM devices.
[ 3923.506716] systemd-shutdown[1]: Not all DM devices detached, 1 left.
[ 3923.513898] systemd-shutdown[1]: Cannot finalize remaining DM devices, continuing.
[ 3923.639272] systemd-shutdown[1]: Successfully changed into root pivot.
[ 3923.646556] systemd-shutdown[1]: Returning to initrd...
[ 3923.677366] dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
[ 3923.730281] XFS (dm-0): Unmounting Filesystem
[ 3923.740581] dracut Warning: Unmounted /oldroot.
[ 3923.758778] dracut: Disassembling device-mapper devices
[ 3923.784344] kvm: exiting hardware virtualization
Powering off.
[ 3923.889650] mlx4_core 0000:05:00.0: mlx4_shutdown was called
[ 3924.151832] mlx4_en 0000:05:00.0: removed PHC
[ 3926.607664] ACPI: Preparing to enter system sleep state S5
[ 3926.613931] Power down.

*****************************************

With fix (OVS FDP 2.5.0.22), no problem is observed on console:

[root@netqe13 ~]# shutdown -h now
[22707.778330] nfsd: last server has exited, flushing export cache
[  OK  ] Started Show Plymouth Power Off Screen.
[  OK  ] Stopped LSB: Starts the Spacewalk Daemon.
[  OK  ] Stopped target Host and Network Name Lookups.
[  OK  ] Stopped target System Time Synchronized.
[  OK  ] Stopped Wait for chrony to synchronize system clock.
         Stopping Wait for chrony to synchronize system clock...
         Stopping NTP client/server...
[  OK  ] Stopped NTP client/server.
[  OK  ] Stopped LVM2 PV scan on device 8:2.
[  OK  ] Removed slice system-lvm2\x2dpvscan.slice.
[22708.000070] libvirtd[24083]: segfault at deadbef7 ip 00007f2b8cbf4448 sp 00007f2b6c9a5a98 error 4 in libvirt.so.0.2000.0[7f2b8cb3c000+352000]
[  OK  ] Stopped Virtualization daemon.
[  OK  ] Stopped target Remote File Systems.
         Unmounting /test...
         Stopping Virtual machine log manager...
[  OK  ] Stopped Virtual machine log manager.
[  OK  ] Unmounted /test.
[  OK  ] Stopped target Network is Online.
[  OK  ] Stopped Dynamic System Tuning Daemon.
[  OK  ] Stopped target Network.
         Stopping LSB: Bring up/down networking...
[  OK  ] Started Restore /run/initramfs.
[22709.449461] device intport0 left promiscuous mode
[22710.284401] device ovsbr0 left promiscuous mode
[22710.298710] device p2p1 left promiscuous mode
[22710.830788] ixgbe 0000:84:00.0: removed PHC on p2p1
[  OK  ] Stopped LSB: Bring up/down networking.
         Stopping Open vSwitch Forwarding Unit...
[  OK  ] Stopped Network Manager Wait Online.
         Stopping Network Manager Wait Online...
         Stopping Network Manager...
         Stopping Open vSwitch...
[  OK  ] Stopped Open vSwitch.
[  OK  ] Stopped Network Manager.
         Stopping D-Bus System Message Bus...
[  OK  ] Stopped D-Bus System Message Bus.
[  OK  ] Stopped Open vSwitch Forwarding Unit.
         Stopping Open vSwitch Database Unit...
[  OK  ] Stopped Open vSwitch Database Unit.
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Sockets.
[  OK  ] Closed RPCbind Server Activation Socket.
[22711.837825] type=1305 audit(1480533240.790:533): audit_pid=0 old=1102 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1

[  OK  [0[22711.852400] type=1130 audit(1480533240.805:534): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[22711.876561] type=1131 audit(1480533240.829:535): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
m] Closed D-Bus [22711.899569] type=1130 audit(1480533240.852:536): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[22711.925115] type=1131 audit(1480533240.878:537): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
System Message B[22711.949582] type=1130 audit(1480533240.902:538): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rhel-import-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[22711.974717] type=1131 audit(1480533240.927:539): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rhel-import-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
us Socket.
[[[22711.999700] type=1130 audit(1480533240.952:540): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rhel-readonly comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[22712.002981] XFS (dm-2): Unmounting Filesystem
[22712.007161] XFS (sda1): Unmounting Filesystem
[22712.033178] type=1131 audit(1480533240.986:541): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rhel-readonly comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
32m  OK  ] Stopped target Slices.
[  OK  ] Removed slice User and Session Slice.
[  OK  ] Removed slic[22712.068363] systemd-shutdown[1]: Sending SIGTERM to remaining processes...
e Virtual Machine and Container Slice.
[  OK  ] Stopped target Paths.[22712.084071] systemd-journald[756]: Received SIGTERM from PID 1 (systemd-shutdow).

[  OK  [22712.093706] systemd-shutdown[1]: Sending SIGKILL to remaining processes...
[0m] Stopped target System Initialization.
   [22712.106094] systemd-shutdown[1]: Unmounting file systems.
      Stopping Load/Save Random Seed...
[  OK  ] Stopped Apply Kernel Variables.
         Stopping Apply Kernel Variables...
[  OK  ] Stopped Setup Virtual Console.
         Stopping Setup Virtual Console...
         Stopping Security Auditing Service...
[  OK  ] Stopped target Swap.
[  OK  ] Stopped Update is Completed.
         Stopping Update is Completed...
[  OK  ] Stopped Rebuild Hardware Database.
         Stopping Rebuild Hardware Database...
[  OK  ] Stopped Rebuild Journal Catalog.
         Stopping Rebuild Journal Catalog...
[  OK  ] Stopped target Encrypted Volumes.
[  OK  ] Stopped Load/Save Random Seed.
[  OK  ] Stopped Security Auditing Service.
[  OK  ] Stopped Create Volatile Files and Directories.
         Stopping Create Volatile Files and Directories...
[  OK  ] Stopped Import network configuration from initramfs.
         Stopping Import network configuration from initramfs...
[  OK  ] Stopped target Local File Systems.
         Unmounting /run/user/0[22712.222103] systemd-shutdown[1]: All filesystems unmounted.
[22712.229100] systemd-shutdown[1]: Deactivating swaps.
...
         U[22712.234712] systemd-shutdown[1]: All swaps deactivated.
[22712.242048] systemd-shutdown[1]: Detaching loop devices.
nmounting /home.[22712.248386] systemd-shutdown[1]: All loop devices detached.
[22712.255761] systemd-shutdown[1]: Detaching DM devices.
..
         Un[22712.261599] systemd-shutdown[1]: Detaching DM 253:2.
mounting /boot...
[  OK  ] Stopped Configure read-only root support.
         Stopping Configure read-only root support...
[  OK  ] Unmounted /run/user/[22712.285014] systemd-shutdown[1]: Detaching DM 253:1.
0.
[  OK  ] Unmounted /home.
[  OK  ] Unmounted /boot.
[  OK  ] Reached target Unmount All Filesystems.
[[32[22712.304016] systemd-shutdown[1]: Not all DM devices detached, 1 left.
m  OK  ] Sto[22712.312437] systemd-shutdown[1]: Detaching DM devices.
pped target Loca[22712.319752] systemd-shutdown[1]: Not all DM devices detached, 1 left.
[22712.328440] systemd-shutdown[1]: Cannot finalize remaining DM devices, continuing.
l File Systems (Pre).
[  OK  ] Stopped Remount Root and Kernel File Systems.
         Stopping Remount Root and Kernel File Systems...
         Stopping Monitoring of LVM2 mirrors... dmeventd or progress polling...
[  OK  ] Stopped Create Static Device Nodes in /dev.
         Stopping Create Static Device Nodes in /dev...
[  OK  ] Reached target Shutdown.
[22712.467006] systemd-shutdown[1]: Successfully changed into root pivot.
[22712.474289] systemd-shutdown[1]: Returning to initrd...
[22712.506709] dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
[22713.019702] XFS (dm-0): Unmounting Filesystem
[22713.117922] dracut Warning: Unmounted /oldroot.
[22713.151235] dracut: Disassembling device-mapper devices
[22713.174208] kvm: exiting hardware virtualization
Powering off.
[22713.201977] mlx4_core 0000:05:00.0: mlx4_shutdown was called
[22713.463982] mlx4_en 0000:05:00.0: removed PHC
[22716.145267] ACPI: Preparing to enter system sleep state S5
[22716.151546] Power down.

Comment 6 errata-xmlrpc 2017-01-05 20:48:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0027.html