Bug 2016144 - [13->16.1 FFU] during leapp upgrade reboot, openvswitch failed to start with error: Starting ovsdb-server ovsdb-server: /var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied)
Summary: [13->16.1 FFU] during leapp upgrade reboot, openvswitch failed to start with ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: OSP Team
QA Contact: David Rosenfeld
URL:
Whiteboard:
: 2091818 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-20 18:51 UTC by Matt Flusche
Modified: 2024-12-20 21:27 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-04 14:59:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-10498 0 None None None 2022-06-09 05:16:12 UTC
Red Hat Issue Tracker UPG-3462 0 None None None 2021-10-20 20:36:22 UTC

Description Matt Flusche 2021-10-20 18:51:48 UTC
Description of problem:

During the first OS boot of RHEL 8 after the leapp OS upgrade step of the 13->16.1.6 FFU, openvswitch failed to start with the following error:

error: Starting ovsdb-server ovsdb-server: /var/run/openvswitch/ovsdb-server.pid.tmp: create failed (Permission denied)

This issue was trace to the following systemd service files:

/etc/systemd/system/ovsdb-server.service
&
/etc/systemd/system/ovs-vswitchd.service

After removing these files, openvswitch could start normally.

This environment has been upgraded many times and its unclear when these files were inserted.  The theory is that a previous openvswitch rpm or tripleo deployment placed the files(environment has been upgraded since osp 8).

The upgrade should handle this situation; perhaps with a validation to protect against this failure.

Version-Release number of selected component (if applicable):
OSP 16.1.6 FFU

How reproducible:
100% with these config files

Additional info:

# cat /etc/systemd/system/ovsdb-server.service
[Unit]
Description=Open vSwitch Database Unit
After=syslog.target network-pre.target
Before=network.target network.service
Wants=ovs-delete-transient-ports.service
PartOf=openvswitch.service

[Service]
Type=forking
Restart=on-failure
EnvironmentFile=/etc/openvswitch/default.conf
EnvironmentFile=-/etc/sysconfig/openvswitch
ExecStartPre=/usr/bin/chown ${OVS_USER_ID} /var/run/openvswitch
ExecStartPre=/bin/sh -c 'rm -f /run/openvswitch/useropts; if [ "$${OVS_USER_ID/:*/}" != "root" ]; then /usr/bin/echo "OVSUSER=--ovs-user=${OVS_USER_ID}" > /run/openvswitch/useropts; fi'
EnvironmentFile=-/run/openvswitch/useropts
ExecStart=/usr/local/bin/ovs-ctl \
          --no-ovs-vswitchd --no-monitor --system-id=random \
          ${OVSUSER} \
          start $OPTIONS
ExecStop=/usr/local/bin/ovs-ctl --no-ovs-vswitchd stop
ExecReload=/usr/local/bin/ovs-ctl --no-ovs-vswitchd \
           ${OVSUSER} \
           --no-monitor restart $OPTIONS
RuntimeDirectory=openvswitch
RuntimeDirectoryMode=0755


# cat /etc/systemd/system/ovs-vswitchd.service
[Unit]
Description=Open vSwitch Forwarding Unit
After=ovsdb-server.service network-pre.target systemd-udev-settle.service
Before=network.target network.service
Requires=ovsdb-server.service
ReloadPropagatedFrom=ovsdb-server.service
AssertPathIsReadWrite=/var/run/openvswitch/db.sock
PartOf=openvswitch.service

[Service]
Type=forking
Restart=on-failure
Environment=HOME=/var/run/openvswitch
EnvironmentFile=/etc/openvswitch/default.conf
EnvironmentFile=-/etc/sysconfig/openvswitch
EnvironmentFile=-/run/openvswitch/useropts
ExecStartPre=-/bin/sh -c '/usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages'
ExecStartPre=-/usr/bin/chmod 0775 /dev/hugepages
ExecStart=/usr/local/bin/ovs-ctl \
          --no-ovsdb-server --no-monitor --system-id=random \
          ${OVSUSER} \
          start $OPTIONS
ExecStop=/usr/local/bin/ovs-ctl --no-ovsdb-server stop
ExecReload=/usr/local/bin/ovs-ctl --no-ovsdb-server \
          --no-monitor --system-id=random \
          ${OVSUSER} \
          restart $OPTIONS
TimeoutSec=300
RuntimeDirectoryMode=0775
UMask=0002

Comment 4 Jesse Pretorius 2021-11-02 09:14:08 UTC
Alex - this needs to be converted into a known issue for OSP 16.1 and 16.2. Generically speaking, in any environment that was upgraded from OSP<13 through to OSP13 there may be some /etc/systemd/system/ovs* files. If they are there, they need to be removed prior to starting the overcloud upgrade process - assuming they weren't put there by the customer on purpose. We cannot do this automatically because those overrides may be there for other purposes (they may be intentionally placed there by the customer). Having systemd service unit overrides is a perfectly valid thing to do if you know what you're doing.

Comment 9 Alex McLeod 2021-11-04 14:59:11 UTC
Flipping back to engineering and closing as WONTFIX. This is now documented, but a fix can't be implemented in the code because automating removal of files in /etc/systemd/system/ is undesirable. Docs point to this (engineering) BZ for more info.

Comment 10 ldenny 2022-06-09 05:06:08 UTC
*** Bug 2091818 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.