Bug 1956338 - Getting error "ovs-appctl: cannot read pidfile \"/var/run/openvswitch/ovs-vswitchd.pid\" (No such file or directory)" while upgrading ceph storage node.
Summary: Getting error "ovs-appctl: cannot read pidfile \"/var/run/openvswitch/ovs-vsw...
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Assaf Muller
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-03 13:51 UTC by rbsshasha
Modified: 2021-05-12 16:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
nohup output file is also attached to see logs in more details. (423.93 KB, text/plain)
2021-05-03 13:51 UTC, rbsshasha
no flags Details

Description rbsshasha 2021-05-03 13:51:36 UTC
Created attachment 1779004 [details]
nohup output file is also attached to see logs in more details.

Description of problem: After executing upgrade command with no tags on ceph nodes network config files vanished from ceph node.


Version-Release number of selected component (if applicable):


How reproducible:
During RHOSP13 to 16 FFU ceph node upgradation fails because of openvswitch in inactive state due to ceph node upgrdation getting failed.

Steps to Reproduce:
We have executed below command for ceph upgradation
nohup openstack overcloud upgrade run --stack overcloud --limit overcloud-cephstorage-0 -y &

Ceph node upgradation failed because of below error:
os_net_config.ConfigurationError: Failure(s) occurred when applying configuration

Below is the ip a output of ceph node:
[root@overcloud-cephstorage-0 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 14:18:77:43:ab:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.221/24 brd 192.168.100.255 scope global dynamic em1
       valid_lft 81907sec preferred_lft 81907sec
    inet6 fe80::1618:77ff:fe43:abc0/64 scope link
       valid_lft forever preferred_lft forever
3: em2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 14:18:77:43:ab:c1 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1618:77ff:fe43:abc1/64 scope link
       valid_lft forever preferred_lft forever
4: em3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 14:18:77:43:ab:c2 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1618:77ff:fe43:abc2/64 scope link
       valid_lft forever preferred_lft forever
5: em4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 14:18:77:43:ab:c3 brd ff:ff:ff:ff:ff:ff
6: p1p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a0:36:9f:ec:4c:44 brd ff:ff:ff:ff:ff:ff
7: p1p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a0:36:9f:ec:4c:46 brd ff:ff:ff:ff:ff:ff
8: p4p1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a0:36:9f:d3:ce:48 brd ff:ff:ff:ff:ff:ff
9: p4p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether a0:36:9f:d3:ce:4a brd ff:ff:ff:ff:ff:ff

Below is ip r output of effected node:
[root@overcloud-cephstorage-0 ~]# ip r
default via 192.168.100.34 dev em1
192.168.100.0/24 dev em1 proto kernel scope link src 192.168.100.221


Actual results:
Upgradation was failing because ovs-vswitchd service was in inactive state during ceph node upgradation because of this ceph node upgradation was failing because of networking.
[root@overcloud-cephstorage-0 ~]# systemctl list-unit-files|grep -i ovs
ovs-delete-transient-ports.service                               static
ovs-vswitchd.service                                             static
ovsdb-server.service                                             static
[root@overcloud-cephstorage-0 ~]# systemctl status ovs-vswitchd.service
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled)
   Active: inactive (dead)


After activating ovs service in ceph node upgradation successfully completed.

[root@overcloud-cephstorage-0 ~]# systemctl status ovs-vswitchd.service
● ovs-vswitchd.service - Open vSwitch Forwarding Unit
   Loaded: loaded (/usr/lib/systemd/system/ovs-vswitchd.service; static; vendor preset: disabled)
   Active: active (running) since Thu 2021-04-29 15:09:47 UTC; 17s ago
  Process: 96792 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server --no-monitor --system-id=random ${OVS_USER_OPT} s>
  Process: 96789 ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages (code=exited, status=0/SUCCESS)
  Process: 96786 ExecStartPre=/bin/sh -c /usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages (code=exited, status=0/SUCCESS)
 Main PID: 96844 (ovs-vswitchd)
    Tasks: 1 (limit: 822668)
   Memory: 20.4M
   CGroup: /system.slice/ovs-vswitchd.service
           └─96844 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --user openvswit>

Apr 29 15:09:47 overcloud-cephstorage-0 systemd[1]: Starting Open vSwitch Forwarding Unit...
Apr 29 15:09:47 overcloud-cephstorage-0 ovs-ctl[96792]: Inserting openvswitch module [  OK  ]
Apr 29 15:09:47 overcloud-cephstorage-0 ovs-ctl[96792]: Starting ovs-vswitchd [  OK  ]
Apr 29 15:09:47 overcloud-cephstorage-0 ovs-vsctl[96851]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait add Open_vSwitch . exter>
Apr 29 15:09:47 overcloud-cephstorage-0 ovs-ctl[96792]: Enabling remote OVSDB managers [  OK  ]
Apr 29 15:09:47 overcloud-cephstorage-0 systemd[1]: Started Open vSwitch Forwarding Unit.

Unset noout flag ------------------------------------------------------- 14.97s
tripleo-podman : Purge /var/lib/docker ---------------------------------- 8.87s
tripleo-podman : Uninstall Docker rpm ----------------------------------- 5.46s
Gathering Facts --------------------------------------------------------- 3.32s
Render all_nodes data as group_vars for overcloud ----------------------- 2.53s
Gathering Facts --------------------------------------------------------- 2.04s
tripleo-podman : Check docker service state ----------------------------- 1.38s
tripleo-podman : Check if docker has some data -------------------------- 0.96s
tripleo-podman : Refresh hardware facts --------------------------------- 0.90s
tripleo-podman : Clean podman images ------------------------------------ 0.38s
tripleo-podman : Clean podman images ------------------------------------ 0.38s
include_tasks ----------------------------------------------------------- 0.35s
tripleo-podman : Clean podman volumes ----------------------------------- 0.33s
include_tasks ----------------------------------------------------------- 0.33s
Stop docker ------------------------------------------------------------- 0.33s
Purge everything about docker on the host ------------------------------- 0.28s
Unset noout flag -------------------------------------------------------- 0.27s
include_tasks ----------------------------------------------------------- 0.27s
include_tasks ----------------------------------------------------------- 0.25s
Stop docker ------------------------------------------------------------- 0.23s

Updated nodes - overcloud-cephstorage-0
Success
2021-04-29 21:05:17.529 528361 INFO tripleoclient.v1.overcloud_upgrade.UpgradeRun [-] Completed Overcloud Upgrade Run for overcloud-cephstorage-0 with playbooks ['upgrade_steps_playbook.yaml', 'deploy_steps_playbook.yaml', 'post_upgrade_steps_playbook.yaml']
2021-04-29 21:05:17.535 528361 INFO osc_lib.shell [-] END return value: None
 

I am getting same error for compute node as well, I just wanted to know if there is any mechanism that openvswitch service will be in active state without manual intervention so that rhosp upgradation will not get hamper in this state. 
 
Expected results:
openvswitch service should be in active state during upgradation. 

Additional info:
RHEL Version: Red Hat Enterprise Linux release 8.2 (Ootpa)
RHOSP Version: 16.2

Below is the upgrade prepare command used during upgradation:
nohup openstack overcloud upgrade prepare  
--templates /home/stack/openstack-tripleo-heat-templates-rendered_16 
-r /home/stack/templates/roles_data.yaml 
-n /home/stack/templates/network_data.yaml 
-e /home/stack/containers-prepare-parameter.yaml 
-e /home/stack/templates/upgrades-environment.yaml 
-e /home/stack/templates/rhsm.yml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/network-isolation.yaml 
-e /home/stack/templates/network-environment.yaml 
-e /home/stack/templates/node-info.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/services/neutron-sriov.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/services/neutron-ovs.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/ceph-ansible/ceph-ansible.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/cinder-backup.yaml 
-e /home/stack/templates/storage-config.yaml 
-e /home/stack/openstack-tripleo-heat-templates-rendered_16/environments/host-config-and-reboot.yaml 
--libvirt-type kvm  
--ntp-server pool.ntp.org -v -y &

I am using below redhat document for rhosp13 to 16.1 upgradation.
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index

Comment 1 rbsshasha 2021-05-11 08:44:59 UTC
Hi OVS team,
Please suggest, I am waiting for your suggestions, do let me know if you need any other information.


Note You need to log in before you can comment on or make changes to this bug.