Bug 2103978 - Failing to deploy hardware offload setup
Summary: Failing to deploy hardware offload setup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-vswitch
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ga
: 17.0
Assignee: OSP Team
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-05 12:58 UTC by Miguel Angel Nieto
Modified: 2023-09-18 04:41 UTC (History)
16 users (show)

Fixed In Version: puppet-vswitch-14.4.2-0.20220721150748.3facbb3.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-21 12:23:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 849949 0 None MERGED Do not restart ovs when updating other_config:emc-insert-inv-prob 2022-07-21 12:15:40 UTC
OpenStack gerrit 849950 0 None MERGED Do not use service resource to restart openvswitch service 2022-07-21 12:15:44 UTC
Red Hat Issue Tracker NFV-2545 0 None None None 2022-07-05 21:09:09 UTC
Red Hat Issue Tracker OSP-16245 0 None None None 2022-07-05 13:06:23 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:23:58 UTC

Description Miguel Angel Nieto 2022-07-05 12:58:47 UTC
Description of problem:

Failing to deploy hwoffload setup, the following error is reported:

2022-07-05 12:14:58.355698 | 525400ae-78b5-fb35-2c87-000000006651 |      FATAL | Create containers managed by Podman for /var/lib/tripleo-config/container-puppet-config/step_1 | computehwoffload-r740 | error={^M
    "changed": false,^M
    "invocation": {^M
        "module_args": {^M
            "concurrency": 6,^M
            "config_dir": "/var/lib/tripleo-config/container-puppet-config/step_1",^M
            "config_id": "tripleo_puppet_step1",^M
            "config_overrides": {},^M
            "config_patterns": "container-puppet-*.json",^M
            "debug": false,^M
            "log_base_path": "/var/log/containers/stdouts"^M
        }^M
    },^M
    "msg": "Failed containers: container-puppet-ovn_controller"^M


In computehwoffload-r740 I see the following error:

[root@computehwoffload-r740 stdouts]# cat container-puppet-ovn_controller.log 
2022-07-05T12:14:43.080126021+00:00 stdout F include ::tripleo::packages
2022-07-05T12:14:43.080126021+00:00 stdout F include tripleo::profile::base::neutron::agents::ovn
2022-07-05T12:14:43.080126021+00:00 stdout F 
2022-07-05T12:14:43.269214534+00:00 stdout F Running puppet
2022-07-05T12:14:43.270944478+00:00 stderr F + logger -s -t puppet-user
2022-07-05T12:14:43.283502144+00:00 stderr F + /usr/bin/puppet apply --summarize --detailed-exitcodes --color=false --modulepath=/etc/puppet/modules:/usr/share/openstack-puppet/modules --tags '"file,file_line,concat,augeas,cron,vs_config,exec"' /etc/config.pp
2022-07-05T12:14:50.035623604+00:00 stderr F <13>Jul  5 12:14:43 puppet-user: Warning: /etc/puppet/hiera.yaml: Use of 'hiera.yaml' version 3 is deprecated. It should be converted to version 5
2022-07-05T12:14:50.037551973+00:00 stderr F <13>Jul  5 12:14:50 puppet-user:    (file: /etc/puppet/hiera.yaml)
2022-07-05T12:14:50.039158745+00:00 stderr F <13>Jul  5 12:14:50 puppet-user: Warning: Undefined variable '::deploy_config_name'; 
2022-07-05T12:14:50.039969777+00:00 stderr F <13>Jul  5 12:14:50 puppet-user:    (file & line not available)
2022-07-05T12:14:50.104264665+00:00 stderr F <13>Jul  5 12:14:50 puppet-user: Warning: The function 'hiera' is deprecated in favor of using 'lookup'. See https://puppet.com/docs/puppet/7.10/deprecated_language.html
2022-07-05T12:14:50.104264665+00:00 stderr F <13>Jul  5 12:14:50 puppet-user:    (file & line not available)
2022-07-05T12:14:50.528023670+00:00 stderr F <13>Jul  5 12:14:50 puppet-user: Notice: Compiled catalog for computehwoffload-r740.redhat.local in environment production in 0.53 seconds
2022-07-05T12:14:50.618343800+00:00 stderr F <13>Jul  5 12:14:50 puppet-user: Error: Found 1 dependency cycle:
2022-07-05T12:14:50.618343800+00:00 stderr F <13>Jul  5 12:14:50 puppet-user: (Service[openvswitch] => Vs_config[other_config:hw-offload] => Service[openvswitch])\nTry the '--graph' option and opening the resulting '.dot' file in OmniGraffle or GraphViz
2022-07-05T12:14:50.627993309+00:00 stderr F <13>Jul  5 12:14:50 puppet-user: Error: Failed to apply catalog: One or more resource dependency cycles detected in graph

I will attach sos report and templates used





Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Miguel Angel Nieto 2022-07-05 12:59:52 UTC
I forgot to add puddle used: RHOS-17.0-RHEL-9-20220628.n.1

Comment 4 Miguel Angel Nieto 2022-07-13 15:13:52 UTC
I think there is a loop in puppet manifests:

This is the error:
2022-07-13T13:45:52.172386522+00:00 stderr F <13>Jul 13 13:45:52 puppet-user: Error: Found 1 dependency cycle:
2022-07-13T13:45:52.172557445+00:00 stderr F <13>Jul 13 13:45:52 puppet-user: (Service[openvswitch] => Vs_config[other_config:hw-offload] => Service[openvswitch])\nTry the '--graph' option and opening the resulting '.dot' file in OmniGraffle or GraphViz
2022-07-13T13:45:52.175890118+00:00 stderr F <13>Jul 13 13:45:52 puppet-user: Error: Failed to apply catalog: One or more resource dependency cycles detected in graph


So Service[openvswitch] deppends on Vs_config[other_config:hw-offload] and Vs_config[other_config:hw-offload] deppends on Service[openvswitch]

We can see it in the code:
/usr/share/openstack-tripleo-heat-templates/deployment/ovn/ovn-controller-container-puppet.yaml
  # Merging role-specific parameters (RoleParameters) with the default parameters.
  # RoleParameters will have the precedence over the default parameters.
  RoleParametersValue:
    type: OS::Heat::Value
    properties:
      type: json
      value:
        map_replace:
          - map_replace:
            - ovn::controller::ovn_bridge_mappings: NeutronBridgeMappings
              ovn::controller::ovn_cms_options:
                if:
                  - az_ovn_unset
                  - OVNCMSOptions
                  - list_join:
                    - ''
                    - - OVNCMSOptions
                      - ",availability-zones="
                      - {get_param: OVNAvailabilityZone}
              vswitch::ovs::enable_hw_offload: OvsHwOffload
            - values: {get_param: [RoleParameters]}
          - values:
              NeutronBridgeMappings: {get_param: NeutronBridgeMappings}
              OVNCMSOptions: {get_param: OVNCMSOptions}
              OvsHwOffload: {get_param: OvsHwOffload}


/usr/share/openstack-puppet/modules/vswitch/manifests/ovs.pp
  if $enable_hw_offload {
    vs_config { 'other_config:hw-offload':
      value  => 'true',
      notify => Service['openvswitch'],
      wait   => true,
    }
  }

Comment 6 Miguel Angel Nieto 2022-07-13 16:03:02 UTC
Modified container-puppet.sh and added options -d -v --graph
/usr/bin/puppet apply --summarize -d -v --graph

Checking graph and debug files: last_run_report.yaml

There is a problem between lines 89 and 112
resource_statuses:
  Service[openvswitch]:
    title: openvswitch
    file: "/etc/puppet/modules/vswitch/manifests/ovs.pp"
    line: 112
    resource: Service[openvswitch]
    resource_type: Service
    ....
      message: resource is part of a dependency cycle
      name: resource_error

  Vs_config[other_config:hw-offload]:
    title: other_config:hw-offload
    file: "/etc/puppet/modules/vswitch/manifests/ovs.pp"
    line: 89
    resource: Vs_config[other_config:hw-offload]
    resource_type: Vs_config
    ....
      message: resource is part of a dependency cycle
      name: resource_error

vi /etc/puppet/modules/vswitch/manifests/ovs.pp

     88   if $enable_hw_offload {
     89     vs_config { 'other_config:hw-offload':
     90       value  => 'true',
     91       notify => Service['openvswitch'],
     92       wait   => true,
     93     }
     94   }

    112   service { 'openvswitch':
    113     ensure    => true,
    114     enable    => true,
    115     name      => $::vswitch::params::ovs_service_name,
    116     status    => $::vswitch::params::ovs_status,
    117     hasstatus => $::vswitch::params::ovs_service_hasstatus
    118   }

Comment 9 Miguel Angel Nieto 2022-07-14 11:48:13 UTC
I have tried to fix the cycle and commenting line 91 the container is able to start and offload is configured, I think there is an issue in puppet.

vi /etc/puppet/modules/vswitch/manifests/ovs.pp

     88   if $enable_hw_offload {
     89     vs_config { 'other_config:hw-offload':
     90       value  => 'true',
     91   #    notify => Service['openvswitch'],
     92       wait   => true,
     93     }
     94   }

    112   service { 'openvswitch':
    113     ensure    => true,
    114     enable    => true,
    115     name      => $::vswitch::params::ovs_service_name,
    116     status    => $::vswitch::params::ovs_status,
    117     hasstatus => $::vswitch::params::ovs_service_hasstatus
    118   }

I have seen than in master it is done in a different way and notify is not there any more. Could it be that our puppet files are outdated?
https://github.com/openstack/puppet-vswitch/blob/master/manifests/ovs.pp

Comment 10 Miguel Angel Nieto 2022-07-14 13:52:05 UTC
Version installed is puppet-vswitch-14.4.2-0.20220317212602.3facbb3.el9ost.noarch

Replacing notify by require also works. I do not understand why it is notify

     88   if $enable_hw_offload {
     89     vs_config { 'other_config:hw-offload':
     90       value  => 'true',
     91       require => Service['openvswitch'],
     92       wait   => true,
     93     }
     94   }

Comment 12 Takashi Kajinami 2022-07-15 08:02:11 UTC
I've went through stable/wallaby code but could not find out the actual trigger.

The issue was reported in upstream and was fixed in xena and later.
 https://review.opendev.org/c/openstack/puppet-vswitch/+/805549

At that time we regarded the issue as a regression caused by the ordering added by
https://review.opendev.org/c/openstack/puppet-vswitch/+/805549  so did not fix it in wallaby.

However puppet-ovn has that problematic ordering in it here
 https://github.com/openstack/puppet-ovn/blob/stable/wallaby/manifests/controller.pp#L223
and that is triggering the problem it seems.

> Replacing notify by require also works. I do not understand why it is notify
We need to notify the service here because changing hw-oflload requires restarting the openvswitch service.

Comment 13 Takashi Kajinami 2022-07-15 08:18:09 UTC
> I've went through stable/wallaby code but could not find out the actual trigger.
Ignore this first line. I later found out the problem triggered by implementation in puppet-ovn.

So I've submitted two patches to stable/wallaby. These replaces notification form Vs_config to Service
by the new Exec resource to trigger the restart command directly so should solve that dependency problem.

 https://review.opendev.org/c/openstack/puppet-vswitch/+/849949
 https://review.opendev.org/c/openstack/puppet-vswitch/+/849950

Comment 14 Miguel Angel Nieto 2022-07-15 13:48:05 UTC
Hi

I applied those patches to compute and controller nodes and it deployed sucessfully

There are some files that were not patched because they are not in the installation. why are they missing?
spec/classes/vswitch_dpdk_spec.rb
spec/classes/vswitch_ovs_spec.rb

Comment 15 Takashi Kajinami 2022-07-18 11:17:01 UTC
(In reply to Miguel Angel Nieto from comment #14)
> Hi
> 
> I applied those patches to compute and controller nodes and it deployed
> sucessfully
> 
> There are some files that were not patched because they are not in the
> installation. why are they missing?
> spec/classes/vswitch_dpdk_spec.rb
> spec/classes/vswitch_ovs_spec.rb

These are files for unit tests so are not installed or used in actual deployments.

Comment 16 Takashi Kajinami 2022-07-18 11:26:57 UTC
(In reply to Takashi Kajinami from comment #12)
> ...
> > Replacing notify by require also works. I do not understand why it is notify
> We need to notify the service here because changing hw-oflload requires
> restarting the openvswitch service.

Hmm... Looking at a few web articles it is mentioned that openvswitch service should
be restart after setting other_config:hw-offload = true.

example.
https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html#create-compute-virtual-functions
~~~
3. Restart Open vSwitch

# sudo systemctl enable openvswitch.service
# sudo ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
# sudo systemctl restart openvswitch.service
~~~

However in current TripleO, we do not enable the service resource type (and I don't
think we can as we run puppet from containers) so openvswitch service is NOT restarted
after other_config:hw-offload=true is set (*1)

(*1)
The current failure is caused by the defined resources but that does not necessarily
mean all resources are executed, as we explicitly select enabled resources by tags.

I'm not quite familiar with this area but is that expected ? It was earlier mentioned
this was tested in OSP16 so I assume the current implementation worked without problems.
If we don't need service restart then we'd be able to delete that notification.

Comment 17 Takashi Kajinami 2022-07-18 15:25:32 UTC
Just for records.

In stable/train we use puppet-ovn to set the hw-offload option. This was later deprecated
in favor of the capability we added to puppet-vswitch.
 https://review.opendev.org/c/openstack/puppet-vswitch/+/779802/
 https://review.opendev.org/c/openstack/puppet-ovn/+/779804

The old implementation in puppet-ovn did not notify the openvswitch service so did not
cause this problem.

Comment 27 errata-xmlrpc 2022-09-21 12:23:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543

Comment 28 Red Hat Bugzilla 2023-09-18 04:41:12 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.