Bug 1968339

Summary: [OSP16.1.6][OVS-DPDK + SRIOV] Minor update fails on pci passthrough values
Product: Red Hat OpenStack Reporter: Maxim Babushkin <mbabushk>
Component: openstack-tripleo-heat-templatesAssignee: Rodolfo Alonso <ralonsoh>
Status: CLOSED ERRATA QA Contact: Joe H. Rahme <jhakimra>
Severity: urgent Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: ccamposr, chrisw, dvd, ekuris, fbaudin, hakhande, kfida, mburns, oblaut, ralonsoh, scohen, supadhya
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.2-1.20210609073304.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-09 20:19:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Maxim Babushkin 2021-06-07 07:42:02 UTC
Description of problem:
OSPD 16.1.6 SRIOV minor update flow fails

Version-Release number of selected component (if applicable):
OSP 16.1.6
Puddle RHOS-16.1-RHEL-8-20210506.n.1

How reproducible:
Deploy OSPD 16.1.6 with SRIOV function and perform minor update

The process of minor update of OSPD 16.1 SRIOV setup fails on derive pci passthrough whitelist values

The /var/lib/pci_passthrough_whitelist_scripts/derive_pci_passthrough_whitelist.py script fails with the following error:

Traceback (most recent call last):
  File "/var/lib/pci_passthrough_whitelist_scripts/derive_pci_passthrough_whitelist.py", line 308, in <module>
    user_configs = user_passthrough_config()
  File "/var/lib/pci_passthrough_whitelist_scripts/derive_pci_passthrough_whitelist.py", line 149, in user_passthrough_config
    return json.loads(out)
  File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.6/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ':' delimiter: line 1 column 12 (char 11)


It happens on the script method - "user_passthrough_config".
The method fetch the pci passthrough whitelist from the hiera by using the following command:
$ hiera -c /etc/hiera.yaml nova::compute::pci::passthrough
And then tries to load it into the json object.

The output of the hiera command is:
[root@computeovsdpdksriov-0 ~]# hiera -c /etc/hiera.yaml nova::compute::pci::passthrough
[{"devname"=>"enp5s0f2", "physical_network"=>"sriov-1", "trusted"=>"true"},
 {"devname"=>"enp5s0f3", "physical_network"=>"sriov-2", "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"02", "function"=>"5"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"02", "function"=>"3"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"03", "function"=>"0"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"02", "function"=>"6"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"02", "function"=>"4"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"03", "function"=>"1"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"02", "function"=>"7"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"07", "function"=>"0"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"06", "function"=>"6"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"06", "function"=>"4"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"07", "function"=>"1"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"06", "function"=>"7"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"06", "function"=>"5"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"},
 {"address"=>{"domain"=>".*", "bus"=>"82", "slot"=>"06", "function"=>"3"},
  "vendor_id"=>"0x8086",
  "product_id"=>"0x154c",
  "trusted"=>"true"}]


While right after the deployment, the hiera output looks like:
[root@computeovndpdksriov-0 ~]# hiera -c /etc/hiera.yaml nova::compute::pci::passthrough
[{"devname": "enp5s0f2", "physical_network": "sriov-1", "trusted": "true"}, {"devname": "enp5s0f3", "physical_network": "sriov-2", "trusted": "true"}, {"devname": "enp130s0f0", "physical_network": "sriov-part-1", "trusted": "true"}, {"devname": "enp130s0f1", "physical_network": "sriov-part-2", "trusted": "true"}]

With the output right after the deployment, the load into json would not fail.

So, somewhere during the process of minor update, the value of the pci passthrough config updated and cause for the failure.

Comment 1 Maxim Babushkin 2021-06-07 07:46:35 UTC
SOSreport - http://file.mad.redhat.com/~mbabushk/sosreports/bz1968339/

Comment 13 Maxim Babushkin 2021-10-10 10:39:21 UTC
The bug has been verified.
The flow is working after the fix.

Comment 22 errata-xmlrpc 2021-12-09 20:19:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762