Bug 2059934 - PAO - TuneD parser error due to tuned 2.18.0 was shipped via FDP
Summary: PAO - TuneD parser error due to tuned 2.18.0 was shipped via FDP
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Performance Addon Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Martin Sivák
QA Contact: Gowrishankar Rajaiyan
URL:
Whiteboard:
Depends On: 2060138
Blocks: 2059847
TreeView+ depends on / blocked
 
Reported: 2022-03-02 11:12 UTC by liqcui
Modified: 2022-04-13 18:23 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: Upgrading OCP to 4.10 brings NTO and tuned with this tuned bug https://github.com/redhat-performance/tuned/issues/378 Any comment (# lorem ipsum) in tuned profile that does not start at the beginning of the line causes a parsing error. Consequence: 1) Tuning generated by PAO 4.9 will not work properly as it uses end of line comments 2) Any Tuned override based on https://access.redhat.com/solutions/5532341 with end of line comments will not work Workaround (if any): For case 1) upgrade PAO to 4.10 so it matches the OCP version and the errors will go away. For case 2) put all tuned profile comments to a standalone line, the # character must be the first character on a line. Result: Tuning of the node is properly applied and works, no tracebacks in NTO pod logs.
Clone Of:
Environment:
Last Closed: 2022-04-13 18:23:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description liqcui 2022-03-02 11:12:52 UTC
Description of problem:

The tuned pod will threw below error after create performance-patch and performanceprofie like as below:

2022-03-02 07:27:49,257 ERROR    tuned.units.manager: Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tuned/units/manager.py", line 119, in _try_call
    return f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/instance/instance.py", line 78, in apply_tuning
    self._plugin.instance_apply_tuning(self)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/base.py", line 261, in instance_apply_tuning
    self._instance_apply_static(instance)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/base.py", line 317, in _instance_apply_static
    self._execute_all_non_device_commands(instance)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/base.py", line 433, in _execute_all_non_device_commands
    self._execute_non_device_command(instance, command, new_value)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/base.py", line 514, in _execute_non_device_command
    command["set"](new_value, sim = False)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/plugin_selinux.py", line 49, in _set_avc_cache_threshold
    threshold = int(value)
ValueError: invalid literal for int() with base 10: '8192                      # Custom (atomic host)'


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy PAO from Operator Hub
2. Create Performace profile and Tuned profile like as
$ cat performance-patch.sh
oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: performance-patch
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Configuration changes profile inherited from performance created tuned
      include=openshift-node-performance-profile
      [bootloader]
      cmdline_crash=nohz_full=0,2-4
      [sysctl]
      kernel.timer_migration=1
      [service]
      service.stalld=start,enable
    name: performance-patch
  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: master
    priority: 19
    profile: performance-patch
EOF
[ocpadmin@ec2-18-217-45-133 nto]$ cat performance-profile.yaml
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  finalizers:
  - foreground-deletion
  name: profile
spec:
  additionalKernelArgs:
  - idle=poll
  cpu:
    isolated: 0,3
    reserved: 1-2
  globallyDisableIrqLoadBalancing: true
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 2
      size: 1G
  machineConfigPoolSelector:
    pools.operator.machineconfiguration.openshift.io/master: ""
  nodeSelector:
    node-role.kubernetes.io/master: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: false


Actual results:
the proformance-patch isn't applied

Expected results:
the proformance-patch applied and without error in tuned pod logs

Additional info:

oc logs tuned-nkhjt -n openshift-cluster-node-tuning-operator
I0302 07:27:41.721131    3689 controller.go:1221] starting openshift-tuned v4.10.0-202202241816.p0.g3c5760e.assembly.stream-0-gb855682-dirty
I0302 07:27:41.830129    3689 controller.go:323] disabling system tuned...
I0302 07:27:41.831650    3689 controller.go:1015] started events processors
I0302 07:27:41.831722    3689 controller.go:349] extracting TuneD profiles
I0302 07:27:41.837925    3689 controller.go:1053] started controller
I0302 07:27:45.469428    3689 controller.go:427] written "/etc/tuned/recommend.d/50-openshift.conf" to set TuneD profile performance-patch
I0302 07:27:45.804559    3689 controller.go:440] starting tuned...
2022-03-02 07:27:47,052 INFO     tuned.daemon.application: TuneD: 2.18.0, kernel: 4.18.0-305.34.2.el8_4.x86_64
2022-03-02 07:27:47,053 INFO     tuned.daemon.application: dynamic tuning is globally disabled
2022-03-02 07:27:47,182 INFO     tuned.daemon.daemon: using sleep interval of 1 second(s)
2022-03-02 07:27:47,183 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2022-03-02 07:27:47,185 INFO     tuned.daemon.daemon: Using 'performance-patch' profile
2022-03-02 07:27:47,204 INFO     tuned.profiles.loader: loading profile: performance-patch
2022-03-02 07:27:47,955 INFO     tuned.daemon.controller: starting controller
2022-03-02 07:27:47,963 INFO     tuned.daemon.daemon: starting tuning
2022-03-02 07:27:48,281 INFO     tuned.plugins.base: instance cpu: assigning devices cpu1, cpu0, cpu3, cpu2
2022-03-02 07:27:48,291 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2022-03-02 07:27:48,304 WARNING  tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias
2022-03-02 07:27:48,341 INFO     tuned.plugins.plugin_disk: Device 'nvme0n1' not supported by hdparm
2022-03-02 07:27:48,343 INFO     tuned.plugins.base: instance disk: assigning devices nvme0n1
2022-03-02 07:27:48,351 INFO     tuned.plugins.base: instance net: assigning devices ens5
2022-03-02 07:27:48,820 INFO     tuned.plugins.plugin_bootloader: cannot read '/etc/default/grub'
2022-03-02 07:27:48,832 ERROR    tuned.plugins.plugin_cpu: unable to evaluate latency value (probably wrong settings in the 'cpu' section of current profile), disabling PM QoS
2022-03-02 07:27:48,838 ERROR    tuned.plugins.plugin_sysctl: Failed to set sysctl parameter 'kernel.nmi_watchdog' to '0                       # cpu-partitioning #realtime': [Errno 524] Unknown error 524
2022-03-02 07:27:48,839 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2022-03-02 07:27:49,257 ERROR    tuned.units.manager: BUG: Unhandled exception in start_tuning: invalid literal for int() with base 10: '8192                      # Custom (atomic host)'
2022-03-02 07:27:49,257 ERROR    tuned.units.manager: Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tuned/units/manager.py", line 119, in _try_call
    return f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/instance/instance.py", line 78, in apply_tuning
    self._plugin.instance_apply_tuning(self)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/base.py", line 261, in instance_apply_tuning
    self._instance_apply_static(instance)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/base.py", line 317, in _instance_apply_static
    self._execute_all_non_device_commands(instance)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/base.py", line 433, in _execute_all_non_device_commands
    self._execute_non_device_command(instance, command, new_value)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/base.py", line 514, in _execute_non_device_command
    command["set"](new_value, sim = False)
  File "/usr/lib/python3.6/site-packages/tuned/plugins/plugin_selinux.py", line 49, in _set_avc_cache_threshold
    threshold = int(value)
ValueError: invalid literal for int() with base 10: '8192                      # Custom (atomic host)'

2022-03-02 07:27:49,268 WARNING  tuned.plugins.plugin_vm: Incorrect 'transparent_hugepages' value 'never                   #  network-latency'.
2022-03-02 07:27:49,276 INFO     tuned.plugins.plugin_systemd: setting 'CPUAffinity' to '1 2' in the '/etc/systemd/system.conf'
2022-03-02 07:27:51,183 INFO     tuned.plugins.plugin_script: calling script '/usr/lib/tuned/cpu-partitioning/script.sh' with arguments '['start']'
2022-03-02 07:27:53,095 INFO     tuned.plugins.plugin_bootloader: installing additional boot command line parameters to grub2
2022-03-02 07:27:53,095 INFO     tuned.plugins.plugin_bootloader: cannot find grub.cfg to patch
2022-03-02 07:27:53,156 INFO     tuned.daemon.daemon: static tuning from profile 'performance-patch' applied
E0302 07:27:53.157692    3689 controller.go:775] unable to sync(daemon/) requeued (0)

Comment 1 Martin Sivák 2022-03-02 11:26:14 UTC
Please provide the versions of used components, especially PAO. This is a known issue that was fixed half a year ago by https://github.com/openshift-kni/performance-addon-operators/commit/874da9e1adaabde490fd9ab58be3e8cd13c32b94

Comment 2 Martin Sivák 2022-03-02 13:55:16 UTC
This is a potential 4.9 to 4.10 blocker.

Comment 3 Jiří Mencák 2022-03-02 19:39:46 UTC
Linking TuneD BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2060138

Comment 4 liqcui 2022-03-03 05:49:34 UTC
the PAO version is 4.9.7

Comment 6 Ken Young 2022-03-07 18:10:15 UTC
Martin,

Can you help us understand the impact of this?  This is in 4.10?

Regards,
Ken Y

Comment 7 Martin Sivák 2022-03-07 18:13:35 UTC
Ken please check the doc text, it is all there.

This only affects PAO 4.9 when combined with OCP 4.10 (during cluster upgrade) or a custom Tuned override with end of line comments.

Neither clean 4.10 install nor clean 4.9 are affected.

Comment 8 Martin Sivák 2022-04-13 18:23:19 UTC
The errata https://access.redhat.com/errata/RHSA-2022:1162 ships cluster-node-tuning-operator-container-v4.10.0-202203282147.p0.g3c5760e.assembly.stream that includes tuned-2.18.0-1.1.20220317gite1045f2d.el8fdp.noarch

That is the same tuned as released in https://access.redhat.com/errata/RHBA-2022:1084 that fixed https://bugzilla.redhat.com/show_bug.cgi?id=2064605

Marking as fixed


Note You need to log in before you can comment on or make changes to this bug.