Bug 1528368

Summary: tuned fails to verify the value of cpumask when it has more than 32 cores
Product: Red Hat Enterprise Linux 7 Reporter: Sergio Lopez <slopezpa>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Dominik Rehák <drehak>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.4CC: drehak, jeder, jskarvad, olysonek, thozza
Target Milestone: rcKeywords: Patch, Upstream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tuned-2.10.0-0.1.rc1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-30 10:48:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1546815, 1549614    

Description Sergio Lopez 2017-12-21 16:03:02 UTC
Description of problem:

On servers with more than 32 cores, with non-isolated cores over the 32 mark, tuned verification may fail like this:

# tuned-adm verify
Verification failed, current system settings differ from the preset profile.
You can mostly fix this by Tuned restart, e.g.:
  service tuned restart
Sometimes (if some plugins like bootloader are used) also reboot is required.
See tuned log file ('/var/log/tuned/tuned.log') for details.

# grep ERROR /var/log/tuned/tuned.log
2017-12-21 18:34:20,716 ERROR    tuned.plugins.base: verify: failed: '/sys/devices/virtual/workqueue/cpumask' = 'ffffffff,80808000', expected 'ffffffff,80808000'
2017-12-21 18:34:20,716 ERROR    tuned.plugins.base: verify: failed: '/sys/bus/workqueue/devices/writeback/cpumask' = 'ffffffff,80808000', expected 'ffffffff,80808000'


Version-Release number of selected component (if applicable):

tuned-profiles-cpu-partitioning-2.8.0-5.el7.noarch
tuned-2.8.0-5.el7.noarch


How reproducible:

Always.


Additional info:

The problem lies in this section:

tuned/plugins/base.py:
    495         def _verify_value(self, name, new_value, current_value, ignore_missing, device = None):
    496                 if new_value is None:
    497                         return None
    498                 ret = False
    499                 if current_value is None and ignore_missing:
    500                         if device is None:
    501                                 log.info(consts.STR_VERIFY_PROFILE_VALUE_MISSING % name)
    502                         else:
    503                                 log.info(consts.STR_VERIFY_PROFILE_DEVICE_VALUE_MISSING % (device,         name))
    504                         return True
    505 
    506                 if current_value is not None:
    507                         current_value = self._norm_value(current_value)
    508                         new_value = self._norm_value(new_value)
    509                         try:
    510                                 ret = int(new_value) == int(current_value)
    511                         except ValueError:
    512                                 try:
    513                                         ret = int(new_value, 16) == int(current_value, 16)
    514                                 except ValueError:
    515                                         ret = str(new_value) == str(current_value)

When the server has more than 32 cores, cpumask is split into two 32 bit values separated by a comma, like "00000000,00000000".

This possibility is not contemplated by the above code, and this is what happens:

 - Line 510: Cast to int fails because there's a comma.
 - Line 513: Cast to int from a limited string fails because the comma is present in the first 16 characters.
 - Line 515: String comparison fails because the value read from sysfs contains a new line ("\n").

Comment 12 errata-xmlrpc 2018-10-30 10:48:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3172