Bug 1469170

Summary: Corosync should set priority when set of RR scheduler fails
Product: Red Hat Enterprise Linux 7 Reporter: Jan Friesse <jfriesse>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: low Docs Contact:
Priority: unspecified    
Version: 7.5CC: ccaulfie, cluster-maint, mnovacek, pzimek
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: corosync-2.4.0-10.el7 Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 16:52:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1476214    
Attachments:
Description Flags
Proposed patch none

Description Jan Friesse 2017-07-10 14:50:08 UTC
Created attachment 1295850 [details]
Proposed patch

Description of problem:
When for some reason sched_setscheduler fails to set RR scheduler corosync continues without changed priority (so like a standard process). This is not optimal because coorsync has near realtime requirements.

We cannot solve sched_setscheduler failure but we can at least set change nice value so corosync gets some advantage over other processes.

Version-Release number of selected component (if applicable):
Every

How reproducible:
100%

Steps to Reproduce:
Force sched_setscheduler to fail

Actual results:
Nice value is unchanged

Expected results:
Nice value is set to lowest possible value for SCHED_OTHER

Additional info:
"Unit test"
https://github.com/corosync/corosync/pull/228#issuecomment-313723620

Comment 2 Jan Friesse 2017-07-28 15:46:30 UTC
"Unit test" may be invalid when https://bugzilla.redhat.com/show_bug.cgi?id=1476214 is also merged. Solution is to use -R so corosync doesn't try to move itself into root cgroup.

Comment 5 michal novacek 2018-01-16 10:32:07 UTC
I have verified that NICE value is set to -20 if RTPRIO cannot be set with corosync-2.4.3-1.el7.x86_64.

----

Common part
===========

> create new rt group
[root@virt-426 ~]# cgcreate -g cpu:test
[root@virt-426 ~]# cgset -r cpu.rt_runtime_us=100000 test
[root@virt-426 ~]# cgget -r cpu.rt_runtime_us test
test:
cpu.rt_runtime_us: 100000


> check that corosync start in the correct group
[root@virt-426 ~]# cgexec -g cpu:test  corosync
notice  [MAIN  ] Corosync Cluster Engine ('2.4.3'): started and ready to provide service.
info    [MAIN  ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp libcgroup pie relro bindnow
notice  [MAIN  ] Corosync sucesfully moved to root cgroup

[root@virt-426 ~]# ps -T  -O cls,rtprio,pri,ni $(pidof corosync)
  PID CLS RTPRIO PRI  NI S TTY          TIME COMMAND
20334  RR     99 139   - S ?        00:00:05 corosync
20334  RR     99 139   - S ?        00:00:00 corosync

> kill corosync process
[root@virt-426 x86_64]# killall corosync
[root@virt-426 x86_64]# killall corosync
corosync: no process found

Before the patch (corosync-2.4.0-9.el7.x86_64)
==============================================

[root@virt-426 x86_64]# cgset  -r cpu.rt_runtime_us=0 test
[root@virt-426 x86_64]# cgget  -r cpu.rt_runtime_us test
test:
cpu.rt_runtime_us: 0

> run corosync in the changed group
[root@virt-426 x86_64]# cgexec -g cpu:test  corosync
notice  [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide service.
info    [MAIN  ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp pie relro bindnow

> RTPRIO not set and NICE not set
[root@virt-426 x86_64]# ps -T -O cls,rtprio,pri,ni $(pidof corosync)
  PID CLS RTPRIO PRI  NI S TTY          TIME COMMAND
 2463  TS      -  19   0 S ?        00:00:00 corosync
 2463  TS      -  19   0 S ?        00:00:00 corosync


After the patch (corosync-2.4.3-1.el7.x86_64)
=============================================

> use "-R" so corosync does not try to move to root group
[root@virt-426 ~]# cgexec -g cpu:test corosync -R
notice  [MAIN  ] Corosync Cluster Engine ('2.4.3'): started and ready to provide service.
info    [MAIN  ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp libcgroup pie relro bindnow
warning [MAIN  ] Could not set SCHED_RR at priority 99: Operation not permitted (1)

> RTPRIO not set but NICE value of corosync changed to -20
[root@virt-426 ~]# ps -T -O cls,rtprio,pri,ni $(pidof corosync)
  PID CLS RTPRIO PRI  NI S TTY          TIME COMMAND
 2013  TS      -  39 -20 S ?        00:00:00 corosync -R
 2013  TS      -  39 -20 S ?        00:00:00 corosync -R

Comment 8 errata-xmlrpc 2018-04-10 16:52:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0920