Bug 1469170 - Corosync should set priority when set of RR scheduler fails
Corosync should set priority when set of RR scheduler fails
Status: VERIFIED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: corosync (Show other bugs)
7.5
All All
unspecified Severity low
: rc
: ---
Assigned To: Jan Friesse
cluster-qe@redhat.com
:
Depends On:
Blocks: 1476214
  Show dependency treegraph
 
Reported: 2017-07-10 10:50 EDT by Jan Friesse
Modified: 2018-01-16 05:32 EST (History)
3 users (show)

See Also:
Fixed In Version: corosync-2.4.0-10.el7
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch (6.29 KB, patch)
2017-07-10 10:50 EDT, Jan Friesse
no flags Details | Diff

  None (edit)
Description Jan Friesse 2017-07-10 10:50:08 EDT
Created attachment 1295850 [details]
Proposed patch

Description of problem:
When for some reason sched_setscheduler fails to set RR scheduler corosync continues without changed priority (so like a standard process). This is not optimal because coorsync has near realtime requirements.

We cannot solve sched_setscheduler failure but we can at least set change nice value so corosync gets some advantage over other processes.

Version-Release number of selected component (if applicable):
Every

How reproducible:
100%

Steps to Reproduce:
Force sched_setscheduler to fail

Actual results:
Nice value is unchanged

Expected results:
Nice value is set to lowest possible value for SCHED_OTHER

Additional info:
"Unit test"
https://github.com/corosync/corosync/pull/228#issuecomment-313723620
Comment 2 Jan Friesse 2017-07-28 11:46:30 EDT
"Unit test" may be invalid when https://bugzilla.redhat.com/show_bug.cgi?id=1476214 is also merged. Solution is to use -R so corosync doesn't try to move itself into root cgroup.
Comment 5 michal novacek 2018-01-16 05:32:07 EST
I have verified that NICE value is set to -20 if RTPRIO cannot be set with corosync-2.4.3-1.el7.x86_64.

----

Common part
===========

> create new rt group
[root@virt-426 ~]# cgcreate -g cpu:test
[root@virt-426 ~]# cgset -r cpu.rt_runtime_us=100000 test
[root@virt-426 ~]# cgget -r cpu.rt_runtime_us test
test:
cpu.rt_runtime_us: 100000


> check that corosync start in the correct group
[root@virt-426 ~]# cgexec -g cpu:test  corosync
notice  [MAIN  ] Corosync Cluster Engine ('2.4.3'): started and ready to provide service.
info    [MAIN  ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp libcgroup pie relro bindnow
notice  [MAIN  ] Corosync sucesfully moved to root cgroup

[root@virt-426 ~]# ps -T  -O cls,rtprio,pri,ni $(pidof corosync)
  PID CLS RTPRIO PRI  NI S TTY          TIME COMMAND
20334  RR     99 139   - S ?        00:00:05 corosync
20334  RR     99 139   - S ?        00:00:00 corosync

> kill corosync process
[root@virt-426 x86_64]# killall corosync
[root@virt-426 x86_64]# killall corosync
corosync: no process found

Before the patch (corosync-2.4.0-9.el7.x86_64)
==============================================

[root@virt-426 x86_64]# cgset  -r cpu.rt_runtime_us=0 test
[root@virt-426 x86_64]# cgget  -r cpu.rt_runtime_us test
test:
cpu.rt_runtime_us: 0

> run corosync in the changed group
[root@virt-426 x86_64]# cgexec -g cpu:test  corosync
notice  [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide service.
info    [MAIN  ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp pie relro bindnow

> RTPRIO not set and NICE not set
[root@virt-426 x86_64]# ps -T -O cls,rtprio,pri,ni $(pidof corosync)
  PID CLS RTPRIO PRI  NI S TTY          TIME COMMAND
 2463  TS      -  19   0 S ?        00:00:00 corosync
 2463  TS      -  19   0 S ?        00:00:00 corosync


After the patch (corosync-2.4.3-1.el7.x86_64)
=============================================

> use "-R" so corosync does not try to move to root group
[root@virt-426 ~]# cgexec -g cpu:test corosync -R
notice  [MAIN  ] Corosync Cluster Engine ('2.4.3'): started and ready to provide service.
info    [MAIN  ] Corosync built-in features: dbus systemd xmlconf qdevices qnetd snmp libcgroup pie relro bindnow
warning [MAIN  ] Could not set SCHED_RR at priority 99: Operation not permitted (1)

> RTPRIO not set but NICE value of corosync changed to -20
[root@virt-426 ~]# ps -T -O cls,rtprio,pri,ni $(pidof corosync)
  PID CLS RTPRIO PRI  NI S TTY          TIME COMMAND
 2013  TS      -  39 -20 S ?        00:00:00 corosync -R
 2013  TS      -  39 -20 S ?        00:00:00 corosync -R

Note You need to log in before you can comment on or make changes to this bug.