Bug 1476214
Summary: | running docker containers prevents processes to use real-time scheduling when restarted | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Chris Jones <chjones> | ||||||||
Component: | corosync | Assignee: | Jan Friesse <jfriesse> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Marian Krcmarik <mkrcmari> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 7.4 | CC: | amurdaca, atomic-bugs, bhu, ccaulfie, cfeist, chjones, cluster-maint, dciabrin, djansa, dwalsh, fdeutsch, fdinitto, hhuang, igkioka, imcleod, jeckersb, jfriesse, jhonce, jpokorny, knoel, lars, lsm5, mpatel, pkomarov, rscarazz, salmy, sasha, ushkalim | ||||||||
Target Milestone: | rc | Keywords: | Triaged, ZStream | ||||||||
Target Release: | --- | Flags: | igkioka:
needinfo-
|
||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | corosync-2.4.0-10.el7 | Doc Type: | Bug Fix | ||||||||
Doc Text: |
Previously, when the corosync service was started or restarted after systemd had enabled CPU Accounting, corosync was not able to run with Real Time (RT) scheduling priority, which could reduce the stability of the High Availability (HA) cluster. This update moves corosync to the root CPU cgroup by default, and now corosync can run with Real Time priority, as expected.
|
Story Points: | --- | ||||||||
Clone Of: | 1467919 | ||||||||||
: | 1477461 (view as bug list) | Environment: | |||||||||
Last Closed: | 2018-04-10 16:52:19 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1467919, 1469170 | ||||||||||
Bug Blocks: | 1415556, 1477461 | ||||||||||
Attachments: |
|
Comment 2
Jan Friesse
2017-07-28 15:20:54 UTC
Created attachment 1305986 [details] Proposed patch main: Add support for libcgroup When corosync is started in environment where it ends in cgroup without properly set rt_runtime_us it's impossible to get RT priority. Already implemented workaround is to use higher non-RT priority. This patch implements another solution. It moves corosync into root cpu cgroup. Root cpu cgroup hopefully has enough RT budget. Another solution was mentioned on ML https://lists.freedesktop.org/archives/systemd-devel/2017-July/039353.html but this means to generate some "random" values. What I've tested ("Unit test"): - Install httpd - copy httpd.service into /etc and add "CPUAccounting=True" line to [service] section - systemctl daemon-reload - service httpd restart - service corosync restart (Before patch): - corosync should have standard priority (no RT) - Install updated corosync - service corosync restart - corosync should have RT priority Scratch build tested in two different scenarios (by poki and rasca) and worked in both :) @Jan I tested the new package and it does what we need. With the corosync shipped in osp12: [root@overcloud-controller-0 ~]# rpm -qa corosync corosync-2.4.0-9.el7.x86_64 [root@overcloud-controller-0 ~]# ps -eo pid,class,rtprio,command --sort=+class | grep [c]orosync 20229 RR 99 corosync [root@overcloud-controller-0 ~]# systemctl restart corosync [root@overcloud-controller-0 ~]# ps -eo pid,class,rtprio,command --sort=+class | grep [c]orosync 191635 TS - corosync So scheduler was changed. With the new corosync package: [root@overcloud-controller-1 ~]# rpm -Uvh /home/heat-admin/corosync* Preparing... ################################# [100%] Updating / installing... 1:corosynclib-2.4.0-9.el7.jf1 ################################# [ 25%] 2:corosync-2.4.0-9.el7.jf1 ################################# [ 50%] Cleaning up / removing... 3:corosynclib-2.4.0-9.el7 ################################# [ 75%] 4:corosync-2.4.0-9.el7 ################################# [100%] [root@overcloud-controller-1 ~]# ps -eo pid,class,rtprio,command --sort=+class | grep [c]orosync 19985 RR 99 corosync [root@overcloud-controller-1 ~]# systemctl restart corosync [root@overcloud-controller-1 ~]# ps -eo pid,class,rtprio,command --sort=+class | grep [c]orosync 11204 RR 99 corosync So scheduler have been kept. Created attachment 1307550 [details]
Proposed patch v2 - upstream
Created attachment 1307551 [details]
Proposed patch v2 - upstream
Can I suggest that the doctext be: Previously, if corosync was started (or restarted) after systemd had enabled CPU Accounting, corosync would not be able to run with Real Time scheduling priority, which could reduce the stability of the High Availability (HA) cluster. This update moves corosync to the root CPU cgroup by default, allowing it to obtain Real Time priority. Yep, Chris description sounds much better. Verified, #rpm -qa|grep corosync corosync-2.4.0-10.el7.x86_64 corosynclib-2.4.0-10.el7.x86_64 # systemctl is-active docker active # docker pull centos Using default tag: latest Trying to pull repository registry.access.redhat.com/centos ... Trying to pull repository docker.io/library/centos ... latest: Pulling from docker.io/library/centos af4b0a2388c6: Pull complete Digest: sha256:2671f7a3eea36ce43609e9fe7435ade83094291055f1c96d9d1d1d7c0b986a5d # docker run -it centos /bin/true # pcs cluster stop Stopping Cluster (pacemaker)... Stopping Cluster (corosync)... # pcs cluster start Starting Cluster... # chrt -p $(pidof corosync) pid 461983's current scheduling policy: SCHED_RR pid 461983's current scheduling priority: 99 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0920 |