Description of problem: # ctdbd -d10 # cat /var/log/log.ctdb 2015/03/13 17:40:15.393068 [ 2573]: CTDB starting on node 2015/03/13 17:40:15.393144 [ 2573]: Recovery lock file set to "". Disabling recovery lock checking 2015/03/13 17:40:15.400617 [ 2574]: Starting CTDBD (Version 4.2.0rc3) as PID: 2574 2015/03/13 17:40:15.400655 [ 2574]: Unable to set scheduler to SCHED_FIFO (Operation not permitted) 2015/03/13 17:40:15.400667 [ 2574]: CTDB daemon shutting down # chrt -f 55 ctdbd chrt: failed to set pid 0's policy: Operation not permitted added "CPUSchedulingPolicy=fifo" to systemd unit and get this: systemd[2584]: Failed at step SETSCHEDULER spawning /usr/sbin/ctdbd_wrapper: Operation not permitted
This should work if you disable selinux or set it to permissive. We'd need to define a specific selinux rule for ctdb, but please verify first by testing with disabled or permissive. Michael
(In reply to Michael Adam from comment #1) > This should work if you disable selinux or set it to permissive. > > We'd need to define a specific selinux rule for ctdb, but please verify > first by testing with disabled or permissive. > > Michael SELinux is already set to Permissive :(
(In reply to Paul Rawson from comment #2) > (In reply to Michael Adam from comment #1) > > This should work if you disable selinux or set it to permissive. > > > > We'd need to define a specific selinux rule for ctdb, but please verify > > first by testing with disabled or permissive. > > > > Michael > > SELinux is already set to Permissive :( Sorry to be insisting: Did you verifiy with running "getenforce" that it is really permissive in the current runtime and not only set in the config file? Other than that I need to recreate a setup. What puzzles me is that you seem to be running ctdbd manually. Could you try running it via the systemctl cmd? 'systemctl start ctdbd'
(In reply to Michael Adam from comment #3) > (In reply to Paul Rawson from comment #2) > > (In reply to Michael Adam from comment #1) > > > This should work if you disable selinux or set it to permissive. > > > > > > We'd need to define a specific selinux rule for ctdb, but please verify > > > first by testing with disabled or permissive. > > > > > > Michael > > > > SELinux is already set to Permissive :( > > Sorry to be insisting: > > Did you verifiy with running "getenforce" that it is really > permissive in the current runtime and not only set in the > config file? > > Other than that I need to recreate a setup. > > What puzzles me is that you seem to be running ctdbd > manually. Could you try running it via the systemctl cmd? > 'systemctl start ctdbd' Yes, I've verified using getenforce. Just for grins, I've also tried it in Disabled mode. Obviously, that didn't make a difference. I've only been running it by hand for debug. As you can see from "systemd[2584]: Failed at step SETSCHEDULER spawning /usr/sbin/ctdbd_wrapper: Operation not permitted", I've also tried starting with systemd I have an identical setup in F21 that works without issue.
With Fedora 22 (final) and selinux completely disabled, I was able to start ctdb. It did not start in neither permissive nor enforcing mode.
It worked for me in all selinux modes lately. Can this be confirmed?
closing now, please reopen if problems persist.
I am able to reproduce this issue. We are trying to setup CTDB in a 3 node gluster cluster. First time CTDB starts and working fine in all 3 nodes. But later when we stop in one node and make some network configuration (Creating a bridge on the nic used for CTDB) then it doesn't start. We are always seeing the following error in ctdb log. 2015/10/15 17:03:12.719602 [42728]: CTDB starting on node 2015/10/15 17:03:12.734214 [42729]: Starting CTDBD (Version 2.5.5) as PID: 42729 2015/10/15 17:03:12.734501 [42729]: Created PID file /run/ctdb/ctdbd.pid 2015/10/15 17:03:12.734566 [42729]: Unable to set scheduler to SCHED_FIFO (Operation not permitted) 2015/10/15 17:03:12.734584 [42729]: CTDB daemon shutting down 2015/10/15 17:03:13.734747 [42729]: Removed PID file /run/ctdb/ctdbd.pid Selinux is always in enforcing mode. CTDB config is as follows: [root@rhsdev9 ~]# cat /etc/ctdb/nodes 10.70.45.17 10.70.40.13 10.70.40.14 [root@rhsdev9 ~]# cat /etc/ctdb/public_addresses 10.70.40.185/22 rhevm [root@rhsdev9 ~]# [root@rhsdev9 ~]# cat /etc/sysconfig/ctdb CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses CTDB_NODES=/etc/ctdb/nodes # Only when using Samba. Unnecessary for NFS. CTDB_MANAGES_SAMBA=no # some tunables CTDB_SET_DeterministicIPs=1 CTDB_SET_RecoveryBanPeriod=120 CTDB_SET_KeepaliveInterval=5 CTDB_SET_KeepaliveLimit=5 CTDB_SET_MonitorInterval=15 CTDB_RECOVERY_LOCK=/mnt/lock/reclock [root@rhsdev9 ~]# [root@rhsdev9 ~]# df -ahT localhost:/ctdb fuse.glusterfs 1014M 33M 982M 4% /mnt/lock
This has been stagnant for a while, thought I'd see if I could bump it along a bit, as I believe the problem still exists. I have a very vanilla Centos 7 minimal installed on 3 hosts, with ctdb + glusterfs + ovirt configured. For testing, iptables has been disabled and selinux set to permissive. This is on baremetal hardware (ie not in a VM or containers) Initial install of CTDB worked well, however now the ctdbd only starts successfully a small percentage of the time. The error I see when trying to start the service is: 2016/02/21 19:10:29.686824 [10643]: Starting CTDBD (Version 4.2.3) as PID: 10643 2016/02/21 19:10:29.686965 [10643]: Unable to set scheduler to SCHED_FIFO (Operation not permitted) A few twists in my CTDB deployment: - Was initially deployed on eth0 interface, then changed to ovirtmgmt interface after ovirt was installed. - Node IPs and public IP are in the same subnet. - Conf files are shared on glusterfs. When it fails, I need to do the following to get the service started: systemctl start ctdb.service /bin/sh /usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start /bin/sh /usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start /bin/sh /usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start /bin/sh /usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start /bin/sh /usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start ctdb status Basically, I start the service, and while it is in 'the failure loop', I manually execute the wrapper quickly until it starts successfully. Sometimes it will start on the first or second attempt. I have never had to try more than 4 or 5 times before it worked. $ cat /opt/ctdb/ctdb CTDB_PUBLIC_ADDRESSES=/opt/ctdb/public_addresses CTDB_NODES=/etc/ctdb/nodes # Only when using Samba. Unnecessary for NFS. CTDB_MANAGES_SAMBA=no # some tunables CTDB_SET_DeterministicIPs=1 CTDB_SET_RecoveryBanPeriod=120 CTDB_SET_KeepaliveInterval=5 CTDB_SET_KeepaliveLimit=5 CTDB_SET_MonitorInterval=15 CTDB_RECOVERY_LOCK=/opt/ctdb/reclock $ cat /opt/ctdb/public_addresses 10.0.20.20/27 ovirtmgmt $ cat /etc/ctdb/nodes 10.0.20.21 10.0.20.22 10.0.20.23 $ df -ahT | grep meta localhost:meta fuse.glusterfs 101G 33G 68G 33% /opt/ctdb $ getenforce Permissive $ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination This issue has caused me to scrape and rebuild my virtualisation environment several times. I have used slightly different configurations each time, including downgrading to CentOS 6. I am happy to provide any other details needed to properly resolve this. This was the guide I loosely followed: http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/
I have a very similar environment to Ben Alexander, and used the same documentation. CentOS 7.2 with oVirt 3.6, both fully patched. Node1 and node2 are both HP DL360g5 2x2 cores, and node3 is a HP DL360g5 2x4 cores. I've tried every suggestion in this thread and Ben's work-around is the only thing that's worked. It worked for me on node2 and node3 the first time, and didn't work on the node1 eight times (and then I gave up). The only other thing that's worked for me is rebooting problematic nodes, but that may have worked for the same reason that Ben's work-around works. This is very reproducible for me, and it is a non-production environment, so I am able to experiment as requested.
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
I got this problem on CentOs 7.2 with KVM installed. It looks like system default configuration don't give resourses to realtime proceses, even for root user. I solved this by issue command echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us As much as I undestend, this command allow to use up to 10000 ns for realtime system processes, default was 0.
I face also the same issue using Centos 7 with KVM. CentOS Linux release 7.3.1611 (Core) uname -a: Linux node1 3.10.0-514.6.1.el7.x86_64 ctdb-4.4.4-12.el7_3.x86_64 getenforce: Disabled CTDB gives following error when trying to start it: Unable to set scheduler to SCHED_FIFO (Operation not permitted) issuing echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us fixes the issue.
Ctdb starts normally, if launched before any virtual machine. After virtual machine start, you need to issue command "echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us" before ctdb. As a solution, you can modify /usr/lib/systemd/system/ctdb.service and add string ExecStartPre=/bin/bash -c "sleep 2; if [ -f /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us ]; then echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us; fi" It is single string. After modify ctdb.service, you need to issue "systemctl daemon-reload" command to apply changes.
(In reply to Alex Kaouris from comment #14) > I face also the same issue using Centos 7 with KVM. > CentOS Linux release 7.3.1611 (Core) > uname -a: Linux node1 3.10.0-514.6.1.el7.x86_64 > ctdb-4.4.4-12.el7_3.x86_64 > getenforce: Disabled > > CTDB gives following error when trying to start it: > Unable to set scheduler to SCHED_FIFO (Operation not permitted) > > issuing echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us > fixes the issue. +1 Thanks also worked for me
Because this option wasn't in anything else, there is a flag in either of the CTDB configs that let you bypass the scheduler: in /etc/sysconfig/ctdb add CTDB_NOSETSCHED=yes or in /etc/ctdb/ctdb.conf add the following section and setting [legacy] realtime scheduling = false