1201952 – ctdb can't start

Bug 1201952 - ctdb can't start

Summary: ctdb can't start

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	ctdb
Sub Component:
Version:	22
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Michael Adam
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-03-14 00:46 UTC by Paul Rawson
Modified:	2019-11-26 15:10 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2016-07-19 13:02:21 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Paul Rawson 2015-03-14 00:46:41 UTC

Description of problem:
# ctdbd -d10
# cat /var/log/log.ctdb
2015/03/13 17:40:15.393068 [ 2573]: CTDB starting on node
2015/03/13 17:40:15.393144 [ 2573]: Recovery lock file set to "". Disabling recovery lock checking
2015/03/13 17:40:15.400617 [ 2574]: Starting CTDBD (Version 4.2.0rc3) as PID: 2574
2015/03/13 17:40:15.400655 [ 2574]: Unable to set scheduler to SCHED_FIFO (Operation not permitted)
2015/03/13 17:40:15.400667 [ 2574]: CTDB daemon shutting down

# chrt -f 55 ctdbd
chrt: failed to set pid 0's policy: Operation not permitted

added "CPUSchedulingPolicy=fifo" to systemd unit and get this:
systemd[2584]: Failed at step SETSCHEDULER spawning /usr/sbin/ctdbd_wrapper: Operation not permitted

Comment 1 Michael Adam 2015-03-15 22:28:30 UTC

This should work if you disable selinux or set it to permissive.

We'd need to define a specific selinux rule for ctdb, but please verify first by testing with disabled or permissive.

Michael

Comment 2 Paul Rawson 2015-03-16 16:33:24 UTC

(In reply to Michael Adam from comment #1)
> This should work if you disable selinux or set it to permissive.
> 
> We'd need to define a specific selinux rule for ctdb, but please verify
> first by testing with disabled or permissive.
> 
> Michael

SELinux is already set to Permissive :(

Comment 3 Michael Adam 2015-03-17 15:57:35 UTC

(In reply to Paul Rawson from comment #2)
> (In reply to Michael Adam from comment #1)
> > This should work if you disable selinux or set it to permissive.
> > 
> > We'd need to define a specific selinux rule for ctdb, but please verify
> > first by testing with disabled or permissive.
> > 
> > Michael
> 
> SELinux is already set to Permissive :(

Sorry to be insisting:

Did you verifiy with running "getenforce" that it is really
permissive in the current runtime and not only set in the
config file?

Other than that I need to recreate a setup.

What puzzles me  is that you seem to be running ctdbd
manually. Could you try running it via the systemctl cmd?
'systemctl start ctdbd'

Comment 4 Paul Rawson 2015-03-17 16:35:38 UTC

(In reply to Michael Adam from comment #3)
> (In reply to Paul Rawson from comment #2)
> > (In reply to Michael Adam from comment #1)
> > > This should work if you disable selinux or set it to permissive.
> > > 
> > > We'd need to define a specific selinux rule for ctdb, but please verify
> > > first by testing with disabled or permissive.
> > > 
> > > Michael
> > 
> > SELinux is already set to Permissive :(
> 
> Sorry to be insisting:
> 
> Did you verifiy with running "getenforce" that it is really
> permissive in the current runtime and not only set in the
> config file?
> 
> Other than that I need to recreate a setup.
> 
> What puzzles me  is that you seem to be running ctdbd
> manually. Could you try running it via the systemctl cmd?
> 'systemctl start ctdbd'

Yes, I've verified using getenforce. Just for grins, I've also tried it in Disabled mode. Obviously, that didn't make a difference. I've only been running it by hand for debug. As you can see from "systemd[2584]: Failed at step SETSCHEDULER spawning /usr/sbin/ctdbd_wrapper: Operation not permitted", I've also tried starting with systemd

I have an identical setup in F21 that works without issue.

Comment 5 Paul Rawson 2015-03-17 16:36:04 UTC

(In reply to Michael Adam from comment #3)
> (In reply to Paul Rawson from comment #2)
> > (In reply to Michael Adam from comment #1)
> > > This should work if you disable selinux or set it to permissive.
> > > 
> > > We'd need to define a specific selinux rule for ctdb, but please verify
> > > first by testing with disabled or permissive.
> > > 
> > > Michael
> > 
> > SELinux is already set to Permissive :(
> 
> Sorry to be insisting:
> 
> Did you verifiy with running "getenforce" that it is really
> permissive in the current runtime and not only set in the
> config file?
> 
> Other than that I need to recreate a setup.
> 
> What puzzles me  is that you seem to be running ctdbd
> manually. Could you try running it via the systemctl cmd?
> 'systemctl start ctdbd'

Yes, I've verified using getenforce. Just for grins, I've also tried it in Disabled mode. Obviously, that didn't make a difference. I've only been running it by hand for debug. As you can see from "systemd[2584]: Failed at step SETSCHEDULER spawning /usr/sbin/ctdbd_wrapper: Operation not permitted", I've also tried starting with systemd

I have an identical setup in F21 that works without issue.

Comment 6 Guenther Deschner 2015-06-04 13:21:51 UTC

With Fedora 22 (final) and selinux completely disabled, I was able to start ctdb. It did not start in neither permissive nor enforcing mode.

Comment 7 Michael Adam 2015-07-20 12:06:52 UTC

It worked for me in all selinux modes lately.
Can this be confirmed?

Comment 8 Michael Adam 2015-08-13 12:35:53 UTC

closing now, please reopen if problems persist.

Comment 9 Ramesh N 2015-10-21 09:57:24 UTC

I am able to reproduce this issue. We are trying to setup CTDB in a 3 node gluster cluster. First time CTDB starts and working fine in all 3 nodes. But later when we stop in one node and make some network configuration (Creating a bridge on the nic used for CTDB) then it doesn't start. We are always seeing the following error in ctdb log.

2015/10/15 17:03:12.719602 [42728]: CTDB starting on node
2015/10/15 17:03:12.734214 [42729]: Starting CTDBD (Version 2.5.5) as PID: 42729
2015/10/15 17:03:12.734501 [42729]: Created PID file /run/ctdb/ctdbd.pid
2015/10/15 17:03:12.734566 [42729]: Unable to set scheduler to SCHED_FIFO (Operation not permitted)
2015/10/15 17:03:12.734584 [42729]: CTDB daemon shutting down
2015/10/15 17:03:13.734747 [42729]: Removed PID file /run/ctdb/ctdbd.pid

Selinux is always in enforcing mode.

CTDB config is as follows:

[root@rhsdev9 ~]# cat /etc/ctdb/nodes 
10.70.45.17
10.70.40.13
10.70.40.14

[root@rhsdev9 ~]# cat /etc/ctdb/public_addresses 
10.70.40.185/22 rhevm
[root@rhsdev9 ~]#

[root@rhsdev9 ~]# cat /etc/sysconfig/ctdb
CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
CTDB_NODES=/etc/ctdb/nodes
# Only when using Samba. Unnecessary for NFS.
CTDB_MANAGES_SAMBA=no
# some tunables
CTDB_SET_DeterministicIPs=1
CTDB_SET_RecoveryBanPeriod=120
CTDB_SET_KeepaliveInterval=5
CTDB_SET_KeepaliveLimit=5
CTDB_SET_MonitorInterval=15
CTDB_RECOVERY_LOCK=/mnt/lock/reclock
[root@rhsdev9 ~]#

[root@rhsdev9 ~]# df -ahT
localhost:/ctdb                                fuse.glusterfs 1014M   33M  982M   4% /mnt/lock

Comment 10 Ben Alexander 2016-02-21 08:37:11 UTC

This has been stagnant for a while, thought I'd see if I could bump it along a bit, as I believe the problem still exists.

I have a very vanilla Centos 7 minimal installed on 3 hosts, with ctdb + glusterfs + ovirt configured. For testing, iptables has been disabled and selinux set to permissive. This is on baremetal hardware (ie not in a VM or containers)

Initial install of CTDB worked well, however now the ctdbd only starts successfully a small percentage of the time. The error I see when trying to start the service is:
2016/02/21 19:10:29.686824 [10643]: Starting CTDBD (Version 4.2.3) as PID: 10643
2016/02/21 19:10:29.686965 [10643]: Unable to set scheduler to SCHED_FIFO (Operation not permitted)

A few twists in my CTDB deployment:
- Was initially deployed on eth0 interface, then changed to ovirtmgmt interface after ovirt was installed.
- Node IPs and public IP are in the same subnet.
- Conf files are shared on glusterfs.

When it fails, I need to do the following to get the service started:

systemctl start ctdb.service
/bin/sh /usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start
/bin/sh /usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start
/bin/sh /usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start
/bin/sh /usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start
/bin/sh /usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start
ctdb status

Basically, I start the service, and while it is in 'the failure loop', I manually execute the wrapper quickly until it starts successfully. Sometimes it will start on the first or second attempt. I have never had to try more than 4 or 5 times before it worked.

$ cat /opt/ctdb/ctdb
CTDB_PUBLIC_ADDRESSES=/opt/ctdb/public_addresses
CTDB_NODES=/etc/ctdb/nodes

# Only when using Samba. Unnecessary for NFS.
CTDB_MANAGES_SAMBA=no

# some tunables
CTDB_SET_DeterministicIPs=1
CTDB_SET_RecoveryBanPeriod=120
CTDB_SET_KeepaliveInterval=5
CTDB_SET_KeepaliveLimit=5
CTDB_SET_MonitorInterval=15
CTDB_RECOVERY_LOCK=/opt/ctdb/reclock

$ cat /opt/ctdb/public_addresses
10.0.20.20/27 ovirtmgmt

$ cat /etc/ctdb/nodes
10.0.20.21
10.0.20.22
10.0.20.23

$ df -ahT | grep meta
localhost:meta fuse.glusterfs 101G 33G 68G 33% /opt/ctdb

$ getenforce
Permissive

$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

This issue has caused me to scrape and rebuild my virtualisation environment several times. I have used slightly different configurations each time, including downgrading to CentOS 6. I am happy to provide any other details needed to properly resolve this. This was the guide I loosely followed: http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/

Comment 11 kevin reichhart 2016-03-19 01:32:09 UTC

I have a very similar environment to Ben Alexander, and used the same documentation.  CentOS 7.2 with oVirt 3.6, both fully patched.

Node1 and node2 are both HP DL360g5 2x2 cores, and node3 is a HP DL360g5 2x4 cores.

I've tried every suggestion in this thread and Ben's work-around is the only thing that's worked.  It worked for me on node2 and node3 the first time, and didn't work on the node1 eight times (and then I gave up). The only other thing that's worked for me is rebooting problematic nodes, but that may have worked for the same reason that Ben's work-around works.

This is very reproducible for me, and it is a non-production environment, so I am able to experiment as requested.

Comment 12 Fedora End Of Life 2016-07-19 13:02:21 UTC

Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 13 Igor 2016-11-30 10:10:45 UTC

I got this problem on CentOs 7.2 with KVM installed. It looks like system default configuration don't give resourses to realtime proceses, even for root user. I solved this by issue command
echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us  
As much as I undestend, this command allow to use up to 10000 ns for realtime system processes, default was 0.

Comment 14 Alex Kaouris 2017-02-22 14:44:23 UTC

I face also the same issue using Centos 7 with KVM. 
CentOS Linux release 7.3.1611 (Core)
uname -a: Linux node1 3.10.0-514.6.1.el7.x86_64
ctdb-4.4.4-12.el7_3.x86_64
getenforce: Disabled

CTDB gives following error when trying to start it: 
Unable to set scheduler to SCHED_FIFO (Operation not permitted)

issuing echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us
fixes the issue.

Comment 15 Igor 2017-09-19 09:07:11 UTC

Ctdb starts normally, if launched before any virtual machine. After virtual machine start, you need to issue command "echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us" before ctdb.
As a solution, you can modify /usr/lib/systemd/system/ctdb.service and add string
ExecStartPre=/bin/bash -c "sleep 2; if [ -f /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us ]; then echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us; fi" It is single string.

After modify ctdb.service, you need to issue "systemctl daemon-reload" command to apply changes.

Comment 16 Sergio Basto 2018-11-15 18:34:41 UTC

(In reply to Alex Kaouris from comment #14)
> I face also the same issue using Centos 7 with KVM. 
> CentOS Linux release 7.3.1611 (Core)
> uname -a: Linux node1 3.10.0-514.6.1.el7.x86_64
> ctdb-4.4.4-12.el7_3.x86_64
> getenforce: Disabled
> 
> CTDB gives following error when trying to start it: 
> Unable to set scheduler to SCHED_FIFO (Operation not permitted)
> 
> issuing echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us
> fixes the issue.

+1 Thanks also worked for me

Comment 17 Jonathan Liedy 2019-11-26 15:10:49 UTC

Because this option wasn't in anything else, there is a flag in either of the CTDB configs that let you bypass the scheduler:

in /etc/sysconfig/ctdb add
CTDB_NOSETSCHED=yes

or in /etc/ctdb/ctdb.conf add the following section and setting
[legacy]
   realtime scheduling = false

Note You need to log in before you can comment on or make changes to this bug.