Description of problem: Customer has 2 Gluster nodes which uses CTDB for samba share. Customer started the service on one node but on the other node the service is not starting up, it comes up with error in /var/log/log.ctdb -> 2017/02/02 10:03:48.510907 [ 4898]: CTDB starting on node 2017/02/02 10:03:48.514958 [ 4899]: Starting CTDBD (Version 4.4.5) as PID: 4899 2017/02/02 10:03:48.515105 [ 4899]: Created PID file /run/ctdb/ctdbd.pid 2017/02/02 10:03:48.515144 [ 4899]: Unable to set scheduler to SCHED_FIFO (Operation not permitted) 2017/02/02 10:03:48.515149 [ 4899]: CTDB daemon shutting down 2017/02/02 10:03:49.515388 [ 4899]: Removed PID file /run/ctdb/ctdbd.pid Version-Release number of selected component (if applicable): CTDBD (Version 4.4.5)
Here are the dots that need to be connected. What is the problem? ctdb is not permitted to set scheduling preference for its threads. This should not happen and does not happen with same systemd unit files on non-vdsm setups. What could be the problem? May be vdsm changes something in "/sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us" . Workaround to be tried 1. echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us 2. systemctl stop ctdb.service 3. systemctl start ctdb.service
Otakar replied that the following workaround was sufficient ------------------------------------------------------------------------------- Issue fixed after execution of : 1. echo 950000 > /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us 2. systemctl stop ctdb.service 3. systemctl start ctdb.service I had to change the value and the file cpu.rt_runtime_us path as customer RHEL is 7.3 ---------------------------------------------------------------------------------
This problem appears on restarting a node after vdsm is installed. Yaniv, does vdsm change any global systemd settings?
it doesn't sound like systemd issue, but the service ctdb run fails to start without setting cpu.rt_runtime_us in cgroup.. im not aware of touching this in vdsm scope, but maybe we do as part of sla stuff? check the value before the change, maybe the default in centos is wrong and need an update?
(In reply to Yaniv Bronhaim from comment #7) > it doesn't sound like systemd issue, but the service ctdb run fails to start > without setting cpu.rt_runtime_us in cgroup.. im not aware of touching this > in vdsm scope, but maybe we do as part of sla stuff? check the value before > the change, maybe the default in centos is wrong and need an update? The default in rhel 7 is 950000. After a vdsm installation is complete, we see that it has been changed to 0. I am not sure which package makes this change. Is there a mailing list where we can ask this question, it is for sure related to virt.
is this something new? is it producible always after installing vdsm? I tried to reproduce it over centos 7.2 and the file was not set at all after vdsm installation I tried over centos 7.3 to remove and re-install vdsm after setting it to 950000 and it is not changed.. same if I reinstalled libvirt ---- snip [root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) [root@localhost ~]# cat /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us cat: /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us: No such file or directory [root@localhost ~]# rpm -qa | grep vdsm vdsm-jsonrpc-4.20.0-422.git13530cc.el7.centos.noarch vdsm-api-4.20.0-422.git13530cc.el7.centos.noarch vdsm-client-4.20.0-422.git13530cc.el7.centos.noarch vdsm-python-4.20.0-422.git13530cc.el7.centos.noarch vdsm-yajsonrpc-4.20.0-422.git13530cc.el7.centos.noarch vdsm-4.20.0-422.git13530cc.el7.centos.x86_64 vdsm-tests-4.20.0-422.git13530cc.el7.centos.noarch vdsm-xmlrpc-4.20.0-422.git13530cc.el7.centos.noarch vdsm-hook-vmfex-dev-4.20.0-422.git13530cc.el7.centos.noarch ----
(In reply to Yaniv Bronhaim from comment #10) > is this something new? is it producible always after installing vdsm? > I tried to reproduce it over centos 7.2 and the file was not set at all > after vdsm installation > It is NOT new. However, as you have observed, the file was named differently till RHEL 7.2. I don't remember the exact path but it was certainly under /sys/fs/cgroup/. > I tried over centos 7.3 to remove and re-install vdsm after setting it to > 950000 and it is not changed.. same if I reinstalled libvirt I think you did not perform a restart. I have not yet figured out systemd+cgroups works, but after vdsm+virt packages are installed and machine is restarted, this option changes. May be there is some other config file that is changed. > > > ---- snip > [root@localhost ~]# cat /etc/redhat-release > CentOS Linux release 7.2.1511 (Core) > [root@localhost ~]# cat > /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us > cat: /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us: No such file > or directory > > [root@localhost ~]# rpm -qa | grep vdsm > vdsm-jsonrpc-4.20.0-422.git13530cc.el7.centos.noarch > vdsm-api-4.20.0-422.git13530cc.el7.centos.noarch > vdsm-client-4.20.0-422.git13530cc.el7.centos.noarch > vdsm-python-4.20.0-422.git13530cc.el7.centos.noarch > vdsm-yajsonrpc-4.20.0-422.git13530cc.el7.centos.noarch > vdsm-4.20.0-422.git13530cc.el7.centos.x86_64 > vdsm-tests-4.20.0-422.git13530cc.el7.centos.noarch > vdsm-xmlrpc-4.20.0-422.git13530cc.el7.centos.noarch > vdsm-hook-vmfex-dev-4.20.0-422.git13530cc.el7.centos.noarch > ----
This link has the best possible info on the cgroup for realtime cpu and systemd interaction. https://www.freedesktop.org/wiki/Software/systemd/MyServiceCantGetRealtime/
I tried now with fresh centos latest installation, ran yum upgrade, then deployed using ovirt-engine, rebooted the host, and still the file does not exist at all. I saw this 950000 value in some setups. but I can't reproduce the description with vdsm and engine, 4.1 and master code. so I assume its not changed by vdsm rpm installation or the deploy flow
Updated the doc text slightly for the release notes
Hi Otakar, Is there anything pending from Engineering side? I have provided the solution in comment #16. Can you please confirm whether it worked for the customer or not?