Description of problem: Due to special configuration of the each hosts of Ceph the cluster, deployment options where stored it the /etc/ansible/hosts: [mons] mon1 monitor_interface=bond0 mon2 monitor_interface=eno16780032 mon3 monitor_interface=eno16780032 [osds] osd1 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdm', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk', '/dev/sdl' ]" raw_journal_devices="[ '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdg' ]" osd2 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk' ]" raw_journal_devices="[ '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sdl', '/dev/sdl', '/dev/sdl', '/dev/sdl', '/dev/sdl' ]" osd3 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk' ]" raw_journal_devices="[ '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sdl', '/dev/sdl', '/dev/sdl', '/dev/sdl', '/dev/sdl' ]" osd4 devices="[ '/dev/sda', '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdi', '/dev/sdj', '/dev/sdk', '/dev/sdl', '/dev/sdm', '/dev/sdn', '/dev/sdo', '/dev/sdp', '/dev/sdq', '/dev/sdr', '/dev/sds', '/dev/sdt' ]" raw_journal_devices="[ '/dev/sdf', '/dev/sdf', '/dev/sdf', '/dev/sdf', '/dev/sdf', '/dev/sdf', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdh', '/dev/sdh', '/dev/sdh', '/dev/sdh', '/dev/sdh' ]" [mdss] mon1 mon2 In the /usr/share/ceph-ansible/group_vars/all both lines stayed commented and deployment was successful: #monitor_interface: #monitor_address: During "ansible-playbook rolling_update.yml" it fails with: [snip] TASK: [ceph-common | make sure monitor_interface or monitor_address is configured] *** failed: [mon1] => {"failed": true} msg: monitor_interface or monitor_address must be configured. Interface for the monitor to listen on or IP address of that interface The rolling_update.yml read the information from /usr/share/ceph-ansible/group_vars/all, but there is none. You cannot define different interfaces in ".../all" for each host. Version-Release number of selected component (if applicable): ansible-1.9.4-1.el7aos.noarch ceph-ansible-1.0.5-46.el7scon.noarch RHCS 2 How reproducible: Every time Steps to Reproduce: 1.Deploy Ceph cluster with ansible with configuration stored in /etc/ansible/hosts 2. run rolling_update.yml 3. Actual results: rolling_update.yml is end with fail Expected results: rolling_update.yml will get the information about monitor interfaces from /etc/ansible/hosts OR /usr/share/ceph-ansible/group_vars/all will have option for setting configuration for each host or group of hosts, like one group is mon1 monitor_interface=bond0 second group mon2,mon3 monitor_interface=eno16780032 Additional info:
Hi, I have run some tests and I have found possible workaround, data from my test environment: setting in /usr/share/ceph-ansible/group_vars/all monitor_address: 192.168.66.157 adding into CONFIG OVERRIDE: ceph_conf_overrides: mon.rhscamon1: host: rhscamon1 mon addr: 192.168.66.157 mon.rhscamon2: host: rhscamon2 mon addr: 192.168.66.251 mon.rhscamon3: host: rhscamon3 mon addr: 192.168.66.149 result in /etc/ceph/ceph.conf after rolling_update.yml: [mon.rhscamon1] host = rhscamon1 mon addr = 192.168.66.157 [mon.rhscamon2] host = rhscamon2 mon addr = 192.168.66.251 [mon.rhscamon3] host = rhscamon3 mon addr = 192.168.66.149 [mon] ---------- setting in /usr/share/ceph-ansible/group_vars/all whitout CONFIG OVERRIDE part monitor_address: 192.168.66.157 all monitors have the same IP set in ceph.conf ---------- Comparison with default look of /etc/ceph/ceph.conf Whitout CONFIG OVERRIDE and in setting in /usr/share/ceph-ansible/group_vars/all monitor_interface: eth0 after rolling_update.yml - /etc/ceph/ceph.conf [mon] [mon.rhscamon1] host = rhscamon1 # we need to check if monitor_interface is defined in the inventory per host or if it's set in a group_vars file mon addr = 192.168.66.157 [mon.rhscamon2] host = rhscamon2 # we need to check if monitor_interface is defined in the inventory per host or if it's set in a group_vars file mon addr = 192.168.66.251 [mon.rhscamon3] host = rhscamon3 # we need to check if monitor_interface is defined in the inventory per host or if it's set in a group_vars file mon addr = 192.168.66.149 Is this valid workaroud/solution for this issue? Can it be set differently using interface and not IP address? Thanks, Tomas
Tomas, Could you share the full log from your ansible run. Please use the -vv flag when you run again so we can capture additional debug information. In your case where you have differing values of 'monitor_address' or 'monitor_interface' per host, the correct method is to set those values in your hosts file. Correct ansible usage also means that any group_vars you define must live in a group_vars directory that is the same as where you host files live. So if you are using /etc/ansible/hosts for your inventory then you must defined your group_vars in /etc/ansible/group_vars. When you use /etc/ansible/hosts as your inventory group_vars defined in /usr/share/ceph-ansible/group_vars are not used. I've also noticed that you're using ceph-ansible-1.0.5-46.el7scon.noarch, which is quite old when compared to ceph-ansible upstream master. We're in the process or resyncing ceph-ansible downstream with the stable-2.1 branch of ceph-ansible upstream. This problem might not exist upstream anymore.
First thing I don't understand here, why is Ansible not capable of using variables in /etc/ansible/hosts? I mean there are defined and for some reasons when running the ceph-common role they get ignored.
(In reply to seb from comment #7) > First thing I don't understand here, why is Ansible not capable of using > variables in /etc/ansible/hosts? I mean there are defined and for some > reasons when running the ceph-common role they get ignored. Because this version of rolling_update.yml is using tasks that include vars from the defaults/main.yml from each role and files from the group_vars directory. This has the effect of overriding variables defined in your inventory because of ansible variable precedence.
I'm pretty sure you will have the same problem with the manual update since you will have to update the version of your packages too. We are not doing anything weird here. We need tp find out what triggered the restorecon. Can you find anything useful in your system log to see which part of the package installation triggered the restorecon? Thanks!
Thanks Boris for the detailed explanation, what do you suggest? Should we add a task as part of our Ansible playbook to not re-label? Thanks!
This should only be an issue when we need to do the relabel. You should only need to do it once (at this extent). We do a comparative relabel based on what default labels have changed in the policy. We did not change the default label of files under /var/lib/ceph since we introduced it back in May of 2015. We might want to document this (I thought we already did?) to warn people that the update make take several hours to complete if they enable SELinux on a system with TBs of data on it and then do the update (especially if they did not update the base system in qutie some time). Also, we might want to consider filing a RHEL bug request/RFE for selinux-policy so that it does not label files under /var/lib/ceph by default in order to avoid double relabel of these files.
We're almost 100% that we never enabled SELinux, but either we (or Taco) did change (most of) the systems from enabled to permissive. Also we had two restorecon processes running back to back taking several days, even with one of them crashing. That's more than several hours.
@Torkil: The SELinux has three modes: disabled, permissive, enforcing. Both, permissive and enforcing modes, mean that SELinux is enabled. When you changed your config to permissive then you probably enabled it and this triggered the long-running relabells. btw: If you are referring to this bit in the logs ----------------------------------------------------------------------------- 2017 Jan 23 08:45:39 sneezy.drcmr kernel: INFO: task restorecon:1799447 blocked for more than 120 seconds. Jan 23 08:45:39 sneezy.drcmr kernel: restorecon D 0000000000000000 0 1799447 1798662 0x00000080 592886:Jan 23 09:55:45 sneezy.drcmr yum[1740085]: Updated: 1:ceph-selinux-10.2.3-17.el7cp.x86_64 ----------------------------------------------------------------------------- then the restorecon process did not crash. The line kernel: INFO: task restorecon:1799447 blocked for more than 120 seconds. is just an info message for you to know that the process has been waiting on a system call for more than two minutes, it does not trigger a process kill. Indeed, if you look at the timestamps, it took another hour (+10 minutes) for the package update that triggered the restorecon run to complete.
(In reply to Boris Ranto from comment #30) > @Torkil: The SELinux has three modes: disabled, permissive, enforcing. Both, > permissive and enforcing modes, mean that SELinux is enabled. When you > changed your config to permissive then you probably enabled it and this > triggered the long-running relabells. I presumed RHEL installs with SELinux enabled, and changing it from enabled to permissive shouldn't trigger a relabel. If it installs with SELinux disabled, then yea, but I very much doubt we would have changed it to anything from disabled. > btw: If you are referring to this bit in the logs > > ----------------------------------------------------------------------------- > 2017 Jan 23 08:45:39 sneezy.drcmr kernel: INFO: task restorecon:1799447 > blocked for more than 120 seconds. > Jan 23 08:45:39 sneezy.drcmr kernel: restorecon D 0000000000000000 > 0 1799447 1798662 0x00000080 > > 592886:Jan 23 09:55:45 sneezy.drcmr yum[1740085]: Updated: > 1:ceph-selinux-10.2.3-17.el7cp.x86_64 > ----------------------------------------------------------------------------- > > then the restorecon process did not crash. The line > > kernel: INFO: task restorecon:1799447 blocked for more than 120 seconds. > > is just an info message for you to know that the process has been waiting on > a system call for more than two minutes, it does not trigger a process kill. > Indeed, if you look at the timestamps, it took another hour (+10 minutes) > for the package update that triggered the restorecon run to complete. Ah, alright. I was looking at the following dmesg snippet, which I interpreted as a stack trace triggered by restorecon, but I'm evidently mistaken " [Jan23 10:48] INFO: task restorecon:1799447 blocked for more than 120 seconds. [ +0.000589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ +0.000472] restorecon D 0000000000000000 0 1799447 1798662 0x00000080 [ +0.000004] ffff880afb7777e0 0000000000000086 ffff88104e62d080 ffff880afb777fd8 [ +0.000004] ffff880afb777fd8 ffff880afb777fd8 ffff88104e62d080 ffff880125fe4030 [ +0.000003] 7fffffffffffffff ffff88104e62d080 0000000000000002 0000000000000000 [ +0.000003] Call Trace: [ +0.000010] [<ffffffff8163ba09>] schedule+0x29/0x70 [ +0.000005] [<ffffffff816396f9>] schedule_timeout+0x209/0x2d0 [ +0.000002] [<ffffffff8163b3b8>] ? __schedule+0x2d8/0x900 [ +0.000003] [<ffffffff8163ad32>] __down_common+0xd2/0x14a [ +0.000067] [<ffffffffa03ca0cd>] ? _xfs_buf_find+0x16d/0x2c0 [xfs] [ +0.000002] [<ffffffff8163adc7>] __down+0x1d/0x1f [ +0.000005] [<ffffffff810ab9a1>] down+0x41/0x50 [ +0.000012] [<ffffffffa03c9ecc>] xfs_buf_lock+0x3c/0xd0 [xfs] [ +0.000010] [<ffffffffa03ca0cd>] _xfs_buf_find+0x16d/0x2c0 [xfs] [ +0.000010] [<ffffffffa03ca24a>] xfs_buf_get_map+0x2a/0x180 [xfs] [ +0.000011] [<ffffffffa03cad2c>] xfs_buf_read_map+0x2c/0x140 [xfs] [ +0.000017] [<ffffffffa03f7849>] xfs_trans_read_buf_map+0x199/0x400 [xfs] [ +0.000011] [<ffffffffa03beb3e>] xfs_imap_to_bp+0x6e/0xf0 [xfs] [ +0.000025] [<ffffffffa03bf03e>] xfs_iread+0x7e/0x420 [xfs] [ +0.000026] [<ffffffffa03e81c7>] ? kmem_zone_alloc+0x77/0x100 [xfs] [ +0.000014] [<ffffffffa03d30f8>] xfs_iget+0x228/0x6d0 [xfs] [ +0.000013] [<ffffffffa03dbc37>] xfs_lookup+0xe7/0x110 [xfs] [ +0.000011] [<ffffffffa03d8a0b>] xfs_vn_lookup+0x7b/0xd0 [xfs] [ +0.000004] [<ffffffff811e8e3d>] lookup_real+0x1d/0x50 [ +0.000002] [<ffffffff811e97b2>] __lookup_hash+0x42/0x60 [ +0.000005] [<ffffffff81633638>] lookup_slow+0x42/0xa7 [ +0.000002] [<ffffffff811ee683>] path_lookupat+0x773/0x7a0 [ +0.000003] [<ffffffff811ec4c5>] ? do_last+0x635/0x1270 [ +0.000004] [<ffffffff811c1505>] ? kmem_cache_alloc+0x35/0x1d0 [ +0.000003] [<ffffffff811ef37f>] ? getname_flags+0x4f/0x1a0 [ +0.000002] [<ffffffff811ee6db>] filename_lookup+0x2b/0xc0 [ +0.000003] [<ffffffff811f04a7>] user_path_at_empty+0x67/0xc0 [ +0.000006] [<ffffffff81101322>] ? from_kgid_munged+0x12/0x20 [ +0.000004] [<ffffffff811e425f>] ? cp_new_stat+0x14f/0x180 [ +0.000003] [<ffffffff811f0511>] user_path_at+0x11/0x20 [ +0.000002] [<ffffffff811e3d53>] vfs_fstatat+0x63/0xc0 [ +0.000006] [<ffffffff811e4321>] SYSC_newlstat+0x31/0x60 [ +0.000005] [<ffffffff8110bc56>] ? __audit_syscall_exit+0x1e6/0x280 [ +0.000002] [<ffffffff811e45ae>] SyS_newlstat+0xe/0x10 [ +0.000006] [<ffffffff81646a09>] system_call_fastpath+0x16/0x1b " Thanks
(In reply to Torkil Svensgaard from comment #31) > I presumed RHEL installs with SELinux enabled, and changing it from enabled > to permissive shouldn't trigger a relabel. If it installs with SELinux > disabled, then yea, but I very much doubt we would have changed it to > anything from disabled. > If you used ceph before version 1.3.4 (or .3?) then it was part of the setup steps to disable SELinux and it did indeed produce few denials if you did not disable it. > Ah, alright. I was looking at the following dmesg snippet, which I > interpreted as a stack trace triggered by restorecon, but I'm evidently > mistaken > > " > [Jan23 10:48] INFO: task restorecon:1799447 blocked for more than 120 > seconds. > [ +0.000589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ +0.000472] restorecon D 0000000000000000 0 1799447 1798662 > 0x00000080 > [ +0.000004] ffff880afb7777e0 0000000000000086 ffff88104e62d080 > ffff880afb777fd8 > [ +0.000004] ffff880afb777fd8 ffff880afb777fd8 ffff88104e62d080 > ffff880125fe4030 > [ +0.000003] 7fffffffffffffff ffff88104e62d080 0000000000000002 > 0000000000000000 > [ +0.000003] Call Trace: > [ +0.000010] [<ffffffff8163ba09>] schedule+0x29/0x70 > [ +0.000005] [<ffffffff816396f9>] schedule_timeout+0x209/0x2d0 > [ +0.000002] [<ffffffff8163b3b8>] ? __schedule+0x2d8/0x900 > [ +0.000003] [<ffffffff8163ad32>] __down_common+0xd2/0x14a > [ +0.000067] [<ffffffffa03ca0cd>] ? _xfs_buf_find+0x16d/0x2c0 [xfs] > [ +0.000002] [<ffffffff8163adc7>] __down+0x1d/0x1f > [ +0.000005] [<ffffffff810ab9a1>] down+0x41/0x50 > [ +0.000012] [<ffffffffa03c9ecc>] xfs_buf_lock+0x3c/0xd0 [xfs] > [ +0.000010] [<ffffffffa03ca0cd>] _xfs_buf_find+0x16d/0x2c0 [xfs] > [ +0.000010] [<ffffffffa03ca24a>] xfs_buf_get_map+0x2a/0x180 [xfs] > [ +0.000011] [<ffffffffa03cad2c>] xfs_buf_read_map+0x2c/0x140 [xfs] > [ +0.000017] [<ffffffffa03f7849>] xfs_trans_read_buf_map+0x199/0x400 [xfs] > [ +0.000011] [<ffffffffa03beb3e>] xfs_imap_to_bp+0x6e/0xf0 [xfs] > [ +0.000025] [<ffffffffa03bf03e>] xfs_iread+0x7e/0x420 [xfs] > [ +0.000026] [<ffffffffa03e81c7>] ? kmem_zone_alloc+0x77/0x100 [xfs] > [ +0.000014] [<ffffffffa03d30f8>] xfs_iget+0x228/0x6d0 [xfs] > [ +0.000013] [<ffffffffa03dbc37>] xfs_lookup+0xe7/0x110 [xfs] > [ +0.000011] [<ffffffffa03d8a0b>] xfs_vn_lookup+0x7b/0xd0 [xfs] > [ +0.000004] [<ffffffff811e8e3d>] lookup_real+0x1d/0x50 > [ +0.000002] [<ffffffff811e97b2>] __lookup_hash+0x42/0x60 > [ +0.000005] [<ffffffff81633638>] lookup_slow+0x42/0xa7 > [ +0.000002] [<ffffffff811ee683>] path_lookupat+0x773/0x7a0 > [ +0.000003] [<ffffffff811ec4c5>] ? do_last+0x635/0x1270 > [ +0.000004] [<ffffffff811c1505>] ? kmem_cache_alloc+0x35/0x1d0 > [ +0.000003] [<ffffffff811ef37f>] ? getname_flags+0x4f/0x1a0 > [ +0.000002] [<ffffffff811ee6db>] filename_lookup+0x2b/0xc0 > [ +0.000003] [<ffffffff811f04a7>] user_path_at_empty+0x67/0xc0 > [ +0.000006] [<ffffffff81101322>] ? from_kgid_munged+0x12/0x20 > [ +0.000004] [<ffffffff811e425f>] ? cp_new_stat+0x14f/0x180 > [ +0.000003] [<ffffffff811f0511>] user_path_at+0x11/0x20 > [ +0.000002] [<ffffffff811e3d53>] vfs_fstatat+0x63/0xc0 > [ +0.000006] [<ffffffff811e4321>] SYSC_newlstat+0x31/0x60 > [ +0.000005] [<ffffffff8110bc56>] ? __audit_syscall_exit+0x1e6/0x280 > [ +0.000002] [<ffffffff811e45ae>] SyS_newlstat+0xe/0x10 > [ +0.000006] [<ffffffff81646a09>] system_call_fastpath+0x16/0x1b > " > > Thanks Yep, this is a regular hung task timeout -- process is stuck on a system call for quite a long time -- message. It is actually quite common to hit this when you are dealing with TBs of data and millions of files -- the kernel and/or user space code are not always optimized well for that amount of data.
This http://tracker.ceph.com/issues/9927#note-2 is exactly what I was after. Boris can you confirm this is the right way to do this? In ceph-ansible we can easily add this to make sure to never re-label /var/lib/ceph. Tomas, about the device rename this is due to the Kernel, the Kernel itself is managing this allocation, this can change from whatever reasons, hence the usage of /dev/disks/by-* which are persistent naming. We have a PR upstream that should solve this kind of problem.
@seb: No, that would mean we never relabel the directory (we use fixfiles in our %post scripts as well). All we want to do is skip double relabelling -- making the selinux-policy-targeted package not relabel /var/lib/ceph directory and letting it be relabelled by ceph-selinux only. btw: The tracker issue was created back then when we relabelled way too often, we do a much more sophisticated relabelling, now.
Is there anything we want to add in the update process of ceph-ansible to address this? Or do we leave this to the operator? Thanks!