Bug 1414066 - [GSS] ceph-ansible rolling_update.yml is failing
Summary: [GSS] ceph-ansible rolling_update.yml is failing
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat
Component: ceph-ansible
Version: 2
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
: 3
Assignee: Sébastien Han
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-17 16:14 UTC by Tomas Petr
Modified: 2020-02-14 18:28 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-31 20:59:21 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Tomas Petr 2017-01-17 16:14:55 UTC
Description of problem:
Due to special configuration of the each hosts of Ceph the cluster, deployment options where stored it the /etc/ansible/hosts:

[mons]
mon1 monitor_interface=bond0
mon2 monitor_interface=eno16780032
mon3 monitor_interface=eno16780032

[osds]
osd1 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdm', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk', '/dev/sdl' ]" raw_journal_devices="[ '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdg' ]"

osd2 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk' ]" raw_journal_devices="[ '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sdl', '/dev/sdl', '/dev/sdl', '/dev/sdl', '/dev/sdl' ]"

osd3 devices="[ '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdf', '/dev/sdg', '/dev/sdh', '/dev/sdi', '/dev/sdj', '/dev/sdk' ]" raw_journal_devices="[ '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sda', '/dev/sdl', '/dev/sdl', '/dev/sdl', '/dev/sdl', '/dev/sdl' ]"

osd4 devices="[ '/dev/sda', '/dev/sdb', '/dev/sdc', '/dev/sdd', '/dev/sde', '/dev/sdi', '/dev/sdj', '/dev/sdk', '/dev/sdl', '/dev/sdm', '/dev/sdn', '/dev/sdo', '/dev/sdp', '/dev/sdq', '/dev/sdr', '/dev/sds', '/dev/sdt' ]" raw_journal_devices="[ '/dev/sdf', '/dev/sdf', '/dev/sdf', '/dev/sdf', '/dev/sdf', '/dev/sdf', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdg', '/dev/sdh', '/dev/sdh', '/dev/sdh', '/dev/sdh', '/dev/sdh' ]"

[mdss]
mon1
mon2

In the /usr/share/ceph-ansible/group_vars/all both lines stayed commented and deployment was successful:
#monitor_interface:
#monitor_address: 

During "ansible-playbook rolling_update.yml" it fails with:
[snip]
TASK: [ceph-common | make sure monitor_interface or monitor_address is 
configured] ***
failed: [mon1] => {"failed": true}
msg: monitor_interface or monitor_address must be configured. Interface 
for the monitor to listen on or IP address of that interface

The rolling_update.yml read the information from /usr/share/ceph-ansible/group_vars/all, but there is none. You cannot define different interfaces in ".../all" for each host.


Version-Release number of selected component (if applicable):
ansible-1.9.4-1.el7aos.noarch
ceph-ansible-1.0.5-46.el7scon.noarch
RHCS 2

How reproducible:
Every time

Steps to Reproduce:
1.Deploy Ceph cluster with ansible with configuration stored  in /etc/ansible/hosts
2. run rolling_update.yml
3. 

Actual results:
rolling_update.yml is end with fail

Expected results:
rolling_update.yml will get the information about monitor interfaces from /etc/ansible/hosts
OR 
/usr/share/ceph-ansible/group_vars/all will have option for setting configuration for each host or group of hosts, like one group is
mon1 monitor_interface=bond0
second group
mon2,mon3 monitor_interface=eno16780032

Additional info:

Comment 3 Tomas Petr 2017-01-17 16:38:11 UTC
Hi,
I have run some tests and I have found possible workaround, data from my test environment:

setting in /usr/share/ceph-ansible/group_vars/all
monitor_address: 192.168.66.157

adding into CONFIG OVERRIDE:
ceph_conf_overrides:
   mon.rhscamon1:
     host: rhscamon1
     mon addr: 192.168.66.157
   mon.rhscamon2:
     host: rhscamon2
     mon addr: 192.168.66.251
   mon.rhscamon3:
     host: rhscamon3
     mon addr: 192.168.66.149

result in /etc/ceph/ceph.conf after rolling_update.yml:
[mon.rhscamon1]
host = rhscamon1
mon addr = 192.168.66.157

[mon.rhscamon2]
host = rhscamon2
mon addr = 192.168.66.251

[mon.rhscamon3]
host = rhscamon3
mon addr = 192.168.66.149

[mon]
----------

setting in /usr/share/ceph-ansible/group_vars/all whitout CONFIG OVERRIDE part
monitor_address: 192.168.66.157 
all monitors have the same IP set in ceph.conf 
----------
Comparison with default look of /etc/ceph/ceph.conf
Whitout CONFIG OVERRIDE and in setting in /usr/share/ceph-ansible/group_vars/all
monitor_interface: eth0

after rolling_update.yml - /etc/ceph/ceph.conf
[mon]
[mon.rhscamon1]
host = rhscamon1
# we need to check if monitor_interface is defined in the inventory per host or if it's set in a group_vars file
mon addr = 192.168.66.157
[mon.rhscamon2]
host = rhscamon2
# we need to check if monitor_interface is defined in the inventory per host or if it's set in a group_vars file
mon addr = 192.168.66.251
[mon.rhscamon3]
host = rhscamon3
# we need to check if monitor_interface is defined in the inventory per host or if it's set in a group_vars file
mon addr = 192.168.66.149

Is this valid workaroud/solution for this issue?
Can it be set differently using interface and not IP address?
Thanks, Tomas

Comment 5 Andrew Schoen 2017-01-17 17:20:40 UTC
Tomas,

Could you share the full log from your ansible run. Please use the -vv flag when you run again so we can capture additional debug information.

In your case where you have differing values of 'monitor_address' or 'monitor_interface' per host, the correct method is to set those values in your hosts file. Correct ansible usage also means that any group_vars you define must live in a group_vars directory that is the same as where you host files live. So if you are using /etc/ansible/hosts for your inventory then you must defined your group_vars in /etc/ansible/group_vars.  When you use /etc/ansible/hosts as your inventory group_vars defined in /usr/share/ceph-ansible/group_vars are not used.

I've also noticed that you're using ceph-ansible-1.0.5-46.el7scon.noarch, which is quite old when compared to ceph-ansible upstream master. We're in the process or resyncing ceph-ansible downstream with the stable-2.1 branch of ceph-ansible upstream. This problem might not exist upstream anymore.

Comment 7 seb 2017-01-18 10:45:39 UTC
First thing I don't understand here, why is Ansible not capable of using variables in /etc/ansible/hosts? I mean there are defined and for some reasons when running the ceph-common role they get ignored.

Comment 12 Andrew Schoen 2017-01-18 18:24:44 UTC
(In reply to seb from comment #7)
> First thing I don't understand here, why is Ansible not capable of using
> variables in /etc/ansible/hosts? I mean there are defined and for some
> reasons when running the ceph-common role they get ignored.

Because this version of rolling_update.yml is using tasks that include vars from the defaults/main.yml from each role and files from the group_vars directory. This has the effect of overriding variables defined in your inventory because of ansible variable precedence.

Comment 22 seb 2017-01-24 10:54:53 UTC
I'm pretty sure you will have the same problem with the manual update since you will have to update the version of your packages too. We are not doing anything weird here. We need tp find out what triggered the restorecon. Can you find anything useful in your system log to see which part of the package installation triggered the restorecon? Thanks!

Comment 27 seb 2017-01-25 12:30:40 UTC
Thanks Boris for the detailed explanation, what do you suggest? Should we add a task as part of our Ansible playbook to not re-label? Thanks!

Comment 28 Boris Ranto 2017-01-25 12:59:36 UTC
This should only be an issue when we need to do the relabel. You should only need to do it once (at this extent). We do a comparative relabel based on what default labels have changed in the policy. We did not change the default label of files under /var/lib/ceph since we introduced it back in May of 2015.

We might want to document this (I thought we already did?) to warn people that the update make take several hours to complete if they enable SELinux on a system with TBs of data on it and then do the update (especially if they did not update the base system in qutie some time).

Also, we might want to consider filing a RHEL bug request/RFE for selinux-policy so that it does not label files under /var/lib/ceph by default in order to avoid double relabel of these files.

Comment 29 Torkil Svensgaard 2017-01-25 13:14:48 UTC
We're almost 100% that we never enabled SELinux, but either we (or Taco) did change (most of) the systems from enabled to permissive.

Also we had two restorecon processes running back to back taking several days, even with one of them crashing. That's more than several hours.

Comment 30 Boris Ranto 2017-01-25 13:40:00 UTC
@Torkil: The SELinux has three modes: disabled, permissive, enforcing. Both, permissive and enforcing modes, mean that SELinux is enabled. When you changed your config to permissive then you probably enabled it and this triggered the long-running relabells.

btw: If you are referring to this bit in the logs

-----------------------------------------------------------------------------
2017 Jan 23 08:45:39 sneezy.drcmr kernel: INFO: task restorecon:1799447 blocked for more than 120 seconds.
Jan 23 08:45:39 sneezy.drcmr kernel: restorecon      D 0000000000000000     0 1799447 1798662 0x00000080

592886:Jan 23 09:55:45 sneezy.drcmr yum[1740085]: Updated: 1:ceph-selinux-10.2.3-17.el7cp.x86_64
-----------------------------------------------------------------------------

then the restorecon process did not crash. The line

kernel: INFO: task restorecon:1799447 blocked for more than 120 seconds.

is just an info message for you to know that the process has been waiting on a system call for more than two minutes, it does not trigger a process kill. Indeed, if you look at the timestamps, it took another hour (+10 minutes) for the package update that triggered the restorecon run to complete.

Comment 31 Torkil Svensgaard 2017-01-25 14:03:05 UTC
(In reply to Boris Ranto from comment #30)
> @Torkil: The SELinux has three modes: disabled, permissive, enforcing. Both,
> permissive and enforcing modes, mean that SELinux is enabled. When you
> changed your config to permissive then you probably enabled it and this
> triggered the long-running relabells.

I presumed RHEL installs with SELinux enabled, and changing it from enabled to permissive shouldn't trigger a relabel. If it installs with SELinux disabled, then yea, but I very much doubt we would have changed it to anything from disabled. 
 
> btw: If you are referring to this bit in the logs
> 
> -----------------------------------------------------------------------------
> 2017 Jan 23 08:45:39 sneezy.drcmr kernel: INFO: task restorecon:1799447
> blocked for more than 120 seconds.
> Jan 23 08:45:39 sneezy.drcmr kernel: restorecon      D 0000000000000000    
> 0 1799447 1798662 0x00000080
> 
> 592886:Jan 23 09:55:45 sneezy.drcmr yum[1740085]: Updated:
> 1:ceph-selinux-10.2.3-17.el7cp.x86_64
> -----------------------------------------------------------------------------
> 
> then the restorecon process did not crash. The line
> 
> kernel: INFO: task restorecon:1799447 blocked for more than 120 seconds.
> 
> is just an info message for you to know that the process has been waiting on
> a system call for more than two minutes, it does not trigger a process kill.
> Indeed, if you look at the timestamps, it took another hour (+10 minutes)
> for the package update that triggered the restorecon run to complete.

Ah, alright. I was looking at the following dmesg snippet, which I interpreted as a stack trace triggered by restorecon, but I'm evidently mistaken

"
[Jan23 10:48] INFO: task restorecon:1799447 blocked for more than 120 
seconds.
[  +0.000589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  +0.000472] restorecon      D 0000000000000000     0 1799447 1798662 
0x00000080
[  +0.000004]  ffff880afb7777e0 0000000000000086 ffff88104e62d080 
ffff880afb777fd8
[  +0.000004]  ffff880afb777fd8 ffff880afb777fd8 ffff88104e62d080 
ffff880125fe4030
[  +0.000003]  7fffffffffffffff ffff88104e62d080 0000000000000002 
0000000000000000
[  +0.000003] Call Trace:
[  +0.000010]  [<ffffffff8163ba09>] schedule+0x29/0x70
[  +0.000005]  [<ffffffff816396f9>] schedule_timeout+0x209/0x2d0
[  +0.000002]  [<ffffffff8163b3b8>] ? __schedule+0x2d8/0x900
[  +0.000003]  [<ffffffff8163ad32>] __down_common+0xd2/0x14a
[  +0.000067]  [<ffffffffa03ca0cd>] ? _xfs_buf_find+0x16d/0x2c0 [xfs]
[  +0.000002]  [<ffffffff8163adc7>] __down+0x1d/0x1f
[  +0.000005]  [<ffffffff810ab9a1>] down+0x41/0x50
[  +0.000012]  [<ffffffffa03c9ecc>] xfs_buf_lock+0x3c/0xd0 [xfs]
[  +0.000010]  [<ffffffffa03ca0cd>] _xfs_buf_find+0x16d/0x2c0 [xfs]
[  +0.000010]  [<ffffffffa03ca24a>] xfs_buf_get_map+0x2a/0x180 [xfs]
[  +0.000011]  [<ffffffffa03cad2c>] xfs_buf_read_map+0x2c/0x140 [xfs]
[  +0.000017]  [<ffffffffa03f7849>] xfs_trans_read_buf_map+0x199/0x400 [xfs]
[  +0.000011]  [<ffffffffa03beb3e>] xfs_imap_to_bp+0x6e/0xf0 [xfs]
[  +0.000025]  [<ffffffffa03bf03e>] xfs_iread+0x7e/0x420 [xfs]
[  +0.000026]  [<ffffffffa03e81c7>] ? kmem_zone_alloc+0x77/0x100 [xfs]
[  +0.000014]  [<ffffffffa03d30f8>] xfs_iget+0x228/0x6d0 [xfs]
[  +0.000013]  [<ffffffffa03dbc37>] xfs_lookup+0xe7/0x110 [xfs]
[  +0.000011]  [<ffffffffa03d8a0b>] xfs_vn_lookup+0x7b/0xd0 [xfs]
[  +0.000004]  [<ffffffff811e8e3d>] lookup_real+0x1d/0x50
[  +0.000002]  [<ffffffff811e97b2>] __lookup_hash+0x42/0x60
[  +0.000005]  [<ffffffff81633638>] lookup_slow+0x42/0xa7
[  +0.000002]  [<ffffffff811ee683>] path_lookupat+0x773/0x7a0
[  +0.000003]  [<ffffffff811ec4c5>] ? do_last+0x635/0x1270
[  +0.000004]  [<ffffffff811c1505>] ? kmem_cache_alloc+0x35/0x1d0
[  +0.000003]  [<ffffffff811ef37f>] ? getname_flags+0x4f/0x1a0
[  +0.000002]  [<ffffffff811ee6db>] filename_lookup+0x2b/0xc0
[  +0.000003]  [<ffffffff811f04a7>] user_path_at_empty+0x67/0xc0
[  +0.000006]  [<ffffffff81101322>] ? from_kgid_munged+0x12/0x20
[  +0.000004]  [<ffffffff811e425f>] ? cp_new_stat+0x14f/0x180
[  +0.000003]  [<ffffffff811f0511>] user_path_at+0x11/0x20
[  +0.000002]  [<ffffffff811e3d53>] vfs_fstatat+0x63/0xc0
[  +0.000006]  [<ffffffff811e4321>] SYSC_newlstat+0x31/0x60
[  +0.000005]  [<ffffffff8110bc56>] ? __audit_syscall_exit+0x1e6/0x280
[  +0.000002]  [<ffffffff811e45ae>] SyS_newlstat+0xe/0x10
[  +0.000006]  [<ffffffff81646a09>] system_call_fastpath+0x16/0x1b
"

Thanks

Comment 38 Boris Ranto 2017-01-25 17:08:45 UTC
(In reply to Torkil Svensgaard from comment #31)
> I presumed RHEL installs with SELinux enabled, and changing it from enabled
> to permissive shouldn't trigger a relabel. If it installs with SELinux
> disabled, then yea, but I very much doubt we would have changed it to
> anything from disabled. 
>  


If you used ceph before version 1.3.4 (or .3?) then it was part of the setup steps to disable SELinux and it did indeed produce few denials if you did not disable it.


> Ah, alright. I was looking at the following dmesg snippet, which I
> interpreted as a stack trace triggered by restorecon, but I'm evidently
> mistaken
> 
> "
> [Jan23 10:48] INFO: task restorecon:1799447 blocked for more than 120 
> seconds.
> [  +0.000589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [  +0.000472] restorecon      D 0000000000000000     0 1799447 1798662 
> 0x00000080
> [  +0.000004]  ffff880afb7777e0 0000000000000086 ffff88104e62d080 
> ffff880afb777fd8
> [  +0.000004]  ffff880afb777fd8 ffff880afb777fd8 ffff88104e62d080 
> ffff880125fe4030
> [  +0.000003]  7fffffffffffffff ffff88104e62d080 0000000000000002 
> 0000000000000000
> [  +0.000003] Call Trace:
> [  +0.000010]  [<ffffffff8163ba09>] schedule+0x29/0x70
> [  +0.000005]  [<ffffffff816396f9>] schedule_timeout+0x209/0x2d0
> [  +0.000002]  [<ffffffff8163b3b8>] ? __schedule+0x2d8/0x900
> [  +0.000003]  [<ffffffff8163ad32>] __down_common+0xd2/0x14a
> [  +0.000067]  [<ffffffffa03ca0cd>] ? _xfs_buf_find+0x16d/0x2c0 [xfs]
> [  +0.000002]  [<ffffffff8163adc7>] __down+0x1d/0x1f
> [  +0.000005]  [<ffffffff810ab9a1>] down+0x41/0x50
> [  +0.000012]  [<ffffffffa03c9ecc>] xfs_buf_lock+0x3c/0xd0 [xfs]
> [  +0.000010]  [<ffffffffa03ca0cd>] _xfs_buf_find+0x16d/0x2c0 [xfs]
> [  +0.000010]  [<ffffffffa03ca24a>] xfs_buf_get_map+0x2a/0x180 [xfs]
> [  +0.000011]  [<ffffffffa03cad2c>] xfs_buf_read_map+0x2c/0x140 [xfs]
> [  +0.000017]  [<ffffffffa03f7849>] xfs_trans_read_buf_map+0x199/0x400 [xfs]
> [  +0.000011]  [<ffffffffa03beb3e>] xfs_imap_to_bp+0x6e/0xf0 [xfs]
> [  +0.000025]  [<ffffffffa03bf03e>] xfs_iread+0x7e/0x420 [xfs]
> [  +0.000026]  [<ffffffffa03e81c7>] ? kmem_zone_alloc+0x77/0x100 [xfs]
> [  +0.000014]  [<ffffffffa03d30f8>] xfs_iget+0x228/0x6d0 [xfs]
> [  +0.000013]  [<ffffffffa03dbc37>] xfs_lookup+0xe7/0x110 [xfs]
> [  +0.000011]  [<ffffffffa03d8a0b>] xfs_vn_lookup+0x7b/0xd0 [xfs]
> [  +0.000004]  [<ffffffff811e8e3d>] lookup_real+0x1d/0x50
> [  +0.000002]  [<ffffffff811e97b2>] __lookup_hash+0x42/0x60
> [  +0.000005]  [<ffffffff81633638>] lookup_slow+0x42/0xa7
> [  +0.000002]  [<ffffffff811ee683>] path_lookupat+0x773/0x7a0
> [  +0.000003]  [<ffffffff811ec4c5>] ? do_last+0x635/0x1270
> [  +0.000004]  [<ffffffff811c1505>] ? kmem_cache_alloc+0x35/0x1d0
> [  +0.000003]  [<ffffffff811ef37f>] ? getname_flags+0x4f/0x1a0
> [  +0.000002]  [<ffffffff811ee6db>] filename_lookup+0x2b/0xc0
> [  +0.000003]  [<ffffffff811f04a7>] user_path_at_empty+0x67/0xc0
> [  +0.000006]  [<ffffffff81101322>] ? from_kgid_munged+0x12/0x20
> [  +0.000004]  [<ffffffff811e425f>] ? cp_new_stat+0x14f/0x180
> [  +0.000003]  [<ffffffff811f0511>] user_path_at+0x11/0x20
> [  +0.000002]  [<ffffffff811e3d53>] vfs_fstatat+0x63/0xc0
> [  +0.000006]  [<ffffffff811e4321>] SYSC_newlstat+0x31/0x60
> [  +0.000005]  [<ffffffff8110bc56>] ? __audit_syscall_exit+0x1e6/0x280
> [  +0.000002]  [<ffffffff811e45ae>] SyS_newlstat+0xe/0x10
> [  +0.000006]  [<ffffffff81646a09>] system_call_fastpath+0x16/0x1b
> "
> 
> Thanks


Yep, this is a regular hung task timeout -- process is stuck on a system call for quite a long time -- message. It is actually quite common to hit this when you are dealing with TBs of data and millions of files -- the kernel and/or user space code are not always optimized well for that amount of data.

Comment 41 seb 2017-01-26 12:17:21 UTC
This http://tracker.ceph.com/issues/9927#note-2 is exactly what I was after. Boris can you confirm this is the right way to do this?
In ceph-ansible we can easily add this to make sure to never re-label /var/lib/ceph.

Tomas, about the device rename this is due to the Kernel, the Kernel itself is managing this allocation, this can change from whatever reasons, hence the usage of /dev/disks/by-* which are persistent naming. We have a PR upstream that should solve this kind of problem.

Comment 42 Boris Ranto 2017-01-26 13:59:58 UTC
@seb: No, that would mean we never relabel the directory (we use fixfiles in our %post scripts as well). All we want to do is skip double relabelling -- making the selinux-policy-targeted package not relabel /var/lib/ceph directory and letting it be relabelled by ceph-selinux only.

btw: The tracker issue was created back then when we relabelled way too often, we do a much more sophisticated relabelling, now.

Comment 46 seb 2017-01-30 09:34:39 UTC
Is there anything we want to add in the update process of ceph-ansible to address this? Or do we leave this to the operator?
Thanks!


Note You need to log in before you can comment on or make changes to this bug.