1302675 – OSD getting marked down while installing ceph-selinux package

Bug 1302675 - OSD getting marked down while installing ceph-selinux package

Summary: OSD getting marked down while installing ceph-selinux package

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Documentation
Sub Component:
Version:	1.3.2
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	1.3.2
Assignee:	Bara Ancincova
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-01-28 11:30 UTC by Tanay Ganguly
Modified:	2016-09-20 01:50 UTC (History)
CC List:	11 users (show)
Fixed In Version:	RHEL: ceph-deploy-1.5.27.4-2.el7cp Ubuntu: ceph-deploy_1.5.27.4-3redhat1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-02-29 14:45:23 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Increase timeout for pkg install procedure patch (1.30 KB, patch) 2016-02-02 00:40 UTC, Boris Ranto	no flags	Details \| Diff
Disable timeout for pkg install/remove patch (2.21 KB, patch) 2016-02-02 12:26 UTC, Boris Ranto	no flags	Details \| Diff
Mon log (784.00 KB, text/plain) 2016-02-06 16:16 UTC, Tanay Ganguly	no flags	Details
OSd log (4.29 MB, text/plain) 2016-02-06 16:17 UTC, Tanay Ganguly	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0313	0	normal	SHIPPED_LIVE	Red Hat Ceph Storage 1.3.2 bug fix and enhancement update	2016-02-29 19:37:43 UTC

Description Tanay Ganguly 2016-01-28 11:30:39 UTC

Description of problem:
Installing ceph-linux package is causing osd to go down.

Version-Release number of selected component (if applicable):
1.3.2
rpm -qa |grep ceph
ceph-0.94.5-4.el7cp.x86_64
ceph-selinux-0.94.5-4.el7cp.x86_64
ceph-common-0.94.5-4.el7cp.x86_64
ceph-osd-0.94.5-4.el7cp.x86_64

Selinux status:
Current mode:                   permissive
I haven't made selinux as enforcing in the cluster.

How reproducible:
100%

Steps to Reproduce:
1. Install ceph Cluster ( Actually my Cluster is upgraded from 1.3.1 Async to 1.3.2, 26th Build )
2. My Cluster is up and healthy after the upgrade and the new ceph version is 
ceph-0.94.5-4
3. Later i tried to install ceph-selinux package on my cluster.
ceph-deploy pkg --install ceph-selinux `node`

Actual results:
Its getting timeout after 300 seconds, and the osd are getting down.

Expected results:
It should get installed and OSD's should not go down.

Additional info:
Installing in Monitor is working Fine (ceph-selinux)
But installing in OSD is failing and marking the OSD as down.

2016-01-28 22:05:18.983858 7fdb2f6e6700 -1 osd.11 11429 *** Got signal Terminated ***
2016-01-28 22:05:18.983922 7fdb2f6e6700  0 osd.11 11429 prepare_to_stop telling mon we are shutting down


--------------------------------------------------------------------------------[cephqe11][INFO  ] installing packages on cephqe11
[cephqe11][INFO  ] Running command: sudo yum -y install ceph-selinux
[cephqe11][DEBUG ] Loaded plugins: product-id, search-disabled-repos, subscription-manager
[cephqe11][DEBUG ] Resolving Dependencies
[cephqe11][DEBUG ] --> Running transaction check
[cephqe11][DEBUG ] ---> Package ceph-selinux.x86_64 1:0.94.5-4.el7cp will be installed
[cephqe11][DEBUG ] --> Finished Dependency Resolution
[cephqe11][DEBUG ]
[cephqe11][DEBUG ] Dependencies Resolved
[cephqe11][DEBUG ]
[cephqe11][DEBUG ] ================================================================================
[cephqe11][DEBUG ]  Package       Arch    Version           Repository                        Size
[cephqe11][DEBUG ] ================================================================================
[cephqe11][DEBUG ] Installing:
[cephqe11][DEBUG ]  ceph-selinux  x86_64  1:0.94.5-4.el7cp  rhel-7-server-ceph-1.3-mon-test   36 k
[cephqe11][DEBUG ]
[cephqe11][DEBUG ] Transaction Summary
[cephqe11][DEBUG ] ================================================================================
[cephqe11][DEBUG ] Install  1 Package
[cephqe11][DEBUG ]
[cephqe11][DEBUG ] Total download size: 36 k
[cephqe11][DEBUG ] Installed size: 97 k
[cephqe11][DEBUG ] Downloading packages:
[cephqe11][DEBUG ] Running transaction check
[cephqe11][DEBUG ] Running transaction test
[cephqe11][DEBUG ] Transaction test succeeded
[cephqe11][DEBUG ] Running transaction
[cephqe11][WARNIN] No data was received after 300 seconds, disconnecting...

Comment 2 Ken Dreyer (Red Hat) 2016-01-29 18:06:03 UTC

Boris, could you please look into this bug? I will try reproducing it.

Comment 3 Boris Ranto 2016-02-01 11:58:22 UTC

I suppose what happened here is that ceph-deploy has got way too low a time-out value for this command to run its post install script successfully. Afterwards, ceph-deploy will kill the remote command somewhere in the middle leaving the machine in an inconsistent state -- we do stop mon/osd in the post script for relabel. Your machine should be ok if you simply bring the osd back up again on the remote machine (you might also need to relabel the ceph-related directories, though).

Generally, it is easy for almost any installation command to take more than 5 minutes. Given that we can do a relabel of quite a huge amount of data, the install procedure can take probably even hours for it to complete in case of ceph-selinux package.

-> re-targeting to ceph-deploy

Comment 4 Boris Ranto 2016-02-01 12:01:10 UTC

i.e.: the suggested solution is much higher time-out value for ceph-deploy pkg --install remote commands

Comment 5 Alfredo Deza 2016-02-01 12:43:39 UTC

When ceph-deploy closes the connection it doesn't `kill` the remote command/process: it just disconnects.

If an OSD is getting marked down when installing selinux, it certainly cannot be because of ceph-deploy disconnecting after 300 seconds.

To demonstrate that the command will still hold (specifically running yum):

    $ ipython
    Python 2.7.8 (v2.7.8:ee879c0ffa11, Jun 29 2014, 21:07:35)
    Type "copyright", "credits" or "license" for more information.
    
    IPython 4.0.1 -- An enhanced Interactive Python.
    ?         -> Introduction and overview of IPython's features.
    %quickref -> Quick reference.
    help      -> Python's own help system.
    object?   -> Details about 'object', use 'object??' for extra details.
    
    In [1]: from ceph_deploy import hosts
    
    In [2]: distro = hosts.get('node6')
    
    # Close the connection after one second
    In [3]: distro.conn.global_timeout = 1
    
    # Install 'vim' on the remote node (node6 in this case)
    In [4]: distro.packager.install(['vim'])
    
    
    # Now ssh into node6 and see if yum is running, the ipython session above is now
    # closed (this is the same terminal session)

    (ceph-deploy)papaya-2[master] ..ream/ceph-deploy/ceph_deploy $ ssh node6
    Last login: Fri Jan 29 20:11:21 2016 from 192.168.111.1
    Welcome to your Packer-built virtual machine.
    [vagrant@localhost ~]$ ps aux | grep yum
    root      3890  1.3  4.5 421152 23028 ?        Sl   12:34   0:00 /usr/bin/python /bin/yum -y install vim
    vagrant   3918  0.0  0.1 112640   968 pts/0    R+   12:34   0:00 grep --color=auto yum
    [vagrant@localhost ~]$ which vim
    /usr/bin/which: no vim in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/vagrant/.local/bin:/home/vagrant/bin)
    [vagrant@localhost ~]$ date
    Mon Feb  1 12:34:37 UTC 2016
    
    # yum is still installing vim. Let's wait and see if it completes


    [vagrant@localhost ~]$ ps aux | grep yum
    root      3890  0.5  5.1 429520 26028 ?        Sl   12:34   0:00 /usr/bin/python /bin/yum -y install vim
    vagrant   3929  0.0  0.1 112640   968 pts/0    R+   12:35   0:00 grep --color=auto yum
    [vagrant@localhost ~]$ date
    Mon Feb  1 12:35:06 UTC 2016

    # still going after a minute
    
    [vagrant@localhost ~]$ ps aux | grep yum
    vagrant   4012  0.0  0.1 112640   968 pts/0    R+   12:36   0:00 grep --color=auto yum
    [vagrant@localhost ~]$ date
    Mon Feb  1 12:36:36 UTC 2016
    [vagrant@localhost ~]$ which vim
    /usr/bin/vim

    # completed after about 2 minutes


There is no easy way I can think of where ceph-deploy can detect if in fact a remote command that is running is stuck/hung or if it should just wait.

If a higher timeout would be set, we would undoubtedly hit a similar problem for a larger system that may have a slow network for example.

But then again, I highly doubt this is a problem with ceph-deploy. Rule of thumb for ceph-deploy issues is to try and execute the same command ceph-deploy is running remotely and see if the same behavior persists or not.

Comment 6 Ken Dreyer (Red Hat) 2016-02-01 23:03:32 UTC

I'm not 100% sure that yum will continue like that in all cases if you terminate the SSH connection. I know it worked for Vim above, but I've seen it fail midway on other packages. One example is the kernel RPM, which has a long-running %post task (regenerating initrd). I've broken systems before when I was yum-updating a kernel package and lost my SSH connection. It strikes me as similar to the long-running ceph-selinux %post task.

Boris, should we change the documentation here to avoid "ceph-deploy pkg --install ceph-selinux", and have the users just run the "yum install" command by hand on each node ?

Comment 7 Boris Ranto 2016-02-02 00:40:30 UTC

Created attachment 1120295 [details]
Increase timeout for pkg install procedure patch

We can fix this in code if we try to increase the default timeout value for 'ceph-deploy pkg --install' commands. I'll send the patch upsream some time tomorrow (well, later today for me). In the meantime it would be nice if this actually got reviewed, i.e. to be more sure that the timeout kw argument actually gets propagated to the remoto.run statement, etc.

Comment 8 Boris Ranto 2016-02-02 00:48:17 UTC

Hm, sorry, completely missed alfredo's comment:

My other guess would then be that while the yum process did not get killed, it did not complete either so the yum was still running and the osd was still down. It should be brought up automatically after a while unless you powered off the machine. This would be quite easy to check by steps described in Alfredo's comment (simply connect to the machine and check that the yum process is still running).

Either way, we should be able to fix this for the user if we increase the timeout value. As for the exact safe value, I myself would wonder how long can a file system relabel take on a really huge production system. Hopefully, the 5 hours that I used should be long enough.

Comment 9 Boris Ranto 2016-02-02 01:31:36 UTC

From what I can see here

http://pydoc.net/Python/remoto/0.0.16/remoto.log/

we should be able to use timeout=-1 to wait forever so I'd suggest using that for the 'ceph-deploy pkg --install' command instead of the ~random 5 hour long maximal wait.

Comment 10 Alfredo Deza 2016-02-02 11:48:55 UTC

There is in fact some possibility that disconnecting early may be causing this.

However, it is critical to determine if that is the case. To verify this as a bug that requires a change in ceph-deploy we need to make sure that:

* installing ceph-selinux on the actual server via yum (sans ceph-deploy) works correctly, not marking OSDs down

Comment 11 Boris Ranto 2016-02-02 12:26:21 UTC

Created attachment 1120401 [details]
Disable timeout for pkg install/remove patch

OK, after some testing, the value of -1 won't help here. The timeout value has to be None for execnet (which remoto uses) to wait forever. However, you can't simply do timeout = None (*) if you have a non-None global_timeout. However, you can temporarily disable global_timeout in the install function to enable "wait forever" mode.

(*) not to mention that it wouldn't propagate all the way from that line as I recently found out but that could be quite easily patched

@Alfredo: We do stop osd services in ceph-selinux post script when we do relabel so it is no surprise that the OSD is down while yum is still running in the background. The same would go for MON if you checked the health of the cluster while yum is still running ceph-selinux post script. However, we also start the processes (restore them to their original state) once the relabelling procedure completes. This was also verified for MONs where it worked just fine. It is just the timeout that confuses the users in this case.

Comment 12 Boris Ranto 2016-02-02 12:58:09 UTC

Upstream PR:

https://github.com/ceph/ceph-deploy/pull/384

Comment 13 Boris Ranto 2016-02-03 21:55:09 UTC

dist-git commits related to build ceph-deploy-1.5.27.4-2.el7cp:
http://pkgs.devel.redhat.com/cgit/rpms/ceph-deploy/commit/?h=ceph-1.3-rhel-7&id=3195d69b0996a4f7da0c6006b9880b17e2c07b76

Comment 15 Tanay Ganguly 2016-02-06 16:15:31 UTC

Strangely, again it happened almost same thing.
The installation got stuck for a while and then the OSD was marked Down, one-by-one.

But after sometime, the OSD again came back UP and the installation process completed.

But for a while all the OSD daemon went DOWN one-by-one

ceph-common-0.94.5-8.el7cp.x86_64
ceph-0.94.5-8.el7cp.x86_64
ceph-selinux-0.94.5-8.el7cp.x86_64
ceph-osd-0.94.5-8.el7cp.x86_64

OSD Log Snippet:

Attaching one of the OSD log ( which got down when the installation process was in progress)


----------------------------------------------------------------------------

2016-02-07 02:48:35.841584 7f22db9aa7c0  0 ceph version 0.94.5-8.el7cp (deef183a81111fa5e128ec88c90a32c9587c615d), process ceph-osd, pid 16370
2016-02-07 02:48:35.866872 7f22db9aa7c0  0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2016-02-07 02:48:35.911979 7f22db9aa7c0  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is supported and appears to work
2016-02-07 02:48:35.911987 7f22db9aa7c0  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2016-02-07 02:48:36.850460 7f22db9aa7c0  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2016-02-07 02:48:36.850592 7f22db9aa7c0  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf
2016-02-07 02:48:38.132515 7f22db9aa7c0  0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2016-02-07 02:48:38.418168 7f22db9aa7c0  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 19: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2016-02-07 02:48:39.562723 7f22db9aa7c0  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 19: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2016-02-07 02:48:39.563362 7f22db9aa7c0  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2016-02-07 02:48:39.569030 7f22db9aa7c0  0 osd.0 16466 crush map has features 283675107524608, adjusting msgr requires for clients
2016-02-07 02:48:39.569040 7f22db9aa7c0  0 osd.0 16466 crush map has features 283675107524608 was 8705, adjusting msgr requires for mons
2016-02-07 02:48:39.569046 7f22db9aa7c0  0 osd.0 16466 crush map has features 283675107524608, adjusting msgr requires for osds
2016-02-07 02:48:39.569059 7f22db9aa7c0  0 osd.0 16466 load_pgs
2016-02-07 02:48:41.023780 7f22db9aa7c0  0 osd.0 16466 load_pgs opened 889 pgs
2016-02-07 02:48:41.024637 7f22db9aa7c0 -1 osd.0 16466 log_to_monitors {default=true}
2016-02-07 02:48:41.028277 7f22c8d3f700  0 osd.0 16466 ignoring osdmap until we have initialized
2016-02-07 02:48:41.028373 7f22c8d3f700  0 osd.0 16466 ignoring osdmap until we have initialized
2016-02-07 02:48:43.059787 7f22db9aa7c0  0 osd.0 16466 done with init, starting boot process
2016-02-07 02:48:45.126297 7f22a5622700  0 -- 10.70.44.50:6801/16370 >> 10.70.44.50:6805/16541 pipe(0x11f21000 sd=121 :6801 s=0 pgs=0 cs=0 l=0 c=0x110f2000).accept connect_seq 0 vs existing 0 state connecting

Comment 16 Tanay Ganguly 2016-02-06 16:16:12 UTC

Created attachment 1121688 [details]
Mon log

Comment 17 Tanay Ganguly 2016-02-06 16:17:48 UTC

Created attachment 1121689 [details]
OSd log

Comment 18 Boris Ranto 2016-02-08 11:37:57 UTC

That is nothing new. We do stop ceph daemons (both mon and osd) while the package is being installated to avoidhaving mislabelled files in the file system but we do bring them to their original state afterwards. It is not a bug, that is really an intended behaviour. We have already had an upstream discussion on that, here:

http://tracker.ceph.com/issues/13061

All we can fix here is to make ceph-deploy wait for the installation process to complete which we did (notice that the ceph-ceploy command did not time-out this time). Maybe, we should take special care to document it.

The reason why we have to wait so long is that we need to re-label a pretty big file system -- this means accessing a lot of files that Ceph has previously created (loads of disk reads) and changing their metadata (loads of disk writes). This does not happen every time the package is installed, only when new Ceph SELinux policy is detected and it is one of the reasons I pushed SELinux in 1.3.2 as an opt-in and not an opt-out.

btw: I wonder, how long did the installation process take on your test machine(s)? I heard it can take even more than an hour.

Comment 19 Tanay Ganguly 2016-02-08 17:08:47 UTC

No it took 2-3 minutes of time and it didn't time out.

Documentation team:
Can this be documented in Installation guide saying that:

If the ceph-selinux package is installed later in ceph cluster, the ceph-selinux package installation can take time ( Time can be different based on the size of the cluster/objects/files) to complete its installation.

It will also reset the OSD/MON daemons, which means they will be stopped and started back again.

Boris,
Does the content look good, please modify if there is something needed to be changed/added,

Thanks,
Tanay

Comment 20 Boris Ranto 2016-02-08 17:34:30 UTC

We should probably add more details and make it a bit more general like:


'All the Ceph daemons will be down for the time the ceph-selinux package is being installed. Therefore, your cluster will not be able to serve any data at this point.

This operation is necessary in order to update the metadata of the files located on the underlying file system and to make Ceph daemons run with the correct context.

This operation may easily take several minutes depending on the size and speed of the underlying storage.'

Comment 21 Harish NV Rao 2016-02-09 09:13:34 UTC

Hi Federico,

As per comment 18, the behavior observed in this bz can be documented.

Comment 20 has the doc text.

Please let us know if the doc text mentioned in comment 20 is fine with you.

We will move this BZ to doc team once you are ok with the doc text.

Thanks,
Harish

Comment 22 Federico Lucifredi 2016-02-11 15:00:22 UTC

Very nicely done Boris. I agree, we need to doc this.

Let's add this to the docs *and* to the release notes.Outline is #comment #20.

NEEDINFO Anjana to decide where this goes in the docs.

Comment 23 Harish NV Rao 2016-02-11 15:54:52 UTC

Thanks Federico!

Moving this defect to assigned state.

Comment 26 Tanay Ganguly 2016-02-14 04:26:56 UTC

Marking it as Verified, as the content is being added in the Installation Guide.

Comment 28 errata-xmlrpc 2016-02-29 14:45:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0313

Note You need to log in before you can comment on or make changes to this bug.