Bug 1278524
Summary: | 1.3.1: ceph-deploy mon destroy prints " UnboundLocalError: local variable 'status_args' referenced before assignment" | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Harish NV Rao <hnallurv> |
Component: | Build | Assignee: | Alfredo Deza <adeza> |
Status: | CLOSED ERRATA | QA Contact: | Vasishta <vashastr> |
Severity: | medium | Docs Contact: | Bara Ancincova <bancinco> |
Priority: | unspecified | ||
Version: | 1.3.1 | CC: | adeza, dmick, flucifre, gmeno, hnallurv, kdreyer, tserlin, vashastr |
Target Milestone: | rc | ||
Target Release: | 1.3.3 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | RHEL: ceph-deploy-1.5.36-1.el7cp Ubuntu: ceph-deploy_1.5.36-2redhat1 | Doc Type: | Bug Fix |
Doc Text: |
."ceph-deploy" now correctly removes directories of manually added monitors
Previously, an attempt to remove a manually added monitor node by using the `ceph-deploy mon destroy` command failed with the following error:
----
UnboundLocalError: local variable 'status_args' referenced before assignment"
----
The monitor was removed despite the error, however, `ceph-deploy` failed to remove the monitor configuration directory located in the `/var/lib/ceph/mon/` directory. With this update, `ceph-deploy` removes the monitor directory as expected.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2016-09-29 12:55:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1262054, 1372735 |
Description
Harish NV Rao
2015-11-05 17:04:56 UTC
Would you please help me understand the impact here. Did the monitor get removed? What was the exit code from ceph-deploy? Do the steps for manual removal work? http://docs.ceph.com/docs/v0.80.5/rados/operations/add-or-rm-mons/#removing-monitors (In reply to Gregory Meno from comment #2) > Would you please help me understand the impact here. Recommended command did not complete successfully (error messages were printed). I guess 'mon destroy' command does not work completely. Users may not be able to use this command. They may have to use 'non-ceph-deploy' steps to remove mon (which are not complete. please see below) > Did the monitor get removed? monitor was stopped. But I don't think the old monitor's /var/lib/ceph/ceph-mon_id directory was removed > What was the exit code from ceph-deploy? I am not sure how to get that. > Do the steps for manual removal work? > http://docs.ceph.com/docs/v0.80.5/rados/operations/add-or-rm-mons/#removing- > monitors I follow the doc: https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/cluster-size.adoc for my mon add/remove testing. It has following commands to remove the mon: Stop the monitor. : service ceph -a stop mon.{mon-id} Remove the monitor from the cluster. : ceph mon remove {mon-id} Remove the monitor entry from ceph.conf. The monitor stops and is removed from quorum list also after executing these steps. But when I try to add the mon on same node again using manual steps(non-ceph-deploy) mentioned in above doc, it failed at one of the commands: [cephuser@magna040 ~]$ sudo ceph-mon -i magna040 --mkfs --monmap temp/map-filename --keyring temp/key-filename '/var/lib/ceph/mon/ceph-magna040' already exists and is not empty: monitor may already exist After removing the '/var/lib/ceph/mon/ceph-magna040', add mon was successful. so the doc should contain a step to remove the above mentioned old dir before adding the mon on the same node. I will file a separate defect for that. This is the same old 'thread race on exit' bug that affects a number of our tools using ssh connection libraries. I will bet that it's intermittent and that the monitor removal has completed. Can you please definitively confirm that the monitor was removed (above you say "But I don't think the old monitor's /var/lib/ceph/ceph-mon_id directory was removed"; it should be possible to make that a certainty) (In reply to Dan Mick from comment #4) > This is the same old 'thread race on exit' bug that affects a number of our > tools using ssh connection libraries. I will bet that it's intermittent and > that the monitor removal has completed. Can you please definitively confirm > that the monitor was removed (above you say "But I don't think the old > monitor's /var/lib/ceph/ceph-mon_id directory was removed"; it should be > possible to make that a certainty) Re-tested thrice. /var/lib/ceph/ceph-mon_id directory is not removed. Mon Node: --------- [cephuser@magna040 ~]$ #before issuing destroy command [cephuser@magna040 ~]$ sudo service ceph status === mon.magna040 === mon.magna040: running {"version":"0.94.3"} [cephuser@magna040 ~]$ date ; ll /var/lib/ceph/mon/ Fri Nov 6 04:51:57 EST 2015 total 4 drwxr-xr-x. 3 root root 4096 Nov 6 04:49 ceph-magna040 [cephuser@magna040 ~]$ # After [cephuser@magna040 ~]$ sudo service ceph status === mon.magna040 === mon.magna040: not running. [cephuser@magna040 ~]$ date ; ll /var/lib/ceph/mon/ Fri Nov 6 04:52:27 EST 2015 total 4 drwxr-xr-x. 3 root root 4096 Nov 6 04:49 ceph-magna040 [cephuser@magna040 ~]$ date ; ll /var/lib/ceph/mon/ Fri Nov 6 04:52:29 EST 2015 total 4 drwxr-xr-x. 3 root root 4096 Nov 6 04:49 ceph-magna040 [cephuser@magna040 ~]$ date ; ll /var/lib/ceph/mon/ Fri Nov 6 04:52:30 EST 2015 total 4 drwxr-xr-x. 3 root root 4096 Nov 6 04:49 ceph-magna040 [cephuser@magna040 ~]$ date ; ll /var/lib/ceph/mon/ Fri Nov 6 04:52:32 EST 2015 total 4 drwxr-xr-x. 3 root root 4096 Nov 6 04:49 ceph-magna040 [cephuser@magna040 ~]$ date ; ll /var/lib/ceph/mon/ Fri Nov 6 04:53:02 EST 2015 total 4 drwxr-xr-x. 3 root root 4096 Nov 6 04:49 ceph-magna040 [cephuser@magna040 ~]$ date ; ll /var/lib/ceph/mon/ Fri Nov 6 04:54:02 EST 2015 total 4 drwxr-xr-x. 3 root root 4096 Nov 6 04:49 ceph-magna040 [cephuser@magna040 ~]$ date ; ll /var/lib/ceph/mon/ Fri Nov 6 05:07:36 EST 2015 total 4 drwxr-xr-x. 3 root root 4096 Nov 6 04:49 ceph-magna040 [cephuser@magna040 ~]$ Admin node: ----------- [cephuser@magna006 ceph-config]$ ceph-deploy mon destroy magna040 [ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephuser/ceph-config/cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.25): /usr/bin/ceph-deploy mon destroy magna040 [ceph_deploy.mon][DEBUG ] Removing mon from magna040 [magna040][DEBUG ] connection detected need for sudo [magna040][DEBUG ] connected to host: magna040 [magna040][DEBUG ] detect platform information from remote host [magna040][DEBUG ] detect machine type [magna040][DEBUG ] get remote short hostname [magna040][INFO ] Running command: sudo ceph --cluster=ceph -n mon. -k /var/lib/ceph/mon/ceph-magna040/keyring mon remove magna040 [magna040][WARNIN] removed mon.magna040 at 10.8.128.40:6789/0, there are now 2 monitors [magna040][INFO ] polling the daemon to verify it stopped [ceph_deploy][ERROR ] Traceback (most recent call last): [ceph_deploy][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/util/decorators.py", line 69, in newfunc [ceph_deploy][ERROR ] return f(*a, **kw) [ceph_deploy][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/cli.py", line 162, in _main [ceph_deploy][ERROR ] return args.func(args) [ceph_deploy][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 447, in mon [ceph_deploy][ERROR ] mon_destroy(args) [ceph_deploy][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 386, in mon_destroy [ceph_deploy][ERROR ] hostname, [ceph_deploy][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 347, in destroy_mon [ceph_deploy][ERROR ] if is_running(conn, status_args): [ceph_deploy][ERROR ] UnboundLocalError: local variable 'status_args' referenced before assignment [ceph_deploy][ERROR ] [cephuser@magna006 ceph-config]$ This will only happen on a systemd box. The logic for destroying a monitor doesn't cover that use case so that is why this variable is left out causing it to fail. Looks like mon destroy fails only when we try to destroy a mon that was manually added with the workaround provided by Kefu in comment 69 in 1231203. If I use mon destroy on a mon that was added using 'ceph-deploy mon add' command then it works fine as below. [cephuser@magna006 ceph-config]$ date; ceph-deploy mon destroy magna040 Fri Nov 6 07:00:18 EST 2015 [ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephuser/ceph-config/cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.25): /usr/bin/ceph-deploy mon destroy magna040 [ceph_deploy.mon][DEBUG ] Removing mon from magna040 [magna040][DEBUG ] connection detected need for sudo [magna040][DEBUG ] connected to host: magna040 [magna040][DEBUG ] detect platform information from remote host [magna040][DEBUG ] detect machine type [magna040][DEBUG ] get remote short hostname [magna040][INFO ] Running command: sudo ceph --cluster=ceph -n mon. -k /var/lib/ceph/mon/ceph-magna040/keyring mon remove magna040 [magna040][WARNIN] removed mon.magna040 at 10.8.128.40:6789/0, there are now 2 monitors [magna040][INFO ] polling the daemon to verify it stopped [magna040][INFO ] Running command: sudo service ceph status mon.magna040 [magna040][INFO ] polling the daemon to verify it stopped [magna040][INFO ] Running command: sudo service ceph status mon.magna040 [magna040][INFO ] Running command: sudo mkdir -p /var/lib/ceph/mon-removed [magna040][DEBUG ] move old monitor data [cephuser@magna006 ceph-config]$ On mon node: ------------- [cephuser@magna040 mon]$ ll total 0 [cephuser@magna040 mon]$ pwd /var/lib/ceph/mon [cephuser@magna040 mon]$ sudo service ceph status [cephuser@magna040 mon]$ [cephuser@magna040 ceph]$ pwd /var/lib/ceph [cephuser@magna040 ceph]$ ll total 16 drwxr-xr-x. 2 root root 4096 Sep 30 17:20 bootstrap-osd drwxr-xr-x. 3 root root 4096 Nov 6 07:08 mon drwxr-xr-x. 4 root root 4096 Nov 6 06:59 mon-removed drwxr-xr-x. 2 root root 4096 Nov 6 07:08 tmp Can't we remove a mon added via manual method using ceph-deploy destroy command? (In reply to Harish NV Rao from comment #7) > Can't we remove a mon added via manual method using ceph-deploy destroy > command? Alfredo, mind looking into this latest comment #7 above? We can certainly remove a monitor manually, without ceph-deploy. However, the problem I am seeing is that ceph-deploy is broken here for systemd *when using `mon destroy`*. If we are OK destroying manually that is fine, but this needs to be fixed. No? Given comment #7, looks like only manual addition/removal causes this problem, while mon add / mon destroy via ceph-deploy works correctly. If this is indeed the case, I am ok pushing this fix to 1.3.2 and adding to release notes known issues. @alfredo, is this correct? Looks so from Harish's test (#7). Confirmed with adeza. pushing to 1.3.2 Adding this bug to release notes / known issues. This is now fixed in upstream with: https://github.com/ceph/ceph-deploy/commit/e3e1f99629bff50b69c32ba2d2ac7f8038ab8ad3 It just needs to get backported This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions Alfredo, e3e1f99629bff50b69c32ba2d2ac7f8038ab8ad3 upstream works in the systemd case, but in this bug's case, Hammer uses SysV scripts. Is that going to be an issue? (In reply to Ken Dreyer (Red Hat) from comment #17) > Alfredo, e3e1f99629bff50b69c32ba2d2ac7f8038ab8ad3 upstream works in the > systemd case, but in this bug's case, Hammer uses SysV scripts. Is that > going to be an issue? The error was because systemd was not covered in the logic, so the variable was never assigned. I don't think this situation was possible in Hammer because systemd is not supported there. This shouldn't have been a problem in Hammer and the fix should not cause issues for Hammer either. Is there any indication that this issue was happening in Hammer? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-1972.html |