Bug 1278524

Summary: 1.3.1: ceph-deploy mon destroy prints " UnboundLocalError: local variable 'status_args' referenced before assignment"
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Harish NV Rao <hnallurv>
Component: BuildAssignee: Alfredo Deza <adeza>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: medium Docs Contact: Bara Ancincova <bancinco>
Priority: unspecified    
Version: 1.3.1CC: adeza, dmick, flucifre, gmeno, hnallurv, kdreyer, tserlin, vashastr
Target Milestone: rc   
Target Release: 1.3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-deploy-1.5.36-1.el7cp Ubuntu: ceph-deploy_1.5.36-2redhat1 Doc Type: Bug Fix
Doc Text:
."ceph-deploy" now correctly removes directories of manually added monitors Previously, an attempt to remove a manually added monitor node by using the `ceph-deploy mon destroy` command failed with the following error: ---- UnboundLocalError: local variable 'status_args' referenced before assignment" ---- The monitor was removed despite the error, however, `ceph-deploy` failed to remove the monitor configuration directory located in the `/var/lib/ceph/mon/` directory. With this update, `ceph-deploy` removes the monitor directory as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-29 12:55:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1262054, 1372735    

Description Harish NV Rao 2015-11-05 17:04:56 UTC
Description of problem:

ceph-deploy mon destroy prints " UnboundLocalError: local variable 'status_args' referenced before assignment". 

Version-Release number of selected component (if applicable): RHEL 7.1
[cephuser@magna006 ~]$ ceph-deploy --version
1.5.25

How reproducible:


Steps to Reproduce:

1. issue command ceph-deploy mon destroy <mon>

Actual results:

[cephuser@magna006 ceph-config]$ ceph-deploy mon destroy magna040
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephuser/ceph-config/cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /usr/bin/ceph-deploy mon destroy magna040
[ceph_deploy.mon][DEBUG ] Removing mon from magna040
[magna040][DEBUG ] connection detected need for sudo
[magna040][DEBUG ] connected to host: magna040 
[magna040][DEBUG ] detect platform information from remote host
[magna040][DEBUG ] detect machine type
[magna040][DEBUG ] get remote short hostname
[magna040][INFO  ] Running command: sudo ceph --cluster=ceph -n mon. -k /var/lib/ceph/mon/ceph-magna040/keyring mon remove magna040
[magna040][WARNIN] removed mon.magna040 at 10.8.128.40:6789/0, there are now 2 monitors
[magna040][INFO  ] polling the daemon to verify it stopped
[ceph_deploy][ERROR ] Traceback (most recent call last):
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/util/decorators.py", line 69, in newfunc
[ceph_deploy][ERROR ]     return f(*a, **kw)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/cli.py", line 162, in _main
[ceph_deploy][ERROR ]     return args.func(args)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 447, in mon
[ceph_deploy][ERROR ]     mon_destroy(args)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 386, in mon_destroy
[ceph_deploy][ERROR ]     hostname,
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 347, in destroy_mon
[ceph_deploy][ERROR ]     if is_running(conn, status_args):
[ceph_deploy][ERROR ] UnboundLocalError: local variable 'status_args' referenced before assignment
[ceph_deploy][ERROR ] 


Expected results:
No error message

Additional info:

Comment 2 Christina Meno 2015-11-05 18:45:01 UTC
Would you please help me understand the impact here.
Did the monitor get removed?
What was the exit code from ceph-deploy?

Do the steps for manual removal work?
http://docs.ceph.com/docs/v0.80.5/rados/operations/add-or-rm-mons/#removing-monitors

Comment 3 Harish NV Rao 2015-11-05 19:22:59 UTC
(In reply to Gregory Meno from comment #2)
> Would you please help me understand the impact here.
Recommended command did not complete successfully (error messages were printed). I guess 'mon destroy' command does not work completely. Users may not be able to use this command. They may have to use 'non-ceph-deploy' steps to remove mon (which are not complete. please see below)

> Did the monitor get removed?
monitor was stopped. But I don't think the old monitor's /var/lib/ceph/ceph-mon_id directory was removed

> What was the exit code from ceph-deploy?
I am not sure how to get that.
 
> Do the steps for manual removal work?
> http://docs.ceph.com/docs/v0.80.5/rados/operations/add-or-rm-mons/#removing-
> monitors

I follow the doc: https://gitlab.cee.redhat.com/jowilkin/red-hat-ceph-storage-administration-guide/blob/v1.3/cluster-size.adoc for my mon add/remove testing. It has following commands to remove the mon:


    Stop the monitor. :

     service ceph -a stop mon.{mon-id}

    Remove the monitor from the cluster. :

     ceph mon remove {mon-id}

    Remove the monitor entry from ceph.conf.

The monitor stops and is removed from quorum list also after executing these steps.

But when I try to add the mon on same node again using manual steps(non-ceph-deploy) mentioned in above doc, it failed at one of the commands: 
[cephuser@magna040 ~]$ sudo ceph-mon -i magna040 --mkfs --monmap temp/map-filename --keyring temp/key-filename
'/var/lib/ceph/mon/ceph-magna040' already exists and is not empty: monitor may already exist

After removing the '/var/lib/ceph/mon/ceph-magna040', add mon was successful.

so the doc should contain a step to remove the above mentioned old dir before adding the mon on the same node. I will file a separate defect for that.

Comment 4 Dan Mick 2015-11-06 05:35:46 UTC
This is the same old 'thread race on exit' bug that affects a number of our tools using ssh connection libraries.  I will bet that it's intermittent and that the monitor removal has completed.  Can you please definitively confirm that the monitor was removed (above you say "But I don't think the old monitor's /var/lib/ceph/ceph-mon_id directory was removed"; it should be possible to make that a certainty)

Comment 5 Harish NV Rao 2015-11-06 10:51:12 UTC
(In reply to Dan Mick from comment #4)
> This is the same old 'thread race on exit' bug that affects a number of our
> tools using ssh connection libraries.  I will bet that it's intermittent and
> that the monitor removal has completed.  Can you please definitively confirm
> that the monitor was removed (above you say "But I don't think the old
> monitor's /var/lib/ceph/ceph-mon_id directory was removed"; it should be
> possible to make that a certainty)

Re-tested thrice. /var/lib/ceph/ceph-mon_id directory is not removed.

Mon Node:
---------
[cephuser@magna040 ~]$ #before issuing destroy command
[cephuser@magna040 ~]$ sudo service ceph status
=== mon.magna040 === 
mon.magna040: running {"version":"0.94.3"}
[cephuser@magna040 ~]$ date ;  ll /var/lib/ceph/mon/
Fri Nov  6 04:51:57 EST 2015
total 4
drwxr-xr-x. 3 root root 4096 Nov  6 04:49 ceph-magna040

[cephuser@magna040 ~]$ # After
[cephuser@magna040 ~]$ sudo service ceph status
=== mon.magna040 === 
mon.magna040: not running.
[cephuser@magna040 ~]$ date ;  ll /var/lib/ceph/mon/
Fri Nov  6 04:52:27 EST 2015
total 4
drwxr-xr-x. 3 root root 4096 Nov  6 04:49 ceph-magna040
[cephuser@magna040 ~]$ date ;  ll /var/lib/ceph/mon/
Fri Nov  6 04:52:29 EST 2015
total 4
drwxr-xr-x. 3 root root 4096 Nov  6 04:49 ceph-magna040
[cephuser@magna040 ~]$ date ;  ll /var/lib/ceph/mon/
Fri Nov  6 04:52:30 EST 2015
total 4
drwxr-xr-x. 3 root root 4096 Nov  6 04:49 ceph-magna040
[cephuser@magna040 ~]$ date ;  ll /var/lib/ceph/mon/
Fri Nov  6 04:52:32 EST 2015
total 4
drwxr-xr-x. 3 root root 4096 Nov  6 04:49 ceph-magna040
[cephuser@magna040 ~]$ date ;  ll /var/lib/ceph/mon/
Fri Nov  6 04:53:02 EST 2015
total 4
drwxr-xr-x. 3 root root 4096 Nov  6 04:49 ceph-magna040
[cephuser@magna040 ~]$ date ;  ll /var/lib/ceph/mon/
Fri Nov  6 04:54:02 EST 2015
total 4
drwxr-xr-x. 3 root root 4096 Nov  6 04:49 ceph-magna040
[cephuser@magna040 ~]$ date ;  ll /var/lib/ceph/mon/
Fri Nov  6 05:07:36 EST 2015
total 4
drwxr-xr-x. 3 root root 4096 Nov  6 04:49 ceph-magna040
[cephuser@magna040 ~]$ 


Admin node:
-----------
[cephuser@magna006 ceph-config]$ ceph-deploy mon destroy magna040

[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephuser/ceph-config/cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /usr/bin/ceph-deploy mon destroy magna040
[ceph_deploy.mon][DEBUG ] Removing mon from magna040
[magna040][DEBUG ] connection detected need for sudo
[magna040][DEBUG ] connected to host: magna040 
[magna040][DEBUG ] detect platform information from remote host
[magna040][DEBUG ] detect machine type
[magna040][DEBUG ] get remote short hostname
[magna040][INFO  ] Running command: sudo ceph --cluster=ceph -n mon. -k /var/lib/ceph/mon/ceph-magna040/keyring mon remove magna040
[magna040][WARNIN] removed mon.magna040 at 10.8.128.40:6789/0, there are now 2 monitors
[magna040][INFO  ] polling the daemon to verify it stopped
[ceph_deploy][ERROR ] Traceback (most recent call last):
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/util/decorators.py", line 69, in newfunc
[ceph_deploy][ERROR ]     return f(*a, **kw)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/cli.py", line 162, in _main
[ceph_deploy][ERROR ]     return args.func(args)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 447, in mon
[ceph_deploy][ERROR ]     mon_destroy(args)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 386, in mon_destroy
[ceph_deploy][ERROR ]     hostname,
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 347, in destroy_mon
[ceph_deploy][ERROR ]     if is_running(conn, status_args):
[ceph_deploy][ERROR ] UnboundLocalError: local variable 'status_args' referenced before assignment
[ceph_deploy][ERROR ] 
[cephuser@magna006 ceph-config]$

Comment 6 Alfredo Deza 2015-11-06 12:36:50 UTC
This will only happen on a systemd box. The logic for destroying a monitor doesn't cover that use case so that is why this variable is left out causing it to fail.

Comment 7 Harish NV Rao 2015-11-06 12:38:00 UTC
Looks like mon destroy fails only when we try to destroy a mon that was manually added with the workaround provided by Kefu in comment 69 in 1231203.

If I use mon destroy on a mon that was added using 'ceph-deploy mon add' command then it works fine as below.

[cephuser@magna006 ceph-config]$ date; ceph-deploy mon destroy magna040
Fri Nov  6 07:00:18 EST 2015
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephuser/ceph-config/cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /usr/bin/ceph-deploy mon destroy magna040
[ceph_deploy.mon][DEBUG ] Removing mon from magna040
[magna040][DEBUG ] connection detected need for sudo
[magna040][DEBUG ] connected to host: magna040 
[magna040][DEBUG ] detect platform information from remote host
[magna040][DEBUG ] detect machine type
[magna040][DEBUG ] get remote short hostname
[magna040][INFO  ] Running command: sudo ceph --cluster=ceph -n mon. -k /var/lib/ceph/mon/ceph-magna040/keyring mon remove magna040
[magna040][WARNIN] removed mon.magna040 at 10.8.128.40:6789/0, there are now 2 monitors
[magna040][INFO  ] polling the daemon to verify it stopped
[magna040][INFO  ] Running command: sudo service ceph status mon.magna040
[magna040][INFO  ] polling the daemon to verify it stopped
[magna040][INFO  ] Running command: sudo service ceph status mon.magna040
[magna040][INFO  ] Running command: sudo mkdir -p /var/lib/ceph/mon-removed
[magna040][DEBUG ] move old monitor data
[cephuser@magna006 ceph-config]$ 

On mon node:
-------------
[cephuser@magna040 mon]$ ll
total 0
[cephuser@magna040 mon]$ pwd
/var/lib/ceph/mon
[cephuser@magna040 mon]$ sudo service ceph status
[cephuser@magna040 mon]$ 

[cephuser@magna040 ceph]$ pwd
/var/lib/ceph
[cephuser@magna040 ceph]$ ll
total 16
drwxr-xr-x. 2 root root 4096 Sep 30 17:20 bootstrap-osd
drwxr-xr-x. 3 root root 4096 Nov  6 07:08 mon
drwxr-xr-x. 4 root root 4096 Nov  6 06:59 mon-removed
drwxr-xr-x. 2 root root 4096 Nov  6 07:08 tmp

Can't we remove a mon added via manual method using ceph-deploy destroy command?

Comment 8 Ken Dreyer (Red Hat) 2015-11-07 00:31:08 UTC
(In reply to Harish NV Rao from comment #7)
> Can't we remove a mon added via manual method using ceph-deploy destroy
> command?

Alfredo, mind looking into this latest comment #7 above?

Comment 9 Alfredo Deza 2015-11-09 12:14:09 UTC
We can certainly remove a monitor manually, without ceph-deploy. However, the problem I am seeing is that ceph-deploy is broken here for systemd *when using `mon destroy`*.

If we are OK destroying manually that is fine, but this needs to be fixed. No?

Comment 10 Federico Lucifredi 2015-11-09 23:27:47 UTC
Given comment #7, looks like only manual addition/removal causes this problem, while mon add / mon destroy via ceph-deploy works correctly.

If this is indeed the case, I am ok pushing this fix to 1.3.2 and adding to release notes known issues.

@alfredo, is this correct? Looks so from Harish's test (#7).

Comment 11 Federico Lucifredi 2015-11-09 23:32:38 UTC
Confirmed with adeza. pushing to 1.3.2

Adding this bug to release notes / known issues.

Comment 12 Alfredo Deza 2015-12-17 18:18:43 UTC
This is now fixed in upstream with:

https://github.com/ceph/ceph-deploy/commit/e3e1f99629bff50b69c32ba2d2ac7f8038ab8ad3

It just needs to get backported

Comment 15 Mike McCune 2016-03-28 22:11:59 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 17 Ken Dreyer (Red Hat) 2016-08-19 21:54:51 UTC
Alfredo, e3e1f99629bff50b69c32ba2d2ac7f8038ab8ad3 upstream works in the systemd case, but in this bug's case, Hammer uses SysV scripts. Is that going to be an issue?

Comment 18 Alfredo Deza 2016-08-22 12:54:57 UTC
(In reply to Ken Dreyer (Red Hat) from comment #17)
> Alfredo, e3e1f99629bff50b69c32ba2d2ac7f8038ab8ad3 upstream works in the
> systemd case, but in this bug's case, Hammer uses SysV scripts. Is that
> going to be an issue?

The error was because systemd was not covered in the logic, so the variable was never assigned. I don't think this situation was possible in Hammer because systemd is not supported there.

This shouldn't have been a problem in Hammer and the fix should not cause issues for Hammer either.

Is there any indication that this issue was happening in Hammer?

Comment 25 errata-xmlrpc 2016-09-29 12:55:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-1972.html