This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1282484 - [Ceph-deploy]: Ceph-deploy crashed while trying to remove a mon which is added manually
[Ceph-deploy]: Ceph-deploy crashed while trying to remove a mon which is adde...
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Installer (Show other bugs)
1.3.1
x86_64 Linux
unspecified Severity high
: rc
: 1.3.2
Assigned To: Alfredo Deza
ceph-qe-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-16 09:56 EST by shylesh
Modified: 2016-02-29 09:44 EST (History)
7 users (show)

See Also:
Fixed In Version: RHEL: ceph-deploy-1.5.27.4-3.el7cp Ubuntu: ceph-deploy_1.5.27.4-4redhat1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-29 09:44:10 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Attaching the Mon Logs (7.88 KB, text/plain)
2016-02-01 05:22 EST, Hemanth Kumar
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 14049 None None None Never

  None (edit)
Description shylesh 2015-11-16 09:56:50 EST
Description of problem:
I added a monitor manually, while trying to remove it using "ceph-deploy mon destroy", ceph-deploy crashed but mon was removed . /var/lib/ceph/mon/ceph-{id} was not cleaned up.

 ceph@magna059:~/ceph-config$ dpkg -l | grep ceph-deploy
ii  ceph-deploy                         1.5.27.3trusty                        all          Ceph-deploy is an easy to use configuration tool


ceph@magna059:~/ceph-config$ dpkg -l | grep ceph
ii  ceph-common                         0.94.3.3-1trusty                      amd64        common utilities to mount and interact with a ceph storage cluster




 
ceph@magna059:~/ceph-config$ ceph-deploy mon destroy magna110
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ceph/ceph-config/cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.27.3): /usr/bin/ceph-deploy mon destroy magna110
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : destroy
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f355e8b7cb0>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  mon                           : ['magna110']
[ceph_deploy.cli][INFO  ]  func                          : <function mon at 0x7f355e8a12a8>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.mon][DEBUG ] Removing mon from magna110
ceph@magna110's password:
[magna110][DEBUG ] connection detected need for sudo
ceph@magna110's password:
[magna110][DEBUG ] connected to host: magna110
[magna110][DEBUG ] detect platform information from remote host
[magna110][DEBUG ] detect machine type
[magna110][DEBUG ] get remote short hostname
[magna110][INFO  ] Running command: sudo ceph --cluster=ceph -n mon. -k /var/lib/ceph/mon/ceph-magna110/keyring mon remove magna110
[magna110][WARNIN] removed mon.magna110 at 10.8.128.110:6789/0, there are now 2 monitors
[magna110][INFO  ] polling the daemon to verify it stopped
[ceph_deploy][ERROR ] Traceback (most recent call last):
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/dist-packages/ceph_deploy/util/decorators.py", line 69, in newfunc
[ceph_deploy][ERROR ]     return f(*a, **kw)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/dist-packages/ceph_deploy/cli.py", line 169, in _main
[ceph_deploy][ERROR ]     return args.func(args)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/dist-packages/ceph_deploy/mon.py", line 444, in mon
[ceph_deploy][ERROR ]     mon_destroy(args)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/dist-packages/ceph_deploy/mon.py", line 382, in mon_destroy
[ceph_deploy][ERROR ]     hostname,
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/dist-packages/ceph_deploy/mon.py", line 343, in destroy_mon
[ceph_deploy][ERROR ]     if is_running(conn, status_args):
[ceph_deploy][ERROR ] UnboundLocalError: local variable 'status_args' referenced before assignment
[ceph_deploy][ERROR ]
Comment 2 Harish NV Rao 2015-11-16 17:08:16 EST
Shylesh, did manual way of removing mon work?
Comment 3 Federico Lucifredi 2015-11-16 17:31:27 EST
OK pushing this to 1.3.2.

Please determine if we should document this in the known issues before re-targeting to 1.3.2.
Comment 4 Harish NV Rao 2015-11-16 17:40:39 EST
This needs to be in known issues in release notes for 1.3.1.
Comment 5 Harish NV Rao 2015-11-16 17:42:25 EST
Shylesh, is this issue seen only on Ubuntu? Please confirm.
Comment 6 shylesh 2015-11-17 02:01:34 EST
(In reply to Harish NV Rao from comment #2)
> Shylesh, did manual way of removing mon work?

Harish,
Yes, manual removal of mon works fine as per the document
Comment 7 shylesh 2015-11-17 05:38:42 EST
@Harish,

This issue is also reproducible on RHEL as well as UBUNTU.


Here is the output from RHEL [root@cephqe3 ceph-config]# ceph-deploy mon destroy cephqe6
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/ceph-config/cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.27.3): /usr/bin/ceph-deploy mon destroy cephqe6
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : destroy
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x2877638>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  mon                           : ['cephqe6']
[ceph_deploy.cli][INFO  ]  func                          : <function mon at 0x2869d70>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.mon][DEBUG ] Removing mon from cephqe6
[cephqe6][DEBUG ] connected to host: cephqe6 
[cephqe6][DEBUG ] detect platform information from remote host
[cephqe6][DEBUG ] detect machine type
[cephqe6][DEBUG ] get remote short hostname
[cephqe6][INFO  ] Running command: ceph --cluster=ceph -n mon. -k /var/lib/ceph/mon/ceph-cephqe6/keyring mon remove cephqe6
[cephqe6][WARNIN] removed mon.cephqe6 at 10.70.44.46:6789/0, there are now 2 monitors
[cephqe6][INFO  ] polling the daemon to verify it stopped
[ceph_deploy][ERROR ] Traceback (most recent call last):
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/util/decorators.py", line 69, in newfunc
[ceph_deploy][ERROR ]     return f(*a, **kw)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/cli.py", line 169, in _main
[ceph_deploy][ERROR ]     return args.func(args)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 444, in mon
[ceph_deploy][ERROR ]     mon_destroy(args)
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 382, in mon_destroy
[ceph_deploy][ERROR ]     hostname,
[ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/mon.py", line 343, in destroy_mon
[ceph_deploy][ERROR ]     if is_running(conn, status_args):
[ceph_deploy][ERROR ] UnboundLocalError: local variable 'status_args' referenced before assignment
[ceph_deploy][ERROR ]
Comment 8 Alfredo Deza 2015-12-10 16:22:49 EST
This is happening because ceph-deploy doesn't have systemd support for destroying monitors.

A PR has been created and is ready for review: https://github.com/ceph/ceph-deploy/pull/375
Comment 9 Federico Lucifredi 2015-12-10 18:50:09 EST
See 1255495, need to use systemd/upstart to launch MONs.

*** This bug has been marked as a duplicate of bug 1255497 ***
Comment 10 Federico Lucifredi 2015-12-10 18:50:28 EST
See 1255495, need to use systemd/upstart to launch MONs.
Comment 11 Alfredo Deza 2015-12-11 07:13:21 EST
I actually don't think this is a duplicate. bug 1255497 is for starting monitors that are done with the `ceph` tool directly. This is not the case for *destroying* monitors (this ticket).

The codepaths are distinct. And the pull request that is opened addresses *this* issue but not the other one.

I would much rather prefer to keep these separate so that the work can be traced to a specific fix: cannot destroy monitors in a systemd server.

Upstream ticket: http://tracker.ceph.com/issues/14049
Comment 12 Ken Dreyer (Red Hat) 2015-12-14 11:08:19 EST
Let's see if we can get this fix into RHCS 1.3.2
Comment 13 Ken Dreyer (Red Hat) 2016-01-20 13:37:21 EST
Change to cherry-pick to ceph-1.3-rhel-patches in Gerrit: https://github.com/ceph/ceph-deploy/pull/375
Comment 15 Hemanth Kumar 2016-02-01 05:22 EST
Created attachment 1120059 [details]
Attaching the Mon Logs

On the Ubuntu Setup, though the Mon was removed there was an error while running the command..

Ceph Cluster status before destroying : http://pastebin.test.redhat.com/345144

When removing with ceph-deploy method :

ubuntu@magna012:~/install/ubuntu/u130/ceph-config$ ceph-deploy mon destroy magna051
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ubuntu/install/ubuntu/u130/ceph-config/cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.27.4): /usr/bin/ceph-deploy mon destroy magna051
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : destroy
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fdc53a7abd8>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  mon                           : ['magna051']
[ceph_deploy.cli][INFO  ]  func                          : <function mon at 0x7fdc53a552a8>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.mon][DEBUG ] Removing mon from magna051
[magna051][DEBUG ] connection detected need for sudo
[magna051][DEBUG ] connected to host: magna051 
[magna051][DEBUG ] detect platform information from remote host
[magna051][DEBUG ] detect machine type
[magna051][DEBUG ] get remote short hostname
[magna051][INFO  ] Running command: sudo ceph --cluster=ceph -n mon. -k /var/lib/ceph/mon/ceph-magna051/keyring mon remove magna051
[magna051][WARNIN] removed mon.magna051 at 10.8.128.51:6789/0, there are now 1 monitors
[ceph_deploy.mon][ERROR ] unsupported init system detected, cannot continue
[ceph_deploy][ERROR ] GenericError: Failed to destroy 1 monitors
-------------------------------------------------------------------------------

ceph cluster state after removing the cluster..

ubuntu@magna051:~/tmp$ sudo ceph quorum_status --format json-pretty

{
    "election_epoch": 3,
    "quorum": [
        0
    ],
    "quorum_names": [
        "magna015"
    ],
    "quorum_leader_name": "magna015",
    "monmap": {
        "epoch": 7,
        "fsid": "6b95f328-0a99-44ea-82cc-ef5f811868d4",
        "modified": "2016-02-01 10:10:53.734977",
        "created": "0.000000",
        "mons": [
            {
                "rank": 0,
                "name": "magna015",
                "addr": "10.8.128.15:6789\/0"
            }
        ]
    }
}
--------------------------------------------------------------------------------
ubuntu@magna012:~/install/ubuntu/u130/ceph-config$ ceph -s
    cluster 6b95f328-0a99-44ea-82cc-ef5f811868d4
     health HEALTH_OK
     monmap e7: 1 mons at {magna015=10.8.128.15:6789/0}
            election epoch 3, quorum 0 magna015
     osdmap e18189: 12 osds: 12 up, 12 in
      pgmap v325696: 768 pgs, 11 pools, 1098 MB data, 22383 kobjects
            670 GB used, 10441 GB / 11112 GB avail
                 768 active+clean
  client io 7536 B/s rd, 25267 B/s wr, 39 op/s
--------------------------------------------------------------------------------

attaching the mon logs which was added and removed.
Comment 16 Alfredo Deza 2016-02-01 07:21:51 EST
We are back again on a particular situation where ceph-deploy depends on having a monitor added via the ceph command line tools (ceph-mon in this case) which is supposed to leave a file that indicates what system was used to boot the monitor.

In the case of the server under test that path should be:

    /var/lib/ceph/mon/ceph-magna051/

But there isn't a 'sysvinit' or an 'upstart' system there, and because the server is not systemd the tool reports it can't determine what to use to stop the monitor.

Those file would be placed there when/if the monitor was started using the ceph-mon tool.

This will require a bit of work in ceph-deploy to get it right, possible workarounds are:

* do not add a monitor without ceph-deploy
* 'touch' the needed file to hint what init system should be used (this sounds the worst)
* add documentation to issue the right command to stop the monitor because ceph-deploy may not be able to do so

Bottom line is that ceph-deploy needs some work here to be able to handle this properly.
Comment 17 Ken Dreyer (Red Hat) 2016-02-01 18:10:24 EST
(In reply to Alfredo Deza from comment #16)
> * do not add a monitor without ceph-deploy

This is my preference.

Let's update the docs to reflect the new error string. Previous error string was

  UnboundLocalError: local variable 'status_args' referenced before assignment"

New error string:

  unsupported init system detected, cannot continue

The docs currently read:

> The monitor is removed despite the error, however, ceph-deploy fails to
> remove the monitor’s configuration directory located in the
> /var/lib/ceph/mon/ directory. To work around this issue, remove the
> monitor’s directory manually.

Alfredo, if that's correct, let's just go with that for RHCS 1.3.2
Comment 18 Alfredo Deza 2016-02-02 06:53:18 EST
Stepping through the code there is one notable addition: the mon process (daemon) is still hanging around. Not sure if this is a problem worth noting.

Adding a fallback which would mean a small code change *just for this operation* shouldn't be that hard and I could have something ready for review today.
Comment 19 Alfredo Deza 2016-02-02 08:04:03 EST
Decided to go ahead and implement this. Pull request is available here:

https://github.com/ceph/ceph-deploy/pull/385

Once that is merged we can cherry-pick and cut a new ceph-deploy release
Comment 20 Alfredo Deza 2016-02-03 13:36:25 EST
merged commit 379dfd2 into ceph-deploy/master

This involves more than one commit but fixes this problem.
Comment 22 Ken Dreyer (Red Hat) 2016-02-04 15:40:18 EST
Today in #ceph-devel Sage mentioned that this might have caused a regression on Ubuntu Trusty?

Alfredo, can you please verify that this is really working on Trusty (or create an additional PR if appropriate?)
Comment 23 Ken Dreyer (Red Hat) 2016-02-04 15:45:29 EST
You're way ahead of me. Looks like https://github.com/ceph/ceph-deploy/pull/386 is what we need? Could you please cherry-pick that to ceph-1.3-rhel-patches in Gerrit?
Comment 24 Ken Dreyer (Red Hat) 2016-02-05 21:47:33 EST
Today's build has all the necessary patches for this issue.
Comment 26 Hemanth Kumar 2016-02-08 13:10:42 EST
Works perfectly without any errors..

Moving to verified state.
Comment 28 errata-xmlrpc 2016-02-29 09:44:10 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0313

Note You need to log in before you can comment on or make changes to this bug.