1909831 – [cephadm]5.0 - Invalid inputs are accepted and services are deployed when using ceph orch apply mon/mgr commands for deploying services

Bug 1909831 - [cephadm]5.0 - Invalid inputs are accepted and services are deployed when using ceph orch apply mon/mgr commands for deploying services

Summary: [cephadm]5.0 - Invalid inputs are accepted and services are deployed when usi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Cephadm
Sub Component:
Version:	5.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	5.0
Assignee:	Adam King
QA Contact:	Manasa
Docs Contact:	Karen Norteman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-21 18:54 UTC by Preethi
Modified:	2021-08-30 08:27 UTC (History)
CC List:	6 users (show)
Fixed In Version:	ceph-16.2.0-20.el8cp
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-30 08:27:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-1213	0	None	None	None	2021-08-30 00:17:05 UTC
Red Hat Product Errata	RHBA-2021:3294	0	None	None	None	2021-08-30 08:27:46 UTC

Description Preethi 2020-12-21 18:54:32 UTC

Description of problem:[cephadm]5.0 - Invalid inputs are accepted and services are deployed when using ceph orch apply mon/mgr commands for deploying services


Version-Release number of selected component (if applicable):
Using recent ceph image registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest
ceph version 16.0.0-7953.el8cp (aac7c5c7d5f82d2973c366730f65255afd66e515) pacific (dev)

How reproducible:


Steps to Reproduce:
1. Install 5.0 cluster
2. Enter cephadm shell
3. Perform mon/mgr service deployment by passing invalid inputs
4. observe the behaviour

Actual results:
Invalid inputs/parameters are accepted without any error/warning messages. Below output for reference

[ceph: root@magna094 /]# ceph orch apply mgr 12345
Scheduled mgr update...
[ceph: root@magna094 /]# ceph orch apply mon 6789
Scheduled mon update...
[ceph: root@magna094 /]# ceph orch ls
NAME                               RUNNING  REFRESHED  AGE  PLACEMENT                  IMAGE NAME                                                                                                                    IMAGE ID      
alertmanager                           1/1  41s ago    10w  count:1                    docker.io/prom/alertmanager:v0.20.0                                                                                           0881eb8f169f  
crash                                  9/9  44s ago    10w  *                          mix                                                                                                                           dd0a3c51082c  
grafana                                1/1  41s ago    10w  count:1                    docker.io/ceph/ceph-grafana:6.6.2                                                                                             a0dce381714a  
iscsi.iscsi                            1/1  41s ago    4d   magna094;count:1           registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:4b985089d14513ccab29c42e1531bfcb2e98a614c497726153800d72a2ac11f0  dd0a3c51082c  
mds.test                               3/3  44s ago    4d   count:3                    registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-93648-20201117204824                dd0a3c51082c  
mgr                                2/12345  43s ago    12s  count:12345                mix                                                                                                                           dd0a3c51082c  
mon                                 5/6789  44s ago    4s   count:6789                 mix                                                                                                                           dd0a3c51082c  
nfs.foo                                1/1  43s ago    4d   count:1                    registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-93648-20201117204824                dd0a3c51082c  
node-exporter                          9/9  44s ago    10w  *                          docker.io/prom/node-exporter:v0.18.1                                                                                          e5a616e4b9cf  
osd.None                               7/0  44s ago    -    <unmanaged>                mix                                                                                                                           dd0a3c51082c  
osd.all-available-devices            16/20  44s ago    3w   <unmanaged>                registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-93648-20201117204824                dd0a3c51082c  
osd.dashboard-admin-1605876982239      4/4  44s ago    4w   *                          registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-93648-20201117204824                dd0a3c51082c  
prometheus                             1/1  41s ago    10w  count:1                    docker.io/prom/prometheus:v2.18.1                                                                                             de242295e225  
rgw.myorg.us-east-1                    2/2  43s ago    7w   magna092;magna093;count:2  registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-93648-20201117204824                dd0a3c51082c  
rgw.test_realm.test_zone               0/2  -          -    count:2                    <unknown>                                                                                                                     <unknown>     
[ceph: root@magna094 /]# ceph -s
  cluster:
    id:     c97c2c8c-0942-11eb-ae18-002590fbecb6
    health: HEALTH_ERR
            Module 'diskprediction_local' has failed: No module named 'sklearn'
            3 daemons have recently crashed
 
  services:
    mon: 5 daemons, quorum magna094,magna067,magna073,magna093,magna077 (age 30m)
    mgr: magna067.cudixx(active, since 4w), standbys: magna094.hussmr
    mds: test:1 {0=test.magna076.xymdrn=up:active} 2 up:standby
    osd: 27 osds: 27 up (since 3w), 27 in (since 4w)
    rgw: 2 daemons active (myorg.us-east-1.magna092.bxiihn, myorg.us-east-1.magna093.nhekwk)
 
  data:
    pools:   21 pools, 617 pgs
    objects: 452 objects, 427 KiB
    usage:   10 GiB used, 25 TiB / 25 TiB avail
    pgs:     617 active+clean
 
  io:
    client:   937 B/s rd, 0 op/s rd, 0 op/s wr
 
[ceph: root@magna094 /]# 





Expected results:


Additional info:

N

Comment 1 Adam King 2021-02-25 19:14:00 UTC

In this case of the apply command, the first positional arg is the service type and the second is the placement (there are more but that's all that's relevant here). So in a command like "ceph orch apply mgr 12345" "mgr" is the service we are applying and "12345" is the placement. Since integers are considered valid placements (in this case it's saying to put a mgr on up to 12345 hosts. In a typical use case you would do something like "ceph orch apply mgr 3" to put down 3 mgr daemons when you don't care which host they're on) it accepts the command and runs like normal. Do you have anything in mind when you say it should provide an error/warning message here? Technically both the args provided, "mgr" and "12345" are valid for the service type and placement so it doesn't make sense to generate an error saying there were invalid arguments. Maybe we should output back to the user how each arg in the apply command is getting used? For example, in this case, add something to the output saying that "mgr" is being used as the service type and "12345" is the placement to try and avoid confusion?

Comment 2 Juan Miguel Olmo 2021-03-03 10:50:18 UTC

We have a PR to improve the output of the command "apply" in order to show the placement.

https://github.com/ceph/ceph/pull/38689

But in the case of the mon and mgr deployments ... could it be nice to limit the number of mons or mgrs to min(5 , number of nodes in the cluster)

Comment 3 Preethi 2021-03-03 12:00:55 UTC

@Adam, (In reply to Adam King from comment #1)
> In this case of the apply command, the first positional arg is the service
> type and the second is the placement (there are more but that's all that's
> relevant here). So in a command like "ceph orch apply mgr 12345" "mgr" is
> the service we are applying and "12345" is the placement. Since integers are
> considered valid placements (in this case it's saying to put a mgr on up to
> 12345 hosts. In a typical use case you would do something like "ceph orch
> apply mgr 3" to put down 3 mgr daemons when you don't care which host
> they're on) it accepts the command and runs like normal. Do you have
> anything in mind when you say it should provide an error/warning message
> here? Technically both the args provided, "mgr" and "12345" are valid for
> the service type and placement so it doesn't make sense to generate an error
> saying there were invalid arguments. Maybe we should output back to the user
> how each arg in the apply command is getting used? For example, in this
> case, add something to the output saying that "mgr" is being used as the
> service type and "12345" is the placement to try and avoid confusion?

We should throw an error/warning message that we cannot have 12345 hosts to place the mgr/mons. Though integers are valid we should have limitation to pass the values  as we cannot have 12345 hosts to put the mons/mgrs.

Comment 4 Adam King 2021-03-24 17:07:32 UTC

For placing a max on the placement count, which should fix this:

tracker: https://tracker.ceph.com/issues/49960
PR: https://github.com/ceph/ceph/pull/40376

What exactly the max should be is still being debated so those numbers aren't set yet, but we've agreed upstream a max should exist.

Comment 11 errata-xmlrpc 2021-08-30 08:27:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294

Note You need to log in before you can comment on or make changes to this bug.