Description of problem: Version-Release number of selected component (if applicable): ceph version 12.2.1-14.el7cp libtcmu-1.3.0-0.4.el7cp.x86_64 In the 'ceph -s' command output the number of tcmu-runner daemons is reported. I am disabling the network interface on the GW nodes, and after a while bringing it back up. Command used: ifdown <eth> ifup <eth> Total luns: 122 expected tcmu daemons: 488 After 1 GW network dwon: ceph -s cluster: id: 2057393b-ce5e-4821-9eb0-96519e801921 health: HEALTH_OK services: mon: 3 daemons, quorum havoc,mustang,skytrain mgr: mustang(active) osd: 20 osds: 20 up, 20 in rgw: 1 daemon active tcmu-runner: 257 daemons active <---------------- data: pools: 13 pools, 842 pgs objects: 1140k objects, 3320 GB usage: 9960 GB used, 12284 GB / 22245 GB avail pgs: 842 active+clean After all 4 GWs have gone down and come back up: ~]# ceph -s cluster: id: 2057393b-ce5e-4821-9eb0-96519e801921 health: HEALTH_OK services: mon: 3 daemons, quorum havoc,mustang,skytrain mgr: mustang(active) osd: 20 osds: 20 up, 20 in rgw: 1 daemon active tcmu-runner: 31 daemons active <--------------- data: pools: 13 pools, 842 pgs objects: 1140k objects, 3320 GB usage: 9961 GB used, 12284 GB / 22245 GB avail pgs: 842 active+clean io: client: 10743 B/s rd, 111 MB/s wr, 10 op/s rd, 511 op/s wr
@Tejas: the service daemons have a 60 second grace period (by default). Did you check the daemon state after 60 seconds had passed?