Bug 1503411 - [iSCSI]; Incorrect number of tcmu-runner daemons reported after GWs go down and come back up
Summary: [iSCSI]; Incorrect number of tcmu-runner daemons reported after GWs go down a...
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: iSCSI
Version: 3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 3.*
Assignee: Jason Dillaman
QA Contact: Tejas
Erin Donnelly
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 1494421
TreeView+ depends on / blocked
 
Reported: 2017-10-18 04:42 UTC by Tejas
Modified: 2019-02-26 16:14 UTC (History)
5 users (show)

(edit)
.Incorrect number of `tcmu-runner` daemons reported after iSCSI target LUNs fail and recover

After iSCSI target Logical Unit Numbers (LUNs) recover from a failure, the `ceph -s` command in certain cases outputs an incorrect number of `tcmu-runner` daemons.
Clone Of:
(edit)
Last Closed: 2019-02-26 16:14:54 UTC


Attachments (Terms of Use)

Description Tejas 2017-10-18 04:42:19 UTC
Description of problem:


Version-Release number of selected component (if applicable):
ceph version 12.2.1-14.el7cp
libtcmu-1.3.0-0.4.el7cp.x86_64

In the 'ceph -s' command output the number of tcmu-runner daemons is reported. I am disabling the network interface on the GW nodes, and after a while bringing it back up.
Command used:
ifdown <eth>
ifup <eth>

Total luns: 122
expected tcmu daemons: 488

After 1 GW network dwon:
 ceph -s
  cluster:
    id:     2057393b-ce5e-4821-9eb0-96519e801921
    health: HEALTH_OK
 
  services:
    mon:         3 daemons, quorum havoc,mustang,skytrain
    mgr:         mustang(active)
    osd:         20 osds: 20 up, 20 in
    rgw:         1 daemon active
    tcmu-runner: 257 daemons active   <----------------
 
  data:
    pools:   13 pools, 842 pgs
    objects: 1140k objects, 3320 GB
    usage:   9960 GB used, 12284 GB / 22245 GB avail
    pgs:     842 active+clean




After all 4 GWs have gone down and come back up:
~]# ceph -s
  cluster:
    id:     2057393b-ce5e-4821-9eb0-96519e801921
    health: HEALTH_OK
 
  services:
    mon:         3 daemons, quorum havoc,mustang,skytrain
    mgr:         mustang(active)
    osd:         20 osds: 20 up, 20 in
    rgw:         1 daemon active
    tcmu-runner: 31 daemons active    <---------------
 
  data:
    pools:   13 pools, 842 pgs
    objects: 1140k objects, 3320 GB
    usage:   9961 GB used, 12284 GB / 22245 GB avail
    pgs:     842 active+clean
 
  io:
    client:   10743 B/s rd, 111 MB/s wr, 10 op/s rd, 511 op/s wr

Comment 3 Jason Dillaman 2017-10-18 13:16:44 UTC
@Tejas: the service daemons have a 60 second grace period (by default). Did you check the daemon state after 60 seconds had passed?


Note You need to log in before you can comment on or make changes to this bug.