Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1503411 - [iSCSI]; Incorrect number of tcmu-runner daemons reported after GWs go down and come back up
[iSCSI]; Incorrect number of tcmu-runner daemons reported after GWs go down a...
Status: NEW
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: iSCSI (Show other bugs)
3.0
Unspecified Unspecified
unspecified Severity medium
: rc
: 4.0
Assigned To: Jason Dillaman
Tejas
Erin Donnelly
:
Depends On:
Blocks: 1494421
  Show dependency treegraph
 
Reported: 2017-10-18 00:42 EDT by Tejas
Modified: 2018-10-18 13:00 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
.Incorrect number of `tcmu-runner` daemons reported after iSCSI target LUNs fail and recover After iSCSI target Logical Unit Numbers (LUNs) recover from a failure, the `ceph -s` command in certain cases outputs an incorrect number of `tcmu-runner` daemons.
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Tejas 2017-10-18 00:42:19 EDT
Description of problem:


Version-Release number of selected component (if applicable):
ceph version 12.2.1-14.el7cp
libtcmu-1.3.0-0.4.el7cp.x86_64

In the 'ceph -s' command output the number of tcmu-runner daemons is reported. I am disabling the network interface on the GW nodes, and after a while bringing it back up.
Command used:
ifdown <eth>
ifup <eth>

Total luns: 122
expected tcmu daemons: 488

After 1 GW network dwon:
 ceph -s
  cluster:
    id:     2057393b-ce5e-4821-9eb0-96519e801921
    health: HEALTH_OK
 
  services:
    mon:         3 daemons, quorum havoc,mustang,skytrain
    mgr:         mustang(active)
    osd:         20 osds: 20 up, 20 in
    rgw:         1 daemon active
    tcmu-runner: 257 daemons active   <----------------
 
  data:
    pools:   13 pools, 842 pgs
    objects: 1140k objects, 3320 GB
    usage:   9960 GB used, 12284 GB / 22245 GB avail
    pgs:     842 active+clean




After all 4 GWs have gone down and come back up:
~]# ceph -s
  cluster:
    id:     2057393b-ce5e-4821-9eb0-96519e801921
    health: HEALTH_OK
 
  services:
    mon:         3 daemons, quorum havoc,mustang,skytrain
    mgr:         mustang(active)
    osd:         20 osds: 20 up, 20 in
    rgw:         1 daemon active
    tcmu-runner: 31 daemons active    <---------------
 
  data:
    pools:   13 pools, 842 pgs
    objects: 1140k objects, 3320 GB
    usage:   9961 GB used, 12284 GB / 22245 GB avail
    pgs:     842 active+clean
 
  io:
    client:   10743 B/s rd, 111 MB/s wr, 10 op/s rd, 511 op/s wr
Comment 3 Jason Dillaman 2017-10-18 09:16:44 EDT
@Tejas: the service daemons have a 60 second grace period (by default). Did you check the daemon state after 60 seconds had passed?

Note You need to log in before you can comment on or make changes to this bug.