Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 1503411

Summary: [iSCSI]; Incorrect number of tcmu-runner daemons reported after GWs go down and come back up
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tejas <tchandra>
Component: iSCSIAssignee: Jason Dillaman <jdillama>
Status: CLOSED WONTFIX QA Contact: Tejas <tchandra>
Severity: medium Docs Contact: Erin Donnelly <edonnell>
Priority: unspecified    
Version: 3.0CC: ceph-eng-bugs, ceph-qe-bugs, edonnell, jdillama, tchandra
Target Milestone: rc   
Target Release: 3.*   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.Incorrect number of `tcmu-runner` daemons reported after iSCSI target LUNs fail and recover After iSCSI target Logical Unit Numbers (LUNs) recover from a failure, the `ceph -s` command in certain cases outputs an incorrect number of `tcmu-runner` daemons.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-26 16:14:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1494421    

Description Tejas 2017-10-18 04:42:19 UTC
Description of problem:


Version-Release number of selected component (if applicable):
ceph version 12.2.1-14.el7cp
libtcmu-1.3.0-0.4.el7cp.x86_64

In the 'ceph -s' command output the number of tcmu-runner daemons is reported. I am disabling the network interface on the GW nodes, and after a while bringing it back up.
Command used:
ifdown <eth>
ifup <eth>

Total luns: 122
expected tcmu daemons: 488

After 1 GW network dwon:
 ceph -s
  cluster:
    id:     2057393b-ce5e-4821-9eb0-96519e801921
    health: HEALTH_OK
 
  services:
    mon:         3 daemons, quorum havoc,mustang,skytrain
    mgr:         mustang(active)
    osd:         20 osds: 20 up, 20 in
    rgw:         1 daemon active
    tcmu-runner: 257 daemons active   <----------------
 
  data:
    pools:   13 pools, 842 pgs
    objects: 1140k objects, 3320 GB
    usage:   9960 GB used, 12284 GB / 22245 GB avail
    pgs:     842 active+clean




After all 4 GWs have gone down and come back up:
~]# ceph -s
  cluster:
    id:     2057393b-ce5e-4821-9eb0-96519e801921
    health: HEALTH_OK
 
  services:
    mon:         3 daemons, quorum havoc,mustang,skytrain
    mgr:         mustang(active)
    osd:         20 osds: 20 up, 20 in
    rgw:         1 daemon active
    tcmu-runner: 31 daemons active    <---------------
 
  data:
    pools:   13 pools, 842 pgs
    objects: 1140k objects, 3320 GB
    usage:   9961 GB used, 12284 GB / 22245 GB avail
    pgs:     842 active+clean
 
  io:
    client:   10743 B/s rd, 111 MB/s wr, 10 op/s rd, 511 op/s wr

Comment 3 Jason Dillaman 2017-10-18 13:16:44 UTC
@Tejas: the service daemons have a 60 second grace period (by default). Did you check the daemon state after 60 seconds had passed?