865960 – "tgtadm --mode target --op show" shows incomplete data if there are many targets

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 865960 - "tgtadm --mode target --op show" shows incomplete data if there are many targets

Summary: "tgtadm --mode target --op show" shows incomplete data if there are many targets

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	scsi-target-utils
Sub Component:
Version:	6.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Andy Grover
QA Contact:	Bruno Goncalves
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1056239
TreeView+	depends on / blocked

Reported:	2012-10-12 22:20 UTC by Jaroslav Kortus
Modified:	2014-10-14 08:27 UTC (History)
CC List:	4 users (show)
Fixed In Version:	scsi-target-utils-1.0.24-15.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-10-14 08:27:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
targets.conf (271.09 KB, text/plain) 2014-07-17 08:00 UTC, Bruno Goncalves	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:1599	0	normal	SHIPPED_LIVE	scsi-target-utils bug fix update	2014-10-14 01:39:44 UTC

Description Jaroslav Kortus 2012-10-12 22:20:18 UTC

Description of problem:
tgtadm --mode target --op show shows incomplete results if there are many targets on the system.

$ for i in `seq 1 10000`; do tgtadm --mode target --op show | grep -E 'iqn.2008-09.com.example:beaker-disk' | wc -l  >> /tmp/ll.txt; done
$ cat /tmp/ll.txt | sort | uniq -c
      1 955
    176 966
    426 967
    865 968
    442 969
    634 970
    512 971
    925 972
   6019 973

As you can see the number of lost entries varies.
This is very unfriendly behaviour and makes automated task fail randomly in randomly reproducible ways.


Version-Release number of selected component (if applicable):
scsi-target-utils-1.0.24-2.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. setup 900+ disks (it may work with smaller numbers too, for me it started on ~220 and got worse with increased count)
2. tgtadm --mode target --op show | grep 'Target' wc -l
3. see if the number differs over a larger set of attempts
  
Actual results:
my results have wide spread numbers of targets shown

Expected results:
the same number of targets shown each time

Additional info:
In my config there were 973 targets total (some of them shared the same phys devices, but the missing ones seemed random to me).
config file snip:
<target iqn.2008-09.com.example:beaker-disk-54-d1-path1>
allow-in-use yes
direct-store /dev/vg_virts/beaker-disk-54-1
initiator-name CLUSTER54
scsi_id beaker-disk-54-1
scsi_sn beaker-disk-54-1
write-cache off
</target>

<target iqn.2008-09.com.example:beaker-disk-54-d1-path10>
allow-in-use yes
direct-store /dev/vg_virts/beaker-disk-54-1
initiator-name CLUSTER54
scsi_id beaker-disk-54-1
scsi_sn beaker-disk-54-1
write-cache off
</target>

<target iqn.2008-09.com.example:beaker-disk-54-d1-path100>
allow-in-use yes
direct-store /dev/vg_virts/beaker-disk-54-1
initiator-name CLUSTER54
scsi_id beaker-disk-54-1
scsi_sn beaker-disk-54-1
write-cache off
</target>

Comment 2 RHEL Program Management 2013-10-14 04:44:30 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 4 Andy Grover 2014-04-09 22:27:02 UTC

Hi Jaroslav, any chance you could try and reproduce with scsi-target-utils in Fedora 20? There was some buffer management improvements since rhel6's tgtd version that may have resolved this.

Comment 5 Jaroslav Kortus 2014-04-10 17:28:39 UTC

I don't have F20 here, could you pls send me the version (string) that should have that fixed? I'll download/rebuild it on rhel6.

Comment 6 Andy Grover 2014-04-16 18:45:35 UTC

Uploaded a test build of 1.0.25 to http://fedorapeople.org/~grover/bug-865960/ , please give that a go.

Comment 7 Jaroslav Kortus 2014-04-28 13:12:57 UTC

[root@virt-122 ~]# for i in `seq 1 1000`; do tgtadm --mode target --op show | grep 'Target' | wc -l ; done | sort | uniq -c
    973 886
     16 887
      6 888
      5 889

scsi-target-utils-1.0.25-1.el6.x86_64

It seems that it's a bit better, but still giving inconsistent results.

Comment 8 Andy Grover 2014-04-29 16:46:42 UTC

ok thanks for trying that, I'll take a closer look.

Comment 9 Andy Grover 2014-07-11 18:36:02 UTC

I'm having difficulty reproducing this issue.

[agrover@localhost tmp]$ cat /tmp/ll.txt | sort | uniq -c
  10000 1000

What I did:

- create a vm with 4GB and 4 CPUs, and a 30GB block device /dev/vdb
- pvcreate /dev/vdb
- vgcreate mucho /dev/vdb
- for i in `seq 1 1000`; do lvcreate mucho -n blah$i -L 10m; done;
- for i in `seq 1 1000`; do echo -e "<target iqn.2008-09.com.example:foo$i>\nallow-in-use yes\ninitiator-name CLUSTER54\nscsi_id beaker-disk-54-1\nscsi_sn beaker-disk-54-1\nwrite-cache off\nbacking-store /dev/mucho/blah$i\n</target>\n">>targets2.conf; done;

window 1:
- sudo tgtd -f

window 2:
- tgtadm --op update --mode sys --name State -v offline
- tgt-admin -e -c targets2.conf
- tgtadm --op update --mode sys --name State -v ready
- for i in `seq 1 10000`; do sudo tgtadm --mode target --op show|grep -E iqn|wc -l >>/tmp/ll.txt; done

and I'm not seeing variability. Any ideas what the difference in our setups might be?

Comment 12 Bruno Goncalves 2014-07-17 08:00:36 UTC

Created attachment 918631 [details]
targets.conf

I was able to reproduce it with the attached configuration on bare metal server.


# service tgtd start
Starting SCSI target daemon: [  OK  ]


# for i in `seq 1 10000`; do tgtadm --mode target --op show | grep -E 'Target' | wc -l  >> /tmp/ll.txt; done

# cat /tmp/ll.txt | sort | uniq -c
   9232 1000
      1 980
      3 982
      7 985
     18 986
      4 987
     10 988
     17 989
     30 990
     32 991
     52 992
     31 993
     84 994
     50 995
     86 996
    122 997
    189 998
     32 999

Comment 14 Bruno Goncalves 2014-07-17 08:04:43 UTC

Used RHEL-6.5 (scsi-target-utils-1.0.24-10.el6.x86_64) to reproduce the problem.

Comment 15 Andy Grover 2014-07-17 22:26:06 UTC

can reproduce. fix in hand.

Comment 18 Bruno Goncalves 2014-07-18 07:32:47 UTC

Verified that with the patch it is working fine now.

# rpm -q scsi-target-utils
scsi-target-utils-1.0.24-15.el6.x86_64

# for i in `seq 1 10000`; do tgtadm --mode target --op show | grep -E 'Target' | wc -l  >> /tmp/ll.txt; don

# cat /tmp/ll.txt | sort | uniq -c
  10000 1000

Comment 19 errata-xmlrpc 2014-10-14 08:27:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1599.html

Note You need to log in before you can comment on or make changes to this bug.