Bug 994990
| Summary: | Native ping_pong IO and smb torture fails on smb mount | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | surabhi <sbhaloth> |
| Component: | samba | Assignee: | Raghavendra Talur <rtalur> |
| Status: | CLOSED EOL | QA Contact: | surabhi <sbhaloth> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 2.1 | CC: | chrisw, ira, jarrpa, lmohanty, madam, pgurusid, rjoseph, rtalur, rwheeler, sbhaloth, sdharane, surs, vagarwal |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | gluster | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-12-03 17:18:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 956495, 957769 | ||
When I try on a clean setup with changes of
kernel oplocks = no
stat cache = no
stat-prefetch disable
I find that it runs fine.
Terminal 1:
10.70.42.194#/root/work/testing/ping_pong -rw pptest 5
data increment = 1
data increment = 2
144 locks/sec
Terminal 2:
10.70.42.194#/root/work/testing/ping_pong -rw pptest 5
data increment = 2
140 locks/sec
Was there any other config change in your setup?
Talur,
With the options that you mentioned the results are still similar:
# gluster v i dis-rep
Volume Name: dis-rep
Type: Distributed-Replicate
Volume ID: 3b7138c4-ce39-421e-b382-8093cfa57f67
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.16.157.0:/home/dis-rep/b1
Brick2: 10.16.157.3:/home/dis-rep/b2
Brick3: 10.16.157.6:/home/dis-rep/b3
Brick4: 10.16.157.9:/home/dis-rep/b4
Brick5: 10.16.157.0:/home/dis-rep/b5
Brick6: 10.16.157.3:/home/dis-rep/b6
Brick7: 10.16.157.6:/home/dis-rep/b7
Brick8: 10.16.157.9:/home/dis-rep/b8
Brick9: 10.16.157.0:/home/dis-rep/b9
Brick10: 10.16.157.3:/home/dis-rep/b10
Brick11: 10.16.157.6:/home/dis-rep/b11
Brick12: 10.16.157.9:/home/dis-rep/b12
Options Reconfigured:
server.allow-insecure: on
performance.stat-prefetch: off
smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1375962240
time: 2013-08-08 07:44:00.478238
test: ping-pong
time: 2013-08-08 07:44:00.479532
data increment = 2
3 locks/sec
./ping_pong -rw /mnt/smb2/file2 10
data increment = 2
0 locks/sec
# glusterfs -V
glusterfs 3.4.0.18rhs built on Aug 7 2013 08:02:42
For ping-pong to run successfully, you need to either a) disable eager-lock or b) backport http://review.gluster.org/5239/ The native ping_pong in this scenario can fail due to known problems with the CIFS kernel client in Linux. We cannot rely on the CIFS kernel client for these tests. The network (smbtorture) ping_pong, however, should be passing. A failure here indicates either a problem in byte-range locking or cache coherency failure across the cluster. These failures are not SMB-specific, but impact the entire cluster. With respect to comment #5 above, a third option has been offered: * Set delayed-post-op-secs=0 rather than eager-lock=off for Samba use cases. This option is listed as less drastic than disabling eager locking, but needs to be tested. Tested with option post-op-delay-secs:0
With the first instance itself the lock value is 8 and then falling to 1 0 with
more instances. It is not failing but the performance is very very slow.
Volume Name: dis-rep
Type: Distributed-Replicate
Volume ID: 3b7138c4-ce39-421e-b382-8093cfa57f67
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.16.157.0:/home/dis-rep/b1
Brick2: 10.16.157.3:/home/dis-rep/b2
Brick3: 10.16.157.6:/home/dis-rep/b3
Brick4: 10.16.157.9:/home/dis-rep/b4
Brick5: 10.16.157.0:/home/dis-rep/b5
Brick6: 10.16.157.3:/home/dis-rep/b6
Brick7: 10.16.157.6:/home/dis-rep/b7
Brick8: 10.16.157.9:/home/dis-rep/b8
Brick9: 10.16.157.0:/home/dis-rep/b9
Brick10: 10.16.157.3:/home/dis-rep/b10
Brick11: 10.16.157.6:/home/dis-rep/b11
Brick12: 10.16.157.9:/home/dis-rep/b12
Options Reconfigured:
cluster.post-op-delay-secs: 0
server.allow-insecure: on
performance.stat-prefetch: off
root@RHEL6 [Aug-13-2013- 4:04:07] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376381204
time: 2013-08-13 04:06:44.035019
test: ping-pong
time: 2013-08-13 04:06:44.036058
data increment = 1
8 locks/sec
root@RHEL6 [Aug-13-2013- 4:04:03] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=t
orture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376381253
time: 2013-08-13 04:07:33.516177
test: ping-pong
time: 2013-08-13 04:07:33.517117
data increment = 2
1 locks/sec
root@RHEL6 [Aug-13-2013- 4:04:12] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376381302
time: 2013-08-13 04:08:22.250850
test: ping-pong
time: 2013-08-13 04:08:22.251876
data increment = 3
0 locks/sec
root@RHEL6 [Aug-13-2013- 4:04:12] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376381302
time: 2013-08-13 04:08:22.250850
test: ping-pong
time: 2013-08-13 04:08:22.251876
data increment = 3
1 locks/sec
root@RHEL6 [Aug-13-2013- 4:04:07] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376381204
time: 2013-08-13 04:06:44.035019
test: ping-pong
time: 2013-08-13 04:06:44.036058
data increment = 1
data increment = 2
data increment = 3
0 locks/sec
With the eager-lock=off still the lock values are very low.
Volume Name: dis-rep
Type: Distributed-Replicate
Volume ID: 3b7138c4-ce39-421e-b382-8093cfa57f67
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.16.157.0:/home/dis-rep/b1
Brick2: 10.16.157.3:/home/dis-rep/b2
Brick3: 10.16.157.6:/home/dis-rep/b3
Brick4: 10.16.157.9:/home/dis-rep/b4
Brick5: 10.16.157.0:/home/dis-rep/b5
Brick6: 10.16.157.3:/home/dis-rep/b6
Brick7: 10.16.157.6:/home/dis-rep/b7
Brick8: 10.16.157.9:/home/dis-rep/b8
Brick9: 10.16.157.0:/home/dis-rep/b9
Brick10: 10.16.157.3:/home/dis-rep/b10
Brick11: 10.16.157.6:/home/dis-rep/b11
Brick12: 10.16.157.9:/home/dis-rep/b12
Options Reconfigured:
cluster.eager-lock: off
server.allow-insecure: on
performance.stat-prefetch: off
root@RHEL6 [Aug-13-2013- 4:00:27] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376380859
time: 2013-08-13 04:00:59.348214
test: ping-pong
time: 2013-08-13 04:00:59.349305
data increment = 1
1 locks/sec
root@RHEL6 [Aug-13-2013- 4:00:21] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=t
orture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376380885
time: 2013-08-13 04:01:25.520396
test: ping-pong
time: 2013-08-13 04:01:25.521443
data increment = 2
1 locks/sec
root@RHEL6 [Aug-13-2013- 3:59:00] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376380921
time: 2013-08-13 04:02:01.354763
test: ping-pong
time: 2013-08-13 04:02:01.355840
data increment = 3
1 locks/sec
root@RHEL6 [Aug-13-2013- 3:59:00] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376380921
time: 2013-08-13 04:02:01.354763
test: ping-pong
time: 2013-08-13 04:02:01.355840
data increment = 3
0 locks/sec
root@RHEL6 [Aug-13-2013- 4:00:21] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=t
orture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376380885
time: 2013-08-13 04:01:25.520396
test: ping-pong
time: 2013-08-13 04:01:25.521443
data increment = 2
data increment = 3
0 locks/sec
Even after killing the instances one by one the lock values are not increasing.
With the option suggested by dev:
batch-fsync-delay-usec: 0
The lock values increases and even with second and third instance it is falling to reasonable value wchich is expected.
Volume Name: dis-rep
Type: Distributed-Replicate
Volume ID: 3b7138c4-ce39-421e-b382-8093cfa57f67
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.16.157.0:/home/dis-rep/b1
Brick2: 10.16.157.3:/home/dis-rep/b2
Brick3: 10.16.157.6:/home/dis-rep/b3
Brick4: 10.16.157.9:/home/dis-rep/b4
Brick5: 10.16.157.0:/home/dis-rep/b5
Brick6: 10.16.157.3:/home/dis-rep/b6
Brick7: 10.16.157.6:/home/dis-rep/b7
Brick8: 10.16.157.9:/home/dis-rep/b8
Brick9: 10.16.157.0:/home/dis-rep/b9
Brick10: 10.16.157.3:/home/dis-rep/b10
Brick11: 10.16.157.6:/home/dis-rep/b11
Brick12: 10.16.157.9:/home/dis-rep/b12
Options Reconfigured:
storage.batch-fsync-delay-usec: 0
server.allow-insecure: on
performance.stat-prefetch: off
root@RHEL6 [Aug-13-2013- 4:59:37] >smbtorture //10.16.157.0/gluster-dis-rep raw.
e:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376384538
time: 2013-08-13 05:02:18.875441
test: ping-pong
time: 2013-08-13 05:02:18.876496
data increment = 1
731 locks/sec
root@RHEL6 [Aug-13-2013- 4:59:32] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376384561
time: 2013-08-13 05:02:41.844482
test: ping-pong
time: 2013-08-13 05:02:41.845522
data increment = 2
86 locks/sec
root@RHEL6 [Aug-13-2013- 4:58:30] >smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1376384577
time: 2013-08-13 05:02:57.158547
test: ping-pong
time: 2013-08-13 05:02:57.159651
data increment = 3
70 locks/sec
It has been agreed upon to use "batch-fsync-delay-usec" value 0(zero) for samba volumes to avoid this error. You can set using: gluster volume set <VOLNAME> storage.batch-fsync-delay-usec 0. As per discussions we will be making changes in Comment 11 default, removing the blocker. With the latest version of :glusterfs-geo-replication-3.4.0.30rhs-2.el6rhs.x86_64
samba-glusterfs-3.6.9-160.3.el6rhs.x86_64
glusterfs-libs-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-server-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-rdma-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-api-3.4.0.30rhs-2.el6rhs.x86_64
glusterfs-fuse-3.4.0.30rhs-2.el6rhs.x86_64
The native ping pong and smb torture is showing very low performance in terms of acquiring locks.
Even with option batch-fsync delay.
root@RHEL6 [Sep-02-2013- 6:58:18] >smbtorture //10.16.159.150/gluster-testing-new raw.ping-pong -U root%redhat --option=torture:filename=jon3 --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1378119880
time: 2013-09-02 07:04:40.669604
test: ping-pong
time: 2013-09-02 07:04:40.670651
data increment = 1
data increment = 2
data increment = 3
23 locks/sec
root@RHEL6 [Sep-02-2013- 6:58:07] >smbtorture //10.16.159.150/gluster-testing-new raw.ping-pong -U root%redhat --option=torture:filename=jon3 --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1378119897
time: 2013-09-02 07:04:58.010217
test: ping-pong
time: 2013-09-02 07:04:58.011357
data increment = 2
data increment = 3
27 locks/sec
As seen from bug history we have seen better performance before and the lock values were going till 731 locks/sec.
Executed the smbtorture test on following build:
samba-glusterfs-3.6.9-167.9.el6rhs.x86_64
glusterfs-libs-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.55rhs-1.el6rhs.x86_64
glusterfs-api-devel-3.4.0.55rhs-1.el6rhs.x86_64
smbtorture //10.16.157.12/gluster-dis-rep/rhsdata01 raw.ping-pong -U root%redhat --option=torture:filename=file1 --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1389852511
time: 2014-01-16 01:08:31.970890
test: ping-pong
time: 2014-01-16 01:08:31.971809
data increment = 1
data increment = 2
data increment = 3
data increment = 4
21 locks/sec
smbtorture //10.16.157.12/gluster-dis-rep/rhsdata01 raw.ping-pong -U root%redhat --option=torture:filename=file1 --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1389852559
time: 2014-01-16 01:09:19.307104
test: ping-pong
time: 2014-01-16 01:09:19.308130
data increment = 2
data increment = 3
data increment = 4
20 locks/sec
smbtorture //10.16.157.12/gluster-dis-rep/rhsdata01 raw.ping-pong -U root%redhat --option=torture:filename=file1 --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true
Using seed 1389852559
time: 2014-01-16 01:09:19.307104
test: ping-pong
time: 2014-01-16 01:09:19.308130
data increment = 2
data increment = 3
data increment = 4
21 locks/sec
The lock values are still low.
Executed smbtorture ping pong test on glusterfs version:
samba-glusterfs-3.6.9-167.11.el6rhs.x86_64
glusterfs-3.5qa2-0.274.gitecc475d.el6_5.x86_64
Running smbtorture on different mount points:
Still low performance in terms of acquiring locks.
smbtorture //10.16.159.197/gluster-test-vol raw.ping-pong --option=torture:filename=file --option=torture:num_locks=25 --option=torture:read=true --option=torture:write=true
Using seed 1395990660
time: 2014-03-28 03:11:00.189823
test: ping-pong
time: 2014-03-28 03:11:00.190986
Password for [MYGROUP\root]:
data increment = 1
289 locks/sec
smbtorture //10.16.159.197/gluster-test-vol raw.ping-pong --option=torture:filename=file --option=torture:num_locks=25 --option=torture:read=true --option=torture:write=true
Using seed 1395990685
time: 2014-03-28 03:11:25.692515
test: ping-pong
time: 2014-03-28 03:11:25.693942
Password for [MYGROUP\root]:
data increment = 2
49 locks/sec
smbtorture //10.16.159.197/gluster-test-vol raw.ping-pong --option=torture:filename=file --option=torture:num_locks=25 --option=torture:read=true --option=torture:write=true
Using seed 1395990707
time: 2014-03-28 03:11:47.717882
test: ping-pong
time: 2014-03-28 03:11:47.718995
Password for [MYGROUP\root]:
data increment = 3
34 locks/sec
smbtorture //10.16.159.197/gluster-test-vol raw.ping-pong --option=torture:filename=file --option=torture:num_locks=25 --option=torture:read=true --option=torture:write=true
Using seed 1395990741
time: 2014-03-28 03:12:21.491494
test: ping-pong
time: 2014-03-28 03:12:21.492748
Password for [MYGROUP\root]:
data increment = 4
26 locks/sec
++++++++++++++++++++++++++++++++++++++++++++
After stopping test from mount points one by one :
smbtorture //10.16.159.197/gluster-test-vol raw.ping-pong --option=torture:filename=file --option=torture:num_locks=25 --option=torture:read=true --option=torture:write=true
Using seed 1395990741
time: 2014-03-28 03:12:21.491494
test: ping-pong
time: 2014-03-28 03:12:21.492748
Password for [MYGROUP\root]:
data increment = 4
26 locks/sec
smbtorture //10.16.159.197/gluster-test-vol raw.ping-pong --option=torture:filename=file --option=torture:num_locks=25 --option=torture:read=true --option=torture:write=true
Using seed 1395990685
time: 2014-03-28 03:11:25.692515
test: ping-pong
time: 2014-03-28 03:11:25.693942
Password for [MYGROUP\root]:
data increment = 2
data increment = 3
data increment = 4
data increment = 3
data increment = 2
48 locks/sec
smbtorture //10.16.159.197/gluster-test-vol raw.ping-pong --option=torture:filename=file --option=torture:num_locks=25 --option=torture:read=true --option=torture:write=true
Using seed 1395990660
time: 2014-03-28 03:11:00.189823
test: ping-pong
time: 2014-03-28 03:11:00.190986
Password for [MYGROUP\root]:
data increment = 1
data increment = 2
data increment = 3
data increment = 4
data increment = 3
data increment = 2
data increment = 1
296 locks/sec
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release. It has been verified that smbtorture ping-pong works as expected in latest RHGS releases (3.1+) |
Description of problem: The ping pong IO coherence test and smb torture test fails on smb mount.The locks/sec value falling to 0 locks/sec. The data increment value is increasing as expected. Version-Release number of selected component (if applicable): glusterfs-3.4.0.18rhs-1.el6rhs.x86_64.rpm How reproducible: Always Steps to Reproduce: 1. Create 6x2 dis-rep volume and start the volume on a 4 node cluster. 2. mount the samba share on RHEL client via smb 3. On the RHEL client, do 4 smb mounts using all the 4 IPs of the nodes. 4. Start ping pong on one of the mount giving one file name. ./ping_pong -rw /mnt/smb2/file2 10 5. Start ping_pong on the other mounts one after the other. The locks/sec fall to 0 locks/sec The data increment value is increasing per instance i.e. as expected. For smb torture run the test as follows: smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true Actual results: ./ping_pong -rw /mnt/samba/file2 10 data increment = 2 0 locks/sec smbtorture //10.16.157.0/gluster-dis-rep raw.ping-pong -U root%redhat --option=torture:filename=jon --option=torture:num_locks=10 --option=torture:read=true --option=torture:write=true Using seed 1375956934 time: 2013-08-08 06:15:34.489272 test: ping-pong time: 2013-08-08 06:15:34.490613 data increment = 1 data increment = 2 0 locks/sec In both cases as we start the second instance the locks/sec is falling to 0 Expected results: Locks/sec should keep on printing and should not fall to 0. Additional info: