Bug 1197548
Summary: | RDMA:crash during sanity test | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Saurabh <saujain> | ||||||
Component: | rdma | Assignee: | Mohammed Rafi KC <rkavunga> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | mainline | CC: | bugs, gluster-bugs, jthottan, mmadhusu, mzywusko, ndevos, rkavunga, rtalur, skoduri | ||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.7.0 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1198562 (view as bug list) | Environment: | |||||||
Last Closed: | 2015-05-14 17:29:14 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1198562 | ||||||||
Attachments: |
|
Description
Saurabh
2015-03-02 01:42:23 UTC
gluster volume info Volume Name: vol0 Type: Distributed-Replicate Volume ID: 25f4b031-f68e-4e43-9d2a-ce99abaf39ca Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: rdma Bricks: Brick1: 192.168.44.106:/rhs/brick1/d1r1 Brick2: 192.168.44.108:/rhs/brick1/d1r1 Brick3: 192.168.44.106:/rhs/brick1/d2r1 Brick4: 192.168.44.108:/rhs/brick1/d2r2 Options Reconfigured: features.quota: on nfs-ganesha.enable: on nfs-ganesha.host: 192.168.44.106 nfs.disable: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 Created attachment 996930 [details]
brick1-coredump
Created attachment 996931 [details]
brick2-coredump
CC'ed Raghavendra, Rafi and Jiffin to check if they have seen any similar issue while using RDMA. I have not looked at the core dump yet, but looking at the backtrace in the bug I see nomem messages. What was the configuration of the machine that was being used as nfs host(192.168.44.106)? RAM does not seem to be sufficient. Will continue to look at it, may be a different root cause. In the ltp test suite, crash was caused due to fsstress-test. When this test ran alone in the nfsv3 mount, ganesha server didn't crash , but bricks goes down with same back trace. Similarly for nfsv4, it completed successfully without any crash. the test command used : fsstress -d <mount point> -l 22 -n 22 -p 22 2 Root cause identified as: when doing rdma vectored read from the remote end point, the calculation of remote address went wrong from second vector onward. Nevertheless of number of remote buffers, we are always setting the first buffer as remote address for all rdma remote read. REVIEW: http://review.gluster.org/9794 (rdma: setting wrong remote memory.) posted (#2) for review on master by mohammed rafi kc (rkavunga) REVIEW: http://review.gluster.org/9794 (rdma: setting wrong remote memory.) posted (#3) for review on master by Humble Devassy Chirammal (humble.devassy) REVIEW: http://review.gluster.org/9794 (rdma:setting wrong remote memory.) posted (#4) for review on master by mohammed rafi kc (rkavunga) COMMIT: http://review.gluster.org/9794 committed in master by Raghavendra G (rgowdapp) ------ commit e08aea2fd67a06275423ded157431305a7925cf6 Author: Mohammed Rafi KC <rkavunga> Date: Wed Mar 4 14:37:05 2015 +0530 rdma:setting wrong remote memory. when we send more than one work request in a single call, the remote addr is always setting as the first address of the vector. Change-Id: I55aea7bd6542abe22916719a139f7c8f73334d26 BUG: 1197548 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: http://review.gluster.org/9794 Reviewed-by: Raghavendra G <rgowdapp> Tested-by: Raghavendra G <rgowdapp> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |