Bug 1387364 - glusterfs-client box dies when trying to write to gluster volume
Summary: glusterfs-client box dies when trying to write to gluster volume
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.8
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-20 17:23 UTC by Julio Guevara
Modified: 2017-11-07 10:42 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-11-07 10:42:53 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Volume debug log while executing dd command (2.71 KB, text/plain)
2016-10-20 20:51 UTC, Julio Guevara
no flags Details
Brick1 on lainf-git01p debug log while executing dd command (691.11 KB, text/plain)
2016-10-20 20:54 UTC, Julio Guevara
no flags Details
glusterd debug log while executing dd command (927 bytes, text/plain)
2016-10-20 20:54 UTC, Julio Guevara
no flags Details

Description Julio Guevara 2016-10-20 17:23:56 UTC
Description of problem:
I Have a two node replica glusterfs setup to keep a git repository available.
The gluster servers (lainf-git01p/10.10.66.123 and kcinf-git01p/10.10.64.123) both have a /dev/sdb1 with xfs mounted on /data/brick1. A gluster volume has been created in path /data/brick1/vg0 and exported with name gv0. This sames servers mount the exported gluster volume with the command: 'mount -t glusterfs lainf-git01p:/gv0 /gitlab-data' and both are able to mount the partition with no issues. Both machines can list the files with no issue. Problems emerges when i try to start writing files to the gluster volume. 

Whenever I try to execute 'dd if=/dev/urandom of=/gitlab-data/1 count=1 bs=100M' from node kcinf-git01p everything seems to be working fine, I see the same file replicated to the brick on lainf-git01p and kcinf-git01p and listed under /gitlab-data mountpoint.

But when i try to execute the same command from lainf-git01p the dd command never finishes it's execution. I can see the file replicated over to kcinf-git01p and cat it's content but lainf-git01p starts melting down. System becomes unresponsive, no command or new ssh sessions can be created and the system seems to be waiting for an event. Really quickly the system becomes unusable and a hard reset is needed in order to get the system back up. 

Version-Release number of selected component (if applicable):
glusterfs.x86_64 3.8.4-1.el6                                                                                   glusterfs-api.x86_64 3.8.4-1.el6                                                                                   glusterfs-cli.x86_64 3.8.4-1.el6                                                                                   glusterfs-client-xlators.x86_64 3.8.4-1.el6                                                                                   glusterfs-fuse.x86_64 3.8.4-1.el6                                                                                   glusterfs-libs.x86_64 3.8.4-1.el6                                                                                 glusterfs-server.x86_64 3.8.4-1.el6

Packages from CentOS  SIG Storage
uname -r: 2.6.32-431.el6.x86_64
distro: CentOS release 6.5 (Final)


How reproducible:
Whenever i try to execute: 'dd if=/dev/urandom of=/gitlab-data/1 count=1 bs=100M' from lainf-git01p the whole box comes to a creeping halt. command like kill, top won't respond, new ssh connections cannot be created and the box needs to be hard rebooted in order to work again.


Steps to Reproduce:
1. mount -t glusterfs lainf-git01p:gv0 /gitlab-data
2. dd if=/dev/urandom of=/gitlab-data/1 count=1 bs=100M

Actual results:
When the file 1 is listed from the other client (kcinf-git01p) has the correct size, but the client (lainf-git01p) never stops executing the dd command, it hangs waiting even though the file has already been created. After this point the box starts melting down until you are forced to do a hard reboot of the system

Expected results:
dd command reports back that it completely successfully and no issues arise after this. 

Additional info:
Latency between the machines is 35ms.

Comment 1 Julio Guevara 2016-10-20 20:51:27 UTC
Created attachment 1212640 [details]
Volume debug log while executing dd command

Comment 2 Julio Guevara 2016-10-20 20:54:02 UTC
Created attachment 1212641 [details]
Brick1 on lainf-git01p debug log while executing dd command

Comment 3 Julio Guevara 2016-10-20 20:54:44 UTC
Created attachment 1212642 [details]
glusterd debug log while executing dd command

Comment 4 Niels de Vos 2017-11-07 10:42:53 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.