Bug 1387364

Summary: glusterfs-client box dies when trying to write to gluster volume
Product: [Community] GlusterFS Reporter: Julio Guevara <julioguevara150>
Component: replicateAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: urgent Docs Contact:
Priority: high    
Version: 3.8CC: bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-07 10:42:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Volume debug log while executing dd command
none
Brick1 on lainf-git01p debug log while executing dd command
none
glusterd debug log while executing dd command none

Description Julio Guevara 2016-10-20 17:23:56 UTC
Description of problem:
I Have a two node replica glusterfs setup to keep a git repository available.
The gluster servers (lainf-git01p/10.10.66.123 and kcinf-git01p/10.10.64.123) both have a /dev/sdb1 with xfs mounted on /data/brick1. A gluster volume has been created in path /data/brick1/vg0 and exported with name gv0. This sames servers mount the exported gluster volume with the command: 'mount -t glusterfs lainf-git01p:/gv0 /gitlab-data' and both are able to mount the partition with no issues. Both machines can list the files with no issue. Problems emerges when i try to start writing files to the gluster volume. 

Whenever I try to execute 'dd if=/dev/urandom of=/gitlab-data/1 count=1 bs=100M' from node kcinf-git01p everything seems to be working fine, I see the same file replicated to the brick on lainf-git01p and kcinf-git01p and listed under /gitlab-data mountpoint.

But when i try to execute the same command from lainf-git01p the dd command never finishes it's execution. I can see the file replicated over to kcinf-git01p and cat it's content but lainf-git01p starts melting down. System becomes unresponsive, no command or new ssh sessions can be created and the system seems to be waiting for an event. Really quickly the system becomes unusable and a hard reset is needed in order to get the system back up. 

Version-Release number of selected component (if applicable):
glusterfs.x86_64 3.8.4-1.el6                                                                                   glusterfs-api.x86_64 3.8.4-1.el6                                                                                   glusterfs-cli.x86_64 3.8.4-1.el6                                                                                   glusterfs-client-xlators.x86_64 3.8.4-1.el6                                                                                   glusterfs-fuse.x86_64 3.8.4-1.el6                                                                                   glusterfs-libs.x86_64 3.8.4-1.el6                                                                                 glusterfs-server.x86_64 3.8.4-1.el6

Packages from CentOS  SIG Storage
uname -r: 2.6.32-431.el6.x86_64
distro: CentOS release 6.5 (Final)


How reproducible:
Whenever i try to execute: 'dd if=/dev/urandom of=/gitlab-data/1 count=1 bs=100M' from lainf-git01p the whole box comes to a creeping halt. command like kill, top won't respond, new ssh connections cannot be created and the box needs to be hard rebooted in order to work again.


Steps to Reproduce:
1. mount -t glusterfs lainf-git01p:gv0 /gitlab-data
2. dd if=/dev/urandom of=/gitlab-data/1 count=1 bs=100M

Actual results:
When the file 1 is listed from the other client (kcinf-git01p) has the correct size, but the client (lainf-git01p) never stops executing the dd command, it hangs waiting even though the file has already been created. After this point the box starts melting down until you are forced to do a hard reboot of the system

Expected results:
dd command reports back that it completely successfully and no issues arise after this. 

Additional info:
Latency between the machines is 35ms.

Comment 1 Julio Guevara 2016-10-20 20:51:27 UTC
Created attachment 1212640 [details]
Volume debug log while executing dd command

Comment 2 Julio Guevara 2016-10-20 20:54:02 UTC
Created attachment 1212641 [details]
Brick1 on lainf-git01p debug log while executing dd command

Comment 3 Julio Guevara 2016-10-20 20:54:44 UTC
Created attachment 1212642 [details]
glusterd debug log while executing dd command

Comment 4 Niels de Vos 2017-11-07 10:42:53 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.