Description of problem: ======================= As per BZ 1096729, we were seeing frequent disconnects between peer and brick which led to snapshot creation failure, IO failures while snapshot creation was in progress for multiple volumes. Work around provided for BZ 1096729 was to disable ping timer (edit the /etc/glusterfs/glusterd.vol and set ping timeout to 0 and restart glusterd). As per comment 14 in BZ 1096729, this is going in as a Known Issue for Denali (doc bug raised- BZ 1109150) We retried snapshot creation with ping time out set to 30 and we faced similar disconnect issues and also faced glusterd crash. After discussion with the developers, raising this bug to track the issue of the glusterd crash. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.6.0.15-1.el6rhs.x86_64 How reproducible: ================ 1/1 Steps to Reproduce: ================== 4 node cluster Ping time out set to 30 restart glusterd 1.Create 4 volumes 2.Fuse and NFS mount the volume 3.Create IO on all the volumes at the same time for i in {1..400}; do dd if=/dev/urandom of=fuse_vol0"$i" bs=10M count=1; done 4.Create snapshots on all volumes at the same time for i in {1..100}; do gluster snapshot create snap$i vol0 ; done Few snapshot create failures were seen and glusterd crashed Actual results: ============== Glusterd crash Expected results: ================ There should be no crash seen Additional info: ================ Uploaded the sosreports and the core file: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/snapshots/1096729/
Core file attached in the bug looks corrupted. Will update the bug once the problem is recreated again.
Looks like this is not a valid bug anymore will reopen the bug once the problem is recreated again. -Sunny