Hello, Do we have any update on this bz ? Mukul
(In reply to Mukul Malhotra from comment #11) > Hello, > > Do we have any update on this bz ? I'll be working on it today. Will probably have some updates over the course of the day. > > Mukul
Raghavendra, Customer actually does have the reproducible steps other then comment#10 where the rsync processes used to copy hundreds of thousands .jpg files on the gluster volume which results in hung state. Mukul
(In reply to Mukul Malhotra from comment #17) > Raghavendra, > > Customer actually does have the reproducible steps other then comment#10 > where the rsync processes used to copy hundreds of thousands .jpg files on > the gluster volume which results in hung state. Good :). Is it possible to share the steps across? A script is even better :). > > Mukul
Did you actually mean the alternative reproducer is mentioned in comment 10, where a bunch of rsync commands are copying *.jpg files?
Raghavendra, >Did you actually mean the alternative reproducer is mentioned in comment 10, where a bunch of rsync commands are copying *.jpg files? Yes correct as customer does not have any other reproducer. Mukul
Raghavendra, As per current update, customer's Issue has been resolved after suggesting a workaround to disable "performance.client-io-threads" option. Now, customer has requested a hotfix to be installed on a production system & does not require a testbuild fix. Customer has been informed that as the patch will be available then we will Initiate the hotfix process & provide update by next week. Mukul
@Mukul, Please guide the CU to the kcs article which mentions about "client-io-thread" support for disperse volumes. If the CU does not hit the reported issue with disabling "client-io-thread", same should be recommended. Also, Please update the CU that RHGS team is working on enabling "client-io-thread" for other volume types for upcoming RHGS 3.2 release but it is tentative at this stage.
Alok, >@Mukul, Please guide the CU to the kcs article which mentions about "client-io-thread" support for disperse volumes. If the CU does not hit the reported issue with disabling "client-io-thread", same should be recommended Thanks, this information (disabling "client-io-thread") was already provided to the customer as a workaround earlier & it fixes the issue. Also, suggested that this option is recommended with Erasure Coded volume + FUSE client. >Also, Please update the CU that RHGS team is working on enabling "client-io-thread" for other volume types for upcoming RHGS 3.2 release but it is tentative at this stage. Yes suggested the same & the case has been closed by the customer. Mukul
upstream mainline : http://review.gluster.org/15579 (merged) upstream 3.8 : http://review.gluster.org/15658 (merged)
https://code.engineering.redhat.com/gerrit/#/c/91956/
Hi Raghavendra, Can you help me with below questions: I am trying to come up with a testcase(s) to validate this fix based on above conversations: TC#1: do rsync from multiple locations to a gluster volume TC#2: customer scenario which is as below "When customer launches 16 ffmpeg processes, where each one of them records to 2 mp3 files with 256K bit rate, then the issue appears where every hour from one to six processes get hung, waiting for a filesystem response, specifically from Fuse-driver glusterfs." QE will have to see if this is feasible as part of the infra we have. Question to Developer: is there any other way of testing this fix, without all the above complications? Can you suggest me with any new/alternate cases ? Also , regarding volume settings, I have below questions 1)the customer volume is having below options, I am planning to set them all. Question to Developer: are you ok with me setting all the below options(which customer has set) features.barrier: disable auth.allow: 10.110.14.63,10.110.14.64,10.110.14.65,10.100.77.18 performance.open-behind: on performance.quick-read: on performance.client-io-threads: on server.event-threads: 6 client.event-threads: 4 cluster.lookup-optimize: on performance.readdir-ahead: on auto-delete: enable 2) I am going to try on 2x2 volume Question to Developer:let me know if you want any change in the volume type? 3) I see that in comment#25, clientio-threads were to be disabled, as workaround Question to Developer:Clientio threads in a replicate volume in now disabled by default, do you want me to enable it for testing this fix(which i think must be enabled, as customer too has enabled) Note : these will be tested on nodes which are VMs and fuse as the n/w protocol
(In reply to nchilaka from comment #40) > Hi Raghavendra, > Can you help me with below questions: > > I am trying to come up with a testcase(s) to validate this fix based on > above conversations: > > TC#1: do rsync from multiple locations to a gluster volume > TC#2: customer scenario which is as below > "When customer launches 16 ffmpeg processes, where each one of them records > to 2 mp3 files with 256K bit rate, then the issue appears where every hour > from one to six processes get hung, waiting for a filesystem response, > specifically from Fuse-driver glusterfs." > QE will have to see if this is feasible as part of the infra we have. > > Question to Developer: is there any other way of testing this fix, without > all the above complications? > Can you suggest me with any new/alternate cases ? Its a race condition. So, quite difficult to hit. I don't have an easy reproducer. In fact as one of the comments mentions, I tried to reproduce the issue by running rsync for a day (without fix), but without success. The fix posted was arrived at through code review. > > Also , regarding volume settings, I have below questions > 1)the customer volume is having below options, I am planning to set them > all. > Question to Developer: are you ok with me setting all the below > options(which customer has set) Yes. Please have same options set as the customer. > > features.barrier: disable > auth.allow: 10.110.14.63,10.110.14.64,10.110.14.65,10.100.77.18 > performance.open-behind: on > performance.quick-read: on > performance.client-io-threads: on > server.event-threads: 6 > client.event-threads: 4 > cluster.lookup-optimize: on > performance.readdir-ahead: on > auto-delete: enable > > > 2) I am going to try on 2x2 volume > Question to Developer:let me know if you want any change in the volume type? No changes required. > > 3) I see that in comment#25, clientio-threads were to be disabled, as > workaround > Question to Developer:Clientio threads in a replicate volume in now disabled > by default, do you want me to enable it for testing this fix(which i think > must be enabled, as customer too has enabled) Please have it enabled. > > > Note : these will be tested on nodes which are VMs and fuse as the n/w > protocol
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html