Bug 1518710
Summary: | Default volume options open-behind and write-behind cause problems with Postgresql | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Gerald Sternagl <gsternagl> |
Component: | write-behind | Assignee: | Raghavendra G <rgowdapp> |
Status: | CLOSED DUPLICATE | QA Contact: | Nag Pavan Chilakam <nchilaka> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | rhgs-3.3 | CC: | amukherj, andcosta, gsternagl, jbyers, ksandha, nchilaka, pasik, rcyriac, rgowdapp, rhinduja, rhs-bugs, sheggodu, smali, vnosov, ykaul |
Target Milestone: | --- | Keywords: | Rebase |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | GLUSTERFS_METADATA_INCONSISTENCY | ||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
Cause:
turning volume options open-behind and write-behind on is known to create issues when storing RDBMS data on Gluster volumes.
Consequence:
Need to add a description in the Admin Guide for these two volume options.
Workaround (if any):
Turn the two options off by default
Result:
No data corruption and error message when running RDBMS
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2019-05-07 08:31:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1540116 | ||
Attachments: |
Description
Gerald Sternagl
2017-11-29 13:55:56 UTC
I want to understand whether its data or metadata (stat) that is corrupted. What would be your results like if you, 1. turn off performance.stat-prefetch 2. turn on performance.write-behind and performance.open-behind There is an upstream patch which fixes stale stats due to write-behind [1]. Can you check whether this patch helps? Also, while running tests is it possible to collect the request/response traffic between kernel and glusterfs? This can be done by mounting glusterfs with the option, --dump-fuse=PATH Dump fuse traffic to PATH Meanwhile we'll try to reproduce the bug locally. [1] http://review.gluster.org/15757 (In reply to Raghavendra G from comment #4) > I want to understand whether its data or metadata (stat) that is corrupted. > What would be your results like if you, > > 1. turn off performance.stat-prefetch > 2. turn on performance.write-behind and performance.open-behind Note that all the three above settings should be effective during a single test run. > > There is an upstream patch which fixes stale stats due to write-behind [1]. > Can you check whether this patch helps? > > Also, while running tests is it possible to collect the request/response > traffic between kernel and glusterfs? This can be done by mounting glusterfs > with the option, > --dump-fuse=PATH Dump fuse traffic to PATH > > Meanwhile we'll try to reproduce the bug locally. > > [1] http://review.gluster.org/15757 (In reply to Raghavendra G from comment #4) > I want to understand whether its data or metadata (stat) that is corrupted. > What would be your results like if you, > > 1. turn off performance.stat-prefetch > 2. turn on performance.write-behind and performance.open-behind > > There is an upstream patch which fixes stale stats due to write-behind [1]. > Can you check whether this patch helps? > > Also, while running tests is it possible to collect the request/response > traffic between kernel and glusterfs? This can be done by mounting glusterfs > with the option, > --dump-fuse=PATH Dump fuse traffic to PATH Please attach the binary fuse-dump to the bug once tests are over. > > Meanwhile we'll try to reproduce the bug locally. > > [1] http://review.gluster.org/15757 (In reply to Raghavendra G from comment #4) > I want to understand whether its data or metadata (stat) that is corrupted. > What would be your results like if you, > > 1. turn off performance.stat-prefetch > 2. turn on performance.write-behind and performance.open-behind > > There is an upstream patch which fixes stale stats due to write-behind [1]. > Can you check whether this patch helps? > > Also, while running tests is it possible to collect the request/response > traffic between kernel and glusterfs? This can be done by mounting glusterfs > with the option, > --dump-fuse=PATH Dump fuse traffic to PATH > > Meanwhile we'll try to reproduce the bug locally. > > [1] http://review.gluster.org/15757 stat-prefetch: off open-behind + write-behind: on => Same errors and warnings as before. FYI: Before I submitted this BZ I tested all performanc.* options turning them on/off indididually. Only open-behind & write-behind caused these issues. Created attachment 1362642 [details]
Fuse-dump with open-behind, write-behind= on, stat-prefetch=off
Fuse-dump with open-behind, write-behind= on, stat-prefetch=off
Created attachment 1362643 [details]
Fuse-dump with open-behind, write-behind= off, stat-prefetch=off
Created attachment 1362655 [details]
Fuse-log with open-behind, write-behind=on, stat-prefetch=off
I also mounted the Gluster volume with the volume command to get the fuse debug output:
glusterfs --volfile-server=node1 --volfile-id=vol01 --log-file=/tmp/fuse-log2.txt --log-level=DEBUG /var/lib/pgsql
volume options set:
performance.open-behind: on
performance.write-behind: on
performance.stat-prefetch: off
Created attachment 1362656 [details]
Fuse-log with open-behind, write-behind=off, stat-prefetch=off
I also mounted the Gluster volume with the volume command to get the fuse debug output:
glusterfs --volfile-server=node1 --volfile-id=vol01 --log-file=/tmp/fuse-log2.txt --log-level=DEBUG /var/lib/pgsql
volume options set:
performance.open-behind: off
performance.write-behind: off
performance.stat-prefetch: off
When I ran my last tests with the fuse-log on which I then reviewed, some entries in the log led me to a suspicion and I ran some more tests. My suspicion was that the issues might have to do with times being out of synch between the gluster nodes. I'm running these tests on VMs and occasionally I just tend to save the VMs instead of doing a full reboot. My fault but not too uncommon. This results in times being different across the different VMs as there is no NTP/CHRONY running on these nodes by default (RHGS setup doesn't do that). So I setup chrony to synchronize all VMs and ran the tests again with the following volume option settings: open-behind: on write-behind: on stat-prefetch: on I ran the same shortened test procedures: # pgbench -i -s 1 # pgbench -c 4 -j 2 -T 10 Result: No more WARNINGS. No more ERRORS. Data seems to be ok. My colleague who first had experienced these issues while doing a customer PoC with RHGS on AWS also used the default RHGS deployment and volume settings which effectively led to the same result: data corruption. My advice: 1. Make time synchronization between Gluster nodes mandatory. The only options during installation should be to synchronize against an internal or an external source. Having no time synch should be unsupported! There are some occurances in the RHGS Admin guide about using time synchronization but nowhere in the context of replicas running out of synch 2. Document a strong warning at the volume options section when using open-behind, write-behind, stat-prefetch and mention the necessity to setup time synchronization. 3. It would be worth IMO to investigate and document which time deviation is actually still acceptable. I re-ran tests with open-behind/write-behind options on and the warnings and errors did occur again. So time sync seems to have some influence but there is still some more to it. |