Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1226665 - gf_store_save_value fails to check for errors, leading to emptying files in /var/lib/glusterd/
gf_store_save_value fails to check for errors, leading to emptying files in /...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
3.0
Unspecified Unspecified
high Severity high
: ---
: RHGS 3.1.1
Assigned To: Gaurav Kumar Garg
Byreddy
: Patch, ZStream
Depends On:
Blocks: 1226829 1251815 1253148
  Show dependency treegraph
 
Reported: 2015-05-31 11:47 EDT by Cedric Buissart
Modified: 2016-06-05 19:38 EDT (History)
12 users (show)

See Also:
Fixed In Version: glusterfs-3.7.1-12
Doc Type: Bug Fix
Doc Text:
Previously, when there was no space left on the file system and when user performed any operation resulted to change in /var/lib/glusterd/* files, then the glusterd was failing to write to a temporary file. With this fix, a proper error message is displayed when /var/lib/glusterd/* is full.
Story Points: ---
Clone Of:
: 1226829 (view as bug list)
Environment:
Last Closed: 2015-10-05 03:09:38 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
check fflush's return value (488 bytes, patch)
2015-05-31 12:39 EDT, Cedric Buissart
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1459813 None None None Never
Red Hat Product Errata RHSA-2015:1845 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.1 update 2015-10-05 07:06:22 EDT

  None (edit)
Description Cedric Buissart 2015-05-31 11:47:34 EDT
Description of problem:

gf_store_save_value() might not catch write failures. This can lead to files being emptied in /var/lib/glusterd/.
Noticeably, /var/lib/glusterd/glusterd.info and /var/lib/glusterd/peers/* are prone to being emptied.

It can easily be reproduce when the file system is full.



Version-Release number of selected component (if applicable): all : 3.0 and upstream are impacted


How reproducible: easy peasey


Steps to Reproduce:
1. separate /var/lib/glusterd from /var/log to ensure that you dont miss logs
2. fill up the file system containing /var/lib/glusterd/ (doing it in a loop, to ensure there is no free space at any point of time)
3. restart glusterd on another node, to force an update

Actual results:

glusterd will fail to write the temporary file, but will not catch the error. Thus the empty temporary file takes the place of the previous file, and the content is lost.

-> If this happens to one of the peer file, glusterd will not be able to restart until it is sync'ed back from another node.

-> If this happens to glusterd.info, glusterd will regenerate a new UUID upon restart, and that will lead to the node's disappearance in RHEV-M.

Expected results:

gf_store_save_value() should catch the error, and return an error to the caller, so that it is warned not to make the file replacement, and so that a real error is being written in the logs.

Additional info:

=> Normal logs (glusterd set in <DEBUG> log level) :
----8<----
[2015-05-31 12:03:36.080921] D [store.c:372:gf_store_save_value] 0-: returning: 0
[2015-05-31 12:03:36.081000] D [store.c:372:gf_store_save_value] 0-: returning: 0
[2015-05-31 12:03:36.081090] D [store.c:372:gf_store_save_value] 0-: returning: 0
---->8----

=> No error, gf_store_save_value() didn't produce a single warning, although the data were not written.

This is because we did not check the fflush's return code.
See patch suggested :


in libglusterfs/src/store.c, gf_store_save_value (...)
----8<----
         ret = fflush (fp);
-        if (feof (fp)) {
+        if (feof (fp) || ret) {
                 gf_log ("", GF_LOG_WARNING,
                         "fflush failed, error: %s",
                         strerror (errno));
---->8----

After the patch, in a similar situation, an error is successfully returned :
=> With additional logs (the "CEDRIC" logs have been added to show fflush's behaviour)
----8<----
[2015-05-31 15:23:02.644197] D [store.c:361:gf_store_save_value] 0-: CEDRIC: fflush returned: -1
[2015-05-31 15:23:02.644235] W [store.c:365:gf_store_save_value] 0-: fflush failed, error: No space left on device
[2015-05-31 15:23:02.644272] D [store.c:375:gf_store_save_value] 0-: returning: -1
[2015-05-31 15:23:02.644291] C [glusterd-store.c:1914:glusterd_store_global_info] 0-management: Storing uuid failed ret = -1
[2015-05-31 15:23:02.644548] E [glusterd-store.c:1943:glusterd_store_global_info] 0-management: Failed to store glusterd global-info
[2015-05-31 15:23:02.644593] E [glusterd-handshake.c:1121:__glusterd_mgmt_hndsk_versions_ack] 0-management: Failed to store op-version
---->8----
Comment 2 Cedric Buissart 2015-05-31 12:39:01 EDT
Created attachment 1032891 [details]
check fflush's return value

checking feof() does not seem to be sufficient for catching errors.
Comment 4 Atin Mukherjee 2015-08-08 09:25:58 EDT
Upstream patch http://review.gluster.org/11029 is merged now
Comment 6 Gaurav Kumar Garg 2015-08-12 05:33:47 EDT
downstream patch url: https://code.engineering.redhat.com/gerrit/54977
Comment 8 Byreddy 2015-08-27 08:53:19 EDT
Verified this Bug with version "glusterfs-3.7.1-12"

Steps used:
~~~~~~~~~~~
1. Filled up /var
2. glusterd restart failed.
3. Checked glusterd log for the error message. (no space).

Portion of log:
~~~~~~~~~~~~~~~
[2015-08-27 16:53:08.807647] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30703
[2015-08-27 16:53:08.834497] E [MSGID: 101012] [store.c:72:gf_store_mkstemp] 0-: Failed to open /var/lib/glusterd/glusterd.info.tmp. [No space left on device]
[2015-08-27 16:53:08.834560] E [MSGID: 106177] [glusterd-store.c:1898:glusterd_store_global_info] 0-management: Failed to store glusterd global-info
[2015-08-27 16:53:08.834584] E [MSGID: 106089] [glusterd-handshake.c:1199:__glusterd_mgmt_hndsk_versions_ack] 0-management: Failed to store op-version
[2015-08-27 16:53:11.816557] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30703
[2015-08-27 16:53:11.841544] E [MSGID: 101012] [store.c:72:gf_store_mkstemp] 0-: Failed to open /var/lib/glusterd/glusterd.info.tmp. [No space left on device]
[2015-08-27 16:53:11.841604] E [MSGID: 106177] [glusterd-store.c:1898:glusterd_store_global_info] 0-management: Failed to store glusterd global-info
[2015-08-27 16:53:11.841628] E [MSGID: 106089] [glusterd-handshake.c:1199:__glusterd_mgmt_hndsk_versions_ack] 0-management: Failed to store op-version
[2015-08-27 16:53:14.826202] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30703
[2015-08-27 16:53:14.857281] E [MSGID: 101012] [store.c:72:gf_store_mkstemp] 0-: Failed to open /var/lib/glusterd/glusterd.info.tmp. [No space left on device]
[2015-08-27 16:53:14.857346] E [MSGID: 106177] [glusterd-store.c:1898:glusterd_store_global_info] 0-management: Failed to store glusterd global-info
[2015-08-27 16:53:14.857370] E [MSGID: 106089] [glusterd-handshake.c:1199:__glusterd_mgmt_hndsk_versions_ack] 0-management: Failed to store op-version
[2015-08-27 16:53:17.835288] I [MSGID: 106163] [glusterd-handshake.c:1193:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30703
[2015-08-27 16:53:17.872672] E [MSGID: 101012] [store.c:72:gf_store_mkstemp] 0-: Failed to open /var/lib/glusterd/glusterd.info.tmp. [No space left on device]
[2015-08-27 16:53:17.872738] E [MSGID: 106177] [glusterd-store.c:1898:glusterd_store_global_info] 0-management: Failed to store glusterd global-info

Moving this bug to verified state based on above verification result.
Comment 9 Divya 2015-09-22 04:48:09 EDT
Gaurav,

Please review and sign-off the edited text.
Comment 11 errata-xmlrpc 2015-10-05 03:09:38 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html

Note You need to log in before you can comment on or make changes to this bug.