Description of problem: During the introduction of snapshot, in glusterfs-3.6, a change was done in the way the /var/lib/glusterfs/vols/<vol>/info file was read, and a new limitation of 4096 (PATH_MAX) character per line was introduce. - Patch introducing the limitation : http://review.gluster.org/#/c/7128/ - in particular : http://review.gluster.org/#/c/7128/21/libglusterfs/src/store.c , line 182 Version-Release number of selected component (if applicable): glusterfs-3.6 and above How reproducible: 100% Steps to Reproduce: 1. set a key/value longer than 4096 characters for a volume 2. restart glusterd Actual results: fail Expected results: no fail, or at the very least a better explanation Additional info: This limitation was introduced with snapshots : http://review.gluster.org/#/c/7128/ before, there was no limitation : ----8<--- int gf_store_read_and_tokenize (FILE *file, char *str, char **iter_key, char **iter_val, gf_store_op_errno_t *store_errno) { [...] ret = fscanf (file, "%s", str); if (ret <= 0 || feof (file)) { ret = -1; *store_errno = GD_STORE_EOF; goto out; } --->8--- after, the max length is PATH_MAX, which is 4096 : ---8<--- int gf_store_read_and_tokenize (FILE *file, char *str, char **iter_key, char **iter_val, gf_store_op_errno_t *store_errno) { [...] temp = fgets (str, PATH_MAX, file); if (temp == NULL || feof (file)) { ret = -1; *store_errno = GD_STORE_EOF; goto out; } [...] --->8--- Quick reproducer : [root@cbuissar-rhel7-rhs31-node1 rpms]# LIST=$(for IP in 122.{0..255} ; do echo -n "192.168.$IP," ; done ; for IP in 123.{0..6} ; do echo -n "192.168.$IP" ; [[ $IP =~ 123.6 ]] || echo -n "," ; done ;) [root@cbuissar-rhel7-rhs31-node1 rpms]# echo auth.allow=$LIST | wc -c 4095 [root@cbuissar-rhel7-rhs31-node1 rpms]# gluster volume set slim auth.allow $LIST volume set: success [root@cbuissar-rhel7-rhs31-node1 rpms]# systemctl restart glusterd.service [root@cbuissar-rhel7-rhs31-node1 rpms]# echo $? 0 [root@cbuissar-rhel7-rhs31-node1 rpms]# LIST="$LIST,192.168.123.7" [root@cbuissar-rhel7-rhs31-node1 rpms]# echo auth.allow=$LIST | wc -c 4109 [root@cbuissar-rhel7-rhs31-node1 rpms]# gluster volume set slim auth.allow $LIST volume set: success [root@cbuissar-rhel7-rhs31-node1 rpms]# systemctl restart glusterd.service Job for glusterd.service failed. See 'systemctl status glusterd.service' and 'journalctl -xn' for details. will attach the logs. => This will be considered as regression by customer upgrading from RHGS 2.1, which did not seem to have this limitation : glusterd refuses to start after upgrade to 3.X => If we need to keep a limitation, we should probably at least expand to a bigger value than 4096. => However, at the minimum, we should test the length, and if we reached the limit, either truncate/cancel, or fail all together, but with a clear error (you can't guess the issue unless you are in DEBUG level)
Created attachment 1080305 [details] logs
Just to clarify : the issue is that fgets will not place the stream pointer to the next line. And next token read will fail because we will read the end of the line, and there will not be any '=' to parse - I dont think that PATH_MAX is an actual limitation, since the str seems to be allocated with the size of the whole file anyways Thus, should we loop on fgets, appending to str, until we reach the end of line/file ? - should the PATH_MAX be used here? since it's originally supposed to be a limitation in path length (i.e. : not line length) - The functions all seem to handle errors passing (gf_store_op_errno_t), but there does not seem to be a use of it. Maybe using it would help to clarify an unexpected result ?
This looks like a definite regression. If we revert the code use fscanf things work fine. I do feel the culprit here is PATH_MAX since it limits fgets to read at max that many number of characters. If we end up having a key=value crossing PATH_MAX then this API will fail.
Hi Atin, Thanks for looking into it! My fear was that this "PATH_MAX" was introduced on purpose and might have been a real limitation. However, if not, then we could use the file size as limitation, which I think is what is used for the string's allocation.
(In reply to Cedric Buissart from comment #6) > Hi Atin, > Thanks for looking into it! > My fear was that this "PATH_MAX" was introduced on purpose and might have > been a real limitation. > > However, if not, then we could use the file size as limitation, which I > think is what is used for the string's allocation. That's right Cedric. This is exactly what the patch will look like. Gaurav will be sending a patch for this.
upstream patch available: http://review.gluster.org/#/c/12346/
Verified this bug with the version 3.1.2 (glusterfs-3.7.5-0.3). Glusterd restart worked after setting key/value > 4096. Steps: (Same as reproducing steps specified above ) 1. Created a Distributed volume 2. Set the auth.allow with 4098 character length 3. glusterd restart worked fine. Based one above info. moving this bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html