Bug 1355801
Summary: | Brick process on container node not coming up after node reboot | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Anoop <annair> | ||||
Component: | rhgs-server-container | Assignee: | Humble Chirammal <hchiramm> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Prasanth <pprakash> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | rhgs-3.1 | CC: | amukherj, annair, hchiramm, lpabon, mliyazud, pkarampu, pprakash, rcyriac, rhs-bugs, sankarshan, sashinde | ||||
Target Milestone: | --- | Keywords: | ZStream | ||||
Target Release: | RHGS Container Converged 1.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | rhgs-server-docker-3.1.3-12 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-10-14 13:43:58 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1332128 | ||||||
Attachments: |
|
Description
Anoop
2016-07-12 14:47:49 UTC
Based on the discussion I had with Anoop, it seems like glusterd sent a trigger to start the brick, but the brick process didn't come up as it failed to get trusted.glusterfs.volume-id from the brick path which is weird. This looks similar to BZ 1340049 Anoop, I believe you haven't tampered the brick from the backend (removing the xattrs accidentally?). If you have the set up with you, could you check the xattrs of the brick which failed to come up from both the host and the container? It seems like while bind mounting the brick path from host to the container the xattrs are not inherited from some reason and if this hypothesis is true, then you should be able to see the difference between the xattr list from host and container. Once that's confirmed I think this BZ then moves to the heketi layer to see how exactly the bind mount takes place during a reboot. Honestly this doesn't look like an issue at Gluster layer. Thanks, Atin <5ef492224163/brick_bf85c53f40d125c097486f6193fabf70/brick getfattr: Removing leading '/' from absolute path names # file: var/lib/heketi/mounts/vg_2af243604dd82a9918105ef492224163/brick_bf85c53f40d125c097486f6193fabf70/brick security.selinux=0x73797374656d5f753a6f626a6563745f723a646f636b65725f7661725f6c69625f743a733000 <ts/vg_2af243604dd82a9918105ef492224163/brick_bf85c53f40d125c097486f6193fabf70> getfattr: Removing leading '/' from absolute path names # file: var/lib/heketi/mounts/vg_2af243604dd82a9918105ef492224163/brick_bf85c53f40d125c097486f6193fabf70/brick security.selinux=0x73797374656d5f753a6f626a6563745f723a646f636b65725f7661725f6c69625f743a733000 <49d5c6069905/brick_ee06d72f0d312ec1f8aa5d7926c90821/brick getfattr: Removing leading '/' from absolute path names # file: var/lib/heketi/mounts/vg_c12c12bb62a005c1442949d5c6069905/brick_ee06d72f0d312ec1f8aa5d7926c90821/brick security.selinux=0x73797374656d5f753a6f626a6563745f723a646f636b65725f7661725f6c69625f743a733000 (In reply to Anoop from comment #4) > > <5ef492224163/brick_bf85c53f40d125c097486f6193fabf70/brick > getfattr: Removing leading '/' from absolute path names > # file: > var/lib/heketi/mounts/vg_2af243604dd82a9918105ef492224163/ > brick_bf85c53f40d125c097486f6193fabf70/brick > security. > selinux=0x73797374656d5f753a6f626a6563745f723a646f636b65725f7661725f6c69625f7 > 43a733000 > > <ts/vg_2af243604dd82a9918105ef492224163/ > brick_bf85c53f40d125c097486f6193fabf70> > getfattr: Removing leading '/' from absolute path names > # file: > var/lib/heketi/mounts/vg_2af243604dd82a9918105ef492224163/ > brick_bf85c53f40d125c097486f6193fabf70/brick > security. > selinux=0x73797374656d5f753a6f626a6563745f723a646f636b65725f7661725f6c69625f7 > 43a733000 > > <49d5c6069905/brick_ee06d72f0d312ec1f8aa5d7926c90821/brick > getfattr: Removing leading '/' from absolute path names > # file: > var/lib/heketi/mounts/vg_c12c12bb62a005c1442949d5c6069905/ > brick_ee06d72f0d312ec1f8aa5d7926c90821/brick > security. > selinux=0x73797374656d5f753a6f626a6563745f723a646f636b65725f7661725f6c69625f7 > 43a733000 Its clear that all of these bricks do not have any gluster xattrs set from this output. Are these bricks even mounted? Could there be a case that we are hitting a race between bind mounting the bricks and bringing up glusterd service? This doesn't look like a bug at Gluster layer. (In reply to Atin Mukherjee from comment #5) > (In reply to Anoop from comment #4) > > > > <5ef492224163/brick_bf85c53f40d125c097486f6193fabf70/brick > > getfattr: Removing leading '/' from absolute path names > > # file: > > var/lib/heketi/mounts/vg_2af243604dd82a9918105ef492224163/ > > brick_bf85c53f40d125c097486f6193fabf70/brick > > security. > > selinux=0x73797374656d5f753a6f626a6563745f723a646f636b65725f7661725f6c69625f7 > > 43a733000 > > > > <ts/vg_2af243604dd82a9918105ef492224163/ > > brick_bf85c53f40d125c097486f6193fabf70> > > getfattr: Removing leading '/' from absolute path names > > # file: > > var/lib/heketi/mounts/vg_2af243604dd82a9918105ef492224163/ > > brick_bf85c53f40d125c097486f6193fabf70/brick > > security. > > selinux=0x73797374656d5f753a6f626a6563745f723a646f636b65725f7661725f6c69625f7 > > 43a733000 > > > > <49d5c6069905/brick_ee06d72f0d312ec1f8aa5d7926c90821/brick > > getfattr: Removing leading '/' from absolute path names > > # file: > > var/lib/heketi/mounts/vg_c12c12bb62a005c1442949d5c6069905/ > > brick_ee06d72f0d312ec1f8aa5d7926c90821/brick > > security. > > selinux=0x73797374656d5f753a6f626a6563745f723a646f636b65725f7661725f6c69625f7 > > 43a733000 > > Its clear that all of these bricks do not have any gluster xattrs set from > this output. Are these bricks even mounted? Could there be a case that we > are hitting a race between bind mounting the bricks and bringing up glusterd > service? Bricks are not bind mounted into the container because the container is running in Privileged mode. They are mounted as normal. Humble, please take a look at the container and brick information. (In reply to Anoop from comment #4) > > <49d5c6069905/brick_ee06d72f0d312ec1f8aa5d7926c90821/brick > getfattr: Removing leading '/' from absolute path names > # file: > var/lib/heketi/mounts/vg_c12c12bb62a005c1442949d5c6069905/ > brick_ee06d72f0d312ec1f8aa5d7926c90821/brick > security. > selinux=0x73797374656d5f753a6f626a6563745f723a646f636b65725f7661725f6c69625f7 > 43a733000 Anoop, I have couple of questions here: *) iic, above output has been taken from the container ? can you get the same output from the host ? *) Have we tested this scenario ( node reboot) in previous version of RHGS container release (3.1.2) , if yes, Did it pass ? Humble, 1. I do not see the brick path on the host (which is anoher issue). 2. This scenarios was tested with the 3.1.2 container and this worked. However, in the older container the we bind mounted the bricks from host. This is the first time this test is being carried out with LVM running inside the container. (In reply to Anoop from comment #9) > 1. I do not see the brick path on the host (which is anoher issue). > I dont think. You have to mount it in host and check the output. > 2. This scenarios was tested with the 3.1.2 container and this worked. IIUC, node reboot was tested with previous container release ( RHGS 3.1.2) and it worked. Correct? > However, in the older container the we bind mounted the bricks from host. > This is the first time this test is being carried out with LVM running > inside the container. Yes. I dont have doubt on it :). Have some more questions here *) Have we tested this behaviour with any of the previous container build of APLO ? Or Is this the first time this test has happened? What I am trying to isolate here is, any changes in the package update caused this issue. It would be helpful if you can provide that info. *) Is this issue consistently reproducible in all the QE setup? *) Also, after node reboot, did this issue hit on all the bricks on the host or Is it for some of it ? because the snip in c#1 is clobbered ( with --more strings in it) and does not give full output. * Not tested reboots with previous builds and I think i will be difficult (considering that we have a GA on 26th July) to go back and test this on older build. * This is reproducible evertime on my setup. * See it only on the bricks of the node that was rebooted. > * See it only on the bricks of the node that was rebooted. Yes, but did all the bricks are effected in this node ? Also can you provide the output requested in comment#8 ? If its consistently reproducible in your setup, can you perform below command in your system before node reboot ? #sync && echo 3 > /proc/sys/vm/drop_caches Once the above command is successful, can you execute the test ? Also, how are you testing the node reboot scenario ? what specific command/action is performed ? Are these tests performed on VMs ? [Self note] It looks to me that the 'extended attributes' in 'trusted namespace' has not been set/synced before node reboot which cause this issue. Or some kind of race in this area cause this behaviour. At present, I can think of below isolation steps/RCA. 1) mount option 'user_xattr' available for ext* FS. user_xattr Enable Extended User Attributes. Equivalent XFS option: However it looks like its "ON" by default: attr2 (default) Enable the use of version two of the extended attribute inline allocation policy. Isolation : Does ext4 hit this issue ? Any difference if we mount with above options, if its not *ON* . 2) Trusted extended attributes Trusted extended attributes are visible and accessible only to processes that have the CAP_SYS_ADMIN capability. Attributes in this class are used to implement mechanisms in user space (i.e., outside the kernel) which keep information in extended attributes to which ordinary processes should not have access. NOTE: As we are running in Privileged mode it has "CAP_SYS_ADMIN" capability. 3) Does any updates in 'attr' or 'libattr' cause this issue ? NOTE: Have to reach out to the package maintainers for further input. Isolation: Try the previous version of the builds. 4) May be not directly related, however check the possibilities with Gluster/AFR folks. https://bugzilla.redhat.com/show_bug.cgi?id=762680 https://bugzilla.redhat.com/show_bug.cgi?id=811244 https://www.gluster.org/pipermail/gluster-users/2013-November/015054.html 5) Reproduce this issue without Heketi, that said instead of 'heketi' setting up the disk/device layout and creating volumes do it by manual steps from the container, it can be an isolation. 6) Verify the extended attribute's visibility from the host and container before reboot. Comment #8 output can help here. 7) Check the possibility of 'lack of syncing the FS entries' before the node reboot cause this issue? sync and drop cache in comment#12 is given for the same reason. I guess you have all the date you need now. Since the builds are not available yet, I ran the tests using upstream binaries: 1. GlusterFS container which has: glusterfs-3.8.1-1.el7.x86_64 glusterfs-api-3.8.1-1.el7.x86_64 glusterfs-cli-3.8.1-1.el7.x86_64 glusterfs-geo-replication-3.8.1-1.el7.x86_64 glusterfs-libs-3.8.1-1.el7.x86_64 glusterfs-client-xlators-3.8.1-1.el7.x86_64 glusterfs-fuse-3.8.1-1.el7.x86_64 glusterfs-server-3.8.1-1.el7.x86_64 2. Heketi Container based on the same version as downstream: 2.0.5 3. CentOS Atomic 7.2.1141 and OpenShift Origin 1.2 All this can be easily create using the Heketi/OpenShift Vagrant Demo: https://github.com/heketi/vagrant-heketi/tree/master/openshift The results are that I was able to reboot the systems 50 times, and all times, the systems came back, and the bricks where available. Heketi was able to come up every single time. This is just a datapoint in the tests, but it does show that it is possible to continue rebooting without issue. I hit is issue almost every time on my setup. In fact Humble is using my setup and he himself was able to recreate this issue even after ensuring that the xattrs are visible on the host system. Humble you may want to update the bug with your observations. Here is the progress made so far. I was debugging the issue in Anoop's setup and here is the observation. *) In one of the OSE node where we have 50 brick processes ( 25 volumes ) running, after the node reboot, Everything came back without issues! This issue was *not* reproducible. *) However on the second node, in the same OSE cluster where we have 50 brick processes, after node reboot some of the brick processes were DOWN. Further analysis was done on the problematic node and found that, the 'brick' directory in /var/lib/heketi/mounts/vg_*/brick_* have different permissions ( same as the issue reported here in this bz # https://bugzilla.redhat.com/show_bug.cgi?id=1356050 ) than the permissions before node reboot and the xattrs were missing (I had made sure all the xattrs were present on the brick processes before rebooting the node) on it which caused the brick processes to go down. It looks like the brick directories are newly created/accessed. Then the volumes were 'force' started and rebooted, this time this issue didnt come up. Looks like the issue is not always reproducible, but I can confirm there is an issue which is triggered at times. *) I made couple of attempts to recreate the issue without heketi and unfortunately the issue was not reproducible. In this recreation, the '/dev/' was exported to the container and the LVM creation was performed from the container and used for the gluster volume. Couple of volumes were created and after node reboot all the brick processes came up. With this result, I am not saying this is an issue with Heketi, but somehow it was not triggered in the manual setup *) We have opened a bug against 'attr' package in RHEL (https://bugzilla.redhat.com/show_bug.cgi?id=1356810) as we were seeing missing attributes in some scenarios. *) Didnt get a chance to recreate the issue with any of the previous builds of APLO. Will be trying it out soon. Also, further isolation is going on and I will update the bugzilla with the status soon. @Anoop, I am pretty sure both bugs ( bz#1355801 and bz#1356050 ) are same in this context. Do we need to track it differently or Is it fine to merge ? Let's trcak it as separate bugs for now and once we have the RCA we can merge both. [Status update] We went through different layers to isolate this issue. That said, components like attr, device mapper, kernel, XFS , gluster layers are examined to find out the root cause. After tracing these components it looks like a race in device mapper and FS caused the issue. We have a container image built with 'possible fix'. The image is available @ internal docker registry as docker-registry.usersys.redhat.com/gluster/rhgs-test:2 . Can you please verify this in one of your environment. After the verification, we will cut a brew build. (In reply to Humble Chirammal from comment #17) > *) However on the second node, in the same OSE cluster where we have 50 > brick processes, after node reboot some of the brick processes were DOWN. > Further analysis was done on the problematic node and found that, the > 'brick' directory in /var/lib/heketi/mounts/vg_*/brick_* have different > permissions ( same as the issue reported here in this bz # > https://bugzilla.redhat.com/show_bug.cgi?id=1356050 ) than the permissions > before node reboot and the xattrs were missing (I had made sure all the > xattrs were present on the brick processes before rebooting the node) on it > which caused the brick processes to go down. It looks like the brick > directories are newly created/accessed. Then the volumes were 'force' > started and rebooted, this time this issue didnt come up. Looks like the > issue is not always reproducible, but I can confirm there is an issue which > is triggered at times. > One question remained here though: Who is creating 'brick' directory in the problematic occurrence and I was told 'gluster' is *not* doing it. However, looking at below code path, it has found that 'index' xlator does it. ---snip-- int index_dir_create (xlator_t *this, const char *subdir) { ..... priv = this->private; make_index_dir_path (priv->index_basepath, subdir, fullpath, sizeof (fullpath)); ret = sys_stat (fullpath, &st); if (!ret) { if (!S_ISDIR (st.st_mode)) ret = -2; goto out; } pathlen = strlen (fullpath); if ((pathlen > 1) && fullpath[pathlen - 1] == '/') fullpath[pathlen - 1] = '\0'; dir = strchr (fullpath, '/'); while (dir) { dir = strchr (dir + 1, '/'); if (dir) len = pathlen - strlen (dir); else len = pathlen; strncpy (path, fullpath, len); path[len] = '\0'; ret = sys_mkdir (path, 0600); --> [1] if (ret && (errno != EEXIST)) goto out; } ret = 0; --/snip-- Later 'glusterd' fails to find the 'xattrs' on these 'newly' created directories and the brick processes goes down. @Atin, can you confirm ? (In reply to Humble Chirammal from comment #20) > (In reply to Humble Chirammal from comment #17) > > > *) However on the second node, in the same OSE cluster where we have 50 > > brick processes, after node reboot some of the brick processes were DOWN. > > Further analysis was done on the problematic node and found that, the > > 'brick' directory in /var/lib/heketi/mounts/vg_*/brick_* have different > > permissions ( same as the issue reported here in this bz # > > https://bugzilla.redhat.com/show_bug.cgi?id=1356050 ) than the permissions > > before node reboot and the xattrs were missing (I had made sure all the > > xattrs were present on the brick processes before rebooting the node) on it > > which caused the brick processes to go down. It looks like the brick > > directories are newly created/accessed. Then the volumes were 'force' > > started and rebooted, this time this issue didnt come up. Looks like the > > issue is not always reproducible, but I can confirm there is an issue which > > is triggered at times. > > > > One question remained here though: Who is creating 'brick' directory in the > problematic occurrence and I was told 'gluster' is *not* doing it. However, > looking at below code path, it has found that 'index' xlator does it. > > ---snip-- > > int > index_dir_create (xlator_t *this, const char *subdir) > { > ..... > priv = this->private; > make_index_dir_path (priv->index_basepath, subdir, fullpath, > sizeof (fullpath)); > ret = sys_stat (fullpath, &st); > if (!ret) { > if (!S_ISDIR (st.st_mode)) > ret = -2; > goto out; > } > > pathlen = strlen (fullpath); > if ((pathlen > 1) && fullpath[pathlen - 1] == '/') > fullpath[pathlen - 1] = '\0'; > dir = strchr (fullpath, '/'); > while (dir) { > dir = strchr (dir + 1, '/'); > if (dir) > len = pathlen - strlen (dir); > else > len = pathlen; > strncpy (path, fullpath, len); > path[len] = '\0'; > ret = sys_mkdir (path, 0600); --> [1] > if (ret && (errno != EEXIST)) > goto out; > } > ret = 0; > > > --/snip-- > > Later 'glusterd' fails to find the 'xattrs' on these 'newly' created > directories and the brick processes goes down. > > @Atin, can you confirm ? @Humble, index_dir_create() does mkdir -p <brick-path>/.glusterfs/indices/xattrop if the path doesn't exist. How did we get into a state where <brick-path> itself is missing? Do we know? Pranith @Humble - Honestly I was not aware of that index xlator does a mkdir -p . But as Pranith pointed out complete brickpath gets created only when the brick path is not present. So my apologies to you if this has wasted some of your effort to dig into the problem. (In reply to Pranith Kumar K from comment #21) > (In reply to Humble Chirammal from comment #20) > > (In reply to Humble Chirammal from comment #17) > > ...... > > @Humble, > index_dir_create() does mkdir -p > <brick-path>/.glusterfs/indices/xattrop if the path doesn't exist. How did > we get into a state where <brick-path> itself is missing? Do we know? > @Pranith, Thanks for confirming. I have a question here, Are we getting any advantage by creating this path if its not present? because, later the 'posix' xlator comes in and complaint there is no 'xattrs' present on this newly created path and fail the brick process anyway. Any thoughts or Am I missing something here? (In reply to Atin Mukherjee from comment #22) > @Humble - Honestly I was not aware of that index xlator does a mkdir -p . > But as Pranith pointed out complete brickpath gets created only when the > brick path is not present. So my apologies to you if this has wasted some of > your effort to dig into the problem. @Atin, as there are too many layers involved, it was very difficult to isolate this issue and this behaviour really confused the isolation process. Any way no worries, we all learned the hidden facts :). (In reply to Humble Chirammal from comment #23) > (In reply to Pranith Kumar K from comment #21) > > (In reply to Humble Chirammal from comment #20) > > > (In reply to Humble Chirammal from comment #17) > > > > ...... > > > > @Humble, > > index_dir_create() does mkdir -p > > <brick-path>/.glusterfs/indices/xattrop if the path doesn't exist. How did > > we get into a state where <brick-path> itself is missing? Do we know? > > > > @Pranith, Thanks for confirming. I have a question here, Are we getting any > advantage by creating this path if its not present? because, later the > 'posix' xlator comes in and complaint there is no 'xattrs' present on this > newly created path and fail the brick process anyway. Any thoughts or Am I > missing something here? Index xlator is designed to keep the indices on any hard-disk. Not necessarily in the .glusterfs. This is the default place where it is kept. So to make sure it works, we need the path to be created. @Atin, @Humble, The design when all these decisions were taken is that by the time you create a volume/add-bricks/replace-bricks you already have the brick-path with relavant extended attributes already in place, i.e. glusterd makes sure to create the paths and set extended attributes. Did this decision change in the recent past? Why is it that index xaltor even getting into a state where it has to create a brick path. Why doesn't it already exist by the time we do volume start? (In reply to Pranith Kumar K from comment #25) > (In reply to Humble Chirammal from comment #23) > > (In reply to Pranith Kumar K from comment #21) > > > (In reply to Humble Chirammal from comment #20) > > > > (In reply to Humble Chirammal from comment #17) > > > > > > ...... > > > > > > @Humble, > > > index_dir_create() does mkdir -p > > > <brick-path>/.glusterfs/indices/xattrop if the path doesn't exist. How did > > > we get into a state where <brick-path> itself is missing? Do we know? > > > > > > > @Pranith, Thanks for confirming. I have a question here, Are we getting any > > advantage by creating this path if its not present? because, later the > > 'posix' xlator comes in and complaint there is no 'xattrs' present on this > > newly created path and fail the brick process anyway. Any thoughts or Am I > > missing something here? > > Index xlator is designed to keep the indices on any hard-disk. Not > necessarily in the .glusterfs. This is the default place where it is kept. > So to make sure it works, we need the path to be created. > > @Atin, @Humble, The design when all these decisions were taken is that by > the time you create a volume/add-bricks/replace-bricks you already have the > brick-path with relavant extended attributes already in place, i.e. glusterd > makes sure to create the paths and set extended attributes. Did this > decision change in the recent past? Why is it that index xaltor even getting > into a state where it has to create a brick path. Why doesn't it already > exist by the time we do volume start? Refer to Comment 19, because of the race in dev mapper & FS layer in case of node reboot, the bricks are not mounted as I understand. (In reply to Atin Mukherjee from comment #26) > (In reply to Pranith Kumar K from comment #25) > > (In reply to Humble Chirammal from comment #23) > > > (In reply to Pranith Kumar K from comment #21) > > > > (In reply to Humble Chirammal from comment #20) > > > > > (In reply to Humble Chirammal from comment #17) > > > > > > > > ...... > > > > > > > > @Humble, > > > > index_dir_create() does mkdir -p > > > > <brick-path>/.glusterfs/indices/xattrop if the path doesn't exist. How did > > > > we get into a state where <brick-path> itself is missing? Do we know? > > > > > > > > > > @Pranith, Thanks for confirming. I have a question here, Are we getting any > > > advantage by creating this path if its not present? because, later the > > > 'posix' xlator comes in and complaint there is no 'xattrs' present on this > > > newly created path and fail the brick process anyway. Any thoughts or Am I > > > missing something here? > > > > Index xlator is designed to keep the indices on any hard-disk. Not > > necessarily in the .glusterfs. This is the default place where it is kept. > > So to make sure it works, we need the path to be created. > > > > @Atin, @Humble, The design when all these decisions were taken is that by > > the time you create a volume/add-bricks/replace-bricks you already have the > > brick-path with relavant extended attributes already in place, i.e. glusterd > > makes sure to create the paths and set extended attributes. Did this > > decision change in the recent past? Why is it that index xaltor even getting > > into a state where it has to create a brick path. Why doesn't it already > > exist by the time we do volume start? > > Refer to Comment 19, because of the race in dev mapper & FS layer in case of > node reboot, the bricks are not mounted as I understand. Thanks for this Atin, I guess there is nothing more to be done by gluster at this point? (In reply to Pranith Kumar K from comment #27) > (In reply to Atin Mukherjee from comment #26) > > (In reply to Pranith Kumar K from comment #25) > > > (In reply to Humble Chirammal from comment #23) > > > > (In reply to Pranith Kumar K from comment #21) > > > > > (In reply to Humble Chirammal from comment #20) > > > > > > (In reply to Humble Chirammal from comment #17) > > > > > > > > > > ...... > > > > > > > > > > @Humble, > > > > > index_dir_create() does mkdir -p > > > > > <brick-path>/.glusterfs/indices/xattrop if the path doesn't exist. How did > > > > > we get into a state where <brick-path> itself is missing? Do we know? > > > > > > > > > > > > > @Pranith, Thanks for confirming. I have a question here, Are we getting any > > > > advantage by creating this path if its not present? because, later the > > > > 'posix' xlator comes in and complaint there is no 'xattrs' present on this > > > > newly created path and fail the brick process anyway. Any thoughts or Am I > > > > missing something here? > > > > > > Index xlator is designed to keep the indices on any hard-disk. Not > > > necessarily in the .glusterfs. This is the default place where it is kept. > > > So to make sure it works, we need the path to be created. > > > > > > @Atin, @Humble, The design when all these decisions were taken is that by > > > the time you create a volume/add-bricks/replace-bricks you already have the > > > brick-path with relavant extended attributes already in place, i.e. glusterd > > > makes sure to create the paths and set extended attributes. Did this > > > decision change in the recent past? Why is it that index xaltor even getting > > > into a state where it has to create a brick path. Why doesn't it already > > > exist by the time we do volume start? > > > > Refer to Comment 19, because of the race in dev mapper & FS layer in case of > > node reboot, the bricks are not mounted as I understand. > > Thanks for this Atin, I guess there is nothing more to be done by gluster at > this point? That's right Pranith. > (In reply to Atin Mukherjee from comment #26)
> > (In reply to Pranith Kumar K from comment #25)
> > > Refer to Comment 19, because of the race in dev mapper & FS layer in case of
> > > node reboot, the bricks are not mounted as I understand.
> >
> > Thanks for this Atin, I guess there is nothing more to be done by gluster at
> > this point?
>
> That's right Pranith.
As discussed we need an enhancement in gluster index xlator to check we are having indices on brick path or some other path. If its brick path its better to *not* create directories. I will open a new bug for the same.
At present heketi writes "/dev/vg/lv" path in custom fstab and it looks like its causing the weird behaviour in case of node reboot. We have to use "/dev/mapper/vg-lv path in fstab to atleast make sure device mapper is in charge. More details are available here #http://post-office.corp.redhat.com/archives/rhs-containers/2016-July/msg00111.html Created attachment 1182234 [details]
Node reboot output
(In reply to Humble Chirammal from comment #32) > Created attachment 1182234 [details] > Node reboot output In couple of environments, by changing the mount entries to "/dev/mapper" and with the fix image we tried to reproduce this issue. The result of one of the setup can be found at c#32. There were 78 bricks created inside the container and after reboot all the bricks are up and running. If possible I would like to try this fix in one of QE setup as well . In both of our dev environment the result looks positive!. However it would be appreciated if QE can also provide problematic setup to test this out. Anoop do you have any setup where I can check the result quickly? (In reply to Anoop from comment #11) > * Not tested reboots with previous builds and I think i will be difficult > (considering that we have a GA on 26th July) to go back and test this on > older build. > > * This is reproducible evertime on my setup. > > * See it only on the bricks of the node that was rebooted. Anoop, in our last release ( RHGS 3.1.2) we created Volume snapshot from the container which internally created LV in the system which was tested. Have we tested node reboot with this scenario ? @Neha, Did you get a chance to verify whether the brick is mounted, xattrs are present and there is no data loss? (In reply to Humble Chirammal from comment #38) > @Neha, Did you get a chance to verify whether the brick is mounted, xattrs > are present and there is no data loss? Brick is mounted df -Th | grep brick_61dee9e630ef0646ae4280684ac1e608 /dev/mapper/vg_7c8a1389ab3b08f27e1d6fd5c9e3027a-brick_61dee9e630ef0646ae4280684ac1e608 xfs 5.0G 592K 5.0G 1% /var/lib/heketi/mounts/vg_7c8a1389ab3b08f27e1d6fd5c9e3027a/brick_61dee9e630ef0646ae4280684ac1e608 xattrs : getfattr -d -m . -e hex /var/lib/heketi/mounts/vg_7c8a1389ab3b08f27e1d> getfattr: Removing leading '/' from absolute path names # file: var/lib/heketi/mounts/vg_7c8a1389ab3b08f27e1d6fd5c9e3027a/brick_61dee9e630ef0646ae4280684ac1e608/brick/ security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x00000001000000007fffffffffffffff trusted.glusterfs.volume-id=0x3d3bd68c9e974c96bc4cf788b3490206 There were no data in this volume. @Neha, thanks for this information. I feel,even if there were some data in the brick, it could have persisted, because the xattrs are present perfectly. I went through the log of this brick to check why it failed. [Analysis] -----------------snip---------- [2016-07-21 12:55:42.798650] W [MSGID: 101105] [gfdb_sqlite3.h:239:gfdb_set_sql_params] 0-vol_708e587144fcf31fe317d350e5fa1cc2-changetimerecorder: Failed to retrieve sql-db-autovacuum from params.Assigning default value: none [2016-07-21 12:55:42.799015] I [trash.c:2369:init] 0-vol_708e587144fcf31fe317d350e5fa1cc2-trash: no option specified for 'eliminate', using NULL [2016-07-21 12:55:42.799158] C [MSGID: 113081] [posix.c:6755:init] 0-vol_708e587144fcf31fe317d350e5fa1cc2-posix: Extended attribute not supported, exiting. ------------------------------------> [1] [2016-07-21 12:55:42.799197] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-vol_708e587144fcf31fe317d350e5fa1cc2-posix: Initialization of volume 'vol_708e587144fcf31fe317d350e5fa1cc2-posix' failed, review your volfile again [2016-07-21 12:55:42.799206] E [graph.c:322:glusterfs_graph_init] 0-vol_708e587144fcf31fe317d350e5fa1cc2-posix: initializing translator failed [2016-07-21 12:55:42.799212] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed ------------/snip------- so, the 'posix' xlator somehow failed to see the 'extended' attribute support on this. Eventhough there are different possibilities [3] we were in below code path: posix xlator tries to put the extended attribute 'trusted.glusterfs.test" on this path and failed. op_ret = sys_lsetxattr (dir_data->data, "trusted.glusterfs.test", "working", 8, 0); if (op_ret != -1) { sys_lremovexattr (dir_data->data, "trusted.glusterfs.test"); } else { tmp_data = dict_get (this->options, "mandate-attribute"); if (tmp_data) { if (gf_string2boolean (tmp_data->data, &tmp_bool) == -1) { gf_msg (this->name, GF_LOG_ERROR, 0, P_MSG_INVALID_OPTION, "wrong option provided for key " "\"mandate-attribute\""); ret = -1; goto out; } if (!tmp_bool) { gf_msg (this->name, GF_LOG_WARNING, 0, P_MSG_XATTR_NOTSUP, "Extended attribute not supported, " "starting as per option"); } else { gf_msg (this->name, GF_LOG_CRITICAL, 0, P_MSG_XATTR_NOTSUP, "Extended attribute not supported, " "exiting."); =============> [2] ret = -1; goto out; } } else { gf_msg (this->name, GF_LOG_CRITICAL, 0, P_MSG_XATTR_NOTSUP, "Extended attribute not supported, exiting."); ret = -1; goto out; } [2] unfortunately there is no proper error code in this code path to identify what was the return from sys_lsetxattr() to understand the possibilities of the root cause. it could be a race in any layer. [3] *) EDQUOT Disk quota limits meant that there is insufficient space remaining to store the extended attribute. *) EEXIST XATTR_CREATE was specified, and the attribute exists already. *) ENOATTR XATTR_REPLACE was specified, and the attribute does not exist. (ENOATTR is defined to be a synonym for ENODATA in <attr/xattr.h>.) *) ENOSPC There is insufficient space remaining to store the extended attribute. *) ENOTSUP The namespace prefix of name is not valid. *) ENOTSUP Extended attributes are not supported by the filesystem, or are disabled, iic, in this setup the possibilities of 'EDQUOT', 'ENOSPC', 'ENOTSUP', 'EEXIST_XATTR_CREATE' are not true. [Additional Details] The brick path was mounted before gluster was started and we can see the xattrs were present before gluster start. [Work Around] # Definitely volume start force of this volume or reboot should bring this back. May be someone from gluster posix team can check this scenario. Pranith, Can you check this? Just to clarify there were no data present on the volume before reboot too. I tried force start still brick process is down. Some more analysis was performed and here is the summary: It looks like eventhough the data is intact on this brick, for some reason at boot time this brick process is mounted as "READONLY" and rest of '49' bricks are mounted READWRITE. --snip-- /dev/mapper/vg_7c8a1389ab3b08f27e1d6fd5c9e3027a-brick_61dee9e630ef0646ae4280684ac1e608 on /var/lib/heketi/mounts/vg_7c8a1389ab3b08f27e1d6fd5c9e3027a/brick_61dee9e630ef0646ae4280684ac1e608 type xfs (ro,noatime,seclabel,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota) --/snip-- I feel, even a remount with "rw" flag ( mount -o remount,rw /dev/mapper/vg_7c8a1389ab3b08f27e1d6fd5c9e3027a-brick_61dee9e630ef0646ae4280684ac1e608 )can bring this back. @Neha, can you perform a remount as shown below and find the result? The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |