Bug 1191437 - build: issue with update of upstream build from 3.7dev-0.529 to 3.7dev-0.577
Summary: build: issue with update of upstream build from 3.7dev-0.529 to 3.7dev-0.577
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-11 10:37 UTC by Saurabh
Modified: 2016-01-19 06:14 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-02-13 10:12:06 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Saurabh 2015-02-11 10:37:04 UTC
Description of problem:

issues from upstream build update, 3.7dev-0.529 to 3.7dev-0.577. For me the update is hung at glusterfs-server rpm on all nodes.

  Updating   : glusterfs-libs-3.7dev-0.577.gitf18a3f3.el6.x86_64                                                                                                                                          1/16
  Updating   : glusterfs-api-3.7dev-0.577.gitf18a3f3.el6.x86_64                                                                                                                                           2/16
  Updating   : glusterfs-3.7dev-0.577.gitf18a3f3.el6.x86_64                                                                                                                                               3/16
  Updating   : glusterfs-fuse-3.7dev-0.577.gitf18a3f3.el6.x86_64                                                                                                                                          4/16
  Updating   : glusterfs-cli-3.7dev-0.577.gitf18a3f3.el6.x86_64                                                                                                                                           5/16
  Updating   : glusterfs-server-3.7dev-0.577.gitf18a3f3.el6.x86_64                                                                                                                                        6/16


Version-Release number of selected component (if applicable):
glusterfs-3.7dev-0.577.gitf18a3f3.el6.x86_64

How reproducible:
always

Expected results:
Build update should not cause problem, in case when volume and glusterd are already stopped.

Additional info:

Comment 1 Niels de Vos 2015-02-11 11:52:09 UTC
The problem is that the glusterd update command does not exit:

    glusterd --xlator-option *.upgrade=on -N

From: https://github.com/gluster/glusterfs/blob/master/glusterfs.spec.in#L777

Killing glusterd while the update is stalled seems to 'fix' it. We need to figure out if this command never exits (anymore), or if it is a problem that only happens during 'yum update' (a race of some sort?).

Comment 2 Kaushal 2015-02-13 10:12:06 UTC
This is an issue caused by commit c8a6904 'uss: disable memory accounting for the snapshot daemon'.

It was first observed during downstream testing a little while back. The details of why this happens were found by Raghavendra Bhat, which I'm paraphrasing his findings below.

There are 3 causes leading for the behaviour.
1. The patch brings a command line option to disable memory accounting in a glusterfs process. This required the addition of a new member into the cmd_args_t object which stores the command line flags. The cmd_args_t object is embedded in the global context object, glusterfsd_ctx_t. The change to cmd_args_t, caused the other members of glusterfsd_ctx_t to shift. This detail is important.

2. The glusterfs version string for nightly builds from upstream master is currently 3.7dev. This means glusterfs libraries will be installed under /lib/glusterfs/3.7dev. With the nightly builds, just the release version changes, but not the glusterfs binary version. This means that upgrades will install libraries into /lib/glusterfs/3.7dev .

3. glusterfs-rdma package is installed after the glusterfs-server. This behaviour is because we don't users to be forced to install rdma libraries if they are not interested. This means that when glusterfs-server is being updated, glusterfs-rdma libraries present in /lib/glusterfs/3.7dev belong to the previous package.


When glusterd starts, it loads the rdma transport library and does some checks to see if rdma can be supported on the machine. If the library is not present it doesn't glusterd complains and continues (which is the source of a lot of user confusion).

This checking happens during when glusterd is started during the upgrade as well. Glusterd starts, loads the rdma library, and passes it global context (glusterfsd_ctx_t) to the rdma library. But glusterd would have loaded the older rdma library. As the binary version didn't change, glusterd searches for the rdma library in /lib/glusterfs/3.7dev itself. It finds the rdma library installed by the older release, as glusterfs-rdma package is only updated after glusterfs-server. The rdma transport initialization requires a lock present in the global context to be held. But, the rdma library recieves the newer shifted global context object, and not the older object it is expecting. The rdma will try to lock using the location of the lock struct as it knows, but as the lock struct is shifted, it hangs. This is the hang observed.

This hang will not happen when upgrading between different package versions, as the libraries will be installed into versioned locations. This can't happen on a upgrade from 3.6 to 3.7 when released.

This hang will also not happen when upgrading from nightly build glusterfs-3.7dev-0.545.git88136b5.autobuild (the first build to have the above mentioned commit) to any newer versions.

There are workarounds for upgrades from nightly builds older than 3.7dev-0.545.git88136b5 to newer releases.
1. Remove glusterfs-rdma and don't install glusterfs-rdma and don't use glusterfs-rdma ==> No problems!
2. If you want to have rdma installed,
   a. update glusterfs-rdma before updating other gluster packages. (Not sure if this will work)
   b. remove glusterfs-rdma, update remaining packages, install new glusterfs-rdma package.

This issue cannot be fixed from the code as any code solution will require modification of the glusterfsd_ctx_t object, which will lead to the same problem again.

Anyway, as this issue cannot happen between proper glusterfs releases, I'm closing this as CANTFIX.


Note You need to log in before you can comment on or make changes to this bug.