1402037 – GlusterFS - Server halts updateprocess ... AGAIN

Bug 1402037 - GlusterFS - Server halts updateprocess ... AGAIN

Summary: GlusterFS - Server halts updateprocess ... AGAIN

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	3.8
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-12-06 15:43 UTC by customercare
Modified:	2017-04-11 12:50 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-04-11 12:50:51 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glusterfsd informations (65.67 KB, text/plain) 2016-12-07 12:14 UTC, customercare	no flags	Details
View All

Description customercare 2016-12-06 15:43:15 UTC

Description of problem:

I reported this same problem against FC22 -> FC23 and now with FC23 -> FC24
upgrades, i have to report it again:

While the update to FC24 was installed, the rpm scriptlet started glusterfsd without forking it into the background :


  Aktualisieren    : glusterfs-server-3.8.5-1.fc24.i686            977/2250 


Result : 

       ├─sshd───sshd───bash───dnf───sh───glusterd───6*[{glusterd}]


i was forced to kill it in a paralell ssh session: 

warning: /var/lib/glusterd/vols/gv0/gv0.s145.resellerdesktop.de.data-brick1-gv0.vol saved as /var/lib/glusterd/vols/gv0/gv0.s145.resellerdesktop.de.data-brick1-gv0.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/trusted-gv0.tcp-fuse.vol saved as /var/lib/glusterd/vols/gv0/trusted-gv0.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/gv0.s113.resellerdesktop.de.data-brick1-gv0.vol saved as /var/lib/glusterd/vols/gv0/gv0.s113.resellerdesktop.de.data-brick1-gv0.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/gv0.tcp-fuse.vol saved as /var/lib/glusterd/vols/gv0/gv0.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/gv0-rebalance.vol saved as /var/lib/glusterd/vols/gv0/gv0-rebalance.vol.rpmsave
/var/tmp/rpm-tmp.7jn4r5: Zeile 69: 15058 Getötet                glusterd --xlator-option *.upgrade=on -N


Please rework this rpm scritlet to not start the daemon at all, because the next thing after updating a server is to reboot it.. there is no need to start the daemon while installing the package whilst upgrading the os version.

Comment 1 Niels de Vos 2016-12-07 04:09:39 UTC

The command that got killed seems to be this one:

  glusterd --xlator-option *.upgrade=on -N

This is not starting the glusterd service, but takes care of the updating the (generated) configuration files. The vol-files (may) need updating between releases, this command should not be skipped.

We will need to find out why this command is causing a hang. Is this something you can reproduce at will? If so, please pass a list of all running gluster services with their commandline options, check with strace/ltrace if the problematic glusterd process is still doing something, or is blocked, attach the generated configurations (from /var/lib/glusterd) and if possible a coredump gathered with gcore (from the gdb RPM).

Comment 2 customercare 2016-12-07 11:19:24 UTC

Could be a problem, i can upgrade a server os only once, and this bugreport is the result of it :)

I can't reproduce this until the next update in 6 months

Comment 3 customercare 2016-12-07 12:13:40 UTC

confirmed : 

 \_ /usr/bin/python3 /usr/bin/dnf --allowerasing --releasever=24 --setopt=deltarpm=false distro-sync
root     21609  0.0  0.1   5608  2888 pts/0    S+   13:03   0:00              \_ /bin/sh /var/tmp/rpm-tmp.5dloRw 2
root     21621  0.0  0.8  93240 18148 pts/0    Sl+  13:03   0:00                  \_ glusterd --xlator-option *.upgrade=on -N


and this time, it took a while, but completed whatever it did on it's own.

Comment 4 customercare 2016-12-07 12:14:12 UTC

Created attachment 1229035 [details]
glusterfsd informations

Comment 5 Kaleb KEITHLEY 2016-12-11 12:17:44 UTC

We're not going to rework the script. This is how the update process is designed to work.

But if glusterd is starting volumes when *.upgrade=on is set that might very well be a bug in glusterd.

Comment 6 Atin Mukherjee 2017-01-23 14:15:39 UTC

(In reply to Kaleb KEITHLEY from comment #5)
> We're not going to rework the script. This is how the update process is
> designed to work.
> 
> But if glusterd is starting volumes when *.upgrade=on is set that might very
> well be a bug in glusterd.

GlusterD doesn't modify the state of the volume/peer, it just recreates the volfiles. As Niels pointed out we need a reproducer or atleast some process trace to figure out what is causing this hung. I could see that comment 3 mentions that process took a little longer but didn't hung, if that's the case it could be because of too many volumes where glusterd_recreate_volfiles iterate over all the volumes and generate volfiles for each of them. In that case is it a valid bug then?

Comment 7 customercare 2017-02-06 16:14:29 UTC

Hmm, too many volfiles? I had just one volume created.

Comment 8 Atin Mukherjee 2017-04-11 11:26:57 UTC

Bumping again, do you have a reproducer or at worst case give us the process trace?

Comment 9 customercare 2017-04-11 11:51:36 UTC

I can't reproduce it as it was while upgrading a system from F23->F24 . You can only do that once ;) and i only had one system with gluster setup. 

And there was only one test volume setup with only a few files in it at best.

I believe, i'm not helpful anymore. In the meantime, the system got changed from 32 to 64bits, so the environment changed. Maybe when it gets it upgrade from 24->25 we may get more infos. I won't able to strace it from the beginning, but when it hangs again, i will turn it on for you.

Comment 10 Atin Mukherjee 2017-04-11 12:50:51 UTC

I'm closing this bug with insufficient data as a reason, please feel free to reopen once you hit it again and share all the required details.

Note You need to log in before you can comment on or make changes to this bug.