Bug 1402037

Summary: GlusterFS - Server halts updateprocess ... AGAIN
Product: [Community] GlusterFS Reporter: customercare
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.8CC: anoopcs, barumuga, bugs, customercare, humble.devassy, joe, jonathansteffan, kkeithle, ndevos, ramkrsna
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-11 12:50:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
glusterfsd informations none

Description customercare 2016-12-06 15:43:15 UTC
Description of problem:

I reported this same problem against FC22 -> FC23 and now with FC23 -> FC24
upgrades, i have to report it again:

While the update to FC24 was installed, the rpm scriptlet started glusterfsd without forking it into the background :


  Aktualisieren    : glusterfs-server-3.8.5-1.fc24.i686            977/2250 


Result : 

       ├─sshd───sshd───bash───dnf───sh───glusterd───6*[{glusterd}]


i was forced to kill it in a paralell ssh session: 

warning: /var/lib/glusterd/vols/gv0/gv0.s145.resellerdesktop.de.data-brick1-gv0.vol saved as /var/lib/glusterd/vols/gv0/gv0.s145.resellerdesktop.de.data-brick1-gv0.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/trusted-gv0.tcp-fuse.vol saved as /var/lib/glusterd/vols/gv0/trusted-gv0.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/gv0.s113.resellerdesktop.de.data-brick1-gv0.vol saved as /var/lib/glusterd/vols/gv0/gv0.s113.resellerdesktop.de.data-brick1-gv0.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/gv0.tcp-fuse.vol saved as /var/lib/glusterd/vols/gv0/gv0.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/gv0-rebalance.vol saved as /var/lib/glusterd/vols/gv0/gv0-rebalance.vol.rpmsave
/var/tmp/rpm-tmp.7jn4r5: Zeile 69: 15058 Getötet                glusterd --xlator-option *.upgrade=on -N


Please rework this rpm scritlet to not start the daemon at all, because the next thing after updating a server is to reboot it.. there is no need to start the daemon while installing the package whilst upgrading the os version.

Comment 1 Niels de Vos 2016-12-07 04:09:39 UTC
The command that got killed seems to be this one:

  glusterd --xlator-option *.upgrade=on -N

This is not starting the glusterd service, but takes care of the updating the (generated) configuration files. The vol-files (may) need updating between releases, this command should not be skipped.

We will need to find out why this command is causing a hang. Is this something you can reproduce at will? If so, please pass a list of all running gluster services with their commandline options, check with strace/ltrace if the problematic glusterd process is still doing something, or is blocked, attach the generated configurations (from /var/lib/glusterd) and if possible a coredump gathered with gcore (from the gdb RPM).

Comment 2 customercare 2016-12-07 11:19:24 UTC
Could be a problem, i can upgrade a server os only once, and this bugreport is the result of it :)

I can't reproduce this until the next update in 6 months

Comment 3 customercare 2016-12-07 12:13:40 UTC
confirmed : 

 \_ /usr/bin/python3 /usr/bin/dnf --allowerasing --releasever=24 --setopt=deltarpm=false distro-sync
root     21609  0.0  0.1   5608  2888 pts/0    S+   13:03   0:00              \_ /bin/sh /var/tmp/rpm-tmp.5dloRw 2
root     21621  0.0  0.8  93240 18148 pts/0    Sl+  13:03   0:00                  \_ glusterd --xlator-option *.upgrade=on -N


and this time, it took a while, but completed whatever it did on it's own.

Comment 4 customercare 2016-12-07 12:14:12 UTC
Created attachment 1229035 [details]
glusterfsd informations

Comment 5 Kaleb KEITHLEY 2016-12-11 12:17:44 UTC
We're not going to rework the script. This is how the update process is designed to work.

But if glusterd is starting volumes when *.upgrade=on is set that might very well be a bug in glusterd.

Comment 6 Atin Mukherjee 2017-01-23 14:15:39 UTC
(In reply to Kaleb KEITHLEY from comment #5)
> We're not going to rework the script. This is how the update process is
> designed to work.
> 
> But if glusterd is starting volumes when *.upgrade=on is set that might very
> well be a bug in glusterd.

GlusterD doesn't modify the state of the volume/peer, it just recreates the volfiles. As Niels pointed out we need a reproducer or atleast some process trace to figure out what is causing this hung. I could see that comment 3 mentions that process took a little longer but didn't hung, if that's the case it could be because of too many volumes where glusterd_recreate_volfiles iterate over all the volumes and generate volfiles for each of them. In that case is it a valid bug then?

Comment 7 customercare 2017-02-06 16:14:29 UTC
Hmm, too many volfiles? I had just one volume created.

Comment 8 Atin Mukherjee 2017-04-11 11:26:57 UTC
Bumping again, do you have a reproducer or at worst case give us the process trace?

Comment 9 customercare 2017-04-11 11:51:36 UTC
I can't reproduce it as it was while upgrading a system from F23->F24 . You can only do that once ;) and i only had one system with gluster setup. 

And there was only one test volume setup with only a few files in it at best.

I believe, i'm not helpful anymore. In the meantime, the system got changed from 32 to 64bits, so the environment changed. Maybe when it gets it upgrade from 24->25 we may get more infos. I won't able to strace it from the beginning, but when it hangs again, i will turn it on for you.

Comment 10 Atin Mukherjee 2017-04-11 12:50:51 UTC
I'm closing this bug with insufficient data as a reason, please feel free to reopen once you hit it again and share all the required details.