Description of problem: I reported this same problem against FC22 -> FC23 and now with FC23 -> FC24 upgrades, i have to report it again: While the update to FC24 was installed, the rpm scriptlet started glusterfsd without forking it into the background : Aktualisieren : glusterfs-server-3.8.5-1.fc24.i686 977/2250 Result : ├─sshd───sshd───bash───dnf───sh───glusterd───6*[{glusterd}] i was forced to kill it in a paralell ssh session: warning: /var/lib/glusterd/vols/gv0/gv0.s145.resellerdesktop.de.data-brick1-gv0.vol saved as /var/lib/glusterd/vols/gv0/gv0.s145.resellerdesktop.de.data-brick1-gv0.vol.rpmsave warning: /var/lib/glusterd/vols/gv0/trusted-gv0.tcp-fuse.vol saved as /var/lib/glusterd/vols/gv0/trusted-gv0.tcp-fuse.vol.rpmsave warning: /var/lib/glusterd/vols/gv0/gv0.s113.resellerdesktop.de.data-brick1-gv0.vol saved as /var/lib/glusterd/vols/gv0/gv0.s113.resellerdesktop.de.data-brick1-gv0.vol.rpmsave warning: /var/lib/glusterd/vols/gv0/gv0.tcp-fuse.vol saved as /var/lib/glusterd/vols/gv0/gv0.tcp-fuse.vol.rpmsave warning: /var/lib/glusterd/vols/gv0/gv0-rebalance.vol saved as /var/lib/glusterd/vols/gv0/gv0-rebalance.vol.rpmsave /var/tmp/rpm-tmp.7jn4r5: Zeile 69: 15058 Getötet glusterd --xlator-option *.upgrade=on -N Please rework this rpm scritlet to not start the daemon at all, because the next thing after updating a server is to reboot it.. there is no need to start the daemon while installing the package whilst upgrading the os version.
The command that got killed seems to be this one: glusterd --xlator-option *.upgrade=on -N This is not starting the glusterd service, but takes care of the updating the (generated) configuration files. The vol-files (may) need updating between releases, this command should not be skipped. We will need to find out why this command is causing a hang. Is this something you can reproduce at will? If so, please pass a list of all running gluster services with their commandline options, check with strace/ltrace if the problematic glusterd process is still doing something, or is blocked, attach the generated configurations (from /var/lib/glusterd) and if possible a coredump gathered with gcore (from the gdb RPM).
Could be a problem, i can upgrade a server os only once, and this bugreport is the result of it :) I can't reproduce this until the next update in 6 months
confirmed : \_ /usr/bin/python3 /usr/bin/dnf --allowerasing --releasever=24 --setopt=deltarpm=false distro-sync root 21609 0.0 0.1 5608 2888 pts/0 S+ 13:03 0:00 \_ /bin/sh /var/tmp/rpm-tmp.5dloRw 2 root 21621 0.0 0.8 93240 18148 pts/0 Sl+ 13:03 0:00 \_ glusterd --xlator-option *.upgrade=on -N and this time, it took a while, but completed whatever it did on it's own.
Created attachment 1229035 [details] glusterfsd informations
We're not going to rework the script. This is how the update process is designed to work. But if glusterd is starting volumes when *.upgrade=on is set that might very well be a bug in glusterd.
(In reply to Kaleb KEITHLEY from comment #5) > We're not going to rework the script. This is how the update process is > designed to work. > > But if glusterd is starting volumes when *.upgrade=on is set that might very > well be a bug in glusterd. GlusterD doesn't modify the state of the volume/peer, it just recreates the volfiles. As Niels pointed out we need a reproducer or atleast some process trace to figure out what is causing this hung. I could see that comment 3 mentions that process took a little longer but didn't hung, if that's the case it could be because of too many volumes where glusterd_recreate_volfiles iterate over all the volumes and generate volfiles for each of them. In that case is it a valid bug then?
Hmm, too many volfiles? I had just one volume created.
Bumping again, do you have a reproducer or at worst case give us the process trace?
I can't reproduce it as it was while upgrading a system from F23->F24 . You can only do that once ;) and i only had one system with gluster setup. And there was only one test volume setup with only a few files in it at best. I believe, i'm not helpful anymore. In the meantime, the system got changed from 32 to 64bits, so the environment changed. Maybe when it gets it upgrade from 24->25 we may get more infos. I won't able to strace it from the beginning, but when it hangs again, i will turn it on for you.
I'm closing this bug with insufficient data as a reason, please feel free to reopen once you hit it again and share all the required details.