Bug 1402037
Summary: | GlusterFS - Server halts updateprocess ... AGAIN | ||||||
---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | customercare | ||||
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 3.8 | CC: | anoopcs, barumuga, bugs, customercare, humble.devassy, joe, jonathansteffan, kkeithle, ndevos, ramkrsna | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-04-11 12:50:51 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
customercare
2016-12-06 15:43:15 UTC
The command that got killed seems to be this one: glusterd --xlator-option *.upgrade=on -N This is not starting the glusterd service, but takes care of the updating the (generated) configuration files. The vol-files (may) need updating between releases, this command should not be skipped. We will need to find out why this command is causing a hang. Is this something you can reproduce at will? If so, please pass a list of all running gluster services with their commandline options, check with strace/ltrace if the problematic glusterd process is still doing something, or is blocked, attach the generated configurations (from /var/lib/glusterd) and if possible a coredump gathered with gcore (from the gdb RPM). Could be a problem, i can upgrade a server os only once, and this bugreport is the result of it :) I can't reproduce this until the next update in 6 months confirmed : \_ /usr/bin/python3 /usr/bin/dnf --allowerasing --releasever=24 --setopt=deltarpm=false distro-sync root 21609 0.0 0.1 5608 2888 pts/0 S+ 13:03 0:00 \_ /bin/sh /var/tmp/rpm-tmp.5dloRw 2 root 21621 0.0 0.8 93240 18148 pts/0 Sl+ 13:03 0:00 \_ glusterd --xlator-option *.upgrade=on -N and this time, it took a while, but completed whatever it did on it's own. Created attachment 1229035 [details]
glusterfsd informations
We're not going to rework the script. This is how the update process is designed to work. But if glusterd is starting volumes when *.upgrade=on is set that might very well be a bug in glusterd. (In reply to Kaleb KEITHLEY from comment #5) > We're not going to rework the script. This is how the update process is > designed to work. > > But if glusterd is starting volumes when *.upgrade=on is set that might very > well be a bug in glusterd. GlusterD doesn't modify the state of the volume/peer, it just recreates the volfiles. As Niels pointed out we need a reproducer or atleast some process trace to figure out what is causing this hung. I could see that comment 3 mentions that process took a little longer but didn't hung, if that's the case it could be because of too many volumes where glusterd_recreate_volfiles iterate over all the volumes and generate volfiles for each of them. In that case is it a valid bug then? Hmm, too many volfiles? I had just one volume created. Bumping again, do you have a reproducer or at worst case give us the process trace? I can't reproduce it as it was while upgrading a system from F23->F24 . You can only do that once ;) and i only had one system with gluster setup. And there was only one test volume setup with only a few files in it at best. I believe, i'm not helpful anymore. In the meantime, the system got changed from 32 to 64bits, so the environment changed. Maybe when it gets it upgrade from 24->25 we may get more infos. I won't able to strace it from the beginning, but when it hangs again, i will turn it on for you. I'm closing this bug with insufficient data as a reason, please feel free to reopen once you hit it again and share all the required details. |