Bug 1402037

Summary:

GlusterFS - Server halts updateprocess ... AGAIN

Product:

[Community] GlusterFS

Reporter:

customercare

Component:

glusterd

Assignee:

Atin Mukherjee <amukherj>

Status:

CLOSED INSUFFICIENT_DATA

QA Contact:

Severity:

high

Docs Contact:

Priority:

medium

Version:

3.8

CC:

anoopcs, barumuga, bugs, customercare, humble.devassy, joe, jonathansteffan, kkeithle, ndevos, ramkrsna

Target Milestone:

---

Keywords:

Triaged

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-04-11 12:50:51 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
glusterfsd informations	none

Description customercare 2016-12-06 15:43:15 UTC

Description of problem:

I reported this same problem against FC22 -> FC23 and now with FC23 -> FC24
upgrades, i have to report it again:

While the update to FC24 was installed, the rpm scriptlet started glusterfsd without forking it into the background :


  Aktualisieren    : glusterfs-server-3.8.5-1.fc24.i686            977/2250 


Result : 

       ├─sshd───sshd───bash───dnf───sh───glusterd───6*[{glusterd}]


i was forced to kill it in a paralell ssh session: 

warning: /var/lib/glusterd/vols/gv0/gv0.s145.resellerdesktop.de.data-brick1-gv0.vol saved as /var/lib/glusterd/vols/gv0/gv0.s145.resellerdesktop.de.data-brick1-gv0.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/trusted-gv0.tcp-fuse.vol saved as /var/lib/glusterd/vols/gv0/trusted-gv0.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/gv0.s113.resellerdesktop.de.data-brick1-gv0.vol saved as /var/lib/glusterd/vols/gv0/gv0.s113.resellerdesktop.de.data-brick1-gv0.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/gv0.tcp-fuse.vol saved as /var/lib/glusterd/vols/gv0/gv0.tcp-fuse.vol.rpmsave
warning: /var/lib/glusterd/vols/gv0/gv0-rebalance.vol saved as /var/lib/glusterd/vols/gv0/gv0-rebalance.vol.rpmsave
/var/tmp/rpm-tmp.7jn4r5: Zeile 69: 15058 Getötet                glusterd --xlator-option *.upgrade=on -N


Please rework this rpm scritlet to not start the daemon at all, because the next thing after updating a server is to reboot it.. there is no need to start the daemon while installing the package whilst upgrading the os version.

Comment 1 Niels de Vos 2016-12-07 04:09:39 UTC

The command that got killed seems to be this one:

  glusterd --xlator-option *.upgrade=on -N

This is not starting the glusterd service, but takes care of the updating the (generated) configuration files. The vol-files (may) need updating between releases, this command should not be skipped.

We will need to find out why this command is causing a hang. Is this something you can reproduce at will? If so, please pass a list of all running gluster services with their commandline options, check with strace/ltrace if the problematic glusterd process is still doing something, or is blocked, attach the generated configurations (from /var/lib/glusterd) and if possible a coredump gathered with gcore (from the gdb RPM).

Comment 2 customercare 2016-12-07 11:19:24 UTC

Could be a problem, i can upgrade a server os only once, and this bugreport is the result of it :)

I can't reproduce this until the next update in 6 months

Comment 3 customercare 2016-12-07 12:13:40 UTC

confirmed : 

 \_ /usr/bin/python3 /usr/bin/dnf --allowerasing --releasever=24 --setopt=deltarpm=false distro-sync
root     21609  0.0  0.1   5608  2888 pts/0    S+   13:03   0:00              \_ /bin/sh /var/tmp/rpm-tmp.5dloRw 2
root     21621  0.0  0.8  93240 18148 pts/0    Sl+  13:03   0:00                  \_ glusterd --xlator-option *.upgrade=on -N


and this time, it took a while, but completed whatever it did on it's own.

Comment 4 customercare 2016-12-07 12:14:12 UTC

Created attachment 1229035 [details]
glusterfsd informations

Comment 5 Kaleb KEITHLEY 2016-12-11 12:17:44 UTC

We're not going to rework the script. This is how the update process is designed to work.

But if glusterd is starting volumes when *.upgrade=on is set that might very well be a bug in glusterd.

Comment 6 Atin Mukherjee 2017-01-23 14:15:39 UTC

(In reply to Kaleb KEITHLEY from comment #5)
> We're not going to rework the script. This is how the update process is
> designed to work.
> 
> But if glusterd is starting volumes when *.upgrade=on is set that might very
> well be a bug in glusterd.

GlusterD doesn't modify the state of the volume/peer, it just recreates the volfiles. As Niels pointed out we need a reproducer or atleast some process trace to figure out what is causing this hung. I could see that comment 3 mentions that process took a little longer but didn't hung, if that's the case it could be because of too many volumes where glusterd_recreate_volfiles iterate over all the volumes and generate volfiles for each of them. In that case is it a valid bug then?

Comment 7 customercare 2017-02-06 16:14:29 UTC

Hmm, too many volfiles? I had just one volume created.

Comment 8 Atin Mukherjee 2017-04-11 11:26:57 UTC

Bumping again, do you have a reproducer or at worst case give us the process trace?

Comment 9 customercare 2017-04-11 11:51:36 UTC

I can't reproduce it as it was while upgrading a system from F23->F24 . You can only do that once ;) and i only had one system with gluster setup. 

And there was only one test volume setup with only a few files in it at best.

I believe, i'm not helpful anymore. In the meantime, the system got changed from 32 to 64bits, so the environment changed. Maybe when it gets it upgrade from 24->25 we may get more infos. I won't able to strace it from the beginning, but when it hangs again, i will turn it on for you.

Comment 10 Atin Mukherjee 2017-04-11 12:50:51 UTC

I'm closing this bug with insufficient data as a reason, please feel free to reopen once you hit it again and share all the required details.