| Summary: | Load spikes after 3.0.3 upgrade on Solaris 10 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Chida <chida> | ||||||||
| Component: | write-behind | Assignee: | Raghavendra G <raghavendra> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | low | ||||||||||
| Version: | 3.0.3 | CC: | amarts, anush, dirteat, gluster-bugs, vijay | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Solaris | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | Type: | --- | |||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Attachments: |
|
||||||||||
|
Description
Chida
2010-03-23 15:30:53 UTC
Created attachment 166 [details]
This program will illustrate the bug if executed on RH6.x
See comment from eatdirt.
Hi, same error here. By the way, I reported this bug for version 3.0.0 but you closed it as fixed; and as I said at that time it was not. This is a major feature. Files disappear and reappear randomly and this destroy many script working. Here the kind of error you get: mv: cannot move `xch50m245' to `xch50m245_3': No such file or directory then you do "ls" and the file is here again. Please, fix this bug, it makes glusterfs unusable. I am now considering to give up gluster. Cheers, Chris. Chris,
It will help us debug this problem if you can share your scripts. We need to understand the sequence of system calls being triggered. Also, can you share your server log files?
Thanks,
Avati
> Hi,
> same error here. By the way, I reported this bug for version 3.0.0 but you
> closed it as fixed; and as I said at that time it was not.
>
> This is a major feature. Files disappear and reappear randomly and this destroy
> many script working. Here the kind of error you get:
>
> mv: cannot move `xch50m245' to `xch50m245_3': No such file or directory
>
> then you do "ls" and the file is here again.
>
> Please, fix this bug, it makes glusterfs unusable. I am now considering to give
> up gluster.
>
> Cheers,
> Chris.
Created attachment 167 [details]
Sorry about that. ( " )
Here they are. The script strgs_stop.bash simply look for some files and rename them. Here I get the random "file not found errors". Just after the script failing, if I look to the file they are there and I can do the mv command by hand and it works fine.
I even tried to add a sleep 0.1 in the script, but still the "not found" errors shows up.
In the next attachment I put the server log file of one node. When this error occurs, the fuse-lookup error shows up in the log, precisely on the nodes in which these files are located.
Finally, it may be worth mentioning that this script is run a few times. I am using it to rename files "outputfilename" to "outputfilename_1, _2 etc..." after each run of some codes.
So if the filesystem does not record that outputfilename has disappeared, then I imagine that the next time my codes are creating outputfilename, some nasty stuffs appear. Especially if I move again outputfilename to outputfilename_nextindex.
Created attachment 168 [details]
OK, here is the source then.
This is the server log file for the node "mars", client-05 of the nufa mode.
We suspect write-behind missing a frame in flush cbk. Soon will have fix on this. Du, Did we submitted a patch on this ?? Lets make sure this goes in mainline. There are some fixes to wb_flush in http://patches.gluster.com/patch/3453/ This patch has fixes equivalent to the ones present in patch sent by Avati for release-3.0. http://patches.gluster.com/patch/3522/ Patch 3522 deals with duplicate flushes sent to server. Since, whole of wb_flush is re-wrote, this bug might've been fixed. We can be sure of that if we could reproduce this bug and rerun the tests with patch-3453. |