Bug 1594536
| Summary: | Extracting the star archive on GFS2 filesystem taking much time. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Kumar <psnpkumar> | ||||||
| Component: | star | Assignee: | Pavel Raiskup <praiskup> | ||||||
| Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-daemons | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 7.5 | CC: | databases-maint, hhorak, psnpkumar, rpeterso, sanyadav, sbradley, swhiteho | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-08-24 12:18:25 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Kumar
2018-06-24 02:50:25 UTC
Hi Kumar, thanks for the report.
Can you compare the two runs, /bin/tar vs /bin/star for I/O and proc time
(iotop vs. top) to see whether there isn't something obvious?
Is star slower on other filesystems than GFS2?
> The command i used was "star -c -acl H=exustar data > data.tar" for
> extracting "star -x -acl < data.tar" but the same archive created with star
> ommand is extracting within two hours for the same data with all above
> scenarios the tar command used for extracting is "tar --acls -xf data.tar"
I'm not sure I understand this. So /bin/tar extracts faster than star?
Can you please elaborate?
If this issue is critical or in any way time sensitive, please raise a ticket
through your regular Red Hat support channels to make certain it receives the
proper attention and prioritization to assure a timely resolution.
> The command i used was "star -c -acl H=exustar data > data.tar" for
> extracting "star -x -acl < data.tar" but the same archive created with star
> ommand is extracting within two hours for the same data with all above
> scenarios the tar command used for extracting is "tar --acls -xf data.tar"
I'm not sure I understand this. So /bin/tar extracts faster than star?
Can you please elaborate?
The archive created with star command i.e star -c -acl H=exustar data > data.tar
Senario 1:
extracted with /bin/star is taking almost 72 hours
star -x -acl < data.tar
Senario 2:
The same archive created with star is extracted with /bin/tar within 2 hours
tar --acls -xf data.tar
yes /bin/tar extracts faster than star.
We already raised this issue on redhat two weeks back with case ID: 02120773
I've prepared tarball ~9GB, with ACLs and tried to extract both with /bin/star and /bin/tar and neither of those is noticeably faster; tried with tar-1.26-34.el7.x86_64 star-1.5.2-13.el7.x86_64 on XFS filesystem. Seems like some order of syscalls or something tricky is causing the difference on GFS2, but setting up 1:1 configured GFS2 cluster is rather complicated task. (In reply to Kumar from comment #3) > ... > star -x -acl < data.tar > ... > tar --acls -xf data.tar Could you please try those two: $ star -v -x -acl -f data.tar $ tar -v -x --acls -f data.tar And check when the star starts slow down (if at all)? Why not to use tar instead? Could you please try those two: $ star -v -x -acl -f data.tar $ tar -v -x --acls -f data.tar We tried both and tar is extracting within two hours and star is taking almost 72 hours for the same data and same tar file created using star. tar is also extracting the data taken backup with tar also in two hours. And check when the star starts slow down (if at all)? Star is slowing down since starting. Why not to use tar instead? We are using star since last 10 years and never faced any issue from users about acls, time stamps extended attributes. Star is meant for acls backup and 3 times faster than tar but why here in this case it is not happening. How we can trust on tar that is equivalent to star. Why star is not performing well on GFS2 file system. The same backup taken with star is taking 72 hours while extracting with star and taking only 2 hours with tar. What is the problem behind that. Is we are missing something. How the huge time difference is coming. Could you explain the difference between star and tar can we rely on tar that it wont miss anything that star will restore. What is the equivalent tar command and provide same output for below star command syntax. star -c -acl H=exustar > data.tar We have 7 TB data which is unstructured data contains ods,odt,text,pictures,2D and 3D drawings, spreadsheets, source code files, exe files, analysis reports etc which are created and stored by user using different desktop applications. Our users are rely on ctime,mtime and very few times will create files with longer paths and file names. The issues will came to know only after the data is used by users. We could not able to find the difference between star and tar as initially tar was not supported acls backup and extended attributes support. Our intention is to know the problem is due to GFS2 bug. Because earlier we have not observed which is using since last 7 years. Why the problem is coming with new GFS2 version. We would like to know is it safe to migrate the data on to the new GFS2 file system. As per your analysis it should perform faster in lock_nolock option but in this it is not happening and du -sh is also taking much time with lock_nolock option which is not supposed happens because no involvement of dlm and glocks. > Star is meant for acls backup GNU tar is meant to store/restore the ACLs as well. > and 3 times faster than tar If it is like that, can you please submit a bug report against tar? It's really something to be analyzed as well. > What is the equivalent tar command and provide same output for below star > command syntax. > star -c -acl H=exustar > data.tar $ tar -c --acls -f data.tar // plus directories you want to archive - How we can trust on tar that is equivalent to star. - Why star is not performing well on GFS2 file system. The same backup taken with star is taking 72 hours while extracting with star and taking only 2 hours with tar. What is the problem behind that. Is we are missing something. How the huge time difference is coming. - Could you explain the difference between star and tar can we rely on tar that it wont miss anything that star will restore. (In reply to Pavel Raiskup from comment #7) > > What is the equivalent tar command and provide same output for below star > > command syntax. > > star -c -acl H=exustar > data.tar > > $ tar -c --acls -f data.tar // plus directories you want to archive Could you provide the exustar equivalent in tar command. For what I can answer now: (In reply to Kumar from comment #12) > - How we can trust on tar that is equivalent to star. This is really tough question. Both tar implementations are supposed designed to archive/extract ACLs, but still that's different implementation. GNU tar is used much more widely in GNU/Linux ecosystem, if that matters to you, so as such it is more battle tested on Red Hat Enterprise Linux. > - Could you explain the difference between star and tar can we rely on tar > that it wont miss anything that star will restore. Star is CDDL tool, GNU tar is GPL tool (different license). Both are pretty equivalent in standard ustar/pax formats. Regarding ACLs, GNU tar has a few test-cases performed at build-time and post-build time in our testing environment (so I'm pretty confident ACLs work in GNU tar). (In reply to Pavel > Could you provide the exustar equivalent in tar command. The H=exustar option is mandatory in case of star -- and it only picks an appropriate format for the created tarball which is able to store ACLs (otherwise the -acl option is just ignored). This isn't needed in case of GNU tar because --acls automatically turns the 'pax' format on. The 'exustar' format isn't anyhow standardized thing, but feel free to look at man(1) star what's meant by that format. Created attachment 1455061 [details]
tar writes
Here's a graph of the writes done by tar (from strace data), and the time they took.
Created attachment 1455062 [details]
star writes
Here's a graph of the writes done by star (from strace data), and the time they took.
Hi, What is the basic difference between star and tar why both graphs are looking different. Why call time is scattered on star while compare with tar. Our intention is to know the problem is due to GFS2 bug. Because earlier we have not observed which is using since last 7 years. Why the problem is coming with new GFS2 version. We would like to know is it safe to migrate the data on to the new GFS2 file system. As per your analysis it should perform faster in lock_nolock option but in this it is not happening and du -sh is also taking much time with lock_nolock option which is not supposed happens because no involvement of dlm and glocks. Why star command restore is taking huge time while compared to tar only on GFS2 file system even though it is mounted with lock_nolock. The difference is that star calls fsync for every file, which forces the file system to sync all its journal data. The tar tool does not, and lets the Linux page cache buffer things up in memory. The times in the graph are higher for star because the writes are waiting for journal writes. This is not a gfs2 bug. We identified that tar could not able to restore the special characters in filenames and not able to get the access time stamps exactly and it is coming as it is with star. (In reply to Kumar from comment #22) > We identified that tar could not able to restore the special characters in > filenames and not able to get the access time stamps exactly and it is > coming as it is with star. Kumar, can you please submit new bugs for those issues, with more details? |