| Summary: | nVidia driver installer fails when installing onto glusterfs root | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Gordan Bobic <gordan> |
| Component: | core | Assignee: | Lakshmipathi G <lakshmipathi> |
| Status: | CLOSED WONTFIX | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 3.0.0 | CC: | amarts, avati, fharshav, gluster.bugs, gluster-bugs, gordan, vijay |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Gordan Bobic
2009-07-24 14:31:03 UTC
(In reply to comment #0) > nVidia driver installer fails to install drivers correctly onto glusterfs > (glusterfs as root fs). > > Steps to reproduce: > > Mount a glusterfs volume, and copy the entire OS tree into it (I use CentOS > 5.x). chroot to this glusterfs mount. Download the nVidia driver installer > (http://www.nvidia.com/Download/index.aspx?lang=en-us), and run it (bash > ./NVIDIA....). > > The first error will be: > ERROR: Unable to map destination file '/var/lib/nvidia/100' for copying (No > such device). > > After that more errors will follow and if they are all ignored, the resulting > libraries it puts on the file system will be will be completely corrupted. Can you please confirm if the same problem exists with GlusterFS v 2.0.6? No change with 2.0.6, the problem still exists, and manifests in the same way. Can you confirm if this behavior is observed in 3.0.x releases as well? Unfortunately, I cannot get shared root file system to work stably enough with 3.0.0pre1 to get as far as testing. 3.0.0pre1 suffers from a random-ish lock-up issue where all access to the file system blocks. Will try this with 3.0.x when 3.0.1 is released. In the meantime, I updated my cluster to RH kernel 2.6.18-164.9.1.el5 and glfs to 2.0.9, and the problem still occurs with that release. Verified that this issue still exists in 3.0 branch, up to and including 3.0.2rc1. (In reply to comment #6) > Verified that this issue still exists in 3.0 branch, up to and including > 3.0.2rc1. Just occurred to me, can you confirm whether the / mount of glusterfs is mounted with --disable-direct-io? Avati Yes, always has been: /usr/sbin/glusterfs --log-level=NONE --log-file=/dev/null --disable-direct-io-mode --volfile=/etc/glusterfs.root/root2.vol /mnt/newroot I'm not entirely sure what the nvidia installer does - it certaily does something weird, as if it was putting all the data through a file (/var/lib/nvidia/100), but it's not a socket. But whatever it does is clearly valid on ext file systems. This root on glfs setup isn't a difficult environment to reproduce. Can you not build a test system like it so you can see it first hand? (In reply to comment #8) > Yes, always has been: > > /usr/sbin/glusterfs --log-level=NONE --log-file=/dev/null > --disable-direct-io-mode --volfile=/etc/glusterfs.root/root2.vol /mnt/newroot > > I'm not entirely sure what the nvidia installer does - it certaily does > something weird, as if it was putting all the data through a file > (/var/lib/nvidia/100), but it's not a socket. But whatever it does is clearly > valid on ext file systems. > > This root on glfs setup isn't a difficult environment to reproduce. Can you not > build a test system like it so you can see it first hand? Tested this with 3.0.2 release and kernel version 2.6.30 and NVIDIA installation went fine without issues. Setup was same type of configuration as your volume files. This might be problem with older fuse - fuse 2.7.4glfs11 ---- snip nvidia installer log ----------------------------------------------- nvidia 0000:02:00.0: setting latency timer to 64 NVRM: loading NVIDIA UNIX x86_64 Kernel Module 190.53 Wed Dec 9 15:29:46 PST 2009 nvidia 0000:02:00.0: setting latency timer to 64 NVRM: loading NVIDIA UNIX x86_64 Kernel Module 190.53 Wed Dec 9 15:29:46 PST 2009 nvidia 0000:02:00.0: setting latency timer to 64 NVRM: loading NVIDIA UNIX x86_64 Kernel Module 190.53 Wed Dec 9 15:29:46 PST 2009 nvidia 0000:02:00.0: setting latency timer to 64 NVRM: loading NVIDIA UNIX x86_64 Kernel Module 190.53 Wed Dec 9 15:29:46 PST 2009 -> Installing both new and classic TLS OpenGL libraries. -> Installing classic TLS 32bit OpenGL libraries. -> Install NVIDIA's 32-bit compatibility OpenGL libraries? (Answer: Yes) -> Parsing log file: -> done. -> Validating previous installation: -> done. -> Uninstalling NVIDIA Accelerated Graphics Driver for Linux-x86_64 (1.0-19053 (190.53)): -> done. -> Uninstallation of existing driver: NVIDIA Accelerated Graphics Driver for Linux-x86_64 (190.53) is complete. -> Searching for conflicting X files: -> done. -> Searching for conflicting OpenGL files: -> done. ------------------------------------------------------------------------------ I'm not using fuse 2.7.4glfs11 any more, haven't done in a while, ever since I gave up on knfsd with glfs. I'm using fuse from the latest RHEL5 kernel (2.6.18, but heavily patched, fuse API is 7.10). Are you saying that direct-io option is ignored on 3.0.x releases? (In reply to comment #10) > I'm not using fuse 2.7.4glfs11 any more, haven't done in a while, ever since I > gave up on knfsd with glfs. I'm using fuse from the latest RHEL5 kernel > (2.6.18, but heavily patched, fuse API is 7.10). > Ok > Are you saying that direct-io option is ignored on 3.0.x releases? No not at all, if you have 2.6.26+ kernel then its on no use. If you lower version of kernel other than stated then it is useful. But i believe now that its related to fuse itself. Will investigate with older kernels and let you know. Please open a new bug if you still face the issue with latest release (> 3.1.x) |