Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 761895 (GLUSTER-163)

Summary:	nVidia driver installer fails when installing onto glusterfs root
Product:	[Community] GlusterFS	Reporter:	Gordan Bobic <gordan>
Component:	core	Assignee:	Lakshmipathi G <lakshmipathi>
Status:	CLOSED WONTFIX	QA Contact:
Severity:	medium	Docs Contact:
Priority:	low
Version:	3.0.0	CC:	amarts, avati, fharshav, gluster.bugs, gluster-bugs, gordan, vijay
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Gordan Bobic 2009-07-24 14:31:03 UTC

nVidia driver installer fails to install drivers correctly onto glusterfs (glusterfs as root fs).

Steps to reproduce:

Mount a glusterfs volume, and copy the entire OS tree into it (I use CentOS 5.x). chroot to this glusterfs mount. Download the nVidia driver installer
(http://www.nvidia.com/Download/index.aspx?lang=en-us), and run it (bash ./NVIDIA....).

The first error will be:
ERROR: Unable to map destination file '/var/lib/nvidia/100' for copying (No
       such device).

After that more errors will follow and if they are all ignored, the resulting libraries it puts on the file system will be will be completely corrupted.

Comment 1 Vijay Bellur 2009-08-17 07:53:57 UTC

(In reply to comment #0)
> nVidia driver installer fails to install drivers correctly onto glusterfs
> (glusterfs as root fs).
> 
> Steps to reproduce:
> 
> Mount a glusterfs volume, and copy the entire OS tree into it (I use CentOS
> 5.x). chroot to this glusterfs mount. Download the nVidia driver installer
> (http://www.nvidia.com/Download/index.aspx?lang=en-us), and run it (bash
> ./NVIDIA....).
> 
> The first error will be:
> ERROR: Unable to map destination file '/var/lib/nvidia/100' for copying (No
>        such device).
> 
> After that more errors will follow and if they are all ignored, the resulting
> libraries it puts on the file system will be will be completely corrupted.

Can you please confirm if the same problem exists with GlusterFS v 2.0.6?

Comment 2 Gordan Bobic 2009-08-22 23:00:07 UTC

No change with 2.0.6, the problem still exists, and manifests in the same way.

Comment 3 Anand Avati 2009-11-12 12:41:48 UTC

Can you confirm if this behavior is observed in 3.0.x releases as well?

Comment 4 Gordan Bobic 2009-11-17 22:52:59 UTC

Unfortunately, I cannot get shared root file system to work stably enough with 3.0.0pre1 to get as far as testing. 3.0.0pre1 suffers from a random-ish lock-up issue where all access to the file system blocks.

Comment 5 Gordan Bobic 2009-12-25 06:57:49 UTC

Will try this with 3.0.x when 3.0.1 is released. In the meantime, I updated my cluster to RH kernel 2.6.18-164.9.1.el5 and glfs to 2.0.9, and the problem still occurs with that release.

Comment 6 Gordan Bobic 2010-01-28 13:51:26 UTC

Verified that this issue still exists in 3.0 branch, up to and including 3.0.2rc1.

Comment 7 Anand Avati 2010-01-28 14:05:45 UTC

(In reply to comment #6)
> Verified that this issue still exists in 3.0 branch, up to and including
> 3.0.2rc1.

Just occurred to me, can you confirm whether the / mount of glusterfs is mounted with --disable-direct-io?

Avati

Comment 8 Gordan Bobic 2010-01-28 14:52:47 UTC

Yes, always has been:

/usr/sbin/glusterfs --log-level=NONE --log-file=/dev/null --disable-direct-io-mode --volfile=/etc/glusterfs.root/root2.vol /mnt/newroot

I'm not entirely sure what the nvidia installer does - it certaily does something weird, as if it was putting all the data through a file (/var/lib/nvidia/100), but it's not a socket. But whatever it does is clearly valid on ext file systems.

This root on glfs setup isn't a difficult environment to reproduce. Can you not build a test system like it so you can see it first hand?

Comment 9 Harshavardhana 2010-02-13 11:38:24 UTC

(In reply to comment #8)
> Yes, always has been:
> 
> /usr/sbin/glusterfs --log-level=NONE --log-file=/dev/null
> --disable-direct-io-mode --volfile=/etc/glusterfs.root/root2.vol /mnt/newroot
> 
> I'm not entirely sure what the nvidia installer does - it certaily does
> something weird, as if it was putting all the data through a file
> (/var/lib/nvidia/100), but it's not a socket. But whatever it does is clearly
> valid on ext file systems.
> 
> This root on glfs setup isn't a difficult environment to reproduce. Can you not
> build a test system like it so you can see it first hand?

Tested this with 3.0.2 release and kernel version 2.6.30 and NVIDIA installation went fine without issues. 

Setup was same type of configuration as your volume files. 

This might be problem with older fuse - fuse 2.7.4glfs11

---- snip nvidia installer log -----------------------------------------------
  nvidia 0000:02:00.0: setting latency timer to 64
   NVRM: loading NVIDIA UNIX x86_64 Kernel Module  190.53  Wed Dec  9 15:29:46
   PST 2009
   nvidia 0000:02:00.0: setting latency timer to 64
   NVRM: loading NVIDIA UNIX x86_64 Kernel Module  190.53  Wed Dec  9 15:29:46
   PST 2009
   nvidia 0000:02:00.0: setting latency timer to 64
   NVRM: loading NVIDIA UNIX x86_64 Kernel Module  190.53  Wed Dec  9 15:29:46
   PST 2009
   nvidia 0000:02:00.0: setting latency timer to 64
   NVRM: loading NVIDIA UNIX x86_64 Kernel Module  190.53  Wed Dec  9 15:29:46
   PST 2009
-> Installing both new and classic TLS OpenGL libraries.
-> Installing classic TLS 32bit OpenGL libraries.
-> Install NVIDIA's 32-bit compatibility OpenGL libraries? (Answer: Yes)
-> Parsing log file:
-> done.
-> Validating previous installation:
-> done.
-> Uninstalling NVIDIA Accelerated Graphics Driver for Linux-x86_64 (1.0-19053
   (190.53)):
-> done.
-> Uninstallation of existing driver: NVIDIA Accelerated Graphics Driver for
   Linux-x86_64 (190.53) is complete.
-> Searching for conflicting X files:
-> done.
-> Searching for conflicting OpenGL files:
-> done.
------------------------------------------------------------------------------

Comment 10 Gordan Bobic 2010-02-13 14:10:27 UTC

I'm not using fuse 2.7.4glfs11 any more, haven't done in a while, ever since I gave up on knfsd with glfs. I'm using fuse from the latest RHEL5 kernel (2.6.18, but heavily patched, fuse API is 7.10).

Are you saying that direct-io option is ignored on 3.0.x releases?

Comment 11 Harshavardhana 2010-02-13 14:41:58 UTC

(In reply to comment #10)
> I'm not using fuse 2.7.4glfs11 any more, haven't done in a while, ever since I
> gave up on knfsd with glfs. I'm using fuse from the latest RHEL5 kernel
> (2.6.18, but heavily patched, fuse API is 7.10).
> 
Ok
> Are you saying that direct-io option is ignored on 3.0.x releases?
No not at all, if you have 2.6.26+ kernel then its on no use. If you lower version of kernel other than stated then it is useful. But i believe now that its related to fuse itself. Will investigate with older kernels and let you know.

Comment 12 Amar Tumballi 2011-01-20 06:43:46 UTC

Please open a new bug if you still face the issue with latest release (> 3.1.x)