| Summary: | GCC Compilation error in 4 Node Distributed Native NFS Configuration for infiniband transport | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Sampath <sampath.kumar> |
| Component: | nfs | Assignee: | Shehjar Tikoo <shehjart> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | urgent | ||
| Version: | nfs-beta | CC: | gluster-bugs, lakshmipathi |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | RTP | Mount Type: | nfs |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Sampath
2010-04-26 06:51:27 UTC
Getting Following error when make;- make[5]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/32/boehm-gc' make[4]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/32/boehm-gc' make[3]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/boehm-gc' make[2]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/boehm-gc' make[1]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/boehm-gc' make[1]: Entering directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/libjava' deps.mk:1: libltdl/ltdl.d: No such file or directory tr make[1]: *** No rule to make target `libltdl/ltdl.d'. Stop. make[1]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/libjava' make: *** [all-target-libjava] Error 2 strace output of make command is copied to /share/tickets/856/ folder of dev.gluster.com Sampath, do you get the same error when running with strace? Getting following gcc make error make[5]: *** [ltdl.lo] Error 1 make[5]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/32/libjava/libltdl' make[4]: *** [all] Error 2 make[4]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/32/libjava/libltdl' make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/32/libjava' make[2]: *** [multi-do] Error 1 make[2]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/libjava' make[1]: *** [all-multi] Error 2 make[1]: Leaving directory `/mnt/nfs-share/build/x86_64-unknown-linux-gnu/libjava' make: *** [all-target-libjava] Error 2 strace output is copied to /share/tickets/856 folder of dev.gluster.com gcc build has been verified to run over ethernet. The problem is only over IB, and most probably related to the NFS perf degradation over IB. From Chida: Compile on localdisk compiles in 32 minutes. Compile on 4 node distribute setup results in error. Run 1: checking for strerror... yes checking for unistd.h... (cached) yes updating cache ./config.cache creating ./config.status creating Makefile make[1]: Entering directory `/mnt/distribute/src/build/zlib' make[1]: *** No rule to make target `adler32.*', needed by `libz.a'. Stop. make[1]: Leaving directory `/mnt/distribute/src/build/zlib' make: *** [all-zlib] Error 2 real 0m42.237s user 0m14.662s sys 0m9.793s [root@client01 build]# Run 2: checking if libtool supports shared libraries... yes checking whether to build shared libraries... yes checking whether to build static libraries... yes creating libtool updating cache ./config.cache configure: loading cache ./config.cache checking how to run the C++ preprocessor... (cached) /mnt/synopsys/build/gcc/xgcc -shared-libgcc -B/mnt/synopsys/build/gcc/ -nostdinc++ -L/ configure: error: C++ preprocessor " /mnt/synopsys/build/gcc/xgcc -shared-libgcc -B/mnt/synopsys/build/gcc/ -nostdinc++ -L/mnt/synopsys/buil See `config.log' for more details. make: *** [configure-target-libstdc++-v3] Error 1 real 6m13.037s user 4m57.355s sys 0m26.728s Run 3: checking for time... yes checking for ftime... /mnt/distribute/src/gcc-3.4.6/libjava/configure: line 6295: test: conftest.*: binary operator expected no checking for memmove... yes checking for memcpy... /mnt/distribute/src/gcc-3.4.6/libjava/configure: line 6412: test: conftest.*: binary operator expected no configure: error: memcpy is required make: *** [configure-target-libjava] Error 1 real 40m39.667s user 18m19.311s sys 9m16.740s [root@client01 buils]# The first time I am able to see an error number related message:
[root@domU-12-31-39-0E-B1-16 gcc-3.4.6]# make >/dev/null
configure: WARNING:
*** Makeinfo is missing or too old.
*** Info documentation will not be built.
<b>nm: conftest.o: Input/output error
nm: conftest.o: Input/output error
</b>
./configure: line 10583: test: too many arguments
checking for the document directory.
Links are now set up to build a native compiler for i686-pc-linux-gnu.
In file included from ./../include/xregex.h:26,
from regex.c:195:
./../include/xregex2.h:548: warning: ISO C90 does not support ‘static’ or type qualifiers in parameter array declarators
In file included from regex.c:649:
regex.c: In function ‘byte_compile_range’:
regex.c:4548: warning: signed and unsigned type in conditional expression
regex.c:4558: warning: signed and unsigned type in conditional expression
regex.c:4558: warning: signed and unsigned type in conditional expression
regex.c: In function ‘xregcomp’:
regex.c:8043: warning: signed and unsigned type in conditional expression
regex.c: In function ‘xregerror’:
regex.c:8178: warning: unused parameter ‘preg’
concat.c: In function ‘concat_length’:
concat.c:112: warning: traditional C rejects ISO C style function definitions
concat.c: In function ‘concat_copy’:
concat.c:127: warning: traditional C rejects ISO C style function definitions
concat.c: In function ‘concat_copy2’:
concat.c:146: warning: traditional C rejects ISO C style function definitions
concat.c: In function ‘concat’:
concat.c:157: warning: traditional C rejects ISO C style function definitions
concat.c: In function ‘reconcat’:
concat.c:194: warning: traditional C rejects ISO C style function definitions
<b>make[1]: *** Makefile: Input/output error. Stop.</b>
make: *** [all-gcc] Error 2
NFS client syslog gives the following messages:
call_verify: XDR representation not a multiple of 4 bytes: 0x756
call_verify: XDR representation not a multiple of 4 bytes: 0x756
call_verify: XDR representation not a multiple of 4 bytes: 0x756
call_verify: XDR representation not a multiple of 4 bytes: 0x756
call_verify: XDR representation not a multiple of 4 bytes: 0x949
call_verify: XDR representation not a multiple of 4 bytes: 0x949
call_verify: XDR representation not a multiple of 4 bytes: 0x949
call_verify: XDR representation not a multiple of 4 bytes: 0x949
call_verify: XDR representation not a multiple of 4 bytes: 0x949
call_verify: XDR representation not a multiple of 4 bytes: 0x949
call_verify: XDR representation not a multiple of 4 bytes: 0x75a
call_verify: XDR representation not a multiple of 4 bytes: 0x75a
call_verify: XDR representation not a multiple of 4 bytes: 0x75a
call_verify: XDR representation not a multiple of 4 bytes: 0x75a
call_verify: XDR representation not a multiple of 4 bytes: 0x989
call_verify: XDR representation not a multiple of 4 bytes: 0x989
call_verify: XDR representation not a multiple of 4 bytes: 0x989
call_verify: XDR representation not a multiple of 4 bytes: 0x8b5
call_verify: XDR representation not a multiple of 4 bytes: 0x8b5
call_verify: XDR representation not a multiple of 4 bytes: 0x8b5
call_verify: XDR representation not a multiple of 4 bytes: 0x8f9
call_verify: XDR representation not a multiple of 4 bytes: 0x8f9
call_verify: XDR representation not a multiple of 4 bytes: 0x8f9
call_verify: XDR representation not a multiple of 4 bytes: 0x8f9
call_verify: XDR representation not a multiple of 4 bytes: 0x997
call_verify: XDR representation not a multiple of 4 bytes: 0x997
call_verify: XDR representation not a multiple of 4 bytes: 0x997
From the linux kernel file net/sunrpc/clnt.c:
static __be32 *
call_verify(struct rpc_task *task)
{
struct kvec *iov = &task->tk_rqstp->rq_rcv_buf.head[0];
int len = task->tk_rqstp->rq_rcv_buf.len >> 2;
__be32 *p = iov->iov_base;
u32 n;
int error = -EACCES;
if ((task->tk_rqstp->rq_rcv_buf.len & 3) != 0) {
/* RFC-1014 says that the representation of XDR data must be a
* multiple of four bytes
* - if it isn't pointer subtraction in the NFS client may give
* undefined results
*/
dprintk("RPC: %5u %s: XDR representation not a multiple of"
" 4 bytes: 0x%x\n", task->tk_pid, __FUNCTION__,
task->tk_rqstp->rq_rcv_buf.len);
goto out_eio;
}
This could be the reason for the EIO through a syscall
Bugs that were identified during gcc building with releases before rc5 have been fixed in rc5. The gcc build is completing on FC8 AWS instance consistently over a 4 node distribute with and without all performance translators. On the other hand, I've also tested gcc building on centos AWS instances since the US cluster is running Centos. This build does fail consistently even over a ethernet link, unlike what I had earlier reported here. The error messages reported during the Centos failure point to some lines in an Ada source file which contains blank lines. I've verified that the contents of this file are the same on the FC8 instance. I believe the failure is occurring due to an old ada compiler on the centos distros, i.e. the gnat compiler. To test this theory, I deleted the blank lines from that particular file and re-ran the build. This time, it built that file correctly and failed on a different file, again reporting problems with blank lines in the Ada source. This is corroborated by the difference in gnat versions on: FC8 - GNAT 4.1.2 20070925 (Red Hat 4.1.2-33) Centos - GNAT 4.1.1 20070105 (Red Hat 4.1.1-52) Add to this the fact that the build on Centos machine fails not just on NFS mount point but also on the local file system. Sac/Chida If there is nothing more to add, feel free to close this bug. I see others also facing similar problems. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22413 and http://www.linuxquestions.org/questions/linux-server-73/multiple-gcc-versions-691302/ These look like setup/environment problems than filesystem related. Please re-open if doubts remain. Getting Following gcc compilation error on local Disk ====================================================== s-traent.ads:61:01: (style) blank lines not allowed at end of file a-exexda.adb:346:01: (style) multiple blank lines a-exextr.adb:216:01: (style) multiple blank lines make[1]: *** [ada/a-except.o] Error 1 make[1]: Leaving directory `/tmp/build/gcc' make: *** [all-gcc] Error 2 real 2m20.460s user 1m22.799s sys 0m51.179s [root@client7 build]# rpm -qa |grep gcc libgcc-4.1.2-46.el5_4.2 gcc-c++-4.1.2-46.el5_4.2 gcc-4.1.2-46.el5_4.2 gcc-gnat-4.1.2-46.el5_4.2 libgcc-4.1.2-46.el5_4.2 If gcc-gnat is not installed, gcc compilation is successful on local disk. Hi Sampath the previous test is the local disk of which distro? Centos? on AWS or the US cluster? For the record, the localdisk is successful because in the absence of GNAT compiler the configure scripts must be disabling the building of Ada sources. Regression Test Info: RT is required to check against IO error/EIO received by applications. See comments 7 and 8. Analysis: The problem is caused due to nfsx returning unaligned read replies. In RPC every message length needs to be aligned to 4 byte boundary. For reads requests which ask for data lengths that are not aligned to 4 byte boundary, NFSx needs to still send replies with enough padding bytes to align the final RPC message length to 4 bytes. NFSx did not do that and hence the EIO by the NFS client to the application. Patch for this bug was submitted as part of bz 902. Test case: 1. Use storage/posix as an NFS export. 2. Create a large file of say 1G. 3. In your test tool, create a collection of (offset, len) pairs. We'll be performing a read op for each one of these tuples. Either offset or the len should be unaligned to a 4 byte boundary. Each len must be less than 64k, otherwise, the NFS client will just start sending requests larger than 64k as properly aligned 64k read requests. 4. Do a read from the file for each one of these offset,len pairs. The file must be re-opened for each read and that the mount point must be re-mounted to avoid cached data being returned from NFS client cache. We want to force the server to return a reply. 5. Before starting the test, run the following command to enable logging from the kernel NFS client module. $ echo "65535" > /proc/sys/sunrpc/nfs_debug Patches submitted in comment 3 and 4 for bug 762634 are the fix for bug 762588. To reproduce 902, use the commit before all of the patches below. To reproduce 856, use the commit before patches in comment 3 and 4. Fixed.Verified with nfs-beta-rc9. Regression test - http://test.gluster.com/show_bug.cgi?id=79 |