Bug 1328076

Summary: Build libhdfs for all architectures
Product: [Fedora] Fedora Reporter: Mattias Ellert <mattias.ellert>
Component: hadoopAssignee: Christopher Tubbs <ctubbsii>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 24CC: besser82, coolsvap, dan, fweimer, jwakely, moceap, pbrobinson, rrati
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: hadoop-2.4.1-24.fc25 hadoop-2.4.1-24.fc24 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-07 23:24:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 238953, 245418, 485231, 922257, 1051573, 1242747    
Attachments:
Description Flags
Patch against the Fedora git enabling libhdfs for all architectures.
none
This is the new version of the hadoop-dlopen-libjvm.patch if you prefer it this way.
none
New version with function type cast instead of (**void) defreference none

Description Mattias Ellert 2016-04-18 11:44:00 UTC
Created attachment 1148179 [details]
Patch against the Fedora git enabling libhdfs for all architectures.

Description of problem:

The libhdfs package is currently only built for ix86 and x86_64. Please consider providing this package for all architectures.

A proposed patch is attached.

Version-Release number of selected component (if applicable):

hadoop-2.4.1-14.fc24

How reproducible:

Always

Additional info:

In order to build hadoop I had to modify the spec file slightly.

1) The /etc/tomcat/log4j.properties file no longer exists, so the build failed at the point where it tried to copy this file into the build tree. I have removed this copy command in the patch.

2) Patch 11 in the spec file (hadoop-2.4.1-cmake-java-ppc64le.patch) for some reason changed the _java_libarch variable on the ppc64le architecture from the correct ppc64le to the incorrect ppc64, causing the package build to fail on this architecture. Dropping this patch made the build succeed.

Scratch builds:

ix86, x86_64, arm:
https://koji.fedoraproject.org/koji/taskinfo?taskID=13696851

ppc64, ppc64le:
https://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=3307525

s390, s390x:
https://s390.koji.fedoraproject.org/koji/taskinfo?taskID=2188109

Comment 1 Mattias Ellert 2016-04-18 11:46:18 UTC
Created attachment 1148180 [details]
This is the new version of the hadoop-dlopen-libjvm.patch if you prefer it this way.

Comment 2 Mattias Ellert 2016-04-19 12:19:21 UTC
Scratch build aarch64:
https://arm.koji.fedoraproject.org/koji/taskinfo?taskID=3508590

Comment 3 Fedora Admin XMLRPC Client 2016-06-01 19:42:17 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 4 Fedora Admin XMLRPC Client 2016-06-08 21:26:57 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 5 Christopher Tubbs 2016-06-23 17:43:43 UTC
I'm not familiar enough with JNI development to properly vet this code. I'll apply the change if somebody can file a JIRA with the upstream community proposing the patch, and getting them to sign off on it.

Comment 6 Mattias Ellert 2016-06-25 06:08:55 UTC
You do realize that this is a proposed change to the patch that was added during review to make the package comply to the packaging guidelines. Hence the upstream is the Fedora package itself and the package maintainers, not the hadoop developers.

Comment 7 Christopher Tubbs 2016-06-25 06:26:26 UTC
(In reply to Mattias Ellert from comment #6)
> You do realize that this is a proposed change to the patch that was added
> during review to make the package comply to the packaging guidelines. Hence
> the upstream is the Fedora package itself and the package maintainers, not
> the hadoop developers.

Yes, but, I do not feel comfortable about applying a patch when I don't understand what it does. I'd personally feel more comfortable about applying it if somebody in upstream Hadoop can verify it's an appropriate modification to the build of their native libraries to support an additional architecture.

Further, if it's suitable for extending support for an additional architecture within Fedora, I'd think that the Hadoop developers would be willing to apply it upstream, to reduce the number of Fedora-specific patches we need to support.

If a proven packager wishes to apply it, that's fine... but without Apache Hadoop developer backing, I just don't have enough expertise to feel comfortable applying it.

Comment 8 Florian Weimer 2016-06-30 08:21:58 UTC
(In reply to Mattias Ellert from comment #1)
> Created attachment 1148180 [details]
> This is the new version of the hadoop-dlopen-libjvm.patch if you prefer it
> this way.

These lines from the patch

    *(void **)(&getCreatedJavaJVMsPtr) = dlsym(jvmHandle, "JNI_GetCreatedJavaVMs");

        *(void **)(&createJavaVMPtr) = dlsym(jvmHandle, "JNI_CreateJavaVM");

introduce an aliasing violations.  Please use a function pointer cast instead.

Comment 9 Jonathan Wakely 2016-06-30 08:22:31 UTC
The patch has:

     char libjvmPath[strlen(javaHome)+35];
-    snprintf(libjvmPath, sizeof(libjvmPath), "%s/jre/lib/amd64/server/libjvm.so", javaHome);
+    snprintf(libjvmPath, sizeof(libjvmPath), "%s/jre/lib/%s/server/libjvm.so", javaHome, arch);

This will overflow the buffer when strlen(arch) > strlen("amd64") which is true for several arches.

Comment 10 Mattias Ellert 2016-06-30 15:35:43 UTC
(In reply to Florian Weimer from comment #8)
> These lines from the patch
> 
>     *(void **)(&getCreatedJavaJVMsPtr) = dlsym(jvmHandle,
> "JNI_GetCreatedJavaVMs");
> 
>         *(void **)(&createJavaVMPtr) = dlsym(jvmHandle, "JNI_CreateJavaVM");
> 
> introduce an aliasing violations.  Please use a function pointer cast
> instead.

The way the cast of the return value of dlsym is done is unchanged from the existing patch, and not part of my proposed change. As far as I can tell the code as is has not caused problems - it has been in use since the hadoop package was first introduced in Fedora (though only for ix86 and x86_64), and there are no warnings about aliasing violations in the koji build logs.

(In reply to Jonathan Wakely from comment #9)
> The patch has:
> 
>      char libjvmPath[strlen(javaHome)+35];
> -    snprintf(libjvmPath, sizeof(libjvmPath),
> "%s/jre/lib/amd64/server/libjvm.so", javaHome);
> +    snprintf(libjvmPath, sizeof(libjvmPath),
> "%s/jre/lib/%s/server/libjvm.so", javaHome, arch);
> 
> This will overflow the buffer when strlen(arch) > strlen("amd64") which is
> true for several arches.

The longest name is "mips64el". The length of the string for this architecture is:

printf '/jre/lib/mips64el/server/libjvm.so' | wc -c
34

So 35 should be enough for the string + the final '\0' at the end for all architectures.

Comment 11 Jonathan Wakely 2016-07-01 10:11:49 UTC
(In reply to Mattias Ellert from comment #10)
> (In reply to Florian Weimer from comment #8)
> > These lines from the patch
> > 
> >     *(void **)(&getCreatedJavaJVMsPtr) = dlsym(jvmHandle,
> > "JNI_GetCreatedJavaVMs");
> > 
> >         *(void **)(&createJavaVMPtr) = dlsym(jvmHandle, "JNI_CreateJavaVM");
> > 
> > introduce an aliasing violations.  Please use a function pointer cast
> > instead.
> 
> The way the cast of the return value of dlsym is done is unchanged from the
> existing patch, and not part of my proposed change. As far as I can tell the
> code as is has not caused problems - it has been in use since the hadoop
> package was first introduced in Fedora (though only for ix86 and x86_64),
> and there are no warnings about aliasing violations in the koji build logs.

It's still a bug.

> (In reply to Jonathan Wakely from comment #9)
> > The patch has:
> > 
> >      char libjvmPath[strlen(javaHome)+35];
> > -    snprintf(libjvmPath, sizeof(libjvmPath),
> > "%s/jre/lib/amd64/server/libjvm.so", javaHome);
> > +    snprintf(libjvmPath, sizeof(libjvmPath),
> > "%s/jre/lib/%s/server/libjvm.so", javaHome, arch);
> > 
> > This will overflow the buffer when strlen(arch) > strlen("amd64") which is
> > true for several arches.
> 
> The longest name is "mips64el". The length of the string for this
> architecture is:
> 
> printf '/jre/lib/mips64el/server/libjvm.so' | wc -c
> 34
> 
> So 35 should be enough for the string + the final '\0' at the end for all
> architectures.

I miscounted, sorry.

Comment 12 Mattias Ellert 2016-07-05 09:39:55 UTC
(In reply to Jonathan Wakely from comment #11)
> (In reply to Mattias Ellert from comment #10)
> > (In reply to Florian Weimer from comment #8)
> > > These lines from the patch
> > > 
> > >     *(void **)(&getCreatedJavaJVMsPtr) = dlsym(jvmHandle,
> > > "JNI_GetCreatedJavaVMs");
> > > 
> > >         *(void **)(&createJavaVMPtr) = dlsym(jvmHandle, "JNI_CreateJavaVM");
> > > 
> > > introduce an aliasing violations.  Please use a function pointer cast
> > > instead.
> > 
> > The way the cast of the return value of dlsym is done is unchanged from the
> > existing patch, and not part of my proposed change. As far as I can tell the
> > code as is has not caused problems - it has been in use since the hadoop
> > package was first introduced in Fedora (though only for ix86 and x86_64),
> > and there are no warnings about aliasing violations in the koji build logs.
> 
> It's still a bug.

Possibly, but it is completely orthogonal to the current issue. The cast you complain about is not introduced by the proposed change, and the proposed change does not claim that it is trying to address the issue of the cast - so it is irrelevant to the question whether or not the proposed change should be accepted or not.

If you believe the cast issue is important you should file a new bug report about it, not hijack this one.

This is like when you get a -1 karma on an update because it doesn't address a preexisting bug it was not claiming to fix. A -1 is valid if a new bug is introduced or if a bug it claims it is fixing is not fixed.

Comment 13 Mattias Ellert 2016-07-20 08:48:38 UTC
Created attachment 1181965 [details]
New version with function type cast instead of (**void) defreference

Comment 14 Mattias Ellert 2016-10-27 07:51:56 UTC
Hi!

I would like to return to this issue.

In my opinion not supporting all architectures is a bad thing.

I agree that getting a package to work on all architectures is not necessarily a task the package maintainer must perform, even though I think he should at least try. If he deems the task too complicated or time consuming, falling back to ExcludeArch or not building some subpackages on some architectures can be warranted.

However, if someone offers to help and provides a patch to fix the issues, refusing to accept the help is bad maintainership.

Supporting only i686 and x86_64 might have been not too bad at the time these were the only primary architectures. Today this is no longer the case, and arm is primary too. Also aarch64 has recently been moved into to the main koji instance. So it really is a pity the libhdfs is only there for i686 and x86_64.

The patch in the attachment still works as a drop in replacement for the existing patch, and the package builds fine for F24, F25 and rawhide (F26):

Recent scratch builds:
F24: http://koji.fedoraproject.org/koji/taskinfo?taskID=16194154
F25: http://koji.fedoraproject.org/koji/taskinfo?taskID=16194150
F26: http://koji.fedoraproject.org/koji/taskinfo?taskID=16194145

Comment 15 Christopher Tubbs 2016-10-27 22:14:53 UTC
Mattias Ellert:

1. I agree that not supporting all arches is bad.
2. I really don't like maintaining the conditionals in the spec file and would love to get rid of them.
3. I am not refusing the help. I welcome the help.

All I've said is that I lack the expertise to review the contribution, and would feel more comfortable with somebody from the upstream community, or a proven packager, having reviewed it first.

I will try to reach out to the upstream community to get their take on the patch, but if you wish to do that, please feel free to link the mailing list or JIRA discussion here.

Do you know if upstream is willing to accept the patch to support more architectures? Has any attempt been made to try to contribute the patch to upstream Hadoop first?

In the absence of external review, I may be able to get to the patch, but it's going to take me a bit of time. I've recently solicited help from a colleague, who may have more expertise than myself. I will document here if I make progress.

I apologize if it seems I'm being unwelcoming to a contributor. I really *want* to apply the patch. I just don't have sufficient expertise with the native code stuffs. Please be patient, or solicit review from upstream Hadoop to accelerate things.

Comment 16 Christopher Tubbs 2016-10-28 03:25:26 UTC
Okay, I spent a few hours trying to grok this patch, and I think I understand it sufficiently to have some confidence in applying it.

Comment 17 Fedora Update System 2016-10-30 20:21:29 UTC
hadoop-2.4.1-24.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-362f86628d

Comment 18 Fedora Update System 2016-10-31 10:21:49 UTC
hadoop-2.4.1-24.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-93287ad2de

Comment 19 Fedora Update System 2016-11-03 18:22:30 UTC
hadoop-2.4.1-24.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 20 Fedora Update System 2016-11-07 23:24:46 UTC
hadoop-2.4.1-24.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 21 Fedora Update System 2016-11-19 21:03:54 UTC
hadoop-2.4.1-24.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.