Bug 1294072

Summary: The MathGL library does not work at all.
Product: [Fedora] Fedora Reporter: Mikhail Lozhnikov <lozhnikovma>
Component: mathglAssignee: Dmitrij S. Kryzhevich <kryzhev>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 23CC: atu, kryzhev, lozhnikovma, mycae
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: mathgl-2.3.3-7.fc23 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-14 08:54:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
A very simple program
none
Patch that makes fonts architecture-independent none

Description Mikhail Lozhnikov 2015-12-24 12:17:48 UTC
Created attachment 1109199 [details]
A very simple program

Description of problem:
All programs compiled with g++ and linked with -lmgl crash.

Version-Release number of selected component (if applicable):
mathgl-mpich-devel-2.3.3-5.fc23.x86_64
mathgl-devel-2.3.3-5.fc23.x86_64
mathgl-mpich-2.3.3-5.fc23.x86_64
mathgl-common-2.3.3-5.fc23.noarch
mathgl-2.3.3-5.fc23.x86_64

How reproducible:
Write any c++ program with mathgl.

Steps to Reproduce:
1. Write any program in C++ language (or use helloworld.cpp in the attachment).
2. Compile it with g++ -lmgl and run.
[mikhail@lozhnikovhl ~]$ g++ helloworld.cpp -lmgl
[mikhail@lozhnikovhl ~]$ ./a.out 
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

Additional info:
The version 2.3-10.fc23 works fine. The version 2.3.3-4 also does not work.
Tha last mathgl package in Fedora 22 has the same problem.

Comment 1 Mikhail Lozhnikov 2015-12-31 09:29:21 UTC
I rebuild the library from mathgl-2.3.3-5.fc23.src.rpm on my local computer and it works fine. Maybe I'll test the binary package on a clean system on a virtual machine.

Comment 2 Anton Guda 2016-01-01 11:22:00 UTC
It seems to be bug in binary font loading.
How to check: replace /usr/share/mathgl/fonts/*.vfmb
with *.vfm analogues from source package - all works!

P.S. It may be better to all fonts, not only minimal set.

Comment 3 Mikhail Lozhnikov 2016-01-02 08:42:23 UTC
Yes, the fonts are broken. I install mathgl-debuginfo package and try to use gdb. Here is the result:

(gdb) where
#0  0x00007ffff6c9ca98 in raise () from /lib64/libc.so.6
#1  0x00007ffff6c9e69a in abort () from /lib64/libc.so.6
#2  0x00007ffff75d1aed in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3  0x00007ffff75cf936 in ?? () from /lib64/libstdc++.so.6
#4  0x00007ffff75cf981 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007ffff75cfb99 in __cxa_throw () from /lib64/libstdc++.so.6
#6  0x00007ffff75d013c in operator new(unsigned long) () from /lib64/libstdc++.so.6
#7  0x00007ffff75d0199 in operator new[](unsigned long) () from /lib64/libstdc++.so.6
#8  0x00007ffff79ff75f in mglFont::LoadBin (this=this@entry=0x7ffff7dda1c0 <mglDefFont>, 
    base=base@entry=0x7ffff7ae8ae6 "STIX", path=path@entry=0x7ffff7ae8a8b "/usr/share/mathgl/fonts")
    at /usr/src/debug/mathgl-2.3.3/src/font.cpp:709
#9  0x00007ffff7a00951 in mglFont::Load (this=this@entry=0x7ffff7dda1c0 <mglDefFont>, base=0x7ffff7ae8ae6 "STIX", 
    path=0x7ffff7ae8a8b "/usr/share/mathgl/fonts") at /usr/src/debug/mathgl-2.3.3/src/font.cpp:744
#10 0x00007ffff7a016a3 in mglFont::mglFont (this=0x7ffff7dda1c0 <mglDefFont>, name=<optimized out>, path=<optimized out>)
    at /usr/src/debug/mathgl-2.3.3/src/font.cpp:895
#11 0x00007ffff7908254 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
    at /usr/src/debug/mathgl-2.3.3/src/font.cpp:37
#12 _GLOBAL__sub_I_font.cpp(void) () at /usr/src/debug/mathgl-2.3.3/src/font.cpp:919
#13 0x00007ffff7deb75a in call_init.part () from /lib64/ld-linux-x86-64.so.2
#14 0x00007ffff7deb86b in _dl_init () from /lib64/ld-linux-x86-64.so.2
#15 0x00007ffff7ddccba in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#16 0x0000000000000001 in ?? ()
#17 0x00007fffffffe1d6 in ?? ()
#18 0x0000000000000000 in ?? ()


If you look at the source code of the library you can find this function:
bool mglFont::LoadBin(const char *base, const char *path)
{
	Clear();	// first clear old
	if(!path)	path = MGL_FONT_PATH;
	char str[256], sep='/';
	snprintf(str,256,"%s%c%s.vfmb",path,sep,base?base:"");	str[255]=0;
	FILE *fp = fopen(str,"rb");		if(!fp)	return false;
	size_t s, len;
	bool res = true;
	s = fread(&numb,sizeof(long),1,fp);
	if(s<1)	res = false;
	s = fread(fact,sizeof(float),4,fp);
	if(s<4)	res = false;
	Buf = new short[numb];
	s = fread(Buf,sizeof(short),numb,fp);
	if(s<size_t(numb))	res = false;
	s = fread(&len,sizeof(size_t),1,fp);
	if(s<1)	res = false;
	if(res)
	{
		glyphs.clear();	glyphs.resize(len);
		s = fread(&(glyphs[0]),sizeof(mglGlyphDescr),len,fp);
		if(s<len)	res = false;
	}
//	if(!res)	Clear();
	fclose(fp);		return res;
}


So, GDB tells that the error occurs at string "Buf = new short[numb];".
It can happen if numb is < 0 or if numb is too much. numb has a type long. And if I am not mistaken the size of long depends on the architecture. For example the size of long on my computer is 8 bytes (x86_64 architecture).

Finally I wrote a test program, that reads the header of MGL font files. Here is the program:

#include <stdio.h>
#include <iostream>

int main(int argc,char *argv[])
{
	FILE *fp;
	long numb;
	size_t s,len;
	float fact[4];
	
	if(argc < 2)
	{
		printf("Usage: ./a.out <MathGL binary font file>\n");
		return 1;
	}
	if((fp = fopen(argv[1],"rb")) == NULL)
	{
		printf("Can not open file \'%s\'!\n",argv[1]);
		return 2;
	}

	s = fread(&numb,sizeof(long),1,fp);
	if(s<1)	
	{
		printf("Can not read numb!\n");
		return 3;
	}
	std::cout << "numb=" << numb << std::endl;
	std::cout << "The size of numb on the architecture of my computer is " << sizeof(long) << " bytes" << std::endl;
	s = fread(fact,sizeof(float),4,fp);
	if(s<4)
	{
		printf("Can not read fact!\n");
		return 4;
	}
	if(numb <= 0)
	{
		printf("ERROR!!!! numb = %ld < 0 !!!\n",numb);
		return 5;
	}
	fclose(fp);
	return 0;
}


And here is the output of the program on MGL font file:
[mikhail@lozhnikovhl mathgl]$ ./a.out /usr/share/mathgl/fonts/STIX.vfmb 
numb=4970285164529286654
The size of numb on the architecture of my computer is 8 bytes

Here is the output of the program on MGL font file from the package that I rebuild on my local computer:
[mikhail@lozhnikovhl mathgl]$ ./a.out correct_fonts/STIX.vfmb 
numb=2719230
The size of numb on the architecture of my computer is 8 bytes


My conclusions:
1. I suppose that the package that contains the fonts (mathgl-common-2.3.3-5.fc23.noarch.rpm) was built on a 32-bit architecture where long has size 4 bytes.
2. I think that ".noarch" package should not contain architecture-dependent files such as binary fonts.
3. The code of MathGL's LoadBin is very strange.
a) It uses fread for reading 'long' datatype. It is better to use datatypes such as int32_t or something else because it has a fixed size on every architecture.
b) It uses fread for reading float!!! If I am not mistaken the format of float may be differ even on different processors of x86_64 family! And it is better to store separately the mantissa and the exponent. For example you can use int64_t for the mantissa and uint8_t for the exponent and you will get sometheng like 'double' datatype. Or it is better to use text format and fscanf function.

Sorry for such a long message, I didn't find how to add an attachment.

Comment 4 Mikhail Lozhnikov 2016-01-02 15:55:55 UTC
Created attachment 1111052 [details]
Patch that makes fonts architecture-independent

Comment 5 Mikhail Lozhnikov 2016-01-02 16:10:49 UTC
I think I've made a patch that makes fonts architecture-independent (see in the attachments).

I replace functions LoadBin and SaveBin from src/font.cpp
The 'float' datatype is represented by 32-bits mantissa and 8-bits exponent.
All integers are stored in network byte order format. The sign of a signed integer is stored in uint8_t variable.
Instead of writing the array of mglGlyphDescr structure using fwrite, I write all elements separetely because the elements of the structure can be aligned in memory and sizeof(mglGlyphDescr) may be different on various architectures and compilers.

Maybe this solution works slow because it uses a lot of fread but if I am not mistaken the library loads fonts at init so it happens only one time. Anyway, it is easy to create an array of uint8_t and reduce fread and fwrite count.

What do you think about it?

Comment 6 mycae 2016-01-03 12:26:26 UTC
Hi Mikhail,

I know I dont keep an eye on the package very much any more, and thankyou for looking into this.  

I'm not sure it is necessary to create custom mantissa reading/writing routines to this level of detail. I'm fairly certain under all amd64/x86_64 you can  simply use fwrite on individual floats (obviously not on the structures due to alignment concerns), even between hosts. To my knowledge this works (and I do this quite a bit) reliably as most hosts assume that sizeof(float) == 4. I normally check this at configure time

Float should be IEEE 754 floating point, stored in little-endian format in memory for x86-64. The processor itself may store the data internally differently, but AFAIK (needs citation), needs to present this format for the x86_64 instruction set. For arm, which is a supported fedora arch, the endian-ness is not fixed, so host-to-network is required.

The only requirement should be to detect the endian-ness and perform a byte-order swap, eg via these
https://www.gnu.org/software/libc/manual/html_node/Byte-Order.html

This should, to the best of my knowledge dump a 4-byte float in network-order if used correctly. I think custom reader/writer routines are making the patch more complex than it needs to be, and doing a lot of unneeded computation. 

Indeed, font loading in MGL is a definite  pain point, as it consumes most of my programs' loading times.  Calling exponential functions in order to rebuild the float seems slow. Are we certain this is required?

Here the serialisation is performed by shifting/masking operations, which seem a lot faster.
https://stackoverflow.com/questions/10620601/portable-serialisation-of-ieee754-floating-point-values

Also, we would hopefully push this into mathgl itself, as other distributions will be affected.

Comment 7 Dmitrij S. Kryzhevich 2016-01-04 04:14:45 UTC
Hi!

You are too fast, sorry (and there a week New Year vocations here).

There is not a general issue as in most cases it is built and use in the same surround. So. I think I just make font package arch dependent what should cheaply fix the issue.

Comment 8 Anton Guda 2016-01-04 10:16:12 UTC
2.3.3-7.fc24.x86_64 works for me!

Comment 9 Mikhail Lozhnikov 2016-01-04 12:31:34 UTC
2.3.3-7.fc23.x86_64 works fine. Thank you!

Comment 10 Fedora Update System 2016-01-05 04:36:30 UTC
mathgl-2.3.3-7.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2016-bf00a9534b

Comment 11 Fedora Update System 2016-01-06 00:29:22 UTC
mathgl-2.3.3-7.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-bf00a9534b

Comment 12 Fedora Update System 2016-01-14 08:54:26 UTC
mathgl-2.3.3-7.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.