Description of problem: My Fedora Core 5 installation became outdated, so I installed the current Fedora 7 from scratch (no upgrade). I use "fwlogwatch v1.1" (http://fwlogwatch.inside-security.de/) to analyze my ipfilter logs. I compiled "fwlogwatch" by just running make (-O2). All gone fine, but when analyzing my /var/log/messages file and hitting a row with "last message repeated x times" (parser.c), the program core dumps - sometimes, it works (20:1). Changing -O2 to -O1 compiling works again, but after this analyzing my logs, "fwlogwatch" don't segmentation faults any more. I put some printf's for more debugging into the source - and the core dump magically disappears although using -O2! So, maybe there is something special with Fedora 7 and gcc? Version-Release number of selected component (if applicable): fwlogwatch v1.1 gcc-4.1.2-12: gcc version 4.1.2 20070502 (Red Hat 4.1.2-12) How reproducible: Always with a - Pentium 4, 2GHz, "pure install" - Pentium 4, 3GHz, VMware Steps to Reproduce: 1. wget http://www.kybs.de/boris/sw/fwlogwatch-1.1.tar.gz 2. tar xzf fwlogwatch-1.1.tar.gz 3. cd fwlogwatch-1.1 4. make 5. ./fwlogwatch -vvdNw -o fwlogwatch.htm /tmp/messages Actual results: $ fwlogwatch -vvdNw -o fwlogwatch.htm /tmp/messages Opening input file '/tmp/messages' Processing ..rSegmentation fault . = PARSE_OK r = "last message repeated x times" Expected results: $ fwlogwatch -vdNw -o fwlogwatch.htm /tmp/messages Opening input file '/tmp/messages' Processing ..r.. Closing '/tmp/messages' Sorting data Opening output file 'fwlogwatch.htm' Closing 'fwlogwatch.htm' Exiting Additional info: Program terminated with signal 11, Segmentation fault. #0 0x0083533b in _IO_file_xsgetn_internal () from /lib/libc.so.6 (gdb) where #0 0x0083533b in _IO_file_xsgetn_internal () from /lib/libc.so.6 #1 0x008373f8 in _IO_sgetn_internal () from /lib/libc.so.6 #2 0x0082b63e in fread () from /lib/libc.so.6 #3 0x0095efc7 in gzread () from /lib/libz.so.1 #4 0x0095f283 in gzgets () from /lib/libz.so.1 #5 0x08053ca1 in common_input_loop (linenum=0xbf8643a4, hitnum=0xbf8643a0, errnum=0xbf86439c, oldnum=0xbf864398, exnum=0xbf864394) at modes.c:89 #6 0x080547f8 in mode_summary () at modes.c:136 #7 0x08052d47 in main (argc=6, argv=0xbf864484) at main.c:462 --- The line number (linenum) get passed to some functions. While monitoring linenum I saw that after the line "last message repeated..." linenum brake out of its "*linenum += 1;", i.e.: First run: [...] linenum: 3178 linenum: 3179 linenum: 9323924 [...] Second run: [...] linenum: 3178 linenum: 3179 linenum: -1074132267 [...] I can't figure out, how this happens - and why it don't happened with -O1. I attached a small /var/log/messages for testing purposes.
Created attachment 172641 [details] small /var/log/messages for testing purposes
Far more probable than a compiler bug is just a buggy program. If the program triggers undefined behavior somewhere, different optimization levels can change how the bug manifests. So, please start with running the program under valgrind, or some other memory management debugger (ElectricFence, etc.), check for warnings, try to do binary search in which source file the -O1 vs. -O2 matters, try additional options like -O2 -fno-strict-aliasing in case e.g. the program would violate aliasing rules.
You are right. There was a bug...