Bug 218249 - [msgget] RHEL5 LSB failures analysis
[msgget] RHEL5 LSB failures analysis
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Brian Maly
Brian Brock
:
: 222845 (view as bug list)
Depends On:
Blocks: 218245 425461
  Show dependency treegraph
 
Reported: 2006-12-03 23:56 EST by Lawrence Lim
Modified: 2014-03-25 20:54 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-08 13:22:56 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
A more basic test to exhibit this problem (737 bytes, text/x-csrc)
2007-01-11 12:51 EST, Brian Maly
no flags Details

  None (edit)
Description Lawrence Lim 2006-12-03 23:56:09 EST
Description of problem:
As discussed earlier, I have break down the failures further, so that I can
reduce the time for you. Please let me know if it is a Test Suite issue. Once
confirmed, I will bring your analysis to FSG for a TSD (Test Suite Deficiency)
status so that we can move ahead with the certification. 

Thanks.

Version-Release number of selected component (if applicable):
glibc-2.5-5

How reproducible:
Always

Steps to Reproduce:
hostname: phantom.brisbane.redhat.com
User: root
Passwd: qa4i18n

User: vsx0
Passwd: vsx0VSX0

Source:
tset/LSB.os/ipc/msgget/msgget.c
tset/PTHR.os/procenv/ttyname_r/ttyname_r.c

Recompile:
    tcc -p -b -s $HOME/scen.bld $*

Execute:
    tcc -e -l /tset/POSIX.os/files/stat_X/T.stat_X

    tcc -e -l /tset/POSIX.os/files/stat_X/T.stat_X{1}
    (one test case only)
  
Actual results:
/tset/LSB.os/ipc/msgget/T.msgget 2 FAIL
/tset/LSB.os/ipc/msgget/T.msgget 3 FAIL
/tset/PTHR.os/procenv/ttyname_r/T.ttyname_r 2 FAIL

Expected results:


Additional info:
Comment 1 Jakub Jelinek 2006-12-04 06:31:38 EST
Simplified testcase:
#include <sys/ipc.h>
#include <sys/msg.h>
#include <time.h>
#include <stdio.h>

int
main (void)
{
  for (;;)
    {
      int msgqid;
      struct msqid_ds buf;
      time_t time_before, time_after;

      time_before = time ((time_t *) 0);

      if ((msgqid = msgget (IPC_PRIVATE, 0666)) == -1)
        {
          perror ("msgget");
          return 1;
        }

      time_after = time ((time_t *) 0);

      if (msgctl (msgqid, IPC_STAT, &buf) == -1)
        {
          perror ("msgctl");
          return 1;
        }

      if (buf.msg_ctime < time_before || buf.msg_ctime > time_after)
        {
          fprintf (stderr, "msg_ctime %ld time_before %ld time_after %ld\n",
                   buf.msg_ctime, time_before, time_after);
          return 1;
        }
      msgctl (msgqid, IPC_RMID, NULL);
    }
  return 0;
}

This fails fairly quickly on the above mentioned host with
2.6.18-1.2747.el5xen kernel, i686 arch.  I couldn't reproduce this on
FC6 x86-64, most likely because time(3) on x86-64 doesn't involve a syscall.

POSIX says here:
http://www.opengroup.org/onlinepubs/009695399/functions/msgget.html
* msg_ctime shall be set equal to the current time.
Whether the above is a POSIX violation or just too strict test depends on
the precision the standard mandates for the "current time".

q_ctime on msg creation (which is copied to msg_ctime on IPC_STAT) is initialized
(together with plenty of other places in the kernel) with:
static inline unsigned long get_seconds(void)
{
        return xtime.tv_sec;
}
sys_time on the other side uses do_gettimeofday, which:
void do_gettimeofday (struct timeval *tv)
{
        unsigned long seq, nsec, usec, sec, offset;
        do {
                seq = read_seqbegin(&xtime_lock);
                offset = time_interpolator_get_offset();
                sec = xtime.tv_sec;
                nsec = xtime.tv_nsec;
        } while (unlikely(read_seqretry(&xtime_lock, seq)));

        usec = (nsec + offset) / 1000;

        while (unlikely(usec >= USEC_PER_SEC)) {
                usec -= USEC_PER_SEC;
                ++sec;
        }

        tv->tv_sec = sec;
        tv->tv_usec = usec;
}
which explains why what time returns can be 1 bigger than what is assigned
by subsequent get_seconds.

ttyname_r failure sounds like a glibc bug, guess we want to clone this.
Comment 2 Lawrence Lim 2006-12-04 08:30:41 EST
Please refer to Release Criteria Secion 15.C

15. RHEL Certifications-> C. LSB 3.1 Certification

<http://intranet.corp.redhat.com/ic/intranet/RHEL500ReleaseCriteria#RHELCertifications>
Comment 8 Peter Martuccelli 2006-12-15 14:21:19 EST
Brain lets get the msgget test running on some test systems here in Westford,
see what archs reproduce the problem, and what clocks are being used.  Lets also
check to see if upstream fails the same way.
Comment 9 Brian Maly 2006-12-20 14:38:55 EST
So FC6 i686 kernel seems to work fine (tested on 2 different systems) with the
test code in Comment #1. I wasnt able to test the Xen kernel because both FC6
systems fail to boot the Xen kernel (which is a much bigger problem than this
issue).

Can anyone give me access to a system that runs a Xen kernel so I can do some
testing? If not, Ill follow up with some people in the test group and try and
repo the issue...
Comment 10 Lawrence Lim 2006-12-20 18:11:15 EST
Steps to Reproduce:
hostname: phantom.brisbane.redhat.com
User: root
Passwd: qa4i18n

User: vsx0
Passwd: vsx0VSX0

# uname -r
2.6.18-1.2747.el5xen

Will the above kernel version help? From the log seen so far, this bug seems to
be i386 specific.
Comment 11 Irina Boverman 2007-01-02 17:16:38 EST
Any progress on this issue?
Comment 12 Brian Maly 2007-01-11 12:51:20 EST
Created attachment 145372 [details]
A more basic test to exhibit this problem

Heres an even more basic test... It breaks (eventually) on x86_64 as well as
i386. So its a general linux issue, and not just arch specific (since each arch
has its own independent timekeeping code).
Comment 13 Brian Maly 2007-01-11 15:20:11 EST
Some more info (and a correction on Comment #12). The test in Comment #12 seems
to work fine on i386 (i686), but fails in only a few seconds on x86_64. Further,
when on x86_64, if the test is compiled with the -m32 option, the test does not
fail. Very strange...
Comment 14 Ulrich Drepper 2007-01-11 15:28:31 EST
Not too surprising.  x86-64 in 64-bit mode uses the userlevel gettimeofday
implementation.  No kernel entry needed.  It's supposed to provide a correct
value and maybe this is true and the problem is made easily visible since the
call is so far.  But of course it is also possible that there is a problem in
the userlevel code (provided by the kernel, in case this is not clear).
Comment 16 Brian Maly 2007-01-11 17:24:57 EST
So more info from running the test in Comment #12....

The test fails when timesource = PIT/TSC. If PMTimer is selected, the test does
not fail. So there is at leas a potential workaround here... In any case, we are
currently running this test in testgrid and will see if this test works with
other timesources (like HPET). 
Comment 17 Thomas Gleixner 2007-01-15 12:36:00 EST
Is there a difference between Intel and AMD systems ? Does this happen on both
UP and SMP ? 


Comment 18 Brian Maly 2007-01-16 13:19:13 EST
Re Comment #17, the problem occurs on both AMD and Intel (its a generic problem
if you use PIT/TSC). I will test with UP/SMP and see if there is a difference.

Brian
Comment 19 Lawrence Lim 2007-01-17 18:07:26 EST
*** Bug 222845 has been marked as a duplicate of this bug. ***
Comment 24 RHEL Product and Program Management 2007-09-07 15:58:30 EDT
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.
Comment 26 RHEL Product and Program Management 2007-11-08 13:22:56 EST
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request. 

Note You need to log in before you can comment on or make changes to this bug.