Bug 780729 (SOA-3184)

Summary: File Gateway is unreliable under load
Product: [JBoss] JBoss Enterprise SOA Platform 5 Reporter: Rick Wagner <rwagner>
Component: DistributionAssignee: Default User <jbpapp-maint>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 5.1.0 GACC: kevin.conner, rwagner
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: http://jira.jboss.org/jira/browse/SOA-3184
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
SOA-P 5.1.0, using a File Gateway
Last Closed: 2011-07-19 19:39:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Rick Wagner 2011-07-18 19:17:03 UTC
Help Desk Ticket Reference: https://c.na7.visual.force.com/apex/Case_View?id=500A0000007pDhG&sfdc.override=1
Steps to Reproduce: 
The condition can be duplicated easily by cron'ing a script similar to the one seen below and using the helloworld_file_action quickstart.  (Start the ESB first, deploy something like the file gateway quickstart.  Use the script below to drop 600 small files each minute into the directory being watched.  The script will also grep the log for lines in the log as each file is processed.  You should see inconsistent line counts-- usually the number will be right for 600 files with 2 records, but once every 50 files or so you'll lose a whole file or at least a record from a file being processed.)

Further investigation will reveal:
- Sometimes an entire file just doesn't make it to the directory that holds the files after they are processed.
- Sometimes the file is processed for the first record, but the second isn't in the log.  In this case, the file will be in the 'processed' directory with the rest of the properly processed files.  

There were no suspicious log entries noted around the time of trouble.

Note:  Volume used for this test is identical to the customer's environment, 600 files per minute, 2 records per file.

---------------------------------------------------------------------------------------------------------------------------------------------

#!/bin/bash

DIR="/home/jboss/SOA/jboss-soa-p-5.1.0/jboss-as/samples/quickstarts/helloworld_file_action_Volume/Files"
date=$(date +'%Y-%m-%d %H:%M:%S')
read Y M D h m s <<< ${date//[-: ]/ }

kount=600
#for i in {0..599..1}
while [ $kount -gt 0 ];
  do
     FILENAME="$kount.$m$s.dat"
     FILEWITHPATH=$DIR/In/$FILENAME
     echo "$FILENAME Line1" >> "$FILEWITHPATH"
     echo "$FILENAME Line2" >> "$FILEWITHPATH"
     kount=$((kount-1))
 done

sleep 40

LOG="/home/jboss/SOA/jboss-soa-p-5.1.0/jboss-as/bin/nohup.out"
NUMLINES_IN_LOG=`grep $m$s $LOG | grep Line | grep -v INFO | wc -l`

echo `date` $NUMLINES_IN_LOG >> $DIR/results.txt

project_key: SOA

The File Gateway occasionally loses a file (and also can lose just records from a file) under high file-count (but low record-count) conditions.

Comment 1 Kevin Conner 2011-07-19 09:31:23 UTC
Can you please enable DEBUG logging and attach the server.log to this issue.


Comment 2 Kevin Conner 2011-07-19 09:37:50 UTC
Unfortunately there is a bug is the script, which is likely to be exacerbated on a SAN.

The missing 'Line 2' entries are more likely a cause of interleaving, i.e. consuming the file before the Line2 has been written.

Comment 3 Kevin Conner 2011-07-19 09:43:23 UTC
Change the loop to the following and retest

while [ $kount -gt 0 ];
do
FILENAME="$kount.$m$s.dat"
FILEWITHPATH=$DIR/In/$FILENAME

FILEWITHPATH_CREATE=$FILEWITHPATH.create

echo "$FILENAME Line1" >> "$FILEWITHPATH_CREATE"
echo "$FILENAME Line2" >> "$FILEWITHPATH_CREATE"

mv "$FILEWITHPATH_CREATE" "$FILEWITHPATH"

kount=$((kount-1))
done


Comment 4 Kevin Conner 2011-07-19 10:54:08 UTC
I should add that the race may also be a possible explanation for the missing file.

Comment 5 Rick Wagner 2011-07-19 19:38:46 UTC
Thanks, Kevin.  You're right, my script was bad and did lead to a race condition in consumption. (I *knew* there was a bug around here somewhere!), I was able to run for several hours without a dropped file or record.  I'll close this JIRA and will see if I can get the customer to agree.