Question Long Transaction Problem

Hello everyone, i am stuck in a problem of long transaction:

For interfacing a fie (with comma separated delimiter) we were fetching each line one by one (in a repeat loop) and storing the variables (fetched from file) into database table (with create statement) in a different .p program (.p program is being called inside repeat loop). Interfacing file is having around 4500 records (4500 lines) that we are going to interface and store into database table. Time taken for this process is around 7-9 hours (Does interfacing around 4500 records in separate transactions could cause the interfacing time as 7-9 hours or there could be anything else that I need to consider).

As per my understanding, repeat loop is causing problem there because it treats each line of interfacing file as a single transaction. Apparently for around n number for records (around 4500 in our case) we have n number of transactions and that makes this whole process very slow.

Perhaps solution of this problem is: Either can replace REPEAT loop with DO WHILE TRUE loop so that all the records are under one transaction but this could cause .BI, .AI or lock table problem or we can use DO TRANSACTION block around create statement (where we are creating DB records).

Please refer these programs as an overview of the problem:


Code:
  INPUT FROM VALUE (something).
       REPEAT:  /* 1. Should I replace this with DO WHILE TRUE*/
             IMPORT DELIMETER “,” /*import around 4500 lines records*/
                      a
                      b
                      c
                      d.
/* if variables (a,b,c,d) contains value zero or anything unwanted then we store them into temp files and send them towards exception report*/
        RUN abc.p(…)  /*passing all variable along with many other values*/
  END. /*repeat end*/
  INPUT CLOSE.

abc.p
Code:
/*do some checking stuff here*/
/*should I apply a DO TRANSACTION statement around create statement*/
CREATE c-data.
ASSIGN
  /*2. Assigning all input parameters to table c-data*/

Using DO WHILE TRUE LOOP seems good to me (please suggest), but if I do that then do I need to increase the space for lock table entries, .AI and .BI files or I could save myself by dividing the transaction into four parts (for ex: thousand records per transaction) by using a counter variable.

Please suggest!

Thanks & Regards!
Rajat.
 

TheMadDBA

Active Member
I would be shocked beyond belief to find that your performance problem was the transaction scope. What else is your code doing besides parsing an input file and creating records in the database?

Early candidate has to be record reads of some kind.
 
Thanks for replying MadDBA!

The main objective of the program is interfacing file and apart from parsing an input file and creating records in the database, i didn't find anything specific (i will look into that again). Writing data in a single transaction vs Writing the same data in n number of transactions (4500 transactions), isn't it make any difference (i think it does, please suggest)?

Sorry i didn't understand your statement that "Early candidate has to be record reads of some kind", could you please elaborate more on this.

Thanks & Regards!
Rajat.
 

TheMadDBA

Active Member
Yes there is a performance difference between single transactions and batches of transactions. You will not save 9 hours on 4500 transactions though. More like a few seconds.

Look at other things in your code like FOR EACH/FIND etc where you are looking up records. That is much more likely to cause the extra hours of processing time.
 

TomBascom

Curmudgeon
Importing 4,500 records as described should happen in the blink of an eye.

The problem is most assuredly in the bits of code that you have waved your hands over. Either stuff that is completely missing or "if variables (a,b,c,d) contains value zero or anything unwanted then we store them into temp files and send them towards exception report", "do some checking stuff here" or "Assigning all input parameters to table c-data" and the ilk.
 
MadDBA,
As i could remember, there was one FIND before run abc.p statement for updating the counter of table c-data (same table used to store records in abc.p). it looks like:
Code:
FIND c-data WHERE c-data.num = i-num no-lock. /*i-num coming from file*/
IF AVAILABLE c-data then 
i-counter = i-counter + 1.
else
i-counter = 0.
This could make any difference?:(

Tom,
That was basic code which includes streams, temp-tables and few variables, i don't understand how does it make so much difference? Either i should put ETIME everywhere and try to count the time consumption or what?

Please suggest.

Thanks & Regards!
Rajat.
 

TomBascom

Curmudgeon
Any time that you are wondering what takes up time in a program you should enable and use profiling.

Sprinkling ETIME and MESSAGE statements around is like brushing your teeth with a brick.
 

TomBascom

Curmudgeon
If c-data.num is not the leading component on an index then you are doing a table scan. That will certainly slow things down although C-DATA would need to be a pretty big table for 4,500 table scans to take 9 hours.
 

TomBascom

Curmudgeon
FWIW the basic "load in chunks" algorithm goes something like this:
Code:
/* load.p
*/

define variable i as integer no-undo.
define variable j as integer no-undo.

input from "customer.d".

outerLoop: do while true transaction:

  j = j + 1.

  repeat:

    create customer.
    import customer.

    i = i + 1.
    if i modulo 10 = 0 then next outerLoop.  /* commit a group of records */

  end.

  leave outerLoop.

end.

input close.

message i "records loaded in" j "chunks".
 
Thanks for replying Tom,

Sorry i didn't understand that c-data.num should be "leading component on an index" does leading component means primary index or what?. If it is not then probably i need to add that field into indexing structure?

Please suggest.

Thanks & Regards!
Rajat.
 

TomBascom

Curmudgeon
"Leading component" means the first field listed in the index definition. If the field is the first component of any index then that field can be used with the index to satisfy the WHERE clause without doing a table scan ("whole index").

Using a phone book for an example -- listings are usually printed lastname, firstname, phone#. Lastname is the leading component. Firstname is the 2nd component. If my WHERE clause has lastname in it then I can quickly open the phone book to the right section. But if my WHERE clause only has firstname I get no help at all from the lastname component and I must read the entire phone book and get a list of all matches on firstname (a "table scan"). If I have to do that 4,500 times it will probably take a while.

It does not matter if the index is "primary" or not.
 

TheMadDBA

Active Member
If you are not comfortable enough to use the profiler (which is actually pretty easy)... you could at least put timings before and after the find statement and before and after the run abc.p statement and run only a few records.

Like Tom said the FIND could be slowing it down if is doing a full scan (WHOLE-INDEX), especially if you are running this client-server (using -H and -S to connect). Or abc.p could be taking all of the time. Profiler will show you exactly which lines of which programs are taking the most time.
 
Tom,
I will check for c-data.num that, is it a leading index component or not but if it's not then, should we change the sequence of the indexes or list in the index definition(could we do that?) or we can use USE-INDEX or what else we can do?

MadDBA,
I will do the timing stuff (as suggested). I am not much familiar with profiler tool, is there anything that i could refer to learn profiler?

Thanks & Regards!
Rajat.
 
Last edited:

TomBascom

Curmudgeon
I'm only looking at a fraction of your code -- I don't have any idea what the full impact of adding an index would be. But within the portion of code that you have revealed so far, yes, it would make sense to add an appropriate index if one does not already exist.
 

TheMadDBA

Active Member
For the profiler you can start here: http://www.oehive.org/Profiler

USE-INDEX will not help in this case at all. It hardly ever does since it cannot change the Progress indexing rules... it would just force the database to read all of the records in the table using a different index. If one exists that is appropriate the compiler will pick that index.
 
Thanks Tom, MadDBA, for your mentorship on this!

I will check the code again and may get back to you on this.

Thanks & Regards!
Rajat.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
The profiler is a great tool, particularly for looking at where the most elapsed time is being spent in your code down to the line number. But if there are environment issues slowing you down (e.g. backup jobs, SAN mirroring, CPU starvation, etc.) then a given piece of code could have very different performance characteristics based on when and where it runs. Elasped time info is important but sometimes you need more.

If you do have table scans in your code you will be doing far too many reads on one or more tables. To see where that is happening you should also get to know the _TableStat/_IndexStat virtual system tables. They contain information about creates, reads, updates, and deletes ("CRUD") in your tables and indexes since the database started. There are also tables that track these stats on a per-user basis, for currently-connected users, called _UserTableStat and _UserIndexStat.

A simple example from a sports database:
Code:
for each _tablestat no-lock by _tablestat-read desc:
  find _file no-lock where _file._file-number = _tablestat._tablestat-id no-error.
  display
    _tablestat._tablestat-id label "#"
    _file._file-name when available _file
    _tablestat._tablestat-create
    _tablestat._tablestat-read
    _tablestat._tablestat-update
    _tablestat._tablestat-delete
  .
end.

Output:
Code:
 # File-Name           create         read    update     delete
--- -------------- ----------- ------------ --------- ----------
  1 Invoice                  0 999999999999         0          0
  3 Item                     0           23         0          0
  2 Customer                 0           18         0          0
  4 Order                    0           16         0          0
  9 Ref-Call                 0           14         0          0
  5 Order-Line               0           12         0          0
  8 Local-Default            0           11         0          0
  6 Salesrep                 0           10         0          0
  7 State                    0            9         0          0

Oops, someone's doing way too many reads on the Invoice table. This example is for table I/O but you can get similar info on accesses to index keys.

From there you may have some idea of where to look in the code for issues. If not, there are other ways to drill down like the _UserTableStat table, or using the QryInfo log entry type with the LOG-MANAGER, or using the -zqil startup parameter.

Note: by default Progress only tracks CRUD statistics for table and index numbers 1-50. To change that behaviour you have to set the -tablerangesize and -indexrangesize startup parameters. They should be set higher than your highest application table and index numbers respectively. You can get those numbers for a given DB like so:

Code:
find last _file no-lock where _file._tbl-type = "T" use-index _file-number.
find last _index no-lock where not _index._index-name begins "_" use-index _index-number.

display
  "Highest table #:" _file._file-number skip
  "Highest index #:" _index._idx-num
with no-labels.
Output:
Code:
┌───────────────────────┐
│Highest table #:    123│
│Highest index #:    456│
└───────────────────────┘
 

GregTomkins

Active Member
I tried the Profiler a couple of years ago, and I was disappointed by my inability to reconcile its numbers with reality. Specifically, it indicated some big time gaps which brick-teeth-brushing did not, and since brick-teeth-brushing is so simple, I trust it more. Given enough toothpaste, it's not so bad, but ... I guess it's time for another look at the Profiler ;)
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
I tried the Profiler a couple of years ago, and I was disappointed by my inability to reconcile its numbers with reality. Specifically, it indicated some big time gaps which brick-teeth-brushing did not, and since brick-teeth-brushing is so simple, I trust it more. Given enough toothpaste, it's not so bad, but ... I guess it's time for another look at the Profiler ;)

You may get different elapsed times with Profiler versus ETIME because the former does not include, for example, time spent on UI waits like an update statement. Apart from that though I haven't actually tried to reconcile ETIME measurements versus total Profiler elaspsed time in non-UI business logic.
 
Top