Large report transactions - multi-threading

version 10.2B
Linux/4GL

In dealing with a very large report with bulky transactions, what will be the best approach to generate this report?

My thought is,

1) run program-0 to calculate the number of transactions and evenly distribute them into 10 threads
2) run program-1 ~ 10 simultaneously to generate 10 reports
3) run program-11 to merge the 10 reports into a single large report.

The problem of this approach is that it inevitably involves a lot of file I/Os and unix commands. I wonder if there's a better and more stable way to implement this.


Thanks in advance.
 

FrancoisL

Member
Personnally i would use the App Server to generate your data . You can make multiple Asynchronous calls and concat the data when each of the asynchronous call completes and calls the call back. Once every call has been completed you can use the generate data and print your report.
 

TomBascom

Curmudgeon
The overall approach is very workable.

Often there are fairly easy ways to split up the work that don't require any special calculations up front. Date ranges, order# modulo 10 etc... you just have to be willing to accept "good enough" load balancing vs. "perfect" load balancing.

If the final merge is a simple concatenation it can be awfully efficient.
 
Personnally i would use the App Server to generate your data . You can make multiple Asynchronous calls and concat the data when each of the asynchronous call completes and calls the call back. Once every call has been completed you can use the generate data and print your report.

We don't have any app server set up yet, everything is running in an old fashioned 4GL way. :eek:
 
The overall approach is very workable.

Often there are fairly easy ways to split up the work that don't require any special calculations up front. Date ranges, order# modulo 10 etc... you just have to be willing to accept "good enough" load balancing vs. "perfect" load balancing.

If the final merge is a simple concatenation it can be awfully efficient.

Thanks.

The problem of this approach is rather the stability. I need to create first a large text file listing information like "thread number, account number, transaction number". Then the 10 programs will read this info file at the same time to get info of their own transactions to process. I wonder if there's a more convenient way that, say, a temp-table can be set to persistent in memory to allow the 10 threaded programs to access instead of a text file.

Similarly, if a temp-table or the like can be passed from program-0 through program-11, then I don't even need to create the 10 files for concatenation, I can create all the records in this sharable temp-table then output only the final large report. Thus all the file I/Os and unix commands can be avoided.
 

TomBascom

Curmudgeon
You cannot share temp-tables between sessions.

You would need to use a db table.

Obviously I don't know what your situation is but it sounds to me like you are over complicating it by requiring a certain structure.

Why do you need to pre-allocate account number and transaction number to certain threads? What benefit do you think that you get from that?

I also think that you are probably overly concerned about file I/O and unix commands. If you are streaming your results to 10 output files your total I/O isn't going to be any different until the final concatenation step. Concatenating files is something that UNIX can do quite efficiently.

Premature optimization is the root of much architectural evil.
 
You cannot share temp-tables between sessions.

You would need to use a db table.

Obviously I don't know what your situation is but it sounds to me like you are over complicating it by requiring a certain structure.

Why do you need to pre-allocate account number and transaction number to certain threads? What benefit do you think that you get from that?

I also think that you are probably overly concerned about file I/O and unix commands. If you are streaming your results to 10 output files your total I/O isn't going to be any different until the final concatenation step. Concatenating files is something that UNIX can do quite efficiently.

Premature optimization is the root of much architectural evil.

I am actually dealing with an existing report. The initial selection is rather complicated and is defined by input parameters entered by users. That's why the account and transaction numbers are required for each thread to choose the transactions to process. Worse still, in order to build this info file, intermediate/temp text files are created first for the different selection criteria, that is, records are "joined" together in a temp file first, then the temp file sorted and read in to add certain flags to instruct in what way the records shall be process for the final result. Then another temp file is created to store "account number, transaction number, flags", only then the final info file is constructed with "thread number" added.

Account number is included as I think that the original programmer didn't want transactions belong to a certain account to be splited into different threads, He wanted them to be handled by one thread only such that account information don't need to be processed many times.

At the time (years ago) when the program was written, disk space is an issue, unix commands are used to compress the files during the process. So unexpected problems occurred, sometimes one or two of the temp files are missing. Sometimes the unix compression failed that the process thus can't continue. Sometimes, strange errors occurred such as "error: entry XXX cannot be found" during temp file processing... That's why I consider it's unstable and look for if a better way exist to get the problems resolved.
 

tamhas

ProgressTalk.com Sponsor
Really, there is no single answer to your question. We would need to know a lot more about the specific problem you are trying to solve and what you have done to measure what is slow. I.e., if the limiting factor in performance is disk IO and the table being operated on is all on one disk, then spawning 10 parallel processes may do nothing for your speed. Moreover, the speed issue may have nothing to do with the scope of the problem and everything to do with the strategy being used in the current program. For example, one technique that can often help a lot in complex reporting is to read the data in the way which is the most efficient for reading, extract the needed values, and put them into a temp-table and use the temp-table to produce the actual report. This can be amazingly more efficient than some complex navigation of the data to fetch the data in the order needed for the report.
 

TomBascom

Curmudgeon
So you are modifying an existing multi-threaded process rather than considering creating a new one.

From your description I suspect that you'd probably be better off scrapping the beast that you currently have and starting over with a clean design.
 
So you are modifying an existing multi-threaded process rather than considering creating a new one.

From your description I suspect that you'd probably be better off scrapping the beast that you currently have and starting over with a clean design.

Yep, I think so. I am trying to use temp-table as much as possible instead of the temp-files. And when it is necessary, I will try to use xml files as much as possible as it seems that xml files are handled perfectly by Progress itself.

Thanks very much everyone. I learned alot here.
 
Top