data load

ADBA · Dec 14, 2005

I'm new to progress, so please forgive me if my question is unclear?

We're loading three txt files into progress database and with the validation piece it takes 4-5 hours to delete the data and about 30 hours to insert. We have three files: one of them is really big - 6 mln records. I assume that program loads data record by record using progress .p program. Is there any progress util that can perform import/export faster. Thx

I may not be correct in all the terms

TomBascom · Dec 14, 2005

There are lots of ways to speed this sort of thing up. Offhand:

1) Break it into chunks and load multiple threads simultaneously.

2) Within a thread process more than 1 record per transaction. 100 is generally accepted as a good number.

3) Disable indexes and do an index rebuild after the load.

4) If you're feeling brave (and have the capability to restore from backup quickly and easily) run the process with no-integrity.

5) Leverage database features such as block size storage areas, rows per block, blocks per cluster, bi cluster size and so forth.

6) Tune your database.

7) Make sure that your hardware is being used effectively.

8) If the delete phase involves deleting the whole table consider using "truncate area" to quickly accomplish that task.

it@flude.co.uk · Dec 19, 2005

We used to load different consignment files up every evening, from various sources, convert them into a single format and write them to a consignment table.

We found it was taking forever to load large files, so we did some of what was mentioned above.

We loaded the data into an empty database (with table structure, no records), which we cleared every evening by deleting the database and copying a master database on top of it.

On the load, we ran the database as single use (-1), no integrity (-i) option, on a local hard disk. It didn't matter if the database was trashed, we could simply copy the master database over it and try again.

We only displayed the progress of the import in multiples of 100 records or so, which sped up the screen refreshes. You need some display so you know it hasn't stopped.

We tended to load the data into a single table, with just a sequential record count index and a lot of data fields. Then we verified the structure of the record was OK before writing it to another table.

We didn't use the TRANSACTION keyword, as it slowed things down and we didn't want it all backed out if the no integrity load failed.

Finally, we connected to this database from our main system, went through the imported records and processed them.

We found that deleting the entire temporary database was a lot faster than deleting records within the database. If we needed to keep a record of the data imported, we simply renamed the database and kept it.

TomBascom · Dec 19, 2005

it@flude.co.uk said:
On the load, we ran the database as single use (-1), no integrity (-i) option, on a local hard disk. It didn't matter if the database was trashed, we could simply copy the master database over it and try again.

Sometimes single user is better. Sometimes multi-user is. It depends a lot on what you have available for hardware resources and what version of Progress you are running. On reasonably modern hardware and with v9 or better Progress I would default to multi-user because v9 introduces the concurrent commit lock protocol which greatly improves parallel load speeds.

We only displayed the progress of the import in multiples of 100 records or so, which sped up the screen refreshes. You need some display so you know it hasn't stopped.

Very good point -- simply displaying status messages with every record takes up an amazing amount if time.

We didn't use the TRANSACTION keyword, as it slowed things down and we didn't want it all backed out if the no integrity load failed.

Not using the TRANSACTION keyword simply effects the scope of transactions -- they are still going to occur. If a no-integrity load fails for any reason you're going to have to start over completely.

The key thing about transactions and big load processes is to not do one per record. Nor should you do one giant transaction. You want to do transactions which are large enough groups of records to minimize IO to the bi file. To do that you want to use code like:

Code:

REPEAT TRANSACTION:

    DO i = 1 TO 100:

       CREATE table-name.
       IMPORT table-name.

    END.

END.

data load

ADBA

New Member

TomBascom

Curmudgeon

it@flude.co.uk

Member

TomBascom

Curmudgeon

Similar threads