How to load Data Files(*.df) to new database with update tables

slh

Member
Hi guys,

I tried loading data to new database using (*.df) but I realised the data in existing table(selected table only) needs to be deleted before I can load them.

I would like to find out is there any increamental loading for DATA into tables, so that only new data would be APPEND into it (which I think is more efficient) instead of delete the whole table and re-import whole new chunk of data all over again.

Regards
 

Casper

ProgressTalk.com Moderator
Staff member
First of all a df file is normally a database definition file, which is not the same as data. But since you explicitly say data in the next part I have the idea that you try to do it with a .d (data file).

In order to only append, you can write an import procedure. If you want the tables to be synchronised then you need to take into acount that existing data can also have been changed.
What is de reason you want to do this?

Regards,
Casper.
 

slh

Member
icon1.gif
<H2 class="posttitle icon">Re: How to load Data Files(*.df) to new database with update tables
First of all a df file is normally a database definition file, which is not the same as data. But since you explicitly say data in the next part I have the idea that you try to do it with a .d (data file).

In order to only append, you can write an import procedure. If you want the tables to be synchronised then you need to take into acount that existing data can also have been changed.
What is de reason you want to do this?

Regards,
Casper.​
</H2>Dear Casper,

Thanks for rectifying, it is indeed data (*.df) file. We have 2 database (LIVE, Test). Some setup data has been input into a table in TEST database> verify ok and ready to append into LIVE database.

Regards
 

TomBascom

Curmudgeon
You do not seem to have understood casper.

A ".df" file is data definitions. This contains the definitions of tables, indexes and fields. Not the data inside of them. Typically you would load a .df file in order to create a new table or to add fields and indexes to an existing table.

You create ".df" files by dumping definitions via the data dictionary. (Admin -> Dump Data and Definitions -> Data Definitions) FWIW You can load them by choosing the corresponding load options.

(All of these things can also be automated through scripting but that's a topic for another thread.)

The situation that you describe is one where you would be working with ".d" files. These files contain data that has been exported from one database which you then want to import into another database.

By it's very nature the load of a ".d" file may do what you want -- it will append new data. It will throw an error for any record that already exists.

So the trick is to only export new data. If you can do that then the standard load process will just work. To do that you need some way to identify "new" data. There is no generic way to do that at an arbitrary point in time -- only someone who understands your specific application and its particular architecture can tell you that.

What you want to look for is key fields that you can use as guideposts -- "date created" fields are very helpful. Monotonically increasing integer id fields can be quite useful too.

Having said that there are a few additional wrinkles and considerations:

1) If you also need updates to old records it gets a lot more complicated. You not only have to identify changed records but you can no longer use the standard load process. You will have to create your own that updates existing records rather than erroring out on them.

2) You could also consider using replication triggers -- this is, however, something that you have to setup at the start (rather than at an arbitrary point in time) and it involves a lot of coding and complexity and overhead.

3) If most of the data in your test db is supposed to be the same as production and you really only need to preserve a few things for testing (perhaps there are config settings that prevent test transactions from being accidentally treated as "the real thing" for 3rd party interfaces... maybe you have all printouts go to a null printer and so forth) then it might be a lot easier to save that special config data and re-apply it to a restored backup.

4) If the data being replaced is focused on just a few tables put them in dedicated storage areas and use "truncate area" to very quickly delete that data before loading the live data.
 

tamhas

ProgressTalk.com Sponsor
I wonder if it is possible that the OP does actually have a .df and the "data" are actually schema changes. In which case we need to talk about incremental dfs, but it would help to know for sure what the question was, first!
 

slh

Member
You do not seem to have understood casper.

A ".df" file is data definitions. This contains the definitions of tables, indexes and fields. Not the data inside of them. Typically you would load a .df file in order to create a new table or to add fields and indexes to an existing table.

You create ".df" files by dumping definitions via the data dictionary. (Admin -> Dump Data and Definitions -> Data Definitions) FWIW You can load them by choosing the corresponding load options.

(All of these things can also be automated through scripting but that's a topic for another thread.)

The situation that you describe is one where you would be working with ".d" files. These files contain data that has been exported from one database which you then want to import into another database.

By it's very nature the load of a ".d" file may do what you want -- it will append new data. It will throw an error for any record that already exists.

So the trick is to only export new data. If you can do that then the standard load process will just work. To do that you need some way to identify "new" data. There is no generic way to do that at an arbitrary point in time -- only someone who understands your specific application and its particular architecture can tell you that.

What you want to look for is key fields that you can use as guideposts -- "date created" fields are very helpful. Monotonically increasing integer id fields can be quite useful too.

Having said that there are a few additional wrinkles and considerations:

1) If you also need updates to old records it gets a lot more complicated. You not only have to identify changed records but you can no longer use the standard load process. You will have to create your own that updates existing records rather than erroring out on them.

2) You could also consider using replication triggers -- this is, however, something that you have to setup at the start (rather than at an arbitrary point in time) and it involves a lot of coding and complexity and overhead.

3) If most of the data in your test db is supposed to be the same as production and you really only need to preserve a few things for testing (perhaps there are config settings that prevent test transactions from being accidentally treated as "the real thing" for 3rd party interfaces... maybe you have all printouts go to a null printer and so forth) then it might be a lot easier to save that special config data and re-apply it to a restored backup.

4) If the data being replaced is focused on just a few tables put them in dedicated storage areas and use "truncate area" to very quickly delete that data before loading the live data.

Dear All,

Am terribly sorry for my 2 mistake, it is *.d file.

Just to add on, when I re-dump second time similar table, similar amount of data, it saves as *.e file instead or it is for some other purpose?

Regards
 

slh

Member
4) If the data being replaced is focused on just a few tables put them in dedicated storage areas and use "truncate area" to very quickly delete that data before loading the live data.

Hi Tom,

Many thanks, appreciate if I could know my details on your point 4- truncated area, does it mean that Progress will explicitly deletes the destinated tableS before replacing them with new load *.d ?

Regards
 

tamhas

ProgressTalk.com Sponsor
The *.e file is errors, expected in your case when any the new dump includes records which are already there. Note, that to have any hope of updating, you will need a custom program to do the load.
 

TomBascom

Curmudgeon
When you truncate a storage area all objects (tables, indexes, LOBs) in that storage area are instantly deleted. Lookup "proutil dbname -C truncate area".

You cannot truncate specific tables -- unless they happen to be the only table in the area.

You shouldn't be getting .e files on a dump. Those would be generated by loading data and having conflicts with existing data.
 

slh

Member
Dear All,

Thanks for the help.

@ TomBascom,

- Truncate in the "Area" meaning Database?, So we do truncating base on 2 scenerios, when tables is no longer needed AND/OR we need to delete multiple table at a go?

- Only Table in the Area = Only Table in the Database? OR Only Table in the Area = Only Table in the Area as specific by dba ?

- Yup the area gotten from loading *.d file ;)
 

TomBascom

Curmudgeon
Starting with version 9 Progress organized the database into "storage areas". A storage area can contain 1 or more "storage objects" (tables, indexes or LOBs). Each storage area can have an independent "rows per block" setting. Typically you group tables into storage areas based on some technical characteristics such as average row size, table size or frequency of access and then set RPB accordingly. You then split indexes apart from tables and put the indexes in storage areas with rows per block set to 1. Starting with OpenEdge 10 each storage area can also have an independent "cluster size". Cluster sizes of 8, 64 and 512 indicate "type 2" storage areas. Type 2 areas also have the property that their data blocks are "asocial" or "homogenous"; that is they contain data from just one table (unlike type 1 areas where data from many tables can be mixed in each data block).

Storage areas are a very handy tool for managing your database and its performance.

By default all storage objects are stored in the "schema area". By design the schema area is a type 1 area with a fixed rows per block of 32 (or 64 if 8k blocks are being used for the db). This is a bad thing. Progress themselves recommend that you not store anything but the schema in the schema area.

So, no, "truncate area" does not refer to the whole database but rather a subset of it that corresponds to a specific storage area.
 

slh

Member
Hi Tom,

I feel enlighten already, thanks. Where can I read more of this?

In any way, I can find out which tables belong to the 'storage area' ?

Regards
 

slh

Member
Hi everyone,

I do have a questions, if I made some changes(Eg: Add field, expand field size) in my local db tableS and I would like to export this 'data definitions' to another developer so that it can be load unto their local db, how should I proceed ?

Regards
 

slh

Member
Hi everyone,

1. I do have a questions, if I made some changes(Eg: Add field, expand field size) in my local db tableS and I would like to export this 'data definitions' to another developer so that it can be load unto their local db, how should I proceed ?

2. Do I need to shutdown db when I do a load of *.df/*.d file? how would be the impact?

Regards
Regards
 

TomBascom

Curmudgeon
To get a mapping of tables to storage areas:

Code:
for each _file no-lock where _hidden = no:

  find _storageobject no-lock                                    /* we need to find the *current*                             */
    where _storageobject._db-recid      = _file._db-recid        /* storage area -- not the *initial*                         */    
      and _storageobject._object-type   = 1                      /* storage area that holds the table                         */  
      and _storageobject._object-number = _file._file-num.       /* ( _file._ianum = initial area, which is wrong!)           */  

  find _area no-lock where _area._area-number = _storageobject._area-number.
  find _areastatus no-lock where _areastatus._areastatus-areanum = _storageobject._area-number.

  display
    _file._file-name
    _area._area-num
    _area._area-name
    _area._area-type
  .

end.
 

TomBascom

Curmudgeon
You can generate an "incremental df file" by connecting to old and new databases. How you go about versioning old databases is an exercise in creating your local development policy ;)

You are not required to shutdown to dump definitions nor to dump data or load data.

But dumping/loading data while other people are actively changing those tables might be frustrating. It depends on what you are doing.

Depending on your Progress version you might be able to load a .df file online. Whether or not that is a good idea depends on your situation.
 

slh

Member
for each _file no-lock where _hidden = no: find _storageobject no-lock /* we need to find the *current* */ where _storageobject._db-recid = _file._db-recid /* storage area -- not the *initial* */ and _storageobject._object-type = 1 /* storage area that holds the table */ and _storageobject._object-number = _file._file-num. /* ( _file._ianum = initial area, which is wrong!) */ find _area no-lock where _area._area-number = _storageobject._area-number. find _areastatus no-lock where _areastatus._areastatus-areanum = _storageobject._area-number. display _file._file-name _area._area-num _area._area-name _area._area-type .
end.

Dear Tom,

Thanks so do I save this file as *.w or *.p ?

Regards
 

rzr

Member
save a copy with .p and another one with .w. Now open the .w with AppBuilder - you will have your answer :) .
 

RealHeavyDude

Well-Known Member
The difference in the extension has no meaning for the compiler. It is just a means for the developer to distinguish between GUI windows (.w) and procedures (.p) be they structured or not. You can save a window with the extension .p and a procedure with the extension .w and the compiler won't give a damn.

Heavy Regards, RealHeavyDude.
 

slh

Member
Dear Guys,

Thanks for all help. I can view the data but is there anyway to extract it out eg: *.txt, *.csv ?
 
Top