table count does not change after importing records

ohchaos

New Member
Hello all,

I need to export some old records and delete them, but want to import them later when a user wants to access those records.
so I am testing my scripts and run into an issue that I can not understand.

The SQL query or ABL query returns the total records, let's say 1000. I export 100 records to a data file and then delete them. Total record number becomes 900.
I import those 100 records to a table and can see those records in the table. But the total records number is still 900. I rebuild the index, but it is same 900.

I am using Progress Openedge 11.4 and here are my queries.

SELECT COUNT(*) FROM PUB."com"
1000

SELECT COUNT (*) FROM PUB."com" WHERE cono = 999
100

AFTER deleting cono=999

SELECT COUNT(*) FROM PUB."com"
900

SELECT COUNT (*) FROM PUB."com" WHERE cono = 999
0

AFTER importing cono=999

SELECT COUNT(*) FROM PUB."com"
900

SELECT COUNT (*) FROM PUB."com" WHERE cono = 999
100

The table, "com" has a primary key and "cono" is one of the key columns.

Again, The ABL table count returns same number and reindexing after the import does not make any difference.

Any idea of why the total records number does not go back to 1000?

Thank you for reading,
Young
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
I need to export some old records and delete them, but want to import them later when a user wants to access those records.
What is the point of deleting and then reloading the records? Why not just leave them in place? And have you thought about how you will handle schema changes to this table?

AFTER importing cono=999
How are you exporting and importing the data?

What are the indexes on com and are all of them active?

SELECT COUNT(*) FROM PUB."com"
900

The ABL table count returns same number and reindexing after the import does not make any difference.
What is your ABL code?

How are you "reindexing"?
 

TomBascom

Curmudgeon
I see a lot of SQL. My first guess is that you neglected to "commit work;". My second guess is that you are ignoring an error message that you have not seen fit to share.

But those are just guesses. Made in the dark and without adequate insight into what you are really doing. You have only shown bits and pieces of your code and a lot of important bits, like the code that deletes the data and the code that reloads the data, are missing. Without seeing the whole picture all we can do is guess.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
SELECT COUNT(*) FROM PUB."com"
900

SELECT COUNT (*) FROM PUB."com" WHERE cono = 999
100
Another possibility that these totals are both correct. For example, if you had another 100 records where cono=888 and they got deleted somehow between the start and end of your test, the total record count of 900 would be correct.

I suggest that you query for all distinct values in this field and their record counts before your test. Then when your table is in this state, run the query again and compare the results so you know what went missing.
 

ohchaos

New Member
What is the point of deleting and then reloading the records? Why not just leave them in place? And have you thought about how you will handle schema changes to this table?

The user say 'Yes' for deleting, but they can ask the data later. so I would like to dump the data and keep the file.

How are you exporting and importing the data?
I am using the code for exporting and importing the data

Here is the script to export.

OUTPUT TO /impero/scripts/export/com.data.

FOR EACH com WHERE cono=999:
EXPORT DELIMITER "~011" com.
END.

OUTPUT CLOSE.

And here is the script to import.

INPUT FROM /impero/scripts/export/com.data.
OUTPUT TO /impero/scripts/export/com.data.import.out.

FOR EACH com:
IMPORT DELIMITER "~011" com.
END.

INPUT CLOSE.



What are the indexes on com and are all of them active?

Com has one primary key index and does not have any other index.

CONSTRAINT "k-com" PRIMARY KEY ("cono","comtype","orderno","ordersuf","lineno","printfl")
What is your ABL code?

How are you "reindexing"?
I run below command after shutting down the database.
proutil /db/nxt -C idxbuild table com
 
Last edited:

ohchaos

New Member
Another possibility that these totals are both correct. For example, if you had another 100 records where cono=888 and they got deleted somehow between the start and end of your test, the total record count of 900 would be correct.

I suggest that you query for all distinct values in this field and their record counts before your test. Then when your table is in this state, run the query again and compare the results so you know what went missing.
It's one of the table I am having same issue. There is no cono=888 records. The condition I put in export, import and delete statements is same.
 

ohchaos

New Member
I see a lot of SQL. My first guess is that you neglected to "commit work;". My second guess is that you are ignoring an error message that you have not seen fit to share.

But those are just guesses. Made in the dark and without adequate insight into what you are really doing. You have only shown bits and pieces of your code and a lot of important bits, like the code that deletes the data and the code that reloads the data, are missing. Without seeing the whole picture all we can do is guess.
I put the same cono=999 in all three codes, export, import and delete statements. There is no other condition in where statement.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
The user say 'Yes' for deleting, but they can ask the data later. so I would like to dump the data and keep the file.
I understand that the user might want the data later. That explains why you want the data. It does not explain why you are choosing to export it, delete it, and then import it. My question, again: why not just leave the data in place? That would also allow the user to read it later. What problem is solved by doing this extra work?

INPUT FROM /impero/scripts/export/com.data.
OUTPUT TO /impero/scripts/export/com.data.import.out.

FOR EACH com:
IMPORT DELIMITER "~011" com.
END.

INPUT CLOSE.

I don't understand the logic of your import. You are iterating over the existing records, and for each one you find, you are reading a record from the input file. I would expect something like this, iterating over the line items in the file:
Code:
input from /impero/scripts/export/com.data.

repeat:
  create com.
  import delimiter "~011" com.
end.

input close.

You didn't answer whether your index is active.
 

TomBascom

Curmudgeon
Just guessing again but the logic behind purge, archive, and reload on demand is often driven by a belief that keeping “too much” old data in the online database is a performance problem.

IMHO if this is actually true it is because your application has index usage improvement opportunities.

The other common justification boils down to “because we have always done it this way”.
 

TomBascom

Curmudgeon
There are two obvious problems in the code as shared. The requirements narrative says that the exported records are to be deleted. But there is no DELETE after the EXPORT. And, as Rob points out, there is no CREATE prior to the IMPORT (so, as written, the imported records are overwriting the existing records).
 

ohchaos

New Member
There are two obvious problems in the code as shared. The requirements narrative says that the exported records are to be deleted. But there is no DELETE after the EXPORT. And, as Rob points out, there is no CREATE prior to the IMPORT (so, as written, the imported records are overwriting the existing records).

Sorry, I just did not put the delete command file. It works fine that I can confirm that it actually delete records.
 

TomBascom

Curmudgeon
Sorry, but I cannot accurately guess why your code doesn't work when you don't post what you are actually doing.
 

ohchaos

New Member
The issue is resolved. Thank you, Rob and Tom, so much for reading and comments.

After changing the import script by adding "Create", I was able to see the total record counts was correctly increased.

I would like to keep the database small as possible as I can. so I would like to delete theses old records in the table. But at this point, we do not 100% sure we can delete some of these records. My goal is deleting 23 millions records in 50 largest tables and I would like to give my manager some level of comfort with an option that I can import those records if they need later.
 
Last edited:

TomBascom

Curmudgeon
These archive, purge, and reload on demand schemes are almost always a complete waste of time and effort.

1) The schema will probably change at some point. Which means that your reload will probably break.
2) You now have two databases to manage. Your total disk footprint is not reduced. Nor is your workload.
3) Whatever benefit you supposedly get from this scheme is almost certainly an illusion.
4) Users lose immediate access to archived data and now have to jump through hoops to get it. That tends to make them unhappy.

On the bright side if you are paid by the hour you will have plenty of extra busy work to indulge in.
 

TomBascom

Curmudgeon
There are a couple of vaguely positive benefits:

1) The backup for the main database is smaller. (But your total backup size remains the same.) So it might run faster.
2) Dump & load, index rebuild etc will have less data to deal with so they will be faster.

Personally, I don't find either of those very compelling and if I were having trouble completing those tasks in a reasonable amount of time I would look to hardware improvements as a better solution.
 

ohchaos

New Member
Thank you Tom for sharing your insight, knowledge and experience. I will bring it to my manager and try to delete only records that we have to for the upgrade.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
It sounds like someone is trying to have their cake and eat it too: economize on disk capacity, but have the ability to grow back to a larger size on demand. As Tom pointed out, this is a false economy.

I have had a client call me one day because their database wouldn't start. It was corrupted and we had to do an unplanned dump and load on the spot. So in that moment we needed space for:
  • the old bad database, which was copied to a new location
  • the dump files
  • the new database, loaded from the dump files
  • a pre-idxbuild full backup
  • the idxbuild sort files (-T/-SS)
  • a post-idxbuild full backup
As you can see, all this data is several times the size of the database. If we hadn't had the disk space available, doing that work would have been much more challenging and time-consuming during an unplanned outage.

Good, fast disk space is cheap. This isn't the place to try to save small amounts of money. It will bite you in the long run.

If you have plenty of disk space on your database servers (don't forget DR!), you can confidently grow your business and know that your DB (including any needed maintenance or repair activities) won't run short of space. And you will happily find that there is no longer a business case for the extra work of archiving, deleting, and reloading old data.

This is a good time to think about your data life cycle. Lots of businesses accumulate old data and don't have an actual plan for how to manage it. They deal with it reactively when the pain becomes acute. That tends to mean that the work is unplanned, interferes with existing schedules, and lacks direction.

Think about the business, regulatory, and legal requirements that pertain to you, write a data-retention policy, and communicate it to internal stakeholders so that no one can act surprised or injured when old data disappears. It should be granular enough that you can develop procedures and automation from it. Implementing that, ultimately, is what will keep your database at a manageable size and will also save you the cost of unnecessary maintenance work.
 
Top