Fast and safe way to delete large number of records

ohchaos

New Member
Hello,

I need to delete lots of records which is half of the big table (over 60 million records) joining the other table in other database. I can write the delete statement, but wan to run it fast and more importantly do commit after deleting1000 records to avoid the blocking. How can I do this? Should I switch to use the 4GL?

Thanks in advance.
 

Rob Fitzpatrick

ProgressTalk.com Sponsor
I need to delete lots of records which is half of the big table (over 60 million records)
Okay...
joining the other table in other database.
I don't know what that means, or why it is relevant to deleting records in this database. Is it relevant? If so, can you please elaborate?
I can write the delete statement, but wan to run it fast
It seems like you're implying you've already considered or tried some approach and discarded it as being too slow. What does "fast" mean to you? And why is "fast" important? Isn't the ultimate goal that the records that are to be deleted, get deleted? Why do they need to be deleted within a set period of time?
and more importantly do commit after deleting1000 records to avoid the blocking.
I don't know what you mean by this.
Should I switch to use the 4GL?
You can use either 4GL or SQL to manipulate data in your database, provided it is configured to accommodate that and you know how to do it. What you should do might be different from what someone else should do, given their skill set, experience in one language versus the other, permissions, access to tools, etc.

I suggest you write out your business requirements and constraints and the write up a proposed solution and prototype it. Then test it on a small database and note how it performs and whether it seems to meet your requirements. Then test it on a production-like database (e.g. over 60 million records). If you can do that with a copy of that production database, on production-like hardware, so much the better. Then evaluate it again whether your solution satisfies your requirements.

But I don't think anyone here can provide meaningful answers to your questions without a lot more information from you. I suggest you review this thread on obtaining help and then re-read your post.
 

TomBascom

Curmudgeon
Rob is correct. Your requirements are really quite vague and any advice we give will have to make far too many assumptions about what you are trying to accomplish.

If you are thinking that grouping deletions into a block and committing the set of deletions as one large transaction rather than 1000 small transactions will perform better then I have bad news for you. That technique works well for CREATE and UPDATE but it does not improve DELETE performance. Or at least it didn't the last few dozen times that I tested it.

Having said that, my default approach to deleting large number of records goes something like this:

1) Identify a starting point and an ending point for the deletion. Ensure that there is an index to efficiently support querying the target set of data.

2) Verify that this process can be done piecemeal. In other words - it is ok to do it in 2 or 3 or 400 passes. It does not need to be "all or nothing" (like a single giant db transaction). It is ok to restart the purge later after something goes wrong or you have to abort it because it did not finish in the expected time.

3) If the business says "no restarts" you will probably need to get an emergency budget requisition approved to upgrade your hardware in order to meet their demands.

Now it is (relatively) easy:

Code:
output to value( "purge.flg" ).
output close.

for each order where orderNum > 0 and orderNum < 1000000:

  for each orderLine where orderLine.orderNum = order.orderNum:
    delete orderLine.
  end.

  delete order.

  file-info:file-name = "purge.flg".
  if file-info:full-pathname = ? then quit.

end.

message "finished!".
pause.

You can run this until it completes. Stop it easily by removing "purge.flg" and restart it any time you'd like.

But that may not be a good fit for your unknown requirements.
 

ohchaos

New Member
Rob is correct. Your requirements are really quite vague and any advice we give will have to make far too many assumptions about what you are trying to accomplish.
If you are thinking that grouping deletions into a block and committing the set of deletions as one large transaction rather than 1000 small transactions will perform better then I have bad news for you. That technique works well for CREATE and UPDATE but it does not improve DELETE performance. Or at least it didn't the last few dozen times that I tested it.

Having said that, my default approach to deleting large number of records goes something like this:

1) Identify a starting point and an ending point for the deletion. Ensure that there is an index to efficiently support querying the target set of data.

2) Verify that this process can be done piecemeal. In other words - it is ok to do it in 2 or 3 or 400 passes. It does not need to be "all or nothing" (like a single giant db transaction). It is ok to restart the purge later after something goes wrong or you have to abort it because it did not finish in the expected time.

3) If the business says "no restarts" you will probably need to get an emergency budget requisition approved to upgrade your hardware in order to meet their demands.

Now it is (relatively) easy:

Code:
output to value( "purge.flg" ).
output close.

for each order where orderNum > 0 and orderNum < 1000000:

  for each orderLine where orderLine.orderNum = order.orderNum:
    delete orderLine.
  end.

  delete order.

  file-info:file-name = "purge.flg".
  if file-info:full-pathname = ? then quit.

end.

message "finished!".
pause.

You can run this until it completes. Stop it easily by removing "purge.flg" and restart it any time you'd like.

But that may not be a good fit for your unknown requirements.

Thank you so much for the code. I am not familiar with 4GL. I need to join two tables to find the orphan records. In MS SQL, this can be done easily using left outer join and "is null" condition.

For example,
delete from pub.oeel_index where srcrowpointer in (select srcrowpointer from pub.oeel_index oi left join nxt.pub.oeel o on oi.sr
crowpointer=o.rowpointer where o.rowpointer is null);

it deletes only records exist in oeel_index table, but not exist in oeel table.

How can I do this in 4GL?
 

TomBascom

Curmudgeon
Code:
for each oeel_index:
  if not can-find( oeel where oeel.field1 = oeel_index.field2 ) then
    delete oeel_index.
end.

Note: my WHERE clause is made up. I cannot make sense out of your example, it looks like it might have some typos in it. Or it might be that I'm just not much of a SQL guy.
 

ohchaos

New Member
Code:
for each oeel_index:
  if not can-find( oeel where oeel.field1 = oeel_index.field2 ) then
    delete oeel_index.
end.

Note: my WHERE clause is made up. I cannot make sense out of your example, it looks like it might have some typos in it. Or it might be that I'm just not much of a SQL guy.
It worked. This is what I was looking for. Thank you so much.
 
Top