Question Error Trying To Load Data Into New Db

nix1016

New Member
I'm upgrading from Openedge 10 to 11 and doing a dump and load to a new version 11 db as I need to convert to Type II storage area and add additional tables/fields/indexes in the process. I also need to convert the DB from iso8859 to utf-8 so I assume that I can't do a binary dump?

I started doing the dump in data administration with code map to UTF-8 and then loading it straight into the empty UTF-8 db with the new data structures, it is very slow which is expected (the database is about 8GB) but I'm also getting the error SYSTEM ERROR: Input data too large, try to increase -s and/or -stsh. (63).

I tried increasing -s to 32000 (from 800) and -stsh to 31 and I'm still getting that error. Is there anything else that I'm missing? Would I be better off just doing a binary dump/load and converting to UTF-8 afterwards? There will be over 100 database of varying sizes from 1GB to 30GB to upgrade using the same process so this will need to be scripted eventually which I don't think can be done for code mapping dump via data administration?

Thanks in advance for your suggestions!
 

Cringer

ProgressTalk.com Moderator
Staff member
I see you've had some good responses on the community. I would reiterate the opinions there that you'll be better off doing the dump and load and the convert in 2 stages.
 

nix1016

New Member
Thanks, I will give that a go. However, we've had issues in the past where certain 1252 characters aren't converted properly which is why I thought doing an ASCII dump would get around the issue
 
Last edited:

RealHeavyDude

Well-Known Member
Binary dump and load is the way to go. But, you need to understand that a binary means data as-is. That means there is no code-page handling involved, neither in the dump nor in the load process. Therefore if you binary dump the data out of a databse with codepage A and then binary load it into a databae with codepage B you've successfully compromised your character data - as you might have experienced already.

One of the most important things for dump and load is: Fix any problem before you dump the data, be it binary or ascii. Fixing data after the load might be much more complicated as the dump and load might have introduced new issues.

If you have any characters stored in the database that are not in line with the database's code page, then you need to fix those first.

Heavy Regards, RealHeavyDude.
 

nix1016

New Member
Thanks, but how do I find out which characters are not in line with the databases' code page? I tried using proutil convchar charscan prior to a conversion and it simply lists the fields that requires translation but doesn't tell me the exact records. Most of the time the translation to UTF-8 works fine as well.
 

RealHeavyDude

Well-Known Member
The last time I did use the prouti convchar charscan utility was some 15 years ago - so please bear with me. As far as I can remember and after looking into the documentation - the utility does list the RECIDs of the records matching your query.

You are correct. Most of the time the conversion from any code page to UTF-8 should be a piece of cake. But, in the past I came across some "smart" guys. Instead of using a character set and thus a database code page that does support all the characters they need, the tweaked the terminal emulation. That way they where able to store unsupported characters in places of supported characters that they thought the users should not use. If you know that something like this has been done in your environment, then you need to be able to identify the records and the fields that are screwed up, before you convert the code page - be it to UTF-8.

Heavy Regards, RealHeavyDude.
 

nix1016

New Member
Thanks for that, that's the issue we have too... however, instead of identifying and fixing the data (which is way too much work) we found a way around it by converting the DB to 1252 and then converting to UTF-8 :)
 
Top