Question UTF-8 trouble.

ron

Member
OE 10.2B, RH

I have rather little experience working with UTF-8 databases - and I've hit some trouble.

I am working in a test environment and I've done this:

1. Taken a probkup online of a running DB that uses UTF-8.
2. Created a new structure file testdb.st.
3. Done prorest testdb ./test.bkp -cpinternal UTF-8, and the restore is successful.
4. If I try to run proutil testdb -C idxbuild I get this error:
Use "-cpinternal UTF-8" with idxbuild only with a UTF-8 database. (8557)

I can start-up the testdb database with proserve - and the log shows:

(4264) Character Set (-cpinternal): ISO8859-1.

Clearly I have a code-page mismatch - but I don't understand why. If I restored specifying UTF-8, why is the DB not UTF-8? And - what makes idxbuild "think" that -cpinternal UTF-8 should apply? (In other words, where does it "see" UTF-8?)

Ron.
 

ron

Member
I have made some "progress" ....

By adding -cpinternal UTF-8 to the idxbuild I can get idxbuild to function.

But I am left with the problem that when I start-up the testdb database with proserve, it shows:

(4264) Character Set (-cpinternal): ISO8859-1.

What have I missed in backing-up and restoring the database?

Ron.
 

TheMadDBA

Active Member
The default codepage is set for a Progress install and not at a database level. When Progress was installed on the test environment somebody picked ISO8859-1 (the default).

You have a few choices....

1) Reinstall Progress and choose UTF-8. This is by far the safest choice as long as all of your databases are UTF-8.

2) Add all of the codepage parameters to your scripts. This will be a pain but will be easier if most of your databases are ISO.

3) Change startup.pf in the DLC directory to use UTF for all of the codepage options.,copy promsgs from a UTF install, copy the empty databases and probably some other things I am forgetting. Not advised.
 

ron

Member
Thank you very much for that.

I'm a little bit surprised that the code page (UTF-8) is not an attribute of the database. I expected that if I back-up a DB that is UTF-8, it would remain UTF-8 when I restore it. But clearly that is not the case.

I am also surprised that if I restore a backup and specify UTF-8, the restored database is not UTF-8. It seems a bit clumsy.
 

TheMadDBA

Active Member
Just to clarify a bit... the DB is indeed encoded in UTF-8.

The problem is that your proserve/proutil/etc is using the ISO from the startup.pf. One of the perks is being able to run mixed codepages and have Progress convert (if it can).

Obviously some things don't support that. If you add -cpinternal UTF-8 to your proserve (and mpro) you should see the data in UTF-8.
 

tamhas

ProgressTalk.com Sponsor
If you think about it as follows, it might make more sense. If all of your databases use one standard, you can make that standard the default and never have to specify it again. But, if your databases are mixed ... as they are by definition during a transition, you have to specify the CP everywhere that the default is not used.
 

RealHeavyDude

Well-Known Member
You need to be aware that, in a client-server scenario, you have three seperate "things" using code pages:

  1. The character data in the database is stored in a certain code page - the database code page - in you case UTF-8.
  2. The databse server uses an internal code page which, unless specified otherwise, is taken from $DLC/startup.pf.
  3. The client uses several code pages: An internal one and one writing to streams which, unless specified otherwise, are taken from $LDC/startup.pf ( there are others but these are not so widely used ).
Between each of these three seperate "things" Progress makes a conversion if possible - meaning the code pages are compatible, meaning they share the same character set like for example is8859-1 and ibm850. If the code pages are not compatible then Progress can't make the conversion out-of-the-box and you are getting an error message. You could always roll you own in convmap.dat - but that's a different story.

Nevertheless, it might seem a good idea for the database server to automaticall pick up the code page from the databse and use it as it's internal. Especially in the case of UTF-8. But, all stuff you access the database with ( proserve, proutil, rfutil - you name it ) take it from the $DLC/startup.pf - that's the way it has been forever, before UTF-8 was introduced in some Progress V9.

Heavy Regards, RealHeavyDude.
 
Top