Incoming Accents (CGI, WebSpeed)

GregTomkins

Active Member
Progress 10.2B

We use French throughout our application and in general accented characters (eg. 'e' with an acute accent, ASCII 233) are no problem. However;

An (eg) ASCII 233 ('e' with an acute accent) is input into a browser, which encodes it as UTF-8 and POSTs using $.ajax. In WS, GET-VALUE returns garbled text.

I tried these:

a. FIX-CODEPAGE ("UTF-8")
b. -cpinternal / -cpstream set to UTF-8
c. Fiddling with OUTPUT ... CONVERT
d. Apache conf.d / .htaccess AddDefaultCharSet UTF-8
e. Checking everywhere including WebSpeed, Apache, CGI etc. for incorrect code page references
f. A kBase entry suggested PROTERMCAP on the WS agent; this rendered WebSpeed unstartable.
g. Our DB is ISO8859-1 but that is not changeable and makes no sense would be the issue.
h. According to W3C, UTF-8 is always used for XHR, even if you specify otherwise.

I emphasize, output from WebSpeed to the browser is OK; the problem is in the opposite direction. Data within the DB, Telnet, GUI clients, etc. is also OK and has been for decades.

Any suggestions?
 

Stefan

Well-Known Member
We do not use get-value for this - but grab the XML directly from the WEB-CONTEXT handle, after setting the encoding to UTF-8.

Code:
DEFINE VARIABLE hRootNode   AS HANDLE  NO-UNDO.

CREATE X-NODEREF hRootNode.

WEB-CONTEXT:X-DOCUMENT:ENCODING = "UTF-8":U.
WEB-CONTEXT:X-DOCUMENT:GET-DOCUMENT-ELEMENT( hRootNode ).
 

GregTomkins

Active Member
Thanks for the suggestion, I am not very familiar with WEB-CONTEXT but it seems like a useful path to go down ... I still have this issue though.

Subsequent to this post I realized part of the issue could be URL encoding that browsers automatically apply to XHR's unless you specify multipart/form-data, but, using that doesn't seem to work either.

May I ask if you meant this as a general comment, or does your app actually handle accented characters? Your profile says 'Belgium' and I believe they use Dutch = English alphabet = A-Z. My apologies if I'm grossly misinformed.
 

GregTomkins

Active Member
FYI:

1. According to the doc, X-DOCUMENT:ENCODING affects the creating of a new XML document, but doesn't appear (by doc or testing) to influence the ingestion of an incoming XML document.

2. I finally solved my issue with 'CODEPAGE-CONVERT (h_char, "iso8859-1", "UTF-8" )', where 'h_char' represents the output of get-value.

Wow, this is a complicated topic! One thing I realized far too late is that traditional Unix utilities (eg. vi, grep) may handle non-ASCII characters differently, so, sometimes what looks like bad data is actually just the tool displaying it wrong. Likewise, browser debuggers may not always display things the way they really appear.
 

Stefan

Well-Known Member
All the WebSpeed functions like get-value are functional wrappers around the WEB-CONTEXT handle - see $DLC/src/web/method/cgiutils.i.

The methods on the WEB-CONTEXT handle are 'documented' as for internal use only. By using the WEB-CONTEXT handle directly I am eliminating any potential issues / obfuscation caused by the wrapper functions.

Every country in Europe (apart from the UK) has special characters, in Dutch for example we have:

é ë ï

Our databases generally use iso8859-1 as code page and browsers are all utf-8. We did have issues with accented characters in the past.

You are correct that setting the encoding is not doing anything. If I message the WEB-CONTEXT:X-DOCUMENT:ENCODING of our incoming messages they are already 'utf-8'.
 
Top