[progress Communities] [progress Openedge Abl] Forum Post: Extended Utf-8 Characters Are...

  • Thread starter geertjeguns@hotmail.com
  • Start date
Status
Not open for further replies.
G

geertjeguns@hotmail.com

Guest
(Please visit the site to view this file)Hello everyone, I'm having a small issue while parsing an XML that's using a UTF-8 codepage. It contains some special characters like ‘ ( U+0091 ), ’ ( U+0092 ), “ ( U+0093 ), ” ( U+0094 ), œ ( U+009C ) and so on. It's not very clear but although the above characters look like a single quotation mark ' and a double quotation mark ", they are not the same. I first read the xml into a longchar and fix the codepage to UTF-8. (with or without FIX-CODEPAGE, the result is the same) No convertion is needed because the xml file is already created in UTF-8, hence the NO-CONVERT. I then use the longchar as an input source for the SAX-READER. Example of my code: /* Set a fixed codepage (UTF-8) for the longchar */ FIX-CODEPAGE ( wclong ) = "UTF-8". /* copy the xml to a longchar */ COPY-LOB FILE wcxml TO wclong NO-CONVERT. /* OUTPUT TO "d:\users\geegun\webservice\bal\esbpws\longcontent.txt". */ /* EXPORT wclong . */ /* OUTPUT CLOSE. */ CREATE SAX-READER whParser. RUN saxparserprocedure.p PERSISTENT SET whHandler. whParser:HANDLER = whHandler. whParser:SET-INPUT-SOURCE("LONGCHAR", wclong ). whParser:SAX-PARSE-FIRST() NO-ERROR. ParseLoop: REPEAT WHILE whParser:pARSE-STATUS = SAX-RUNNING: whParser:SAX-PARSE-NEXT() NO-ERROR. IF whParser:pRIVATE-DATA = "FatalErrorInvokedByUser" THEN DO: ASSIGN ERROR-STATUS:ERROR = TRUE. LEAVE ParseLoop. END. END. IF ERROR-STATUS:ERROR THEN DO: /* ... some error handling here ... */ END. ELSE DO : /* get the dataset from the saxparserprocedure */ RUN getdata IN whHandler (OUTPUT DATASET-HANDLE whdataset BIND, OUTPUT iplfuncerror , OUTPUT ipcErrorMsg ). END. When I uncomment the 'OUTPUT TO' to statement in the code above, the file still contains all the characters. But when I look at the attribute's value (using GET-VALUE-BY-INDEX(indexPosition) ) during the parsing process, the attribute's value has already changed. Attached to this post you can find a excerpt of the xml file. The following text 'Vidange d’huile' contains one of the special characters. It's not a normal apostrophe. I've been searching for a solution for a while and I found the following KB post dating from 2014 which describes my problem but unfortunately there doesn't seem to be a solution. http://knowledgebase.progress.com/articles/Article/000054284 Does anyone have an idea on how to solve this? Or has anyone had the same problem before? Thanks in advance, Geert

Continue reading...
 
Status
Not open for further replies.
Top