G
geertjeguns@hotmail.com
Guest
(Please visit the site to view this file)Hello everyone, I'm having a small issue while parsing an XML that's using a UTF-8 codepage. It contains some special characters like ‘ ( U+0091 ), ’ ( U+0092 ), “ ( U+0093 ), ” ( U+0094 ), œ ( U+009C ) and so on. It's not very clear but although the above characters look like a single quotation mark ' and a double quotation mark ", they are not the same. I first read the xml into a longchar and fix the codepage to UTF-8. (with or without FIX-CODEPAGE, the result is the same) No convertion is needed because the xml file is already created in UTF-8, hence the NO-CONVERT. I then use the longchar as an input source for the SAX-READER. Example of my code: /* Set a fixed codepage (UTF-8) for the longchar */ FIX-CODEPAGE ( wclong ) = "UTF-8". /* copy the xml to a longchar */ COPY-LOB FILE wcxml TO wclong NO-CONVERT. /* OUTPUT TO "d:\users\geegun\webservice\bal\esbpws\longcontent.txt". */ /* EXPORT wclong . */ /* OUTPUT CLOSE. */ CREATE SAX-READER whParser. RUN saxparserprocedure.p PERSISTENT SET whHandler. whParser:HANDLER = whHandler. whParser:SET-INPUT-SOURCE("LONGCHAR", wclong ). whParser:SAX-PARSE-FIRST() NO-ERROR. ParseLoop: REPEAT WHILE whParserARSE-STATUS = SAX-RUNNING: whParser:SAX-PARSE-NEXT() NO-ERROR. IF whParserRIVATE-DATA = "FatalErrorInvokedByUser" THEN DO: ASSIGN ERROR-STATUS:ERROR = TRUE. LEAVE ParseLoop. END. END. IF ERROR-STATUS:ERROR THEN DO: /* ... some error handling here ... */ END. ELSE DO : /* get the dataset from the saxparserprocedure */ RUN getdata IN whHandler (OUTPUT DATASET-HANDLE whdataset BIND, OUTPUT iplfuncerror , OUTPUT ipcErrorMsg ). END. When I uncomment the 'OUTPUT TO' to statement in the code above, the file still contains all the characters. But when I look at the attribute's value (using GET-VALUE-BY-INDEX(indexPosition) ) during the parsing process, the attribute's value has already changed. Attached to this post you can find a excerpt of the xml file. The following text 'Vidange d’huile' contains one of the special characters. It's not a normal apostrophe. I've been searching for a solution for a while and I found the following KB post dating from 2014 which describes my problem but unfortunately there doesn't seem to be a solution. http://knowledgebase.progress.com/articles/Article/000054284 Does anyone have an idea on how to solve this? Or has anyone had the same problem before? Thanks in advance, Geert
Continue reading...
Continue reading...