Reading of XML file using SAX PARSER

shaik.a

New Member
I am facing problem when reading XML files using SAX PARSER. The xml file encoding style is UTF-8 and version is 1.0.

In xml file special symbols are present e.g. ä. After reading the tag value from the xml file it is showing as a square box, its not showing special symbol.

My current code page is 1252 & open edge version is 10.2B

Can any one help me out or correct me if am doing anything worn. Or if you need any more info to answer ?

Its a bit urgent.

Thanks in advance.
 

RealHeavyDude

Well-Known Member
Welcome to the world of internationalization! The underlying problem is that any UTF-8 XML document can contain any number of characters which are not supported by the code page you are using ( 1252 ) and furthermore, as far as I know, the SAX parser does neither do an automatic code page conversion nor does it provide you with an option to specify one. Furthermore, the German Umlaut a which you are mentioning might not have been encoded correctly when the XML was created in the first place.

Did you create the XML document?

The only save way to SAX parse an XML document is to start the Progress session with the same code page than the XML document is encoded. Any other way, depending on the contents of the XML document, you will lose data or will have corrupted data. Therefore I would suggest you to try and change the -cpstream setting to UTF-8 and see if you display the offending characters correctly.

Heavy Regards, RealHeavyDude.
 

shaik.a

New Member
Thanks for quick reply.

The XML file is generated by an extrnal system which not using progress.

I have tried with -cpstream setting with UTF-8 but the same result.

Is their any other way to solve this issue ?
 

Cringer

ProgressTalk.com Moderator
Staff member
That would probably suggest the XML is corrupted. Can you view it successfully using a viewer?
 

shaik.a

New Member
No i am not able to see in the viewer. I am getting error at the first place where special character is used.

But it is processing without any error using SAX PARSER and values are getting populate in DB fields without any problem.

Does SAX PARSER process corrupted xml files also ?

Thanks for your support.
 

RealHeavyDude

Well-Known Member
First: You need to make yourself familiar with what UTF-8 is all about. It is a catch all code page that should contain all characters from all code pages all over the world. That means that you can convert from any code page ( for example your 1252 ) to UTF-8, but you can only convert from UTF-8 to a code page if the target code page contains all characters that the UTF-8 encoded text contains.

What program do you use to view the XML document and what code pages does this viewer support? If the viewer you use does not support UTF-8 then it does not mean that the XML document is not well-formed or corrupted just because your can't view it.

Out of experience I can tell you that the SAX parser will bail out of the parsing with an error if the XML document is corrupted.

What I have seen in the past when dealing with partners sending us XML documents: Sometimes they just wrote UTF-8 for the encoding but in reality the encoding was really something else ( iso8859-1 or ibm850 in some cases ). What if you change the encoding in the XML document to something else?

Heavy Regards, RealHeavyDude.
 

shaik.a

New Member
Thanks RealHeaveDude.

It is working with ISO8859-1, but the xml file which i am receiving from external system is have UTF-8.

Anyway i am going to contact progress software on this.

If i get any info i will share with you all.

once again thanks for your help.
 

sachin_de

New Member
Try this
CREATE SAX-WRITER hSAXWriter.
hSAXWriter:FORMATTED = TRUE. /* Format output so it is easy to read */
hSAXWriter:ENCODING = "ISO-8859-1". /* THIS IS FOR 'ANSI' */
hSAXWriter:SET-OUTPUT-DESTINATION("file",l_filename1).
hSAXWriter:START-DOCUMENT().
 
Top