Resolved SAX is getting the best of me...

Tracy Hall

New Member
Hello, it is me again.

I am working on parsing xml into a temp-table and working with it, then output it to a csv file for a magento load.

I have gotten the main one to work, but I am confused as to how to handle several going into one big temp-table.

This company has a category file, a products file, a price file, and probably a drop-ship one too.
I have the products one going in, but in order to really have the correct categories and prices at least, I need more files to get loaded.

Originally I had two product files and when I tried to load the second one behind the first it saved over the stuff in the temp-table.

I also wondered if you could put the parse and the callback in the same file? I am guessing you can, but I am so new to this and it did not work.

I have been under a push to get this done and I have a hold up not on my end to give me a cushion, but probably early next week I will be under the gun to have the second set of products working correctly.

Originally, I had the block of parse code for the first file and another block for the second file calling the same callback program. That is the one that saved over the first one in the temp-table.

I was wondering if anyone could tell me what has to be different for it to handle more than one xml file?
 

TomBascom

Curmudgeon
Is there some reason that you don't just use READ-XML()?

The SAX parser is very nice but, from what you describe, it doesn't seem necessary or particularly useful.
 

Tracy Hall

New Member
I was thinking about using it, but the stuff I read said it did not do well with large files. I might just try it and see how it does. I am not sure how big the full file is, but I am sure that there are a lot. The other vendor they use had close to 17,000 products. I don't know how many this vendor has.

Thanks for the suggestion!
 

GregTomkins

Active Member
I can't see how READ-XML would work with a single XML chunk that referenced many tables, which is what I thought the OP meant. Is this a dataset thing that I don't know about?

I know from recent bitter experience that READ-XML is sketchy when you have multiple 'records' with different sets of fields, so I'm surprised if it handles multiple 'record types' properly.

BTW, as regards performance, we deal with multi-100-MB XML files all the time and don't pay any special attention to / have particular problems with performance. Unless you are talking about GB's and/or have benchmarked it, I wouldn't pay any more attention to that than I would any other performance aspect.
 

TomBascom

Curmudgeon
I'm reading the question as needing to load several well defined XML files into corresponding temp-tables. Which, to me, sounds like a perfect use case for READ-XML().

If I'm wrong and it is actually random or disorganized stuff that needs a bunch of logic to direct to the proper place then SAX would suddenly make sense.
 

Tracy Hall

New Member
I am not sure what a well formed xml file would look like. I will give you some examples:
Here is a snippet of the categories:
Code:
<?xml version="1.0" encoding="ISO-8859-1"?>
-<AcuSport_Consumer_Categories table="xchlib.WCPP100U">
-<row>
<LEVEL1 typename="char" type="1">ACCESSORIES </LEVEL1>
<LEVEL2 typename="char" type="1">CHOKE TUBES/ACCESSORIES </LEVEL2>
<LEVEL3 typename="char" type="1"> </LEVEL3>
<FLCF typename="char" type="1">09016</FLCF>
</row>-
<row>
<LEVEL1 typename="char" type="1">ACCESSORIES </LEVEL1>
<LEVEL2 typename="char" type="1">CHOKE TUBES/ACCESSORIES </LEVEL2>
<LEVEL3 typename="char" type="1"> </LEVEL3>
<FLCF typename="char" type="1">09019</FLCF>
</row>
[code]
The actual products:
[code]
?xml version="1.0" encoding="UTF-8"?>
-<STEP-ProductInformation>-
<Products>-
<Product ID="134025" AnalyzerResult="included" ParentID="MM89542" UserTypeID="Item">
<Name>101574</Name>
<ClassificationReference AnalyzerResult="" ClassificationID="" InheritedFrom=""/>-
<Values>
<Value AttributeID="Caliber">.22 LR</Value>
<Value AttributeID="Diameter">1 Inch</Value>
<Value AttributeID="Thread">1/2-20 European</Value>
<Value AttributeID="Weight Ounce">3.4</Value>
<Value AttributeID="Catalog Vendor Name">AAC</Value>
<Value AttributeID="InventoryTyp">REG</Value>
<Value AttributeID="Item Class">02066</Value>
<Value AttributeID="Item Code">101574</Value>
<Value AttributeID="Item Description">AAC AAC PILOT 22LR 1/2-20</Value>
<Value AttributeID="Item Group">02</Value>
<Value AttributeID="Item Status">OPEN</Value>
<Value AttributeID="Primary Vendor">10440</Value>
<Value AttributeID="Retail Price">350.00</Value>
<Value AttributeID="Wildcat Eligible">N</Value>
<Value AttributeID="Consumer Description">Pilot Rimfire Silencer .22LR 5.25 Inches 1/2-20 European TPI Thread T3 Hard Coat Scarmor Finish - All NFA Rules Apply</Value>
<Value AttributeID="Length/Added">5.25 in./4.75 in.</Value>
<Value AttributeID="ProductPageNumber"><NoVersionPageNo/></Value>
<Value AttributeID="Master Model Header">Pilot 1/2-20 European</Value>
<Value AttributeID="Master Model Body Copy">Durable compact and light weight .22LR muzzle silencer for pistols and rifles. Stainless thread interface provides longevity and rigid mounting for improved accuracy with minimal and repeatable zero-shift. T3 Hard Coat Scarmor finish. Weight: 3.4 ounces. Length: 5.25 inches. Diameter: 1.0 inch. Mount: 1/2-20 European thread count. All NFA rules apply.</Value>
<Value AttributeID="Vendor Group">AAC</Value>
</Values>
<AssetCrossReference AnalyzerResult="included" AssetID="AAC_101574" Type="Primary Image"/>
</Product>
[code]
And the Prices:
[code]
<?xml version="1.0" encoding="ISO-8859-1"?>
-<FF_IT_PR_initial_load_P_S table="XCHLIB.ITPRFF">-
<row>
<PRITEM type="1">50005 </PRITEM>
<PRPRICE type="2">14.84</PRPRICE>
</row>
-<row>
<PRITEM type="1">50005-6 </PRITEM>
<PRPRICE type="2">14.39</PRPRICE>
</row>-
<row>
<PRITEM type="1">50006 </PRITEM>
<PRPRICE type="2">14.84</PRPRICE>
</row>-
<row>
<PRITEM type="1">50006N </PRITEM>
<PRPRICE type="2">18.68</PRPRICE>
</row>
[code]
So if you had to deal with this particular file structure, what would you use?

Thanks for any advice you can give. I already have the products one loading, I just don't know how to deal with the others and if another way would be easier, I am up for starting over. I just am hoping that I can get this accomplished quickly so no one gets too upset with me. 
Thanks for your help!!
Tracy
 

GregTomkins

Active Member
If each of Prices, Products etc. are in their own file, then indeed READ-XML should work fine and it should be simple enough just to extend whatever you did to multiple files.

Basically you just create a table handle for each file and run the table's READ-XML method, specifying the input file as a parameter, and the input type 'FILE'.

--

If they are all combined in one big file, then I would think about doing some preprocessing using Unix utilities or (gasp) IMPORT/PUT, to split them up first.

--

FYI and not to be all particular about it, but 'well formed' has a very specific meaning in XML, which I don't think is what you meant. And there is nothing 'badly formed' or disorganized about a single file containing many different 'records', interspersed together (albeit this is not convenient when using READ-XML). Lots of places do that, it's one of the strengths of XML (IMO), that you can combine different 'records' in many ways.
 

GregTomkins

Active Member
... sorry just to clarify, if this is really one big file with 3 different sets of types of 'records', one after the other, I would probably split it using an OS utility, and then READ-XML. I think SAX is almost for sure wrong for what you are doing. And don't even think about doing your own parsing, there lies madness.
 

Tracy Hall

New Member
Hey Greg, there are 3 DIFFERENT xml files. I am going to have to parse them each in because the product info has the retail price(MSRP), but not the price it is. I am going to at least use the category file so I can correspond the items to my categories.

Also, I wouldn't think about doing my own parsing because I am a newbie and just figuring out sax has taken me a while. (Not that I have it figured out :))

Thanks for being so helpful both of you!
Tracy
 

Tracy Hall

New Member
Hey Tom,
Do you know of any good examples of READ-XML()??? I have been looking for a while now and have not found an awesome example. I have the section in the documentation on it. I tried testing it out with the xml from another example in the doc. I am going to play with it some more, but I looped through the temp-table and did a display on the whole table and I got nothing.

I thought I would ask to see if you knew of a place that had a better example.

Thanks,
Tracy
 

TomBascom

Curmudgeon
I'm a dilettante when it comes to XML but I do know that Progress is somewhat finicky about the format of the XML. I played around a bit with your example XML, the changes below may, or may not all be necessary but they worked for me:
Code:
<?xml version="1.0" encoding="ISO-8859-1"?>
<categories>
<categoriesRow>
<LEVEL1>ACCESSORIES </LEVEL1>
<LEVEL2>CHOKE TUBES/ACCESSORIES </LEVEL2>
<LEVEL3> </LEVEL3>
<FLCF>09016</FLCF>
</categoriesRow>
<categoriesRow>
<LEVEL1>ACCESSORIES </LEVEL1>
<LEVEL2>CHOKE TUBES/ACCESSORIES </LEVEL2>
<LEVEL3> </LEVEL3>
<FLCF>09019</FLCF>
</categoriesRow>
</categories>
And threw together a sample program:
Code:
define temp-table categories no-undo
  field level1 as character
  field level2 as character
  field level3 as character
  field flcf  as character
  index flcf-idx flcf
.

message temp-table categories:read-xml( "file", "categories.xml", "append", ?, ?, ? ).
pause.

for each categories:
  display categories.
end.
 

GregTomkins

Active Member
You don't need to specify the TT definition statically the way Tom did, although, that is often done and often useful and I would never in a million years suggest that Tom's code is less than stellar. But it's not mandatory and in some cases it's helpful to let READ-XML create the schema for you. Here is a snip from some code I was working on this morning.

Code:
DEF VAR h_th AS HANDLE NO-UNDO.
DEF VAR h_bh AS HANDLE NO-UNDO.

/* This creates an empty table and then uses READ-XML to both create the schema AND load data into it. In
your case you would replace the LONGCHAR with FILE and h_request with a file name. But you can also set it by
hand (eg. h_request = "<foo><bar>data</bar></foo>") to experiment with it and figure out various nightmares
such as special characters in field names and dissimilar record definitions. */

CREATE TEMP-TABLE h_th.
h_th:READ-XML("LONGCHAR":U, h_request, "MERGE", ?, NO) NO-ERROR.

/* This is a miniature query to prove that it worked, or not. It finds the first record and messages the first two fields.
If you used a static definition like Tom suggested, you could just use a FOR-EACH, which is the #1 best
reason for doing it Tom's way if you can ;) */

CREATE BUFFER h_bh FOR TABLE h_th.
h_bh:FIND-FIRST().
MESSAGE h_bh:BUFFER-FIELD(1):BUFFER-VALUE() h_bh:BUFFER-FIELD(2):BUFFER-VALUE().
 

TomBascom

Curmudgeon
Please, I'm going to be ill...

That is a very good example of doing it totally dynamically.

I think that Tracy may be better served by a static example because she seems to have a situation where she knows what she is going to be working with as far as table and field names go.

I'm fairly fuzzy on how much of the stuff I edited out of the original XML really needed to go. I can imagine that it will be a PITA if she has no control over that format.
 

Tracy Hall

New Member
Hey Tom, I do think the static way will serve me better especially at my skill level. And YES it is going to be a PITA because I don't think I can change things.

I don't understand how much work an automation can do. I am supposed to make it an automated process to update the prices and inventory daily.
Can you create a cron job to ftp something? I am guessing you can, but if I am writing it, probably not. :confused:
If it is the case where I get the file and put it somewhere and the task is scheduled to run, I can fix the file when I put it in there. I am not sure, that will be questions I ask next.

Thanks for your help.
Tracy
 

TomBascom

Curmudgeon
Yes, you can use cron to automate such a task.

I wouldn't use FTP though -- it's horribly insecure. But that will all depend on whoever is providing this file and what they can support.
 

Tracy Hall

New Member
I believe it is just ftp. I always forget when I am getting the data from the other vendor because we always use SFTP and I forget to change the setting for them and it does not work. I have not gotten the credentials for this vendor yet, just a sample. The address it mentioned was ftp.blahblahblah.com though.
 

TomBascom

Curmudgeon
You're going to have to put the password in plain text somewhere :( So be sure to carefully consider the permissions on that script -- "world read" is probably a bad idea.
 

Tracy Hall

New Member
They may just want me to do it the other way I had mentioned and code it to do it at a certain time for now. I am not sure. I hope so. It is just pictures and what products a vendor carries and prices, I guess they can't do that much damage with that info. It isn't customer info or anything like that.

Thanks, have a great weekend.

Tracy
 
Top