[Progress Communities] [Progress OpenEdge ABL] Forum Post: RE: Question about the performance of ABL queries resolved by multiple indexes

Status
Not open for further replies.
S

Simon L. Prinsloo

Guest
Tom is correct. I documented it in http://pugchallenge.org/downloads2015/176_OfCourseItWillTakeTheRightIndex.pdf . I state the rule on slide 13 and proof it in slides 41 to 45. But that is when all criteria is joined with AND, so you make one bracket. The moment you use OR, you create alternative brackets. There is one more brackets than the number of OR operators. Different indexes (and presumably sets of indexes) can be selected per bracket, as can be seen on slides 47-48, as long as the statement can use multiple indexes. But statements that can use only one index will not benefit, as can be seen on slides 49-51, where the result turns out to be quite unexpected. The source code used to make the presentation can also be found at PUG Challenge Americas . Also note that while the theory that equality is better than a range sounds correct, but in practice it might not be. This happens when there is a low distribution of values for the equality part and a high distribution of values in the range. The compiler cannot know what the data distribution will look like, but the persons designing the database and the software must be able to predict it. As an example, I encountered a system where invoices and credit notes were in the same table. They were differentiated with a doc-type field, which could be "INV" or "CRN". Rought 98% of the data was "INV". There was single field indexes for the doc-type field and the inv-date field. Doing a report on sales in a particular month worked well, because the index on inv-date was selected. But then the customer requested that credit notes must be filtered out and suddenly the report became very slow. The reason? After 5 years, a range match on a specific month's data returned 1/60 (1.67%) of the data. Roughly 2% of that result set was irrelevant. But an equality match on doc-type for "INV" returned 98% of the data, and in that result set only 1.67% of the data was relevant. Thus with the indexes available: you would by default read 98% of the data and use 1.67% of the result set. you could force a read of 1.67% of the data and use 98% of the result set. With a compound index on "doc-no, inv-date" you would read 1.63% of the data and the entire result set would be relevant.

Continue reading...
 
Status
Not open for further replies.
Top