efficiently remove spaces

whwar9739

Member
So I have a question on efficiency and don't have a good way to test it.

Which of the below code segments would most efficiently replace multiple spaces with a single space?

Code:
l-txt = TRIM(l-txt).
DO WHILE l-txt MATCHES "*  *":
   l-txt = REPLACE(l-txt, "  ", " ").
END.

OR

Code:
l-txt = TRIM(l-txt).
DO WHILE LENGTH(l-txt) > l-idx:
   IF SUBSTRING(l-txt,l-idx,2) = "  "
   THEN DO:
      SUBSTRING(l-txt,l-idx,2) = " ".
      NEXT.
   END.
   l-idx = l-idx + 1.
END.

Any suggestions or comments would be great.

Thank you,
 

MattKnowles

New Member
My instinct tells me that it's the first example although I have nothing definitive to back this up with. If it were my code then I'd certainly use the first method, probably without the if statement, ie

l-txt = REPLACE(TRIM(l-txt), " ", " ").

Although, having said this, the 'do while' does ensure that all examples of double spaces are removed. If there are 3 or more consequative spaces then the above will still leave double spaces.
 

whwar9739

Member
My instinct tells me that it's the first example although I have nothing definitive to back this up with. If it were my code then I'd certainly use the first method, probably without the if statement, ie

l-txt = REPLACE(TRIM(l-txt), " ", " ").


Ahh but I think what you missed is that in either code segment it is replacing 2 spaces with 1 space so the DO WHILE loop is necessary in either case.
 

Stefan

Well-Known Member
Fastest:

Code:
   l-txt = TRIM( l-txt ).
   DO WHILE INDEX( l-txt, "  " ) > 0:
       l-txt = REPLACE( l-txt, "  ", " " ).
   END.

Tested with the following 'test suite':

Code:
&SCOPED-DEFINE loop 10000


DEF VAR itime AS INT NO-UNDO EXTENT 5 INITIAL {&sequence}.
DEF VAR ii AS INT NO-UNDO.
DEF VAR ctxt AS CHAR NO-UNDO INITIAL "1  2 3   4 5  1    2   ".


DEF VAR l-txt AS CHAR NO-UNDO.
DEF VAR l-idx AS INT NO-UNDO.




itime[{&sequence}] = ETIME.


DO ii = 1 TO {&loop}:


END.


itime[{&sequence}] = ETIME.


DO ii = 1 TO {&loop}:


   l-txt = TRIM(ctxt).
   DO WHILE l-txt MATCHES "*  *":
      l-txt = REPLACE(l-txt, "  ", " ").
   END.


END.


itime[{&sequence}] = ETIME.


DO ii = 1 TO {&loop}:


   l-idx = 1.


   l-txt = TRIM(ctxt).
   DO WHILE LENGTH(l-txt) > l-idx:
      IF SUBSTRING(l-txt,l-idx,2) = "  " THEN DO:
         SUBSTRING(l-txt,l-idx,2) = " ".
         NEXT.
      END.
      l-idx = l-idx + 1.
   END.


END.


itime[{&sequence}] = ETIME.


DO ii = 1 TO {&loop}:


   l-txt = TRIM( ctxt ).
   DO WHILE INDEX( l-txt, "  " ) > 0:
      l-txt = REPLACE( l-txt, "  ", " " ).
   END.


END.


itime[{&sequence}] = ETIME.




MESSAGE
   itime[2] - itime[1] SKIP
   itime[3] - itime[2] SKIP
   itime[4] - itime[3] SKIP
   itime[5] - itime[4]
VIEW-AS ALERT-BOX.

Results on 10.2B04 32-bit running on Windows 7 x64 laptop:

---------------------------
Message
---------------------------
5
72
302
28
---------------------------
OK
---------------------------
 

whwar9739

Member
So i tried running it multiple times and with the first and last options, only difference being the do while condition, the results are varied, sometimes the while matches loop is faster, sometimes the index > 0 is faster.

I am going to try running it multiple times and save off the results and see if one is consistently faster than the other.
 

Stefan

Well-Known Member
Also what is the &sequence and where is it defined?
Its a built in preprocessor - see the help file / documentation:

Representing a unique integer value that is sequentially
generated each time the SEQUENCE preprocessor name is referenced. When a compilation begins, the value of {&SEQUENCE} is 0; each time {&SEQUENCE} is referenced, the value increases by 1. To store the value of a reference to SEQUENCE, you must define another preprocessor name as {&SEQUENCE} at the point in your code you want the value retained.

It allows 'tests' to be inserted without having to reorder extents manually.
 

Stefan

Well-Known Member
So i tried running it multiple times and with the first and last options, only difference being the do while condition, the results are varied, sometimes the while matches loop is faster, sometimes the index > 0 is faster.

I am going to try running it multiple times and save off the results and see if one is consistently faster than the other.

I would assume that the actual data would influence the results but I would expect the internals of the index function to be simpler than the internals of the matches function and therefore expect index to always be quicker.
 

Stefan

Well-Known Member
So i tried running it multiple times and with the first and last options, only difference being the do while condition, the results are varied, sometimes the while matches loop is faster, sometimes the index > 0 is faster.

I am going to try running it multiple times and save off the results and see if one is consistently faster than the other.

Wow, bizarre... running ChUI or GUI on my laptop results in index consistently beating the crap out of matches, but when run on 10.2B x64 on CentOS 5.5 its a pretty even match with matches sometimes slightly beating index...

And I'm getting the same (unexpected) results on a Win 7 x64 remote desktop...

Oh well, the difference is marginal, so use what you makes the most reading sense for you.
 

whwar9739

Member
Currently I am trying to run the program you had, 1000 times and saving the results from each then going to count the number of times matches beats index. Currently waiting for the results.....


The results came back that with 1000 runs 973 times the matching logic beat out the index logic. Granted it was never by much.
 

Stefan

Well-Known Member
Just for fun, one that stems from my Z80 coding days:

Code:
l-txt = TRIM( l-txt ).   
DO WHILE INDEX( l-txt, "  " ) > 0:       
   l-txt = REPLACE( l-txt, "  ", " " ).
   l-txt = REPLACE( l-txt, "  ", " " ).
END.

Repeat the number of replaces with the normal expectancy for number of repeated spaces.
 
Top