 Mirror

Towards a more accurate sort order (Views: 101)

 Problem/Question/Abstract:Sorting Addresses is a pain at the best of times, especially when a client supplies bad data (You may define clear fields in your DB, but when the data comes in, does it fit easily??) This attempts to resolve this issueAnswer:unit AddrSortOrder;{The custom sort order is used to deal with the fact that thehouse and flat numbers are sorted as strings. They are stored asstrings to allow things like '150-175' as a house number, or '3a',or perhaps even simply a flat 'A'.The need for a custom sort order is caused by the fact that with anordinary ASCII sort order '4' will appear after '30'. This is notdesirable behaviour.This approach to fix this problem is to look for the first numberin the string (if there is one) and then use this as some kind ofprimary sort order. The rest of the sorting will then be done onthe remaining characters (with preceding and trailing spacesstripped out), based on the ASCII value of their upper-case varients. Potential problems caused by this approach include(but are not limited to) the use of accented characters willpossibly cause strange orderings and furthermore, if there is a blockof flats with three floors A, B, C for example then supposing theflats on those floors are A1, A2, A3, B1, B2, B3 then the orderingof records will not be ideal - this approach will sort them asA1, B1, A2, B2, A3, B3. This behaviour is regrettable, butacceptable - we cannot tell that it is not flat A on floor 1 forexample. It's unlikely that we will be able to find a sort orderthat always produces ideal results.Some examples of sorted lists (not all ideal):EXAMPLE 1       EXAMPLE 2        EXAMPLE 3  Flat 1          1                 A  Flat 2          -2                B  3               2-4               C  3B              3a                1  Flat 3A         5                 2}interfaceuses SysUtils;function CalcSortIndex(NumStr: string): double;implementationfunction CalcSortIndex(NumStr: string): double;var  strlength, i, j, tmp: integer;  found: boolean;  numpart, strpart, divisor: double;  choppedstr: string;begin  //This function will return the sort index value for the string passed  strlength := length(NumStr);  if strlength = 0 then  begin    result := 0;    exit;  end;  found := false;  //split the string into a 'number' and a 'string' part..  //initialise  choppedstr := numstr;  numpart := 0;  //Locate the first digit (if there)  for i := 1 to strlength do  begin    if numstr[i] in ['0'..'9'] then    begin      found := true; //First digit found!!      break;    end;  end; //for i..  if found then  begin    //now get the to the end of the digits..    found := false;    for j := i to strlength do    begin      if not (numstr[j] in ['0'..'9']) then      begin        found := true; //end of digits found        break;      end;    end; //for j..    //Separate out the string parts    if found then    begin      //Number was embedded..      val(copy(numstr, i, j - i), numpart, tmp);      Delete(choppedstr, i, j - i);    end    else    begin      //Number went to the end of the string      val(copy(numstr, i, strlength), numpart, tmp);      Delete(choppedstr, i, strlength);    end;  end;  choppedstr := Uppercase(trim(choppedstr));  strlength := length(choppedstr);  //evaluate a number for the remaining part of the string  strpart := 0;  divisor := 1;  for i := 1 to strlength do  begin    divisor := divisor / 256;    //convert from Char to single using a variant conversion    strpart := strpart + (ord(choppedstr[i]) * divisor);  end;  //All done, return the value  result := numpart + strpart;end;end.NB a version of this Algorithm for MSSQL7 is also posted (Title "Towards a more accurate sort order in MSSQL7")

<< Back to main page