Towards a more accurate sort order (Views: 101)


Sorting Addresses is a pain at the best of times, especially when a client supplies bad data (You may define clear fields in your DB, but when the data comes in, does it fit easily??)
This attempts to resolve this issue


unit AddrSortOrder;

{The custom sort order is used to deal with the fact that the
house and flat numbers are sorted as strings. They are stored as
strings to allow things like '150-175' as a house number, or '3a',
or perhaps even simply a flat 'A'.
The need for a custom sort order is caused by the fact that with an
ordinary ASCII sort order '4' will appear after '30'. This is not
desirable behaviour.

This approach to fix this problem is to look for the first number
in the string (if there is one) and then use this as some kind of
primary sort order. The rest of the sorting will then be done on
the remaining characters (with preceding and trailing spaces
stripped out), based on the ASCII value of their upper-
case varients. Potential problems caused by this approach include
(but are not limited to) the use of accented characters will
possibly cause strange orderings and furthermore, if there is a block
of flats with three floors A, B, C for example then supposing the
flats on those floors are A1, A2, A3, B1, B2, B3 then the ordering
of records will not be ideal - this approach will sort them as
A1, B1, A2, B2, A3, B3. This behaviour is regrettable, but
acceptable - we cannot tell that it is not flat A on floor 1 for
example. It's unlikely that we will be able to find a sort order
that always produces ideal results.
Some examples of sorted lists (not all ideal):
EXAMPLE 1       EXAMPLE 2        EXAMPLE 3
  Flat 1          1                 A
  Flat 2          -2                B
  3               2-4               C
  3B              3a                1
  Flat 3A         5                 2


uses SysUtils;

function CalcSortIndex(NumStr: string): double;


function CalcSortIndex(NumStr: string): double;
  strlength, i, j, tmp: integer;
  found: boolean;
  numpart, strpart, divisor: double;
  choppedstr: string;
  //This function will return the sort index value for the string passed

  strlength := length(NumStr);
  if strlength = 0 then
    result := 0;

  found := false;

  //split the string into a 'number' and a 'string' part..

  choppedstr := numstr;
  numpart := 0;

  //Locate the first digit (if there)
  for i := 1 to strlength do
    if numstr[i] in ['0'..'9'] then
      found := true; //First digit found!!
  end; //for i..

  if found then
    //now get the to the end of the digits..
    found := false;
    for j := i to strlength do
      if not (numstr[j] in ['0'..'9']) then
        found := true; //end of digits found
    end; //for j..

    //Separate out the string parts
    if found then
      //Number was embedded..
      val(copy(numstr, i, j - i), numpart, tmp);
      Delete(choppedstr, i, j - i);
      //Number went to the end of the string
      val(copy(numstr, i, strlength), numpart, tmp);
      Delete(choppedstr, i, strlength);

  choppedstr := Uppercase(trim(choppedstr));
  strlength := length(choppedstr);

  //evaluate a number for the remaining part of the string
  strpart := 0;
  divisor := 1;

  for i := 1 to strlength do
    divisor := divisor / 256;
    //convert from Char to single using a variant conversion
    strpart := strpart + (ord(choppedstr[i]) * divisor);

  //All done, return the value
  result := numpart + strpart;


NB a version of this Algorithm for MSSQL7 is also posted (Title "Towards a more accurate sort order in MSSQL7")

<< Back to main page