Results 1 to 7 of 7
  1. #1
    Join Date
    May 2008
    Posts
    277

    Unanswered: Sorting street addresses

    Consider the following table of street addresses:

    Code:
    CREATE TABLE address (
        address_id INT PRIMARY KEY,
        street_num INT NOT NULL,
        street_dir VARCHAR(2) NOT NULL,
        street_name VARCHAR(255) NOT NULL,
        ...
    );
    I'm faced with having to order objects by address, some of which span multiple addresses. My strategy (as of now) is to select the min and max address of each object and sort on the address range, something like:

    Code:
    SELECT
        object_id,
        MIN(street_name || ' ' || street_dir || ' ' || street_num) AS min_address,
        MAX(street_name || ' ' || street_dir || ' ' || street_num) AS max_addresss
    FROM 
        object
        INNER JOIN object_address USING (object_id)
        INNER JOIN address USING (address_id)
    ORDER BY
        min_address,
        max_address,
        object_id
    Unfortunately, there are at least two problems with this:
    • In order to get the sort to work, the address is formatted wrong.
    • Addresses still don't sort correctly if the number of digits in street_num varies. For example, '900 N Main' is "greater" than '1000 N Main'.

    Any help on how to do this correctly would be greatly appreciated. Many thanks.

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Ah, you've run into one of the many issues in dealing with addresses! They are lovely things when viewed from a database perspective, aren't they???

    First of all, in your sample DDL, street_num is an INT. As long as the number isn't too large for an INT, it will sort correctly.

    Next, and probably more important is that not all "street_num" values are numbers, and even the numeric values in street addresses don't always sort the way that integers do because there are all kinds of "postal exceptions" that require special handling. Unless you want to try to concoct a "universal address normalizer" which has been a holy grail in the postal industry for as long as there has been a postal industry, this is a bad job that only gets worse as you continue to work on it.

    I would very strongly recommend that you consider using one of the services that does postal address to geo-locator translation. These services aren't cheap, but they are much less expensive that trying to do the job yourself!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Quote Originally Posted by Pat Phelan View Post
    I would very strongly recommend that you consider using one of the services that does postal address to geo-locator translation. These services aren't cheap, but they are much less expensive that trying to do the job yourself!
    Completely agree... cleaning up addresses is not something to do in house. If you're doing business with someone, any address you fail to parse is a lost customer.

    But... when you run your data through a geo-locator service, do some sanity checks on it. For example, if you have latitude and longitude pairs, check that they are within the northern, southern, eastern and western boundaries of the continental US. Plug in addresses you know and make sure the output makes sense. Look for anomalous answers and demand that your vendor explain them.

    '900 N Main' is "greater" than '1000 N Main'
    If all you want is sane sorting of numbers within strings, you might (depending on your DBMS) be able to create a collation with numeric sorting enabled.

    Mimer's docs has an example of doing this. (I haven't tried it myself.)

    Code:
     SQL>-- Numeric order [Numeric On]
     SQL>CREATE COLLATION numeric FROM eor USING '[Numeric On]';
     SQL>SELECT * FROM alphanum ORDER BY codes COLLATE numeric;
     
     CODES
     ==========
     A1
     A3
     A23
     A123
     A234
    A hackish way is to create another field and then pad any numbers to be at least, say, 10 digits long. This will force embedded numbers to sort properly. If your DBMS supports regular expressions, it shouldn't be too painful to implement.

  4. #4
    Join Date
    May 2008
    Posts
    277
    Thanks for the responses. Yeah, QA-ing addresses is like trying to nail jello to the wall. Fortunately(?), I'm only worried about getting objects properly sorted by their (already-assigned) addresses.

    Quote Originally Posted by sco08y View Post
    If all you want is sane sorting of numbers within strings, you might (depending on your DBMS) be able to create a collation with numeric sorting enabled.
    I guess the sorting itself isn't really what I'm having problems with -- if I keep the fields split out, I can just use them to get the order I want. The problem is getting objects that span multiple address to show up once in my report. So instead of something like this:

    Code:
    address      | object
    -------------+-------
    990 Main St  | 1
    995 Main St  | 2
    995 Main St  | 6
    1000 Main St | 3
    1000 Main St | 6
    1005 Main St | 4
    1005 Main St | 6
    1010 Main St | 5
    I need something like:

    Code:
    start_address | stop_address | object
    --------------+--------------+-------
    990 Main St   | 990 Main St  | 1
    995 Main St   | 995 Main St  | 2
    995 Main St   | 1005 Main St | 6
    1000 Main St  | 1000 Main St | 3
    1005 Main St  | 1005 Main St | 4
    1010 Main St  | 1010 Main St | 5
    or more accurately:
    Code:
    start_num | start_dir | start_name | stop_num | stop_dir | stop_name | object
    ----------+-----------+------------+----------+----------+-----------+-------
    990       | N         | Main St    | 990      | N        | Main St   | 1
    ...
    In order for this, I think I need something like a MIN/MAX function, but I can't use that over multiple fields.

    I realize there are some embedded assumptions here about sequential addresses and their geographical/physical relationship with other ... but for the moment, it seems to be a reasonably safe one, and I'm not sure how better to display the information, other than something like:

    Code:
    address(es)                             | object
    ----------------------------------------+-------
    990 Main St                             | 1
    995 Main St                             | 2
    995 Main St, 1000 Main St, 1005 Main St | 6
    1000 Main St                            | 3
    1005 Main St                            | 4
    1010 Main St                            | 5
    I guess this would actually be ideal, but it seems to completely break any attempt at sorting.

    If all you want is sane sorting of numbers within strings, you might (depending on your DBMS) be able to create a collation with numeric sorting enabled.
    That might help, but as far as I can tell, the DBMS I'm using (PostgreSQL) doesn't have this. And I don't know if it would actually work for something like:
    Code:
     995 Main St
    1000 Main St
     995 Wall St
    1000 Wall St
    vs
    Code:
     995 Main St
     995 Wall St
    1000 Main St
    1000 Wall St
    Yeah, addresses suck.
    Last edited by futurity; 05-21-10 at 13:48.

  5. #5
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Quote Originally Posted by futurity View Post
    That might help, but as far as I can tell, the DBMS I'm using (PostgreSQL) doesn't have this. And I don't know if it would actually work for something like:
    Code:
     995 Main St
    1000 Main St
     995 Wall St
    1000 Wall St
    vs
    Code:
     995 Main St
     995 Wall St
    1000 Main St
    1000 Wall St
    This should do it:
    Code:
    select *
    from objects
    order by cast(regexp_replace(address, '[^0-9]+', '') as integer)

  6. #6
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Quote Originally Posted by futurity View Post
    I need something like:
    Code:
    start_address | stop_address | object
    --------------+--------------+-------
    990 Main St   | 990 Main St  | 1
    995 Main St   | 995 Main St  | 2
    995 Main St   | 1005 Main St | 6
    1000 Main St  | 1000 Main St | 3
    1005 Main St  | 1005 Main St | 4
    1010 Main St  | 1010 Main St | 5
    Not sure if I comletely understood everything, but this seems to do the trick:
    Code:
    SELECT DISTINCT first_value(address) OVER W AS start_address,
                    last_value(address) OVER W AS end_address, 
                    object_id
    FROM objects
    WINDOW W AS (PARTITION BY object_id ORDER BY cast(regexp_replace(address, '[^0-9]+', '') as integer) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    ORDER BY OBJECT_ID

  7. #7
    Join Date
    May 2008
    Posts
    277
    Thank you for the responses, shammat. Unfortunately I'm stuck with version 8.3 of PostgreSQL, which doesn't have window functions.

    Quote Originally Posted by shammat View Post
    Not sure if I comletely understood everything
    I guess I sorta rambled there; sorry.

    Here's the output I'm looking for:
    Code:
    start_num | start_dir | start_name | stop_num | stop_dir | stop_name | object
    ----------+-----------+------------+----------+----------+-----------+-------
    990       | N         | Main St    | 990      | N        | Main St   | 1
    995       | N         | Main St    | 995      | N        | Main St   | 2
    995       | N         | Main St    | 1005     | N        | Main St   | 6
    1000      | N         | Main St    | 1000     | N        | Main St   | 3
    1005      | N         | Main St    | 1005     | N        | Main St   | 4
    1010      | N         | Main St    | 1010     | N        | Main St   | 5
    This would actually be ideal, but I don't believe I can get this to sort properly, so I'll settle for the above:
    Code:
    addresses                                     | object
    ----------------------------------------------+-------
    990 N Main St                                 | 1
    995 N Main St                                 | 2
    995 N Main St, 1000 N Main St, 1005 N Main St | 6
    1000 N Main St                                | 3
    1005 N Main St                                | 4
    1010 N Main St                                | 5
    Thanks.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •