Page 1 of 2 12 LastLast
Results 1 to 15 of 24

Thread: Join by name

  1. #1
    Join Date
    Apr 2002
    Posts
    168

    Unanswered: Join by name

    I have two tables that consist of information from 2 different resources which do not share common IDs. Therefore, I need to join them by name. It only has one field name, which is full name. I am wondering how I can join this two tables so I can get a good match as a start.

    How do I do the query ? I am guessing I need to use sort of like statement.



    Thanks

  2. #2
    Join Date
    Feb 2002
    Posts
    403
    Perhaps you could first query the Full Name field to split it, into first and second name for each table. At least this will reduce errors to only people with the same first and last name.

  3. #3
    Join Date
    Apr 2002
    Posts
    168
    Thanks for the reply.

    This is not a person's name, but company name.

    I am not sure how to write the SQL for joining the two tables.

  4. #4
    Join Date
    Feb 2002
    Posts
    403
    In that case there is a structural error in the tables (assuming you are not importing them).

    To fix this first I would Run a duplicate query on both tables, to eliminate duplicates.

    Then I would set up a new table Company. With two fields CompanyID and CompanyName.

    I would append the company name from the old tables into the new table.

    Finally I would add a column to the two query tables companyID, and would look up the ID from Company table and update the two tables to match the new ID value.

    This would enable joining on CompanyID.

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    dynamictiger, i kind of got the impression that milan's tables came from different sources ("resources") and this is not a structural error -- the need to consolidate disparate data sources is something that happens all the time, e.g. in data warehousing

    milan, are you going to try to resolve "near" matches? e.g. 'XYX Company' does not match 'XYZ Co.' but they are probably the same company

    do you want to do this only one time, or do you want to repeat it regularly? for example, to bring in a fresh version of each of the two tables every month

    do you want to consolidate the two tables into one as dynamictiger is suggesting? is one of the tables a trusted master table and the other an unverified new source? the reason i ask is, if you do find a "near" match, how do you decide which of the two names to keep?

    as for your question, yes, you can run a query to match by name, that part is relatively trivial

    select table1.companyname
    from table1
    inner join table2
    on table1.companyname = table2.companyname

    but those are exact matches, and the only way that i know how to resolve "near" matches is by looking at a sorted list, which you can get by

    select companyname, 'table1'
    from table1
    union all
    select companyname, 'table2'
    from table2
    order by 1, 2

    back in my mainframe days, we had a utility program we called IEHIBALL for this ("eyeball," get it?)

    nowadays there are commercial software products you can obtain to do the resolution of "near" matches


    rudy

  6. #6
    Join Date
    Apr 2002
    Posts
    168
    The issue in my case is that the two tables contain information about companies from different sources. For example :


    Table 1

    Company Name Company Revenue
    ABC $100


    Table 2

    Company Name Company Employees
    ABC Ltd 15

    Essentially, the two tables contain different information about ABC company, so I want to merge them. However, one has ABC and the other has ABC Ltd. So, I certainly cannot just merge them like usual. So, probably need to use some clever string comparison etc so it can quite match, which I am not sure what it is and how to write in the SQL.


    Any suggestion ?

  7. #7
    Join Date
    Apr 2002
    Posts
    168
    Thanks r937, you are correct.

    The two tables are updated each month. I want to do full join. For example :


    Table 1

    Company Name Company Revenue
    ABC $500
    XYZ $200

    Company Name Company Employee
    ABC Ltd 10
    DEF 15


    So the result will be :

    Company Name Company Revenue Company Employee
    ABC or ABC Ltd $500 10
    XYZ $200
    DEF 15


    Certainly I can't get 100% match like using ID. But, 80-90% will be good enough. So what I need is a good string comparison method.

    Any suggestion ?







    Do you know what soundex does in the code below ?



    SELECT whatever
    FROM table1 INNER JOIN table2
    ON myFunction(table1.Name, table2.Name)



    where myFunction( x, y) is a VBA function, in a standard module, that returns TRUE is x is
    similar to y, FALSE otherwise. Use the VBA code to define what you meant by "similar", like a
    soundex subroutine:


    Public Function myFunction( x As String, y As String) As Boolean
    myFunction = ( Soundex(x) = Soundex(y) )
    End Function

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    soundex and similar functions will cause "false positives" and waste more time than they save

    if this is to be repeatable, you should begin by declaring a "good names" table, which will have a key and a good name for every company

    then, you develop an "incoming names" table, which have all the names from both of the incoming tables, and each such name will have a foreign key to the good names table

    so if your original table1 has
    ABC Co.
    XYZ Ltd

    and if your original table2 has
    ABC
    XYZ Compnay

    then your "good names" table will have
    45 ABC Corporation
    73 XYZ Company Limited

    and your "incoming names" table will have
    ABC Co. 45
    XYZ Ltd 73
    ABC 45
    XYZ Compnay 73

    and you would then do joins through the incoming names table to the good names table

    simple, no?

    and how do you determine which names to use?

    by hand

    it will take work

    also, each month, you may have new names arriving in table 1 or table2, so you have to have a report that catches those (i.e. where name not already in incoming names table)

    does any of this make sense?

    sorry, you will not be able to run some routine or function and expect it to figure out which names are the same

    rudy

  9. #9
    Join Date
    Apr 2002
    Posts
    168
    Thanks for the reply.

    I still don't quite understand some of your stuff.

    How can I go from table 1 and table 2 to "good names" table ?

    In your example with "incoming" table, it seems that you append those two tables. But for example, ABC should have different information, one is $ and one is number.

    What field do I use to join "good names" and "incoming" tables ?


    Well, I think the DB design above is good for long term reusabilility. However, right now, I desperately need to find a good way to join those two tables so they can match quite good since I need to do some analysis about the data very soon, for example, revenue per employee etc. I was hoping to get some sort of string operations that can do this (e.g. soundex), although it might not be perfect.

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    well, you can use Left(companyname,6)

    join the tables based on the fact that the company names will likely be identical through the first 6 or whatever characters

    from table1
    inner join table2
    where left(table1.companyname,6)
    = left(table2.companyname,6)

    CAUTION: you are going to get some false matches

    for example, if both tables have rows for
    Philanthropy Company
    Philanderers Unlimited

    then you are going to get 4 matched results, of which two will be false


    rudy

  11. #11
    Join Date
    Apr 2002
    Posts
    168
    Thanks.

    But I don't this method will be sustainable since I believe there will be a lof of false matches returned and people will question on the number of characters chosen.

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    exactly

    this is why i said long ago that you must do the matches by inspection

    select companyname, 'table1'
    from table1
    union all
    select companyname, 'table2'
    from table2
    order by 1, 2

  13. #13
    Join Date
    Apr 2002
    Posts
    168
    Thanks.

    What does order 1,2 mean ?

    What you mean by inspection is manual inspection ? If yes, wow, I have thousands of records, probably take forever to do that

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    order by 1,2 means order by the first column, then the second

    the two columns in the union are companyname and source table (a literal)

    yes, it will take a while, but you will learn a lot about what you're doing

    and also, that's why they pay us database guys the big bucks


  15. #15
    Join Date
    Apr 2002
    Posts
    168
    So in my case, your query result will be like :

    ABC
    ABC Ltd
    DEF
    XYZ

    Is this correct ?

    An then the next step is, I need to manually check which ones are actually the same ? For example, ABC and ABC Ltd.

    The problem with this is that when other people use this or import a new data, he/she might not be interested in doing the manual check.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •