I have two tables that consist of information from 2 different resources which do not share common IDs. Therefore, I need to join them by name. It only has one field name, which is full name. I am wondering how I can join this two tables so I can get a good match as a start.
How do I do the query ? I am guessing I need to use sort of like statement.
dynamictiger, i kind of got the impression that milan's tables came from different sources ("resources") and this is not a structural error -- the need to consolidate disparate data sources is something that happens all the time, e.g. in data warehousing
milan, are you going to try to resolve "near" matches? e.g. 'XYX Company' does not match 'XYZ Co.' but they are probably the same company
do you want to do this only one time, or do you want to repeat it regularly? for example, to bring in a fresh version of each of the two tables every month
do you want to consolidate the two tables into one as dynamictiger is suggesting? is one of the tables a trusted master table and the other an unverified new source? the reason i ask is, if you do find a "near" match, how do you decide which of the two names to keep?
as for your question, yes, you can run a query to match by name, that part is relatively trivial
inner join table2
on table1.companyname = table2.companyname
but those are exact matches, and the only way that i know how to resolve "near" matches is by looking at a sorted list, which you can get by
select companyname, 'table1'
select companyname, 'table2'
order by 1, 2
back in my mainframe days, we had a utility program we called IEHIBALL for this ("eyeball," get it?)
nowadays there are commercial software products you can obtain to do the resolution of "near" matches
The issue in my case is that the two tables contain information about companies from different sources. For example :
Company Name Company Revenue
Company Name Company Employees
ABC Ltd 15
Essentially, the two tables contain different information about ABC company, so I want to merge them. However, one has ABC and the other has ABC Ltd. So, I certainly cannot just merge them like usual. So, probably need to use some clever string comparison etc so it can quite match, which I am not sure what it is and how to write in the SQL.
The two tables are updated each month. I want to do full join. For example :
Company Name Company Revenue
Company Name Company Employee
ABC Ltd 10
So the result will be :
Company Name Company Revenue Company Employee
ABC or ABC Ltd $500 10
Certainly I can't get 100% match like using ID. But, 80-90% will be good enough. So what I need is a good string comparison method.
Any suggestion ?
Do you know what soundex does in the code below ?
FROM table1 INNER JOIN table2
ON myFunction(table1.Name, table2.Name)
where myFunction( x, y) is a VBA function, in a standard module, that returns TRUE is x is
similar to y, FALSE otherwise. Use the VBA code to define what you meant by "similar", like a
Public Function myFunction( x As String, y As String) As Boolean
myFunction = ( Soundex(x) = Soundex(y) )
I still don't quite understand some of your stuff.
How can I go from table 1 and table 2 to "good names" table ?
In your example with "incoming" table, it seems that you append those two tables. But for example, ABC should have different information, one is $ and one is number.
What field do I use to join "good names" and "incoming" tables ?
Well, I think the DB design above is good for long term reusabilility. However, right now, I desperately need to find a good way to join those two tables so they can match quite good since I need to do some analysis about the data very soon, for example, revenue per employee etc. I was hoping to get some sort of string operations that can do this (e.g. soundex), although it might not be perfect.