If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Extracting main domains

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-15-05, 10:56
bstjean bstjean is offline
Registered User
 
Join Date: Sep 2002
Location: Montreal, Canada
Posts: 219
Extracting main domains

Hi folks!

I have a table of domain name from which I want to extract the main domains. I am using MySQL 3.23.58 so subselects can't be used.

For instance:

I have a topleveldomains table
1 qc.ca
2 on.ca
3 ca
4 com
5 gouv.ca

I have a domains table
1 toto.qc.ca
2 toto.ca
3 xyz.com
4 mysql.qc.ca
5 city.toronto.on.ca

I would like to get:
toto
xyz
mysql
city.toronto

My problem is that some top level domains are "substring" of others... For instance, ca is a substring of qc.ca so in that case, my join matches
the following:

mysql.qc.ca ---> ca
mysql.qc.ca ---> qc.ca

So I end up thinking that I found 2 main domains: mysql.qc and mysql. Also note that a main domain can also contain a period (city.toronto).

Any idea?

Thanks in advance!
Reply With Quote
  #2 (permalink)  
Old 01-15-05, 11:56
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
the problem lies in your topleveldomains table

rows 1 and 3 are the cause

perhaps what you need is to run two queries to do the matching -- the first will check for any 2-part top level (e.g. qc.ca), while the second will check for any 1-part top level that weren't already matched by the first query

without subqueries, you'll need a UNION and for the second query you'll need a LEFT OUTER JOIN

however, there's still a problem

you say you want "toto" as the result, which leads me to believe that you don't care that two completely different sites were matched against two completely different topleveldomains if you only want to see "toto" once in the results

therefore you can use UNION ALL and skip the LEFT OUTER JOIN in the second query

does that make sense?
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On