| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

03-19-04, 13:42
|
|
Registered User
|
|
Join Date: Mar 2004
Posts: 3
|
|
|
Problems with a rewritten subselect query
|
|
Question about a complex query (rewritten subselect)
A bit of background: I have a PHP/MySQL site that allows users to search an academic journal's database of related scholarship. Users can search by title, author, categories, and so on. The table structure is setup so that each title can have 0 to inifinte matching records in corresponding tables (authors, categories, etc.).
My problem is building a query in MySQL that will find items by up to 6 categories using 'AND'. So, the results would be all the items in category 1 and in category 2 and in category 3 and so on. In SQL Server, I would just run a subselect query. But in MySQL 4.0.16, I have to use joins exclusively, so I get a query something like this:
-------------------
SELECT t.*
FROM titlesearch AS t, titlecat AS cat1, titlecat AS cat2, titlecat as cat3, titlecat as cat4, titlecat AS cat5, titlecat AS cat6
WHERE (t.search LIKE '%searchterms%') AND cat1.catid = 'cat1' AND cat2.catid = 'cat2' AND cat3.catid = 'cat3' AND cat4.catid = 'cat4' AND cat5.catid = 'cat5' AND cat6.catid = 'cat6' AND t.tid = cat1.tid AND t.tid = cat2.tid AND t.tid = cat3.tid AND t.tid = cat4.tid AND t.tid = cat5.tid AND t.tid = cat6.tid
-------------------
Naturally, MySQL freaks out when I try to run this query, ultimately just trying to run the query until I kill it, as there are too many joins. It seems like 4 joins is the limit before things begin to go sour.
So my question is does anyone have any suggestions about how to perform this sort of query? Is the best solution to flatten the table and have a query something like the following:
-------------------
SELECT t.*
FROM titlesearch AS t
WHERE (t.search LIKE '%searchterms%') AND (t.cat LIKE '%cat1%') AND (t.cat LIKE '%cat2%') AND ...
-------------------
The source data for the MySQL data is actually a SQL Server database. SQL server is used for all data entry and storage and then exported to MySQL, therefore denormalizing the MySQL structure isn't a big deal in terms of data integrity.
I've googled the death out of this issue and have been told to rewrite my subselects as joins (which isn't working), to switch to PostgreSQL (unfortunately not an option at this time), to upgrade to the alpha 4.1 mysql (also not an option for a production system).
Many thanks in advance for any advice you can offer!
|
|

03-19-04, 14:37
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
|
|
I'd like to reiterate the suggestions about PostgreSQL, or better yet using MS-SQL. If you must use MySQL, there is still a solution.
Assuming that every title has at least one category, and that you can make unnecessary category parameters equal to the Cat1 parameter, then I'd use:
PHP Code:
SELECT t.*
FROM titlesearch t
JOIN titlecat AS t1 ON (t1.tid = t.tid)
JOIN titlecat AS t2 ON (t2.tid = t.tid)
JOIN titlecat AS t3 ON (t3.tid = t.tid)
JOIN titlecat AS t4 ON (t4.tid = t.tid)
JOIN titlecat AS t5 ON (t5.tid = t.tid)
JOIN titlecat AS t6 ON (t6.tid = t.tid)
WHERE t1.catid = 'Cat1'
AND t2.catid = 'Cat2'
AND t3.catid = 'Cat3'
AND t4.catid = 'Cat4'
AND t5.catid = 'Cat5'
AND t6.catid = 'Cat6'
I don't have your test data to test it, but I think it should run pretty nicely.
-PatP
|
|

03-19-04, 15:05
|
|
Registered User
|
|
Join Date: Mar 2004
Posts: 3
|
|
|
|
Thanks, Pat! That works great!
I'm too dependent on subselects for this sort of thing, so I was a bit baffled by how to get the joins to do exactly what I needed in this situation. But you've cleared it up for me--many thanks!
I was also wondering if you or anyone else knew if too many joins will slow a query down? I vaguely remember being told by a db admin several years ago to keep joins to a minimum of 4 or less.
<edit>
About PostgreSQL: I may go ahead and install PostgreSQL on our server this weekend and move over to it or at least give it a go, as it seems to be a better solution in the long run. My only concerns were those that I've heard from others: Postgre is slower and more likely to corrupt data--but perhaps this only applies to older versions? (MySQL 4.1 and 5 seem to be promising developments, if they ever make it into production releases).
</edit>
|
Last edited by cookiemonster; 03-19-04 at 15:16.
|

03-19-04, 15:13
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
|
|
The number of joins that will slow down a query is very installation specific. In most cases somewhere between 20 and 30 tables is safe if you are running current releases of software on reasonable hardware.
Obviously that is load dependant too. If you get 5000 people running a 20 table query, the box is likely to start to smoke!
-PatP
|
|

03-20-04, 08:00
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
number of joins: 1
PHP Code:
select t.tid
, t.search
from titlesearch as t
inner
join titlecat as c
on t.tid = c.tid
where t.search like '%searchterms%'
and c.catid in ('cat1','cat2','cat3'
,'cat4','cat5','cat6')
group
by t.tid
, t.search
having count(*) = 6
this solution also lets you easily implement a change in the search criteria such as "must be in at least 4 of the 6 selected categories"
|
|

03-20-04, 12:04
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
|
|
Rudy's solution is elegant, and I'd actually thought of proposing it. The problem that I've run into is that sometimes you'll have "dirty" data, where you might have six connectoids between a title and cat1, but no other applicable connectoids. In that case, the simpler code that I proposed works as requested, while the more complex code returns a false positive.
This is a design choice. The more complex code allows many neat features to be added easily, but it requires you to be much more careful to keep either your data or your logic clean!
-PatP
|
|

03-20-04, 12:43
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
design choice?
there is no dirty data if you follow some very simple guidelines, such as declaring a primary key for the "connectoid" so that you cannot connect the same title to the same category more than once
create table titlecat
( tid integer not null
, foreign key (tid) references titles (tid)
, catid integer not null
, foreign key (catid) references categories (catid)
, primary key (tid,catid)
)
of course, many people miss this obvious solution because they've gone and blindly declared the primary key as an autonumber (a totally useless design choice)
|
|

03-20-04, 23:40
|
|
Registered User
|
|
Join Date: Mar 2004
Posts: 3
|
|
Thank you Rudy and Pat for your suggestions and help with this! I really appreciate it.
I've tested both of the proposed solutions, and they both work just fine. However, it does seem that single join group query is much quicker in retrieving results, as well as being easy to code and maintain.
The design of titlecat table is exactly as Rudy outlines, so there is no possibility of a title being assigned to the same category multiple times as the primary key is a compound key of the tid and catid.
So, I think the less joins, the better, as there are still a few more tables that will need to be searched. All of the possible search parameters are as follows:
words in title,
author,
year of publication (2: from and to),
categories (6: 4 of subject, 2 chronological),
isbn (1),
language (1), and
document type (1).
(I am having a bit of trouble allowing the user to select a boolean operator between the categories, so titles in cat 1 and cat 2 or cat3. I just can't seem to get my PHP script to put the parenthesis in the right places...)
I'll play around with it a bit more to see if any thing strange crops up in the results. And if anyone has any other suggestions, I'd be delighted to get them.
Thanks again for your help!
|
|

03-21-04, 05:59
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
Quote:
Originally posted by cookiemonster
(I am having a bit of trouble allowing the user to select a boolean operator between the categories, so titles in cat 1 and cat 2 or cat3. I just can't seem to get my PHP script to put the parenthesis in the right places...)
|
that would indeed complicated matters
remember, you can test the existence of certain categories in the WHERE clause, but test the combinations only in the HAVING
|
|

03-21-04, 11:24
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,606
|
|
Quote:
Originally posted by r937
design choice?
there is no dirty data if you follow some very simple guidelines, such as declaring a primary key for the "connectoid" so that you cannot connect the same title to the same category more than once
create table titlecat
( tid integer not null
, foreign key (tid) references titles (tid)
, catid integer not null
, foreign key (catid) references categories (catid)
, primary key (tid,catid)
)
of course, many people miss this obvious solution because they've gone and blindly declared the primary key as an autonumber (a totally useless design choice)
|
I agree wholeheartedly that there shouldn't be any reason to worry about bad data, but I have to live in the real world and bad data is often a problem for systems that I'm asked to clean up after the fact.
Mr. Celko and I have beat the issue of surrogate keys to death, more than once. We finally agreed to call a truce, because the way the I use them meets or exceeds his design criteria in every way, and he couldn't provide a natural key solution that was less than twice as expensive. While surrogate keys aren't for everybody, if used correctly they are sometime the only practical solution.
As in almost every case where you and I disagree, we are both preaching to the choir from different pulpits. We have the same basic intentions, but we disagree on a few fine points that usually only crop up in very large, complex situations.
-PatP
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|