Unanswered: Performance hit if I use SELECT DISTINCT?
We use an ASP/MS SQL 2000 system to send out our mass e-mailing to about 3,500 subscribers (and the list is growing). There are some duplicate entries in the DB and I was thinking about using this code
SELECT DISTINCT email FROM Subscribers
to remove the duplicates (at least until we can get around to cleaning up the data and then putting up new subscriber form to prevent duplicate entries).
I was wondering, though... Will this have a significant impact on our performance? I mean, that's a lot of e-mail addresses to process and I don't want to bog our system down unnecessarily.
Distinct does have an overhead but against 3500 records it should go unnoticed I would expect. Suck it an see - test both queries and see how they hold up.
If you are only selecting email and email is indexed (nonclustered) then you have a covered query (the data comes from the index). These values are not duplicated so I would imagine (never tested) that there would be 0 overhead for this. Someone could confirm or correct this. Either way - the overhead should be minimal.
Thanks, I also don't think there will be a hit on this small a recordset, but I wanted to ask. As for indexing and other modifications... not today. I need a 30-second solution to this problem created in someone else's old code. These issues will be addressed in our next upgrade, which should be done next month.