just because it is 99% of the cost that does not indicate that it was excessive. it just means that out of 100% of total cost required to run the query, (parsing resolving compiling (serial and parallel plans) and execution)
that the execution was 99% of the total operation. this indicates that figuring it out isnt taking a long time but doing it is.
now I believe that when you say 99% of the cost you mean in the Graphical execution plan, then that is still justifiable considering the amount of work that sql has to do to perform a dual sorted join.
sometimes if a merge join is warranted, sql server will sort unsorted columns because the optimizer feels that the merge join is more beneficial to the query than the cost of not performing a merge join.
if one of the columns involved is a foriegn key and it is not indexed you might want to consider a nonclustered index on this column.
this will presort the data in the column and speed up the merge.
the Graphic EX will probably still indicate the 99% (of the process time)but your IO and or execution time might decrease.
Just read up on "Understanding Merge Joins" in BOL. Sounds like in your case, the data is not sorted on the merge keys going into the query. Also, the fact that you are dealing with millions of rows. I have only seen the merge join a few times, but every time has been with 1 million rows or more.
Just for grins (because I doubt it will work), what is the result of throwing an order by on the subquery? The order by should be the same order as the join, and ideally there should be an index on the table you are joining to with that same order.
And this is just the thing I would need to have someone else look at myself, because you never know what is going to jump out at a different pair of eyes.
Another pair of eyes .. i am not gonna get .... am working at client's place all alone on my laptop ... away from my office ... and as I said before in the YAK Corral... that Shark tank story looks a lot like mine ...