# Thread: Need help on GROUPING sets of rows and comparison

1. Registered User
Join Date
Jun 2004
Location
Far away from heaven
Posts
194

## Unanswered: Need help on GROUPING sets of rows and comparison

Hi folks. Hope all the gurus including Brett,Pat Phalen, RdjBarov, r937 are fine.
been so long to ask stupid question. Here's the question and i really need help on this.

i have data that tracks patterns of bus stops from one point to another.
like point a, to point b, point b to point c forms one pattern.
point a to point c , point c to point b should be a different pattern.

create table #journeypatterns (patternid int ,points varchar(100))
go
insert #journeypatterns
select 1 ,'a' union all select 1 ,'b' union all select 1,'c'
union all select 2,'a' union all select 2,'c' union all select 2,'b'
union all select 3 ,'a' union all select 3 ,'b' union all select 3,'c'

select * from #journeypatterns
patternid points
1 a
1 b
1 c
2 a
2 c
2 b
3 a
3 b
3 c
what i want is to get unique pattern value of sequence of points by grouping on patternid. if the sequence of points change, i need a unique value against that pattern.
like for patternid 1, sequence of points a,bc for example should be abc.
for patternid 2, sequence of points a,c,b for example should be acb.
again patternid 3, sequence of points a,bc for example should be abc.

i tried CHECKSUM_AGG which brutally failed in production because the checksum values for each single point when summed produce SAME result for different patterns.
select checksum_agg(binary_checksum(points)) ,patternid
from #journeypatterns
group by patternid
96 1
96 2
96 3

here patternid 2 should be different because sequence is acb. i know checksum is not the right approach for what i need.

I NEED A GENERIC FUNCTION, that marks the pattern differences, my ultimate goal was to create a procedure, whom a patternid should be passed, and it would result the NEXT patternid in the table which has the SAME ORDER OF point sequences.

now folks, i can do this holding all data into a temp table and write a cursor to traverse through each patternid and concatenate the sequence of points.
BUT, using this approach is the ugliest, as it has slow down the process badly and boss is not happy with the performance. the table holds a lot of data.
I NEED a query rather than a cursor on the fly to resolve this.
Here's the query i am using to get the current sequence of a pattern and then i have to search all sequences similarly against it.
declare @patternid int
set @patternid =1
declare @sequence [varchar] (100)
declare @id varchar(10)
declare cr_sequence cursor fast_forward for select points from #journeypatterns where patternid=@patternid
open cr_sequence
fetch next from cr_sequence into @id
while @@fetch_status = 0
begin
select @sequence = isnull(@sequence,'')+@id
fetch next from cr_sequence into @id
end
print @sequence -- next i have code to find the similar sequence for another patternid.... which is not mentioned here but is similar
Last edited by thebeginner; 01-24-07 at 11:20.

2. SQL Consultant
Join Date
Apr 2002
Location
Posts
20,002
you mentioned a temp table, which would've been my first approach

produce a temp table which concatenates the points in sequence

whoops, what sequence???

i'm afraid you're in it up to your ears, my friend

the data for these two patterns is identical --

patternid points
1 a
1 b
1 c
2 a
2 c
2 b

that's because, as brett will confirm, the physical order of data in a database has no meaning

3. Registered User
Join Date
Jun 2004
Location
Far away from heaven
Posts
194
Here you are; thanks for reply.
For Business logic, the data for these two patterns is NOT identical.
pattern 1 has A->B->C
Pattern 2 has A->C->B
so i need a way that when i group the rows on pattern, i get a unique value depending upon the sequence.
concatenating in a temp table is very overwhelming; wonder what, i have to do this for over 50,000 records to compare a patternid's sequence.

4. Registered User
Join Date
Mar 2005
Location
Netherlands
Posts
280
Originally Posted by r937
that's because, as brett will confirm, the physical order of data in a database has no meaning
This is the most important sentence in r937's reply! Logically you may have different patterns, technically, the way you have implemented it, you have not.

Your CURSOR-solution may seem to work but there is absolutly no guarantee the result will always be in the same order. The only way to get guarantees is to use ORDER BY but you don't have column that you can use for ordering (within a paternid).

The best way is to make a third column in which the order is defined. I think this will also make it easier to make the query you want.

5. Registered User
Join Date
Jun 2004
Location
Far away from heaven
Posts
194
Hi.
Patternid+points is a clusteredindex, so order is defined when data gets inserted.
Again, the concern was that this approach is taking too much time; this comparison is required many times and loading these all data to temptable to get concatenation is weird.

6. Registered User
Join Date
Mar 2005
Location
Netherlands
Posts
280
Originally Posted by thebeginner
Patternid+points is a clusteredindex, so order is defined when data gets inserted.
The clustered index is to help SQL Server look up the required rows not to sort the output for you. The order in which the data is returned is not guaranteed if you don't use an ORDER BY (although in some case a GROUP BY will also always give the same order)!

One possibility when the order is changed is when another process is already getting the required data. "Your" process will hook-up to the point the other process is busy, thus getting the last half of the data first. When they're finished your process will get the remaining first half (I think this is the round-robin method). You will get the data in the order it is read. And this is by no means the only way your order can get jumbled.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•