Results 1 to 6 of 6
  1. #1
    Join Date
    Jun 2004
    Location
    Far away from heaven
    Posts
    194

    Unanswered: Need help on GROUPING sets of rows and comparison

    Hi folks. Hope all the gurus including Brett,Pat Phalen, RdjBarov, r937 are fine.
    been so long to ask stupid question. Here's the question and i really need help on this.

    i have data that tracks patterns of bus stops from one point to another.
    like point a, to point b, point b to point c forms one pattern.
    point a to point c , point c to point b should be a different pattern.

    create table #journeypatterns (patternid int ,points varchar(100))
    go
    insert #journeypatterns
    select 1 ,'a' union all select 1 ,'b' union all select 1,'c'
    union all select 2,'a' union all select 2,'c' union all select 2,'b'
    union all select 3 ,'a' union all select 3 ,'b' union all select 3,'c'


    select * from #journeypatterns
    patternid points
    1 a
    1 b
    1 c
    2 a
    2 c
    2 b
    3 a
    3 b
    3 c
    what i want is to get unique pattern value of sequence of points by grouping on patternid. if the sequence of points change, i need a unique value against that pattern.
    like for patternid 1, sequence of points a,bc for example should be abc.
    for patternid 2, sequence of points a,c,b for example should be acb.
    again patternid 3, sequence of points a,bc for example should be abc.

    i tried CHECKSUM_AGG which brutally failed in production because the checksum values for each single point when summed produce SAME result for different patterns.
    select checksum_agg(binary_checksum(points)) ,patternid
    from #journeypatterns
    group by patternid
    96 1
    96 2
    96 3

    here patternid 2 should be different because sequence is acb. i know checksum is not the right approach for what i need.

    I NEED A GENERIC FUNCTION, that marks the pattern differences, my ultimate goal was to create a procedure, whom a patternid should be passed, and it would result the NEXT patternid in the table which has the SAME ORDER OF point sequences.

    now folks, i can do this holding all data into a temp table and write a cursor to traverse through each patternid and concatenate the sequence of points.
    BUT, using this approach is the ugliest, as it has slow down the process badly and boss is not happy with the performance. the table holds a lot of data.
    I NEED a query rather than a cursor on the fly to resolve this.
    Here's the query i am using to get the current sequence of a pattern and then i have to search all sequences similarly against it.
    declare @patternid int
    set @patternid =1
    declare @sequence [varchar] (100)
    declare @id varchar(10)
    declare cr_sequence cursor fast_forward for select points from #journeypatterns where patternid=@patternid
    open cr_sequence
    fetch next from cr_sequence into @id
    while @@fetch_status = 0
    begin
    select @sequence = isnull(@sequence,'')+@id
    fetch next from cr_sequence into @id
    end
    print @sequence -- next i have code to find the similar sequence for another patternid.... which is not mentioned here but is similar
    Last edited by thebeginner; 01-24-07 at 11:20.
    Cheers!





    !Direction is More Important than Speed!

    http://talatnauman.blogspot.com/

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    you mentioned a temp table, which would've been my first approach

    produce a temp table which concatenates the points in sequence

    whoops, what sequence???

    i'm afraid you're in it up to your ears, my friend

    the data for these two patterns is identical --

    patternid points
    1 a
    1 b
    1 c
    2 a
    2 c
    2 b

    that's because, as brett will confirm, the physical order of data in a database has no meaning
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jun 2004
    Location
    Far away from heaven
    Posts
    194
    Here you are; thanks for reply.
    For Business logic, the data for these two patterns is NOT identical.
    pattern 1 has A->B->C
    Pattern 2 has A->C->B
    so i need a way that when i group the rows on pattern, i get a unique value depending upon the sequence.
    concatenating in a temp table is very overwhelming; wonder what, i have to do this for over 50,000 records to compare a patternid's sequence.
    Cheers!





    !Direction is More Important than Speed!

    http://talatnauman.blogspot.com/

  4. #4
    Join Date
    Mar 2005
    Location
    Netherlands
    Posts
    280
    Quote Originally Posted by r937
    that's because, as brett will confirm, the physical order of data in a database has no meaning
    This is the most important sentence in r937's reply! Logically you may have different patterns, technically, the way you have implemented it, you have not.

    Your CURSOR-solution may seem to work but there is absolutly no guarantee the result will always be in the same order. The only way to get guarantees is to use ORDER BY but you don't have column that you can use for ordering (within a paternid).

    The best way is to make a third column in which the order is defined. I think this will also make it easier to make the query you want.

  5. #5
    Join Date
    Jun 2004
    Location
    Far away from heaven
    Posts
    194
    Hi.
    Patternid+points is a clusteredindex, so order is defined when data gets inserted.
    Again, the concern was that this approach is taking too much time; this comparison is required many times and loading these all data to temptable to get concatenation is weird.
    Cheers!





    !Direction is More Important than Speed!

    http://talatnauman.blogspot.com/

  6. #6
    Join Date
    Mar 2005
    Location
    Netherlands
    Posts
    280
    Quote Originally Posted by thebeginner
    Patternid+points is a clusteredindex, so order is defined when data gets inserted.
    The clustered index is to help SQL Server look up the required rows not to sort the output for you. The order in which the data is returned is not guaranteed if you don't use an ORDER BY (although in some case a GROUP BY will also always give the same order)!

    One possibility when the order is changed is when another process is already getting the required data. "Your" process will hook-up to the point the other process is busy, thus getting the last half of the data first. When they're finished your process will get the remaining first half (I think this is the round-robin method). You will get the data in the order it is read. And this is by no means the only way your order can get jumbled.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •