Results 1 to 7 of 7
  1. #1
    Join Date
    Oct 2011
    Posts
    11

    Unanswered: GROUP BY on blob

    Yes, I know its not possible

    Here's my situation. I'm trying to add DB2 support to a CMS. I have everything working (with help from people here) except for 1 issue. I thought about multiple ways to address the issue but at this point, its time to ask for help. All the DBs that the CMS officially supports allow for GROUP BY on blob fields.

    Here's what sample output looks like with MySQL ('variables' being a blob field):

    > SELECT w.message AS message, w.variables AS variables, COUNT(wid) AS count FROM watchdog w WHERE (w.type = 'search') GROUP BY message, variables ORDER BY count DESC LIMIT 30 OFFSET 0;
    +---------------------------+---------------------------------------------------------+-------+
    | message | variables | count |
    +---------------------------+---------------------------------------------------------+-------+
    | Searched %type for %keys. | a:2:{s:5:"%keys";s:4:"test";s:5:"%type";s:7:"Conte nt";} | 1 |
    +---------------------------+---------------------------------------------------------+-------+
    1 row in set (0.00 sec)

    In reality, the 'variables' column probably wouldn't require the blob type 99.9% of the time. But I can't in good faith create the field as varchar (the deprecation of long varchar isn't helping my cause either!).

    So my question is what can be done about this. I saw a previous post by Knut where he said an external UDF that did something with the column was the way to go. I don't have the luxury of requiring external functions so anything needed would need to be in the support driver I'm writing.

    I do have the ability to modify the query. I can't modify the CMS schema but I can use custom datatypes, functions, etc.

    Any ideas or suggestions are appreciated.

  2. #2
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    If you can't modify the query and you can't use external functions, it is going to be tough.

    You could write a SQL-bodied function, which iterates over the BLOB content and uses SUBSTR() to do something with the bytes. For example, you could apply the HEX() function to each byte and concatenate everything to a string up to a certain length. However, you'd still have to change the query to use this function.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  3. #3
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    One workaround could be to calculate a hash (MD5 or other) for the LOB column content, store it in the row, and use the hash value for grouping.

  4. #4
    Join Date
    Oct 2011
    Posts
    11

    Thanks for the feedback

    Thanks for the feedback guys.

    Stolz - I CAN modify the query. All queries are built through a constructor. I can pretty much do anything I like in terms of modification. I can create SQL functions for use but I don't WANT use any external functions (its not mandated or anything). Ideally, you should be able to drop in the driver and install without any additional requirements.

    n_i - I am actually doing just this. In certain situations a COUNT(DISTINCT(<field>)) is done on what is a big:text equivalent field in other DBs. Without LONG VARCHAR I had to create a custom type:

    Code:
    create type bigtext_t as(content blob, md5sum varchar(32)) mode db2sql
    When an INSERT is done on these columns an md5sum is computed and stored with the type (thanks for helping with the PDO insert stolze!). The transform function returns only the blob portion of the type when queried. On SELECT DISTINCT, the query is modified to append '..md5sum' to the field. From my understanding, you can't use functions in grouping (i.e. - GROUP BY field..md5sum).

    In order to GROUP BY the hash value I would either need to add a column for every blob in a given table or store them all in another table altogether. This would likely be a last resort.

    One thing I did try was to create another structured UDT (blob_t) just for blobs with the "REF USING VARCHAR(xxx)". That didn't work either. Not sure if what I was trying to do was even possible.

  5. #5
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    I misread your original post on the ability to adjust the SQL statements. Sorry about that.

    Using an expression like "field..md5sum()" for grouping should work just fine. Did you get some sort of error message with this?

    The REF USING ... is primarily interesting for dereferencing operations. You can store a reference to a value of a structured type somewhere else, dereference this reference and, thus, access the actual structured value. I don't think this would help you much because you'd still have to dereference the value and access the BLOB or hash for the grouping/counting.

    p.s: You are aware that grouping over a hash value is just an approximation because hash values are not guaranteed to be unique?
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  6. #6
    Join Date
    Oct 2011
    Posts
    11

    This works

    I was able to solve this dilemma by re-writing the query to:

    Code:
    SELECT BLOB(variables) AS variables, BLOB(message) AS message, count from (
    SELECT CAST(w.variables AS VARCHAR(200)) AS variables, CAST(w.message..content AS VARCHAR(200)) AS message,COUNT(wid) as count 
    FROM watchdog w where (w.type = 'search') 
    group by  CAST(w.variables AS VARCHAR(200)), CAST(w.message..content AS VARCHAR(200))) 
    ORDER BY count DESC LIMIT 30 OFFSET 0
    You were right Stolze, I can use the '..method()' in the GROUP BY.

    If I just used:

    Code:
    SELECT CAST(w.variables AS VARCHAR(200)) AS variables, CAST(w.message..content AS VARCHAR(200)) AS message,COUNT(wid) as count 
    FROM watchdog w where (w.type = 'search') 
    group by  CAST(w.variables AS VARCHAR(200)), CAST(w.message..content AS VARCHAR(200))
    ORDER BY count DESC LIMIT 30 OFFSET 0
    PDO throws the following error:
    SQLSTATE[07006]: Restricted data type attribute violation: -99999 [IBM][CLI Driver] CLI0102E Invalid conversion. SQLSTATE=07006 (SQLFetchScroll[-99999] at /usr/share/php/PDO_IBM-1.3.2/ibm_statement.c:1137)

    So I have to cast the BLOB fields back. This works for my purposes although it will require some pretty ugly re-writing of the query.

    If anyone has any tips or suggestions I'd like to hear them.

  7. #7
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Interestingly, this is not an error from the DB2 server but rather a CLI error complaining about some sort of invalid conversion. It would be helpful to see the code that you use to retrieve the query results. But I guess that is the PHP/DB2 driver? Which driver are you using for that?
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •