If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > GROUP BY on blob

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-18-11, 02:53
oshman oshman is offline
Registered User
 
Join Date: Oct 2011
Posts: 11
GROUP BY on blob

Yes, I know its not possible

Here's my situation. I'm trying to add DB2 support to a CMS. I have everything working (with help from people here) except for 1 issue. I thought about multiple ways to address the issue but at this point, its time to ask for help. All the DBs that the CMS officially supports allow for GROUP BY on blob fields.

Here's what sample output looks like with MySQL ('variables' being a blob field):

> SELECT w.message AS message, w.variables AS variables, COUNT(wid) AS count FROM watchdog w WHERE (w.type = 'search') GROUP BY message, variables ORDER BY count DESC LIMIT 30 OFFSET 0;
+---------------------------+---------------------------------------------------------+-------+
| message | variables | count |
+---------------------------+---------------------------------------------------------+-------+
| Searched %type for %keys. | a:2:{s:5:"%keys";s:4:"test";s:5:"%type";s:7:"Conte nt";} | 1 |
+---------------------------+---------------------------------------------------------+-------+
1 row in set (0.00 sec)

In reality, the 'variables' column probably wouldn't require the blob type 99.9% of the time. But I can't in good faith create the field as varchar (the deprecation of long varchar isn't helping my cause either!).

So my question is what can be done about this. I saw a previous post by Knut where he said an external UDF that did something with the column was the way to go. I don't have the luxury of requiring external functions so anything needed would need to be in the support driver I'm writing.

I do have the ability to modify the query. I can't modify the CMS schema but I can use custom datatypes, functions, etc.

Any ideas or suggestions are appreciated.
Reply With Quote
  #2 (permalink)  
Old 11-18-11, 08:30
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
If you can't modify the query and you can't use external functions, it is going to be tough.

You could write a SQL-bodied function, which iterates over the BLOB content and uses SUBSTR() to do something with the bytes. For example, you could apply the HEX() function to each byte and concatenate everything to a string up to a certain length. However, you'd still have to change the query to use this function.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #3 (permalink)  
Old 11-18-11, 16:31
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
One workaround could be to calculate a hash (MD5 or other) for the LOB column content, store it in the row, and use the hash value for grouping.
Reply With Quote
  #4 (permalink)  
Old 11-18-11, 20:01
oshman oshman is offline
Registered User
 
Join Date: Oct 2011
Posts: 11
Thanks for the feedback

Thanks for the feedback guys.

Stolz - I CAN modify the query. All queries are built through a constructor. I can pretty much do anything I like in terms of modification. I can create SQL functions for use but I don't WANT use any external functions (its not mandated or anything). Ideally, you should be able to drop in the driver and install without any additional requirements.

n_i - I am actually doing just this. In certain situations a COUNT(DISTINCT(<field>)) is done on what is a big:text equivalent field in other DBs. Without LONG VARCHAR I had to create a custom type:

Code:
create type bigtext_t as(content blob, md5sum varchar(32)) mode db2sql
When an INSERT is done on these columns an md5sum is computed and stored with the type (thanks for helping with the PDO insert stolze!). The transform function returns only the blob portion of the type when queried. On SELECT DISTINCT, the query is modified to append '..md5sum' to the field. From my understanding, you can't use functions in grouping (i.e. - GROUP BY field..md5sum).

In order to GROUP BY the hash value I would either need to add a column for every blob in a given table or store them all in another table altogether. This would likely be a last resort.

One thing I did try was to create another structured UDT (blob_t) just for blobs with the "REF USING VARCHAR(xxx)". That didn't work either. Not sure if what I was trying to do was even possible.
Reply With Quote
  #5 (permalink)  
Old 11-19-11, 05:55
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
I misread your original post on the ability to adjust the SQL statements. Sorry about that.

Using an expression like "field..md5sum()" for grouping should work just fine. Did you get some sort of error message with this?

The REF USING ... is primarily interesting for dereferencing operations. You can store a reference to a value of a structured type somewhere else, dereference this reference and, thus, access the actual structured value. I don't think this would help you much because you'd still have to dereference the value and access the BLOB or hash for the grouping/counting.

p.s: You are aware that grouping over a hash value is just an approximation because hash values are not guaranteed to be unique?
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #6 (permalink)  
Old 11-26-11, 04:12
oshman oshman is offline
Registered User
 
Join Date: Oct 2011
Posts: 11
This works

I was able to solve this dilemma by re-writing the query to:

Code:
SELECT BLOB(variables) AS variables, BLOB(message) AS message, count from (
SELECT CAST(w.variables AS VARCHAR(200)) AS variables, CAST(w.message..content AS VARCHAR(200)) AS message,COUNT(wid) as count 
FROM watchdog w where (w.type = 'search') 
group by  CAST(w.variables AS VARCHAR(200)), CAST(w.message..content AS VARCHAR(200))) 
ORDER BY count DESC LIMIT 30 OFFSET 0
You were right Stolze, I can use the '..method()' in the GROUP BY.

If I just used:

Code:
SELECT CAST(w.variables AS VARCHAR(200)) AS variables, CAST(w.message..content AS VARCHAR(200)) AS message,COUNT(wid) as count 
FROM watchdog w where (w.type = 'search') 
group by  CAST(w.variables AS VARCHAR(200)), CAST(w.message..content AS VARCHAR(200))
ORDER BY count DESC LIMIT 30 OFFSET 0
PDO throws the following error:
SQLSTATE[07006]: Restricted data type attribute violation: -99999 [IBM][CLI Driver] CLI0102E Invalid conversion. SQLSTATE=07006 (SQLFetchScroll[-99999] at /usr/share/php/PDO_IBM-1.3.2/ibm_statement.c:1137)

So I have to cast the BLOB fields back. This works for my purposes although it will require some pretty ugly re-writing of the query.

If anyone has any tips or suggestions I'd like to hear them.
Reply With Quote
  #7 (permalink)  
Old 11-28-11, 11:19
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Interestingly, this is not an error from the DB2 server but rather a CLI error complaining about some sort of invalid conversion. It would be helpful to see the code that you use to retrieve the query results. But I guess that is the PHP/DB2 driver? Which driver are you using for that?
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On