If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > UDF's crash after new CPU installation on AIX!

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-26-07, 09:35
colargol70 colargol70 is offline
Registered User
 
Join Date: Jun 2007
Posts: 8
UDF's crash after new CPU installation on AIX!

Hi all,

Last week we upgraded the CPU of our AIX server. Until then, we had no problems running our UDF's, but we now get SQL0430N error messages ("User defined function "XXX" has abnormally terminated).

The strange thing is that

1) This happens with all UDF's, not with a particular one

2) This happens quite rarely (about 50 times/day, although the UDF's get called thousands of times every day)

Any idea? Any help is welcome.

- Colargol
Reply With Quote
  #2 (permalink)  
Old 06-27-07, 04:28
colargol70 colargol70 is offline
Registered User
 
Join Date: Jun 2007
Posts: 8
Our DBA informed me this morning that we did not only upgrade the hardware... but also the DB2 version, which most certainly has a bigger impact on the problem mentioned above

Our new DB2 version is "DB2 8.2 fixpack 7".

Any help is still welcome

- Colargol
Reply With Quote
  #3 (permalink)  
Old 06-29-07, 03:16
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
My guess is that the UDFs were buggy to begin with. For example, if memory is allocated but not properly initialized, then other parts of the code may use invalid data. The operating system will initialize a new data page when requested, but not every UDF invocation may resort to new data pages and could reuse existing ones. Similar issues can arise with variables on the stack. This is just one possibility, though.

Your best bet is to debug your UDF. What you may want to try is to install a signal handler for signal 11 (sigsegv) at the start of the UDF, and the signal handler can write a core dump or whatever other information you want. At the end of the UDF, you re-install the default signal handler.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #4 (permalink)  
Old 06-29-07, 09:49
colargol70 colargol70 is offline
Registered User
 
Join Date: Jun 2007
Posts: 8
Hi Knut,

Thanks for your answer. This is also what I thought, because out of 318 "UDF crashes", we finally got yesterday *one* Java stack trace (yes, I forgot to mention that these are Java UDF's) in db2diag.log. In this stack trace I can clearly see, that access to a map, lazily initialized, fails, although it never should... That still doesn't explain what happened in the other 317 cases

You suggest I install a signal handler. Ok if these were C UDF's... but any idea how I can achieve this in Java? The only possibility I see would be to execute an external process by using the "Runtime.getRuntime().exec()" method... or am I completely wrong?

- Colargol

Last edited by colargol70; 06-29-07 at 09:54.
Reply With Quote
  #5 (permalink)  
Old 06-29-07, 15:32
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
You were just plain lucky in the other 317 cases and in everything before DB2 V9.

I don't know if you can actually install a signal handler in Java code. Using Runtime.exec() will definitively not work because a signal handler must be installed by the executing process itself. What you could try is:
(1) write a C UDF that installs the signal handler, and then
(2) calls your Java UDF in a nested SQL call, then
(3) remove the signal handler again.
This may fail if DB2 re-installs the signal handler before the the Java function is called, or if the Java function is executed in a different process (which is the case anyways if I'm not mistaken).

p.s: I just don't like Java UDFs because the idea of UDFs is to execute them within a SQL statement that may process many, many rows. Under such conditions, performance is crucial...
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #6 (permalink)  
Old 07-02-07, 02:01
colargol70 colargol70 is offline
Registered User
 
Join Date: Jun 2007
Posts: 8
Quote:
Originally Posted by stolze
You were just plain lucky in the other 317 cases and in everything before DB2 V9.
Yes and no. I maybe wasn't clear enough

With our previous version of DB2 (8.2.3), we never encountered such problems. We upgraded DB2 with "fixpak 14" on June 22nd and the problems began this day.

Our UDF's get executed thousands of times a day and in 99.99% of the time (with our "new" DB2 8.2.7), the SELECT statements succeed.

Between June 22nd and June 29th, SELECT statements using our UDF's failed 318 times with SQLCode SQL0430N. Out of these 318 failures, we only got one Java stacktrace in db2diag.log. The other 317 cases didn't bring any useful informations...

I hope this clarifies the things a little bit more

- Colargol

P.S: We never had performance issues with our Java UDF. In terms of performance, modern JRE's no longer have to be ashamed when compared to C compilers!

Last edited by colargol70; 07-02-07 at 02:23.
Reply With Quote
  #7 (permalink)  
Old 07-02-07, 14:40
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Maybe you got a newer JVM with the FP install, and that JVM is not very stable... What exactly does the stack trace say?
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #8 (permalink)  
Old 07-04-07, 08:46
colargol70 colargol70 is offline
Registered User
 
Join Date: Jun 2007
Posts: 8
Quote:
Originally Posted by stolze
Maybe you got a newer JVM with the FP install, and that JVM is not very stable...
Yes, we also did get a new VM. Previous version was 1.4.2.0 and new version is 1.4.2.100. I'll try to investigate in this direction.

Quote:
Originally Posted by stolze
What exactly does the stack trace say?
That no data is found for a certain key value from a hashtable, that *should* have been initialized... That means either that the hashtable wasn't successfully initialized (in which case, a log entry should have been produced, which is not the case!), or the hashtable was emptied at some time. I have no valid explanation for both assumptions...

Here is as input one of the UDFs that crash the most. This method is in the same class where the hashtable mentionned above is initialized, but doesn't itself use data out of the hashtable:

Code:
private static final BigDecimal ZERO = new BigDecimal("0");
	
public static BigDecimal round(BigDecimal originalValue, BigDecimal roundingValue) {
    BigDecimal ret = null;

    if (originalValue != null) {
        ret = originalValue;
			
        if (roundingValue != null && roundingValue.compareTo(ZERO) != 0) {
            ret = ret.divide(roundingValue, 0, BigDecimal.ROUND_HALF_UP).multiply(roundingValue);
        }

        ret = ret.setScale(6);
    }

    return ret;
}
This method is really trivial and I see no place where it could crash by itself. As we don't get any interesting information in db2diag.log, my guess is that not the method is crashing, but the whole VM...

The biggest problem with this issue, is that this isn't easily reproducible: I took one the of the SELECT statements that failed and executed it about 100'000 times without getting any error...

- Colargol
Reply With Quote
  #9 (permalink)  
Old 07-05-07, 06:19
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Quote:
Originally Posted by colargol70
Yes, we also did get a new VM. Previous version was 1.4.2.0 and new version is 1.4.2.100. I'll try to investigate in this direction.
I suggest that you revert back to 1.4.2.0 and see if the crashes vanish. If yes, then the JVM update introduced bugs (which is what I suspect right now).


Quote:
That no data is found for a certain key value from a hashtable, that *should* have been initialized... That means either that the hashtable wasn't successfully initialized (in which case, a log entry should have been produced, which is not the case!), or the hashtable was emptied at some time. I have no valid explanation for both assumptions...
Could you possibly post the exact stack trace?
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #10 (permalink)  
Old 07-06-07, 05:34
colargol70 colargol70 is offline
Registered User
 
Join Date: Jun 2007
Posts: 8
Quote:
Originally Posted by stolze
I suggest that you revert back to 1.4.2.0 and see if the crashes vanish. If yes, then the JVM update introduced bugs (which is what I suspect right now).
We haven't tested this yet.

Quote:
Originally Posted by stolze
Could you possibly post the exact stack trace?
This won't help much, but here we go

Code:
java.lang.IllegalArgumentException: No Currency for ID:CHF
    at dsc.udf.CurrencyUDF.fixChange(CurrencyUDF.java:302)
    at dsc.udf.CurrencyUDF.fixChange(CurrencyUDF.java:257)
We throw this exception if a certain condition occurs. Here the code:

Code:
Object[] aBaseCurrency = (Object[]) getCurrencyIdToValues().get(aSourceCurrency);

if (aBaseCurrency == null) {
    throw new IllegalArgumentException("No Currency for ID:" + aSourceCurrency);
}
This fails because getCurrencyIdToValues() (a Hashtable), returns no value for "CHF". getCurrencyIdToValues() actually is a lazy initializer. If the hashtable is null, it creates and *initializes* it (selects data from the DB and caches them into it). In *all* cases, data exist for "CHF", so there is no obvious reason, why null was returned (i.e. why the hashtable was empty at that moment).

- Colargol
Reply With Quote
  #11 (permalink)  
Old 07-06-07, 05:39
colargol70 colargol70 is offline
Registered User
 
Join Date: Jun 2007
Posts: 8
We finally were able to write a test, that allows us to reproduce the error on our test systems!

With this test we quickly found out that the error is related to a DBM configuration parameter we changed recently: INTRA_PARALLEL ("Enable intra-partition parallelism"). With this value set to "NO", the error never occurs. It does now as we changed it to "YES".

This still isn't a good explanation, but at least we're moving forward.

This also "seems" to be related to the fact, that most of the SELECT statements that crash are using nested UDFs or UDFs that nest DB2 function calls (like "SELECT XXX.MyUDF(DECIMAL(column))...").

- Colargol
Reply With Quote
  #12 (permalink)  
Old 07-06-07, 07:07
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
First, with INTRA_PARALLEL, you may have concurrent executions of the same UDF.

DB2 uses separate class loaders so that different UDF invocations cannot interfere with each other. But this assumes that you install your Java code in none of the system paths. Did you do that by any chance? If so, the JVM boot strap class loader may load your classes and screw up this protection mechanism. Side effects could mess up your whole implementation.

Also, I would thoroughly review the initialization code. I would add some code to write the content of the hash table to a file in the "get" method. Then you can verify exactly at this point if CHF is really in the hash table or not. If it is, then the "get" method may be in error. If it is not, your initializer has an issue. Maybe the initialization has a problem due to (intra-)parallel statement execution and you don't catch some DB2 warning or error? (Just guessing, though.)
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #13 (permalink)  
Old 07-09-07, 05:46
colargol70 colargol70 is offline
Registered User
 
Join Date: Jun 2007
Posts: 8
Hi Knut,

Quote:
Originally Posted by stolze
DB2 uses separate class loaders so that different UDF invocations cannot interfere with each other. But this assumes that you install your Java code in none of the system paths. Did you do that by any chance? If so, the JVM boot strap class loader may load your classes and screw up this protection mechanism. Side effects could mess up your whole implementation.
Which directories does DB2 consider as being in the system path? We put our UDF JAR files into the "standard" sqllib/function directory.

Quote:
Originally Posted by stolze
Also, I would thoroughly review the initialization code. I would add some code to write the content of the hash table to a file in the "get" method. Then you can verify exactly at this point if CHF is really in the hash table or not. If it is, then the "get" method may be in error. If it is not, your initializer has an issue. Maybe the initialization has a problem due to (intra-)parallel statement execution and you don't catch some DB2 warning or error? (Just guessing, though.)
This is weirder than that: We rewrote our UDF methods, removed all "business" code (including the above mentioned initialization) and now return "new BigDecimal(1.0)" in place of the "real" calculations. Guess what? We still get SQL0430N errors...

We also tried all combinations of "FENCED/NOT FENCED" and "THREADSAFE/NOT THREADSAFE", without any success.

The only thing that helps is to turn off "INTRA_PARALLEL"...

- Colargol
Reply With Quote
  #14 (permalink)  
Old 07-09-07, 12:00
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Quote:
Originally Posted by colargol70
Hi Knut,

Which directories does DB2 consider as being in the system path? We put our UDF JAR files into the "standard" sqllib/function directory.
That depends on the platform, of course. The "system path" is fairly restricted and it is a path where the DB2 instance owner does not have write access to, i.e. things like /usr/lib. So putting stuff in sqllib/function/ is fine from that perspective.

Quote:
This is weirder than that: We rewrote our UDF methods, removed all "business" code (including the above mentioned initialization) and now return "new BigDecimal(1.0)" in place of the "real" calculations. Guess what? We still get SQL0430N errors...
Same stack traceback?

Any results from reverting back to Java 1.4.0.0?

Quote:
We also tried all combinations of "FENCED/NOT FENCED" and "THREADSAFE/NOT THREADSAFE", without any success.
Java UDFs are always executed in FENCED mode. So switching that doesn't have any impact. I'm not sure about [NOT] THREADSAFE.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On