Results 1 to 14 of 14
  1. #1
    Join Date
    Jun 2007
    Posts
    8

    Unanswered: UDF's crash after new CPU installation on AIX!

    Hi all,

    Last week we upgraded the CPU of our AIX server. Until then, we had no problems running our UDF's, but we now get SQL0430N error messages ("User defined function "XXX" has abnormally terminated).

    The strange thing is that

    1) This happens with all UDF's, not with a particular one

    2) This happens quite rarely (about 50 times/day, although the UDF's get called thousands of times every day)

    Any idea? Any help is welcome.

    - Colargol

  2. #2
    Join Date
    Jun 2007
    Posts
    8
    Our DBA informed me this morning that we did not only upgrade the hardware... but also the DB2 version, which most certainly has a bigger impact on the problem mentioned above

    Our new DB2 version is "DB2 8.2 fixpack 7".

    Any help is still welcome

    - Colargol

  3. #3
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    My guess is that the UDFs were buggy to begin with. For example, if memory is allocated but not properly initialized, then other parts of the code may use invalid data. The operating system will initialize a new data page when requested, but not every UDF invocation may resort to new data pages and could reuse existing ones. Similar issues can arise with variables on the stack. This is just one possibility, though.

    Your best bet is to debug your UDF. What you may want to try is to install a signal handler for signal 11 (sigsegv) at the start of the UDF, and the signal handler can write a core dump or whatever other information you want. At the end of the UDF, you re-install the default signal handler.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  4. #4
    Join Date
    Jun 2007
    Posts
    8
    Hi Knut,

    Thanks for your answer. This is also what I thought, because out of 318 "UDF crashes", we finally got yesterday *one* Java stack trace (yes, I forgot to mention that these are Java UDF's) in db2diag.log. In this stack trace I can clearly see, that access to a map, lazily initialized, fails, although it never should... That still doesn't explain what happened in the other 317 cases

    You suggest I install a signal handler. Ok if these were C UDF's... but any idea how I can achieve this in Java? The only possibility I see would be to execute an external process by using the "Runtime.getRuntime().exec()" method... or am I completely wrong?

    - Colargol
    Last edited by colargol70; 06-29-07 at 10:54.

  5. #5
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    You were just plain lucky in the other 317 cases and in everything before DB2 V9.

    I don't know if you can actually install a signal handler in Java code. Using Runtime.exec() will definitively not work because a signal handler must be installed by the executing process itself. What you could try is:
    (1) write a C UDF that installs the signal handler, and then
    (2) calls your Java UDF in a nested SQL call, then
    (3) remove the signal handler again.
    This may fail if DB2 re-installs the signal handler before the the Java function is called, or if the Java function is executed in a different process (which is the case anyways if I'm not mistaken).

    p.s: I just don't like Java UDFs because the idea of UDFs is to execute them within a SQL statement that may process many, many rows. Under such conditions, performance is crucial...
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  6. #6
    Join Date
    Jun 2007
    Posts
    8
    Quote Originally Posted by stolze
    You were just plain lucky in the other 317 cases and in everything before DB2 V9.
    Yes and no. I maybe wasn't clear enough

    With our previous version of DB2 (8.2.3), we never encountered such problems. We upgraded DB2 with "fixpak 14" on June 22nd and the problems began this day.

    Our UDF's get executed thousands of times a day and in 99.99% of the time (with our "new" DB2 8.2.7), the SELECT statements succeed.

    Between June 22nd and June 29th, SELECT statements using our UDF's failed 318 times with SQLCode SQL0430N. Out of these 318 failures, we only got one Java stacktrace in db2diag.log. The other 317 cases didn't bring any useful informations...

    I hope this clarifies the things a little bit more

    - Colargol

    P.S: We never had performance issues with our Java UDF. In terms of performance, modern JRE's no longer have to be ashamed when compared to C compilers!
    Last edited by colargol70; 07-02-07 at 03:23.

  7. #7
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Maybe you got a newer JVM with the FP install, and that JVM is not very stable... What exactly does the stack trace say?
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  8. #8
    Join Date
    Jun 2007
    Posts
    8
    Quote Originally Posted by stolze
    Maybe you got a newer JVM with the FP install, and that JVM is not very stable...
    Yes, we also did get a new VM. Previous version was 1.4.2.0 and new version is 1.4.2.100. I'll try to investigate in this direction.

    Quote Originally Posted by stolze
    What exactly does the stack trace say?
    That no data is found for a certain key value from a hashtable, that *should* have been initialized... That means either that the hashtable wasn't successfully initialized (in which case, a log entry should have been produced, which is not the case!), or the hashtable was emptied at some time. I have no valid explanation for both assumptions...

    Here is as input one of the UDFs that crash the most. This method is in the same class where the hashtable mentionned above is initialized, but doesn't itself use data out of the hashtable:

    Code:
    private static final BigDecimal ZERO = new BigDecimal("0");
    	
    public static BigDecimal round(BigDecimal originalValue, BigDecimal roundingValue) {
        BigDecimal ret = null;
    
        if (originalValue != null) {
            ret = originalValue;
    			
            if (roundingValue != null && roundingValue.compareTo(ZERO) != 0) {
                ret = ret.divide(roundingValue, 0, BigDecimal.ROUND_HALF_UP).multiply(roundingValue);
            }
    
            ret = ret.setScale(6);
        }
    
        return ret;
    }
    This method is really trivial and I see no place where it could crash by itself. As we don't get any interesting information in db2diag.log, my guess is that not the method is crashing, but the whole VM...

    The biggest problem with this issue, is that this isn't easily reproducible: I took one the of the SELECT statements that failed and executed it about 100'000 times without getting any error...

    - Colargol

  9. #9
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by colargol70
    Yes, we also did get a new VM. Previous version was 1.4.2.0 and new version is 1.4.2.100. I'll try to investigate in this direction.
    I suggest that you revert back to 1.4.2.0 and see if the crashes vanish. If yes, then the JVM update introduced bugs (which is what I suspect right now).


    That no data is found for a certain key value from a hashtable, that *should* have been initialized... That means either that the hashtable wasn't successfully initialized (in which case, a log entry should have been produced, which is not the case!), or the hashtable was emptied at some time. I have no valid explanation for both assumptions...
    Could you possibly post the exact stack trace?
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  10. #10
    Join Date
    Jun 2007
    Posts
    8
    Quote Originally Posted by stolze
    I suggest that you revert back to 1.4.2.0 and see if the crashes vanish. If yes, then the JVM update introduced bugs (which is what I suspect right now).
    We haven't tested this yet.

    Quote Originally Posted by stolze
    Could you possibly post the exact stack trace?
    This won't help much, but here we go

    Code:
    java.lang.IllegalArgumentException: No Currency for ID:CHF
        at dsc.udf.CurrencyUDF.fixChange(CurrencyUDF.java:302)
        at dsc.udf.CurrencyUDF.fixChange(CurrencyUDF.java:257)
    We throw this exception if a certain condition occurs. Here the code:

    Code:
    Object[] aBaseCurrency = (Object[]) getCurrencyIdToValues().get(aSourceCurrency);
    
    if (aBaseCurrency == null) {
        throw new IllegalArgumentException("No Currency for ID:" + aSourceCurrency);
    }
    This fails because getCurrencyIdToValues() (a Hashtable), returns no value for "CHF". getCurrencyIdToValues() actually is a lazy initializer. If the hashtable is null, it creates and *initializes* it (selects data from the DB and caches them into it). In *all* cases, data exist for "CHF", so there is no obvious reason, why null was returned (i.e. why the hashtable was empty at that moment).

    - Colargol

  11. #11
    Join Date
    Jun 2007
    Posts
    8
    We finally were able to write a test, that allows us to reproduce the error on our test systems!

    With this test we quickly found out that the error is related to a DBM configuration parameter we changed recently: INTRA_PARALLEL ("Enable intra-partition parallelism"). With this value set to "NO", the error never occurs. It does now as we changed it to "YES".

    This still isn't a good explanation, but at least we're moving forward.

    This also "seems" to be related to the fact, that most of the SELECT statements that crash are using nested UDFs or UDFs that nest DB2 function calls (like "SELECT XXX.MyUDF(DECIMAL(column))...").

    - Colargol

  12. #12
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    First, with INTRA_PARALLEL, you may have concurrent executions of the same UDF.

    DB2 uses separate class loaders so that different UDF invocations cannot interfere with each other. But this assumes that you install your Java code in none of the system paths. Did you do that by any chance? If so, the JVM boot strap class loader may load your classes and screw up this protection mechanism. Side effects could mess up your whole implementation.

    Also, I would thoroughly review the initialization code. I would add some code to write the content of the hash table to a file in the "get" method. Then you can verify exactly at this point if CHF is really in the hash table or not. If it is, then the "get" method may be in error. If it is not, your initializer has an issue. Maybe the initialization has a problem due to (intra-)parallel statement execution and you don't catch some DB2 warning or error? (Just guessing, though.)
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  13. #13
    Join Date
    Jun 2007
    Posts
    8
    Hi Knut,

    Quote Originally Posted by stolze
    DB2 uses separate class loaders so that different UDF invocations cannot interfere with each other. But this assumes that you install your Java code in none of the system paths. Did you do that by any chance? If so, the JVM boot strap class loader may load your classes and screw up this protection mechanism. Side effects could mess up your whole implementation.
    Which directories does DB2 consider as being in the system path? We put our UDF JAR files into the "standard" sqllib/function directory.

    Quote Originally Posted by stolze
    Also, I would thoroughly review the initialization code. I would add some code to write the content of the hash table to a file in the "get" method. Then you can verify exactly at this point if CHF is really in the hash table or not. If it is, then the "get" method may be in error. If it is not, your initializer has an issue. Maybe the initialization has a problem due to (intra-)parallel statement execution and you don't catch some DB2 warning or error? (Just guessing, though.)
    This is weirder than that: We rewrote our UDF methods, removed all "business" code (including the above mentioned initialization) and now return "new BigDecimal(1.0)" in place of the "real" calculations. Guess what? We still get SQL0430N errors...

    We also tried all combinations of "FENCED/NOT FENCED" and "THREADSAFE/NOT THREADSAFE", without any success.

    The only thing that helps is to turn off "INTRA_PARALLEL"...

    - Colargol

  14. #14
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by colargol70
    Hi Knut,

    Which directories does DB2 consider as being in the system path? We put our UDF JAR files into the "standard" sqllib/function directory.
    That depends on the platform, of course. The "system path" is fairly restricted and it is a path where the DB2 instance owner does not have write access to, i.e. things like /usr/lib. So putting stuff in sqllib/function/ is fine from that perspective.

    This is weirder than that: We rewrote our UDF methods, removed all "business" code (including the above mentioned initialization) and now return "new BigDecimal(1.0)" in place of the "real" calculations. Guess what? We still get SQL0430N errors...
    Same stack traceback?

    Any results from reverting back to Java 1.4.0.0?

    We also tried all combinations of "FENCED/NOT FENCED" and "THREADSAFE/NOT THREADSAFE", without any success.
    Java UDFs are always executed in FENCED mode. So switching that doesn't have any impact. I'm not sure about [NOT] THREADSAFE.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •