Results 1 to 10 of 10
  1. #1
    Join Date
    Jul 2012
    Location
    GA
    Posts
    5

    Unanswered: Partitioned database Headaches -- SQL1229N, SQLSTATE=40504

    Hi guys. Recently I have been working on setting up a partitioned database V9.7 on two physical nodes (node0 and node1). Each node has several logical partitions. These two db2inst1 homes are in the same, shared disk -- /shared/home/db2inst1. And I also set up all config files, like SSH, /etc/hosts, /etc/services, ~/.rhosts and db2nodes.cfg.


    Here is the problem. When I start this database on the node0 (which is the master node),
    $db2start

    07/16/2012 11:18:18 0 0 SQL1063N DB2START processing was successful.
    07/16/2012 11:18:18 1 0 SQL1063N DB2START processing was successful.
    SQL1063N DB2START processing was successful.

    It looks perfect and without communication problems.

    However, then when I try to create a database,

    $db2 create database tmp

    SQL1229N The current transaction has been rolled back because of a system
    error. SQLSTATE=40504

    And what the db2diag.log said is:

    2012-07-16-11.26.30.729557-240 E323322796E480 LEVEL: Error (OS)
    PID : 30793 TID : 47067820976448PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000
    EDUID : 14 EDUNAME: db2fcms 0
    FUNCTION: DB2 UDB, oper system services, sqloPdbQuerySocketErrorStatus, probe:15
    MESSAGE : ZRC=0x810F0077=-2129723273=SQLO_COMM_ERR_EHOSTUNREACH
    "No route to host"
    CALLED : OS, -, getsockopt OSERR: EHOSTUNREACH (113)



    Then I tried to start db2 on the node1, which is the slave node. It failed.

    (node1)$db2start

    07/16/2012 11:32:51 0 0 SQL6048N A communication error occurred during START or STOP DATABASE MANAGER processing.
    07/16/2012 11:32:53 1 0 SQL1063N DB2START processing was successful.
    SQL6032W Start command processing was attempted on "2" node(s). "1" node(s) were successfully started. "0" node(s) were already started. "1" node(s) could not be started.

    And db2diag.log file said,

    2012-07-16-11.32.51.086509-240 E323333691E603 LEVEL: Error
    PID : 30367 TID : 46913609250080PROC : db2start
    INSTANCE: db2inst1 NODE : 000
    FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:110
    MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
    DATA #1 : String, 204 bytes
    The remote shell program terminated prematurely. The most likely causes are either that the DB2RSHCMD registry variable is set to an invalid setting, or the remote command program failed to authenticate.
    DATA #2 : String, 12 bytes
    /usr/bin/ssh

    2012-07-16-11.32.51.114725-240 E323334295E557 LEVEL: Error
    PID : 30367 TID : 46913609250080PROC : db2start
    INSTANCE: db2inst1 NODE : 000
    FUNCTION: DB2 UDB, oper system services, sqloPdbInitializeRemoteCommand, probe:200
    MESSAGE : ZRC=0x810F0012=-2129723374=SQLO_COMM_ERR "Communication error"
    DATA #1 : String, 25 bytes
    node1.XXX.XXX.XXXXX.edu
    DATA #2 : String, 25 bytes
    node0.XXX.XXX.XXXXX.edu
    DATA #3 : String, 93 bytes
    db2rcmd: Failed to connect I/O socket to node41.clus.cci.emory.edu on port 57326, errno 113.


    It seems to relate to the communication between two nodes. But I could SSh both without Authorization and db2start successfully on the node0. Also db2_all got a correct answer on both machines.


    Any solutions?? All suggestion are greatly appreciated. It really makes me headache for one week.


    Alex

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Alexe View Post
    Each node has several logical partitions
    I don't think that's what your output indicates. Please show db2nodes.cfg and the relevant fragments of /etc/services on both servers.

  3. #3
    Join Date
    Jul 2012
    Location
    GA
    Posts
    5

    config files

    Hi n_i,


    Thank you for your quickly reply. Here is the relevant fragments

    db2nodes.cfg:

    0 node0 0
    1 node1 0

    /etc/hosts:

    192.168.1.159 node0.xxx.xxx.xxxxx.edu node0
    192.168.1.160 node1.xxx.xxx.xxxxx.edu node1

    /etc/services

    DB2_db2inst1 60000/tcp
    DB2_db2inst1_1 60001/tcp
    DB2_db2inst1_2 60002/tcp
    DB2_db2inst1_3 60003/tcp
    DB2_db2inst1_4 60004/tcp
    DB2_db2inst1_5 60005/tcp
    DB2_db2inst1_6 60006/tcp
    DB2_db2inst1_7 60007/tcp
    DB2_db2inst1_8 60008/tcp
    DB2_db2inst1_9 60009/tcp
    DB2_db2inst1_10 60010/tcp
    DB2_db2inst1_11 60011/tcp
    DB2_db2inst1_12 60012/tcp
    DB2_db2inst1_13 60013/tcp
    DB2_db2inst1_14 60014/tcp
    DB2_db2inst1_15 60015/tcp
    DB2_db2inst1_16 60016/tcp
    DB2_db2inst1_17 60017/tcp
    DB2_db2inst1_18 60018/tcp
    DB2_db2inst1_19 60019/tcp
    DB2_db2inst1_20 60020/tcp
    DB2_db2inst1_21 60021/tcp
    DB2_db2inst1_22 60022/tcp
    DB2_db2inst1_23 60023/tcp
    DB2_db2inst1_24 60024/tcp
    DB2_db2inst1_25 60025/tcp
    DB2_db2inst1_26 60026/tcp
    DB2_db2inst1_27 60027/tcp
    DB2_db2inst1_28 60028/tcp
    DB2_db2inst1_29 60029/tcp
    DB2_db2inst1_30 60030/tcp
    DB2_db2inst1_END 60031/tcp
    db2c_db2inst1 50001/tcp

    And exactly same on the other node.

    ~/.rhosts:

    node0 db2inst1
    node1 db2inst1


    Actually, we will have 15 logical partitions on each nodes but now I just test one.



    Alex

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Ok, so you don't have 15 logical partitions, just as I thought.

    Use fully qualified host names in db2nodes.cfg, as the manual instructs.

  5. #5
    Join Date
    Jul 2012
    Location
    GA
    Posts
    5
    Thank you but I think it's not the point because I already set it in the /etc/hosts and .equiv file. Anyway I try it as you said but it doesn't work.

    Thank you.


    Alex

  6. #6
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Alexe View Post
    it doesn't work.
    It?

    Quote Originally Posted by Alexe
    DATA #1 : String, 25 bytes
    node1.XXX.XXX.XXXXX.edu
    DATA #2 : String, 25 bytes
    node0.XXX.XXX.XXXXX.edu
    DATA #3 : String, 93 bytes
    db2rcmd: Failed to connect I/O socket to node41.clus.cci.emory.edu on port 57326, errno 113.
    I don't want to waste time trying to find errors in your interpretation of the actual events. Since you chose not to provide your actual configuration I can only wish you best of luck in your troubleshooting.

  7. #7
    Join Date
    Jul 2012
    Location
    GA
    Posts
    5
    Hi n_i,

    Sorry to trouble you but I just want to mask my real IP address and all others are actual parameters. I just try to modify my db2nodes.cfg to the fully qualified host names and db2stop/db2start. But the new db2nodes.cfg does not work.

    Anyway thank you for your time.


    Alex

  8. #8
    Join Date
    Jul 2012
    Location
    GA
    Posts
    5
    To more clear, "node0" in my post means node40.clus.cci.emory.edu and "node1" means node41.clus.cci.emory.edu.

    Sorry if I offend someone.


    Alex

  9. #9
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    I can only assume that something happened between

    Code:
    07/16/2012 11:18:18 1 0 SQL1063N DB2START processing was successful.
    and

    Code:
    2012-07-16-11.26.30.729557-240 E323322796E480 LEVEL: Error (OS)
    which made communications between the members (nodes) impossible.

    This

    Code:
    07/16/2012 11:32:53 1 0 SQL1063N DB2START processing was successful.
    also tells me that member 1 was stopped (or crashed) sometime between 11:18:18 and 11:32:53, which might explain the communication error at 11.26.30.
    ---
    "It does not work" is not a valid problem statement.

  10. #10
    Join Date
    Nov 2011
    Posts
    334
    Why db2 need to connect to node41.clus.cci.emory.edu?
    I didn't see it in any cfg files (/etc/hosts, db2nodes.cfg, .rhosts)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •