Results 1 to 12 of 12
  1. #1
    Join Date
    Jun 2008
    Posts
    6

    Post Unanswered: Performance issue on selecting random values using ORDER BY DBMS.RANDOM.VALUE

    Hi,

    Im substituting(replacing) values in column 'NAME' of table 'tableA' with values from column 'STATE' of table 'tableB'
    The Oracle version is 9.2.0.7 release.
    Pls find the code below.

    Code to create tableA:
    Code:
    create table tableA(NAME varchar(500),EMAIL varchar(500))
    insert into tableA(NAME,EMAIL) values(denis,denis@gmail.com)
    insert into tableA(NAME,EMAIL) values(richie,richie@gmail.com)
    insert into tableA(NAME,EMAIL) values(tapay,tapay@gmail.com)
    insert into tableA(NAME,EMAIL) values(rose,rose@gmail.com)
    Code to create tableB:
    Code:
    create table tableB(NAME varchar(500),STATE varchar(500))
    insert into tableB(NAME,STATE) values(ron,alabama)
    insert into tableB(NAME,STATE) values(crown,slovakia)
    insert into tableB(NAME,STATE) values(wang,mississipi)
    insert into tableB(NAME,STATE) values(don,vegas)
    insert into tableB(NAME,STATE) values(sean,detroit)
    Code to update tableA with values of column STATE in tableB:
    Code:
    update tableA set NAME = (SELECT * FROM (SELECT state FROM tableB ORDER BY dbms_random.value) WHERE ROWNUM<2);
    This query works fine. But, the time taken to update 5000 records (ie, replacing 5000 NAMES of tableA with 5000 STATES of tableB) is around 20 minutes
    For 100,000 records it takes more than one hour!!
    I need to speed up the process..as my tables can have upto 1000000 records..

    Pls suggestion a better solution.

    Thanks in advance..

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Maybe a cursor-based solution would be preferable in this case:
    Code:
    declare
       cursor c1 is select rowid as row_id from tableA;
       c1_row c1%rowtype;
       cursor c2 is select state from tableB order by dbms_random.value;
       c2_row c2%rowtype;
    begin
       loop
          fetch c1 into c1_row;
          exit when c1%notfound;
          fetch c2 into c2_row;
          exit when c2%notfound;
          update tableA set name = c2_row.state where rowid = c1_row.row_id;
       end loop;
    end;

  3. #3
    Join Date
    Jun 2008
    Posts
    6

    ORA-01001 invalid cursor

    Andrew,

    I tried using the code that you posted... I get this error 'ORA-01001 invalid cursor' at the line inside the loop

    Code:
    fetch c1 into c1_row;
    Do i need to modify this code...im not sure cause i have not used cursors before..

    Thanks
    Roy

  4. #4
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Sorry, I so rarely use explicit cursors myself I forgot to OPEN them!
    Code:
    declare
       cursor c1 is select rowid as row_id from tableA;
       c1_row c1%rowtype;
       cursor c2 is select state from tableB order by dbms_random.value;
       c2_row c2%rowtype;
    begin
       open c1;
       open c2;
       loop
          fetch c1 into c1_row;
          exit when c1%notfound;
          fetch c2 into c2_row;
          exit when c2%notfound;
          update tableA set name = c2_row.state where rowid = c1_row.row_id;
       end loop;
    end;

  5. #5
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    Another potential issue might be if the column your updating is part of an index. Updates are then very slow as they have to update the table and the index and this generates a lot of redo. If this is the case you might be better off creating a new table i.e.

    create table x as
    select..
    from tableA, tableB
    ...

    create index on x

    rename tableA to y

    rename x to tableA

    You could make the table creation nologging but a backup should be taken afterwards.

    Alan

  6. #6
    Join Date
    Jun 2008
    Posts
    6
    Andrew,
    I read that using cursors affects performance due its overhead.

    Alan,
    In case of indexing Creating a new table could be an option..
    But my requirement is to update the current table...suppose there are many columns in the table then more space would be required for this operation.. ??

  7. #7
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Quote Originally Posted by tech_tree199
    I read that using cursors affects performance due its overhead.
    So that's that then? Why not try it and see the performance for yourself?

  8. #8
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    Quote Originally Posted by tech_tree199
    Alan,
    In case of indexing Creating a new table could be an option..
    But my requirement is to update the current table...suppose there are many columns in the table then more space would be required for this operation.. ??
    Is the column you are updating part of the index? If it isnt dont bother. If it is then yes you will require extra space as you are in effect making a copy of tableA. NOTE this should be done in a downtime window as for a split second the table will be unavailable.

    Alan

  9. #9
    Join Date
    Jun 2008
    Posts
    6

    my real time scenario

    Quote Originally Posted by andrewst
    So that's that then? Why not try it and see the performance for yourself?
    Andrew
    I did try it by the way...only that i didn reply with the result
    Updating 100,000 records took around 1 min 30 secs....but having that update query inside the loop could be a problem for me...i ll explain the same scenario (the requirement that i first posted) in detail as below...

    This has to be done through a stored procedure and user defined function.

    The table name 'tableA' and its column name 'NAME' is passed to the stored procedure 'myProcedure', wherein i set the names of tableB and its column STATE and call the function 'myFunction' that develops a select query to pick random values from STATE of tableB and finally update the tableA with those values in the stored procedure.
    Pls find the code below.

    Code to create tableA:
    Code:
    create table tableA(NAME varchar(500),EMAIL varchar(500))
    insert into tableA(NAME,EMAIL) values(denis,denis@gmail.com)
    insert into tableA(NAME,EMAIL) values(richie,richie@gmail.com)
    insert into tableA(NAME,EMAIL) values(tapay,tapay@gmail.com)
    insert into tableA(NAME,EMAIL) values(rose,rose@gmail.com)
    Code to create tableB:
    Code:
    create table tableB(NAME varchar(500),STATE varchar(500))
    insert into tableB(NAME,STATE) values(ron,alabama)
    insert into tableB(NAME,STATE) values(crown,slovakia)
    insert into tableB(NAME,STATE) values(wang,mississipi)
    insert into tableB(NAME,STATE) values(don,vegas)
    insert into tableB(NAME,STATE) values(sean,detroit)
    Code of stored Procedure:
    Code:
    CREATE OR REPLACE PROCEDURE myProcedure(
    	   	  		  tablename IN VARCHAR2 ,
    				  columnname IN VARCHAR2 
    				  )
    IS
     	
    	param1 	       VARCHAR2(50);
    	param2 	       VARCHAR2(50);
    	param3 	       VARCHAR2(50);
    	updatequery    VARCHAR2(1000);
    	functionString varchar2(1000);
    BEGIN
    
    	updatequery := 'update '||tablename||' set ';
    	param1 := 'tableB';
    	param2 := 'state';
    	param3 := '';
    	functionString:= 'myFunction('''||param1||''','''||param2||''','''||param3||''')' ;
    	updatequery:=updatequery ||columnname|| '=' ||functionString;
    	Execute immediate updateQuery;
    END;
    Code to create function:
    Code:
    CREATE OR REPLACE FUNCTION myFunction(param1 IN VARCHAR2,param2 IN VARCHAR2,param3 IN VARCHAR2) RETURN varchar2
        AS
    	field1_var   varchar(100);
           	SQL_string VARCHAR2(1000);
    
    BEGIN
    	field1_var:='SELECT * FROM (SELECT '||param2||' FROM '||param1||' ORDER BY dbms_random.value) WHERE ROWNUM<2';
    	Execute immediate field1_var into sql_string; 
    	return sql_string;
    END;
    Code Calling the stored procedure from jdbc:
    Code:
    CallableStatement csmt = con.prepareCall("{call myProcedure(?,?)}");
    csmt.setString(1,'tableA');
    csmt.setString(2,'NAME');
    int result = csmt.executeUpdate();
    The reason why im constructing the select query in myFunction is...i need to call other functions(user defined) too and append the result of those functions to the update query in myProcedure (i haven't included this part in the presented code)....a typical case would be to do different operations on different columns of a table...with a single final update query.

    The update query is just one final query executed in the procedure...In such a case where should i be implementing the cursor...???

    Roy

  10. #10
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    That looks like a design that is sure to perform badly. How about a design that generates a PL/SQL block of code like mine based on some parameters:
    Code:
    declare
       cursor c1 is select rowid as row_id from xxxx;
       c1_row c1%rowtype;
       cursor c2 is select yyyy from zzzz order by dbms_random.value;
       c2_row c2%rowtype;
    begin
       loop
          fetch c1 into c1_row;
          exit when c1%notfound;
          fetch c2 into c2_row;
          exit when c2%notfound;
          update xxxx set nnnn = c2_row.yyyy where rowid = c1_row.row_id;
       end loop;
    end;
    i.e. something like:
    Code:
    function myfun 
       ( p_table1 varchar2
       , p_col1 varchar2
       , p_table2 varchar2
       , p_col2 varchar2
       ) return varchar2
    is
       l_sql long;
    begin
       l_sql := 'declare cursor c1 is select rowid as row_id from ' || p_table1 || ';';
       ... etc.
       return l_sql;
    end;
    Then:
    Code:
    declare
       l_sql long;
    begin
       l_sql := myfun ('tableA', 'name', 'tableB', 'state');
       execute immediate l_sql;
    end;
    /
    I realise that's probably a gross simplification of your needs, but maybe shows a direction to go in?

  11. #11
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Quote Originally Posted by andrewst
    That looks like a design that is sure to perform badly...
    What I meant by that is: you are writing a function (myFunction) that generates a select statement that does a full scan on a table or index containing 1 million rows, sorts them into a random order and returns 1 row. Then you are calling that function 1 million times! (You said both tables can have up to 1 million records). That is 1 trillion logical reads!

    Another issue is that the function is not guaranteed to return a different random value each time. If you shuffle a pack of cards and note the top card 52 times in a row, you are very unlikely indeed to get 52 different results. Of course, it may be that repeated values don't matter for your needs?

  12. #12
    Join Date
    Jun 2008
    Posts
    6
    Quote Originally Posted by andrewst
    What I meant by that is: you are writing a function (myFunction) that generates a select statement that does a full scan on a table or index containing 1 million rows, sorts them into a random order and returns 1 row. Then you are calling that function 1 million times! (You said both tables can have up to 1 million records). That is 1 trillion logical reads!

    Another issue is that the function is not guaranteed to return a different random value each time. If you shuffle a pack of cards and note the top card 52 times in a row, you are very unlikely indeed to get 52 different results. Of course, it may be that repeated values don't matter for your needs?
    Nice..now im getting a precise picture of the performance problem!....1 trillion..that is not wat i need!

    I should try out the other approach you mentioned..ie..doing the entire cursor..select...update stuff in the function itself...and put in some condition statements in the procedure...

    Regarding the issue of reptition....the repeated values are not a concern in this case..
    But i do have another situation where repetition is not allowed, in which i will insert the two values, ie. the assigned STATE values(column2 of tableB) assigned and the corresponding NAME values(column1 of tableA) into another table...I was thinking a where not exists condition would check repetition ....should that do good??

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •