Results 1 to 4 of 4
  1. #1
    Join Date
    Jul 2010
    Posts
    21

    Unanswered: python-sqlite to postgres

    Hi,

    I am currently want to fetch data from sqlite and insert into postgres. I know there's a feature like copy. But the situation is, the sqlite capture data everyday, and i have to fetch new data only and insert into postgres. I wrote something like this:

    Code:
    logs=[(20101001,),(20101002,),(20101003,),(20101004,)]
    
    for i in logs:
         for name in glob.glob('/mnt/log_%s.sq3.*'%(i[0],i0])):
            if os.path.isfile(name) == True: 
    	   try:
    		conn=sqlite3.connect(name)
    		curs=conn.cursor()
                    selectDB="SELECT * FROM log"
    		curs.execute(selectDB)
    		rows=curs.fetchall()
    
                    cnt=0
                    for row in rows:
    	             cnt=cnt+1
    	             insertDB="""INSERT INTO user VALUES (%s,%s,)"""
                         gpcurs.execute(insertDB, row)
    								
    	             if cnt==1000:
    		           gpconn.commit()
    		           cnt=0
    
                    if cnt < 1000:
                          gpconn.commit()
    There are actually thousand of sqlite data, approx 10-20 million rows and it takes more than 6 hours to insert data into postgres. Is there any way that i can optimise this and load data faster?

    Thanks in advance.

  2. #2
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    I don't know Python, but can you use batching there?
    In JDBC I can send multiple statements in a single request which improves performance drastically.

    You could also try to send more than one row with the INSERT statement, but that will require some rework of your code:
    Code:
    INSERT INTO user VALUES (1,1), (2,2), (3,3);
    will insert three rows.

    Also try to commit less, maybe even try everything in a single transaction (Postgres - similar to Oracle - is faster if you commit less frequently).

    And finally you could write the modified data to a text file and then use COPY to insert the data into Postgres (this assumes you can run everything on the Postgres server).

    Then of course you can tune your PostgreSQL configuration. You might want to increase the following parameters:

    checkpoint_segments
    commit_delay
    synchronous_commit
    wal_buffers

  3. #3
    Join Date
    Jul 2010
    Posts
    21
    Hi Shammat:

    Code:
    .
    .
    .
    rows=curs.fetchall()
    insertDB="""INSERT INTO user VALUES (%s,%s)"""
    gpcurs.executemany(insertDB, rows)
    gpconn.commit()
    .
    .
    This code insert multiple rows in single statement. Let say if the 'rows' has 200k rows, is it okay to commit once?

    Thanks

  4. #4
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Quote Originally Posted by nbtet View Post
    This code insert multiple rows in single statement.
    That's what I meant

    Let say if the 'rows' has 200k rows, is it okay to commit once?
    From a Postgres point of view - yes.
    The only problem you might get is memory on the computer where you run this program. If Python buffers all the rows in memory, that might be a problem...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •