If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > Shell script that parses a CSV file, builds sql insert calls, then executes them

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-18-06, 15:24
JamesAvery22 JamesAvery22 is offline
Registered User
 
Join Date: Jan 2004
Posts: 70
Shell script that parses a CSV file, builds sql insert calls, then executes them

I am given a CSV file that looks like this

Code:
Key	col1	col2
---------------------
1	asdf	hgsd
2	fasd	gsdh
3	fdsa	dsfg
4	rtw	sdgf
I want to take that data and put each row into an array, say row[@]

Code:
i = 0
cat myfile.csv | while read fileline
do
row[i] = $fileline
i = i + 1
done
I have to join that data with the Key and get another column

So for each I have to do

select col3 from InputTable where KeyCol = Key;

then for each parse row[@] into a col[@]

Code:
insert into OutputTable 
(
	col1,
	col2,
	col3
)
values
(
	col[1],
	col[2],
	col[3]
)
So something like this:

Code:
n = split(row[0], col, ",")

db2 +p -t <<EOF >>  $LOG_FILE

connect to $DB_INSERT user $user using $pw;

values current timestamp;

SET SCHEMA $DB_INSERT_SCHEMA;

insert into $OutputTable 
(
	col1,
	col2,
	col3
)
values
(
	$col[1],
	$col[2],
	select col3 from $InputTable where KeyCol = $col[0]
)
and do that for all row[@]


I just dont understand how to loop through it all. Are there any examples anyone could point me to?

Last edited by JamesAvery22; 04-18-06 at 15:30.
Reply With Quote
  #2 (permalink)  
Old 04-18-06, 16:50
JamesAvery22 JamesAvery22 is offline
Registered User
 
Join Date: Jan 2004
Posts: 70
maybe something like this?

Code:
rowcount = 0
cat $CSV_DIR/$1 | while read fileline
do
rowarray[i] = $fileline
rowcount = $rowcount + 1
done

i = 0
query = ""
while [ "$i" -le "$rowcount" ]
do
	n = split(${rowarray[$i]}, colarray, ",")
	if [ "$n" -ge "2" ]
		then
		query = $query + "\n" + 
		"insert into OutputTable 
		(
			SeqKey,
			ID,
			Col1,
			Col2
		) values ( 
			nextval for " + $DB_INSERT_SCHEMA + ".OutputTable,
			select ID from InputTable where Key = " + ${colarray[0]} + "," + 
			${colarray[1]} + "," + 
			${colarray[2]} + 
		");"
	fi
	i = $i + 1
done

db2 +p -t <<EOF >>  $LOG_FILE

connect to $DB_INSERT user $user using $pw;

values current timestamp;

SET SCHEMA $DB_INSERT_SCHEMA;

$query

commit
;	
values current timestamp;

terminate;
Reply With Quote
  #3 (permalink)  
Old 04-19-06, 02:45
pdreyer pdreyer is offline
Registered User
 
Join Date: May 2005
Location: South Africa
Posts: 1,268
I don’t use db2 and the split command you use is unfamiliar to me. I only know the split command to split a file into pieces

The csv file sample you show does not seem coma delimited so you can read the data directly into variables
i.e. while read kecol col1 col2 junk
yet the split command you use n = split(row[0], col, ",") suggest it is, assuming I interpret it correct


I will not use an array. I’ll build the insert statements into a file then take the file as input. The code below is untested.

Code:
#!/usr/bin/ksh
#…seting your variables…
cat <EOF >insert.sql
connect to $DB_INSERT user $user using $pw;
values current timestamp;
SET SCHEMA $DB_INSERT_SCHEMA;
EOF
cat myfile.csv | while read fileline  
do
  keycol=$(echo $fileline | cut –d’,’ –f1)
  col1=$(echo $fileline | cut –d’,’ –f2)
  col2=$(echo $fileline | cut –d’,’ –f3)
cat <<EOF >>insert.sql
insert into $OutputTable 
(
	col1,
	col2,
	col3
)
values
(
	$col1,
	$col2,
	select col3 from $InputTable where KeyCol = $keycol
);
EOF
Done
db2 +p -t <insert.sql >$LOG_FILE
Reply With Quote
  #4 (permalink)  
Old 04-19-06, 09:58
JamesAvery22 JamesAvery22 is offline
Registered User
 
Join Date: Jan 2004
Posts: 70
I was wondering about the split command. I read about it somewhere but then couldnt find the page again :shrug: I was just reading on how to do the cut command.

Sorry the example CSV file was just a dummy. The real CSV file is comma seperated.

Will the way I guessed at, by trying to execute a bunch of insert statements that are in a string variable not work?
Reply With Quote
  #5 (permalink)  
Old 04-19-06, 12:41
JamesAvery22 JamesAvery22 is offline
Registered User
 
Join Date: Jan 2004
Posts: 70
I tested out some of that code.

Did this

Code:
cat >insert.sql << "EOF"
connect to $DB_INSERT user $user using $pw;
values current timestamp;
SET SCHEMA $DB_INSERT_SCHEMA;
EOF


cat $CSV_DIR/$1 | while read fileline  
do
	colarray[0]=$(echo $fileline | cut -d',' -f1)
	colarray[1]=$(echo $fileline | cut -d',' -f2)
	colarray[2]=$(echo $fileline | cut -d',' -f3)
cat >>insert.sql << "EOF"
insert into OutputTable
(col1,col2,col3,col4) 
values 
(nextval for $DB_INSERT_SCHEMA.VAE_MF_SUBLDGR_SEQ , select col2 from InputTable where Key = ${colarray[0]} , ${colarray[1]} , ${colarray[2]} );
EOF

done
It works but the problem is "${colarray[1]}" is written into insert.sql instead of its value. How can I get the value?


edit------------------

need to remove the quotes around EOF...

http://www.webservertalk.com/archive...-3-952249.html

Looks like I should also use printf if I want it to be faster.

Last edited by JamesAvery22; 04-19-06 at 14:44.
Reply With Quote
  #6 (permalink)  
Old 04-19-06, 15:40
JamesAvery22 JamesAvery22 is offline
Registered User
 
Join Date: Jan 2004
Posts: 70
my tested code:

Code:
printf "%s\n%s\n%s\n" "connect to $DB_INSERT user $user using $pw;" "values current timestamp;" "SET SCHEMA $DB_INSERT_SCHEMA;" >insert.sql

cat $CSV_DIR/$1 | while read fileline  
do
	colarray[0]=$(echo $fileline | cut -d',' -f1)
	colarray[1]=$(echo $fileline | cut -d',' -f2)
	printf "%s\n" "insert into OutputTable (col1,col2,col3) values (nextval for $DB_INSERT_SCHEMA.OUTPUTTABLE_SEQ , (select ID from InputTable where Key = '${colarray[0]}') ,'${colarray[1]}' );" >>insert.sql
done
Thanks for everyones help
Reply With Quote
  #7 (permalink)  
Old 05-02-06, 09:53
JamesAvery22 JamesAvery22 is offline
Registered User
 
Join Date: Jan 2004
Posts: 70
I've been using this script and it works fine with an input file that has <200 lines. Takes about 20 seconds to create the insert.sql. Im trying it on a file with 13k+ lines and it is taking forever. Creating the insert.sql is taking longer than executing it. Is there anyway to do this faster?
Reply With Quote
  #8 (permalink)  
Old 05-03-06, 07:12
pdreyer pdreyer is offline
Registered User
 
Join Date: May 2005
Location: South Africa
Posts: 1,268
Try using sed
Code:
sed "s/\(.*\),\(.*\),\(.*\)/insert into OutputTable (col1,col2,col3) values (nextval for $DB_INSERT_SCHEMA.OUTPUTTABLE_SEQ , (select ID from InputTable where Key = \1) ,'\2' );/" $CSV_DIR/$1  >insert.sql
Reply With Quote
  #9 (permalink)  
Old 03-15-12, 03:43
rameshds rameshds is offline
Registered User
 
Join Date: Nov 2011
Posts: 4
sed to parses a csv file

Suppose we have more than 9 columns in the csv file, then how to creat the insert stmt
As after 9th column, for the 10th column it is picking the 1st columns values and concates the 0 at the end, similarly for 11th column 1st columns values and concates the 1 at the end.

Please suggest what did's to be done to solve these problem.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On