If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > split big files into 100 files

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-12-04, 02:55
reneeb reneeb is offline
Registered User
 
Join Date: Jan 2004
Location: Germany
Posts: 167
split big files into 100 files

Hi!

I've got a problem. I've a very large textfile (~ 3,5GB) and I want to split this file into 100 smaller files with 1000 complete datasets. But I have no idea how to do this.

The file contains several datasets like this:
Code:
ID dummy_id
AC dummy_ic
//
ID dummy2
AC ic2
//
each dataset ends with //.


thanks in advance

reneeb
__________________
board.perl-community.de - The German Perl-Community
Reply With Quote
  #2 (permalink)  
Old 02-12-04, 03:45
aigles aigles is offline
Registered User
 
Join Date: Jan 2004
Location: Bordeaux, France
Posts: 319
Try and adapt this awk script

Code:
#!/usr/bin/awk -f

NR == 1 {
   if (BASE  == "") BASE  = "output_";
   if (COUNT == "") COUNT = 4;
   UseNewOutFile = 1;
   printf "\nInput File ............. : %s\n", FILENAME;
   printf "Output filename(s) ..... : %s*\n",      BASE;
   printf "Datasets per output file : %d\n", COUNT;
}

UseNewOutFile {
   if (OutFile != "") close(OutFile);
   FileCount    += 1;
   DatasetCount  = 0;
   UseNewOutFile = 0;
   OutFile=sprintf("%s%03d.dat",BASE,FileCount);
}

{
   print $0 >> OutFile;
}

/^\/\// {
   DatasetCount += 1;
   if (DatasetCount == COUNT) UseNewOutFile = 1;
}

END {
   printf "Output file(s) created . : %d\n\n",   FileCount;
}
Create the script (mysplit for example)
Make the script executable (chmod +x mysplit)
Execute the script :

mysplit [BASE=base] [COUNT=count] input_file
BASE = Output filenames prefix. Created files : ${BASE}nnn
COUNT = Datasets per output file

Code:
home/jp> mysplit BASE=datasets_ COUNT=100 datas.txt

Input File ............. : datas.txt
Output filename(s) ..... : datasets_*
Datasets per output file : 1000
Output file(s) created . : 100
__________________
Jean-Pierre.
Reply With Quote
  #3 (permalink)  
Old 02-12-04, 04:02
reneeb reneeb is offline
Registered User
 
Join Date: Jan 2004
Location: Germany
Posts: 167
thanks a lot. It works fine...
__________________
board.perl-community.de - The German Perl-Community
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On