If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > random substitution with awk

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 05-03-11, 11:47
neppolo neppolo is offline
Registered User
 
Join Date: May 2011
Posts: 4
random substitution with awk

Dear all,
I am a new user of the forum and of awk.
I have the same problem:
I have a file in the following format:

Si -7.87760000 6.16710000 16.80090000
Si -2.20190000 8.96480000 18.80290000
Si -3.91340000 6.18110000 16.79500000
Si -5.89320000 3.44010000 18.81540000
Si -5.89980000 7.45760000 17.18100000
H -7.84830000 0.67980000 16.84620000
H -7.86760000 4.81220000 18.78920000
Si 2.20250000 8.96490000 18.80280000
Si -0.00010000 6.26880000 16.82620000
Si -1.94640000 3.46010000 18.83720000
H -1.99140000 7.62480000 16.81710000
Si -3.91150000 0.68930000 16.84840000
Si -3.92500000 4.81880000 18.78390000
Si -5.88640000 -2.06440000 18.85090000
H -5.88640000 2.06580000 16.83280000
Si -7.86620000 -4.81050000 16.89460000
Si -7.84800000 -0.67730000 18.83550000
Si 3.91310000 6.18150000 16.79530000
Si 1.94640000 3.46000000 18.83770000
Si 1.99110000 7.62510000 16.81700000

I would like to substitute in a random way the symbol "Si" with the symbol "Ge" in the first column. This substitution should not happen when the line contains the symbol H. I have tried to start with this script, but it doesn't work:

#!/usr/bin/awk -f

#

# Usage:

# ./impurity_gen.awk -v NIMP=12 -v SYMB=Ge

#



BEGIN{

natom=16;
### this is the total number of lines containing the symbol "Si"
nimp=NIMP;
### this the number of lines I would like to substitute
symb=SYMB;
### this is the symbol with whom I'd like to substitute Si
srand()

for (j = 1; j <= nimp; ++j) {

# loop to find a not-yet-seen selection

do {

select = 1 + int(rand() * natom)

} while (select in pick)

pick[j] = select

}

}



NF != 4 { next }



which_Si = 0

symb_tmp = $1



if ( /Si/ ) {

which_Si += 1

do {

symb_tmp=symb

} while ( which_Si in pick )



x=$2; y=$3; z=$4;

printf "%5s %15.9f %15.9f %15.9f \n", symb, x, y, z

}

Please can you give any suggestions or solutions to this problem???
Thank you very much in advance
Reply With Quote
  #2 (permalink)  
Old 05-03-11, 12:21
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
The following pseudo code should work and easily translate into awk:
Code:
if line matches Si
    if random number < .5
        replace Si with Ge
__________________
Mike
Reply With Quote
  #3 (permalink)  
Old 05-03-11, 12:23
neppolo neppolo is offline
Registered User
 
Join Date: May 2011
Posts: 4
Many thanks for the answer, but please can you tell me more details. I am not so sure to be able to translate into awk this pseudo code.
Many thanks!!
Reply With Quote
  #4 (permalink)  
Old 05-03-11, 14:58
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Why are you doing this in awk if you don't program in awk?
Can't you just use another language?
and what does the data actually represent?
__________________
Mike
Reply With Quote
  #5 (permalink)  
Old 05-03-11, 16:03
neppolo neppolo is offline
Registered User
 
Join Date: May 2011
Posts: 4
I want to use awk because I know that is very useful for manipulate text file.
I am a beginner, this is the reason why I post the thread, I would like to learn something and I am looking for an help; the text file is an example of molecular geometry in the xyz format (I am a PhD student in computational material science).
Reply With Quote
  #6 (permalink)  
Old 05-03-11, 16:27
kitaman kitaman is offline
Papabi's friend
 
Join Date: Sep 2009
Location: Ontario
Posts: 629
Using ksh,

Code:
#!/bin/ksh           
while read symb x y z
do                   
if [ $symb = "Si" ]  
then                 
 i=`random`          
 if [ i -eq 1 ]      
   then              
    symb=Ge          
 fi                  
fi                   
echo $symb $x $y $z  
done <input.txt
Reply With Quote
  #7 (permalink)  
Old 05-03-11, 16:34
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Something like this would do:
Code:
/^Si/   { 
            if ( rand() < .5 ) {
                print $0;
            } else {
                print "G ", $2, $3, $4;
            } 
        }
/H/     { print $0; }
Sorry Kiteman - didn't see your reply.
__________________
Mike
Reply With Quote
  #8 (permalink)  
Old 05-03-11, 19:27
neppolo neppolo is offline
Registered User
 
Join Date: May 2011
Posts: 4
Many thanks for your help and consideration. Unfortunately my problem is not solved yet. I will try to explain what I would like to do:
for example: my input file contains 20 lines with Si and I would change 6 lines with the symbol Ge, but in six different ways. In other words for the same number of changes I need six different configurations.
If I try to run these scripts, every time I obtain a different random number and therefore, every time, the number of lines to change are different.
Moreover the rand() function gives the same random number if I run the script two times.
Do you think is it possible to do this in some ways??
Thank you very much again
Reply With Quote
  #9 (permalink)  
Old 05-03-11, 20:13
kitaman kitaman is offline
Papabi's friend
 
Join Date: Sep 2009
Location: Ontario
Posts: 629
look up the srand() function in awk to get a new sequence of random numbers.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On