If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > performance problems - need help urgently

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-04-04, 14:57
fadingjava fadingjava is offline
Registered User
 
Join Date: Mar 2004
Posts: 5
performance problems - need help urgently

hi ,

this is driving me nuts!!!!!!. i am running a perl script which takes records from a database, finds unique single words and two, three & four word collocations in them. then for each unique combination found it searches in the Mysql table , if the combination is found an INSERT is made into a frequency table , otherwise if the query returns empty set then the combination is added to the unique words table as well as the frequency table. all of the tables have the combination field declared as Varchar(x) BINARY.it is taking huge amounts of time to do this ? why? imagine this, it took me 8 hours to process 532 single line documents. something is grossly wrong here. can anybody help me with this? should i use INDEXES ? how?
Reply With Quote
  #2 (permalink)  
Old 03-05-04, 03:57
ika ika is offline
Registered User
 
Join Date: Oct 2003
Location: Slovakia
Posts: 482
Re: performance problems - need help urgently

Quote:
Originally posted by fadingjava
hi ,

this is driving me nuts!!!!!!. i am running a perl script which takes records from a database, finds unique single words and two, three & four word collocations in them. then for each unique combination found it searches in the Mysql table , if the combination is found an INSERT is made into a frequency table , otherwise if the query returns empty set then the combination is added to the unique words table as well as the frequency table. all of the tables have the combination field declared as Varchar(x) BINARY.it is taking huge amounts of time to do this ? why? imagine this, it took me 8 hours to process 532 single line documents. something is grossly wrong here. can anybody help me with this? should i use INDEXES ? how?
Speed of queries depends on design of database.
Learn how to create optimal database model.
Learn when and where you can use indexes.
Reply With Quote
  #3 (permalink)  
Old 03-05-04, 10:30
King_Aero King_Aero is offline
Registered User
 
Join Date: Feb 2004
Posts: 6
Try outside of perl as well.
Reply With Quote
  #4 (permalink)  
Old 03-05-04, 22:59
bstjean bstjean is offline
Registered User
 
Join Date: Sep 2002
Location: Montreal, Canada
Posts: 219
Re: performance problems - need help urgently

Quote:
Originally posted by fadingjava
hi ,

this is driving me nuts!!!!!!. i am running a perl script which takes records from a database, finds unique single words and two, three & four word collocations in them. then for each unique combination found it searches in the Mysql table , if the combination is found an INSERT is made into a frequency table , otherwise if the query returns empty set then the combination is added to the unique words table as well as the frequency table. all of the tables have the combination field declared as Varchar(x) BINARY.it is taking huge amounts of time to do this ? why? imagine this, it took me 8 hours to process 532 single line documents. something is grossly wrong here. can anybody help me with this? should i use INDEXES ? how?
Can you provide us with the SHOW CREATE TABLE, SHOW INDEX and EXPLAIN of the queries involved as well as an example of what you're trying to do?
Reply With Quote
  #5 (permalink)  
Old 03-06-04, 01:38
fadingjava fadingjava is offline
Registered User
 
Join Date: Mar 2004
Posts: 5
Re: performance problems - need help urgently

Quote:
Originally posted by bstjean
Can you provide us with the SHOW CREATE TABLE, SHOW INDEX and EXPLAIN of the queries involved as well as an example of what you're trying to do?
i do not have any indexes on the table right now. but here is the show create table

Code:
mysql> show create table single_words;

| single_words | CREATE TABLE `single_words` (
  `word_id` int(10) unsigned NOT NULL auto_increment,
  `word` varchar(30) binary default NULL,
  `in_stoplist` enum('y','n') default 'n',
  PRIMARY KEY  (`word_id`)
) TYPE=MyISAM |
1 row in set (0.00 sec)

mysql> show create table single_frequency;

| single_frequency | CREATE TABLE `single_frequency` (
  `word` varchar(30) binary default NULL,
  `frequency` int(5) unsigned default NULL,
  `word_id` int(10) unsigned NOT NULL default '0',
  `doc_id` int(10) unsigned NOT NULL default '0'
) TYPE=MyISAM |

1 row in set (0.00 sec)

mysql> EXPLAIN select doc from doc where doc_id=5;
+-------+-------+---------------+---------+---------+-------+------+-------+
| table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+-------+-------+---------------+---------+---------+-------+------+-------+
| doc   | const | PRIMARY       | PRIMARY |       4 | const |    1 |       |
+-------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)

mysql> EXPLAIN select word_id, word from single_words where word = 'I';
+--------------+------+---------------+------+---------+------+------+-------------+
| table        | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+--------------+------+---------------+------+---------+------+------+-------------+
| single_words | ALL  | NULL          | NULL |    NULL | NULL | 1431 | Using where |
+--------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.01 sec)
there are two categories of tables the "words" tables are repositories of unique word combinations found in a corpus, while the "frequency" tables contain the number of occurences of a particular combination in the indivdual documents. Each time a new document is processed , all the unique combinations are extracted and counted . if the combination exists in the "words" tables an INSERT is made into the "frequency" table. on the contrary if a combination is not found an INSERT is done into the "words" as well as "frequency" tables. the queries are simple INSERT and SELECT with WHERE option.

an example would be processing a doc containning text say:

"I need urgent help"

that would yield "I","need","urgent","help" as unique single words in the document. i need to find if these already exist in the repositories. the same is done with two word, three word and four word collocations which have an exact table structure.

hope this my problem is clear enough now.

thanks in adavance
Reply With Quote
  #6 (permalink)  
Old 03-06-04, 08:52
bstjean bstjean is offline
Registered User
 
Join Date: Sep 2002
Location: Montreal, Canada
Posts: 219
Re: performance problems - need help urgently

Quote:
Originally posted by fadingjava
i do not have any indexes on the table right now. but here is the show create table

Code:
mysql> EXPLAIN select word_id, word from single_words where word = 'I';
+--------------+------+---------------+------+---------+------+------+-------------+
| table        | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+--------------+------+---------------+------+---------+------+------+-------------+
| single_words | ALL  | NULL          | NULL |    NULL | NULL | 1431 | Using where |
+--------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.01 sec)
Well, from what I see here an index on word would help this query... Right now it performs a full table scan... Probably a prefix index would suffice... Probably no need to index the full column... Something like the following would help:

CREATE INDEX word_idx ON single_words (word(7));

But what you do with your data is still unclear... Could you post a full simple example of data and all queries you execute to process the data?

Hope this helps.
Reply With Quote
  #7 (permalink)  
Old 03-06-04, 11:08
fadingjava fadingjava is offline
Registered User
 
Join Date: Mar 2004
Posts: 5
Code:
	
#connect to the databases	
	my $dbh1 = DBI->connect("dbi:mysql:corpus_stats","sid" , $password);
	my $dbh = DBI->connect("dbi:mysql:concordance","sid" , $password);
	
#select max,min record id from document table for using as loop counter 
	my @id  = $dbh1->selectall_arrayref("select min(rec_id),max(rec_id) from scotts_data");
	
	batch : for ($k= ($id[0][0][0]);$k<= ($id[0][0][1]);$k++) {	
#select the particular records corresponding to loop counter
	
my @string = $dbh1->selectall_arrayref("select rec_id,data from scotts_data where rec_id = $k");
	
	$doc = "$string[0][0][1]";

#split the doc in indivdual words and pass through regex, build required collocations and find unique combinations	
	
@single = split(' ',$doc);

	foreach (@single)
	{
		$_=~ s/[!^~#&*_+=|?,'.;:]//g;
	}

	for ($i = 0; $i <= ($#single-1); $i++)
	{
		push(@double,$single[$i]." ".$single[$i+1]);
	}   

	for ($i = 0; $i <= ($#single-2); $i++)
	{
		push(@triple,$single[$i]." ".$single[$i+1]." ".$single[$i+2]);
	}   
	
	for ($i = 0; $i <= ($#single-3); $i++)
	{
		push(@quad,$single[$i]." ".$single[$i+1]." ".$single[$i+2]." ".$single[$i+3]);
	}   

	@single_uniq = grep { ! $seen{$_} ++ } @single;
	@double_uniq = grep { ! $seendouble{$_} ++ } @double;
	@triple_uniq = grep { ! $seentriple{$_} ++ } @triple;
	@quad_uniq = grep { ! $seenquad{$_} ++ } @quad;
	


# for each unique element element select word,word_id from words table
	
single:for ($j=0;$j<=$#single_uniq;$j++){
	
		$frequency = ($seen{$single_uniq[$j]});
		my @row = $dbh->selectall_arrayref("select word,word_id from single_words where word = '$single_uniq[$j]'");	
		
#if query returns NULL then insert value of unique element and its frequency in words and frequency tables

		if($row[0][0][0] eq "")
			{
			$dbh->do ("insert into single_words values('','$single_uniq[$j]','n')");
			my $max1 = $dbh->selectall_arrayref("select max(word_id) from single_words");
			$dbh->do ("insert into single_frequency values('$single_uniq[$j]',$frequency,'$max1->[0][0]',$string[0][0][0])"); 			
			}			
#if the a value is returned then insert value of unique element and its frequency in only the frequency table
		else
			{
			$dbh->do ("insert into single_frequency values('$row[0][0][0]',$frequency,'$row[0][0][1]',$string[0][0][0])");
			}
		
undef @row;		

	}


here is the part of perl script which deals with single word combinations. it is basically a part of a concordance application used for corpus analysis.
Reply With Quote
  #8 (permalink)  
Old 03-06-04, 12:57
fadingjava fadingjava is offline
Registered User
 
Join Date: Mar 2004
Posts: 5
hi,

bstjean , in one of the threads about a similar questions you suggested:

Quote:
simply put a UNIQUE INDEX on the field that represents the records unique value and just perform an INSERT...the record would simply would not update.......seems to save time
when i do this , it definitely raises an error about duplication of keys. would this way be good to use when i am inserting large volumes of data with possiblity that there might be duplication of key? how can i just ignore this error?
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On