Results 1 to 4 of 4
  1. #1
    Join Date
    Jan 2010
    Posts
    26

    Unanswered: Advanced algorithm of an existing MySQL query

    Hi all,

    I have a table consists of
    Code:
    CREATE TABLE /*!32312 IF NOT EXISTS*/ `bilgiler` (
      `kayitID` int(11) NOT NULL AUTO_INCREMENT,
      `hak_sahibi_id` bigint(15) DEFAULT NULL,
      `cinsiyet` tinyint(1) DEFAULT NULL,
      `dogumtarihi` date DEFAULT NULL,
      `islemtarihi` date DEFAULT NULL,
      `tanikodu` varchar(10) DEFAULT NULL,
      `tedavi_ili` tinyint(3) DEFAULT NULL,
      `dogum_ili` tinyint(3) DEFAULT NULL,
      `ikamet_ili` tinyint(3) DEFAULT NULL,
      PRIMARY KEY (`kayitID`)
    ) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin5;
    and my SQL query is
    Code:
    Select bilgiler.hak_sahibi_id,bilgiler.cinsiyet,date_format(bilgiler.dogumtarihi,'%d.%m.%Y') as dtarihi,date_format(bilgiler.islemtarihi, '%d.%m.%Y') as itarihi,(To_days( bilgiler.islemtarihi ) - TO_DAYS( bilgiler.dogumtarihi )) as difference,bilgiler.tanikodu,bilgiler.dogum_ili,bilgiler.ikamet_ili,bilgiler.tedavi_ili,b.repeat from bilgiler INNER JOIN ( SELECT hak_sahibi_id,year(islemtarihi) AS yyyy,COUNT(*) AS 'repeat' from bilgiler group by hak_sahibi_id,year(islemtarihi) ) as b on b.hak_sahibi_id = bilgiler.hak_sahibi_id and b.yyyy = year(bilgiler.islemtarihi) into outfile 'C:/outcome/results.txt' FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\r\n'
    As you may understand this query writes the existing fields, calculates the difference of two dates as day and writes as "difference", and also writes how many times a client (hak_sahibi_id) has a record in the year section of field "islemtarihi" as "repeat" and exports the results to text file.

    It gives the results perfectly as I want. My problem is the file size is too big. There are 10 million records on this table and code runs very slow. It takes nearly a month with the query.

    Is it possible to write a better code makes the same job faster? Is it possible to make changes on MySQL settings to make this query run faster (currently it is using the default settings of MySQL)?

    Thanks
    telmessos

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by telmessos View Post
    It takes nearly a month with the query.
    a month??!!

    whoa!!!
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    I would start by making sure that MySQL has access to enough memory to run through 10 million records. Ideally a machine with 4GB of memory if it is a 32 bit operating system. This is something I would not leave to a laptop, or a desktop machine. No idea where MySQL hides its memory configuration, though.

    What sort of machine is this running on?

  4. #4
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by telmessos
    It takes nearly a month with the query.


    Not formatting the code makes it difficult to read but using Turkish/Greek (?) field names and table names is really asking a bit much from us geeks but I'd start by taking the data from :
    Code:
    SELECT hak_sahibi_id,year(islemtarihi) AS yyyy, COUNT(*) AS 'repeat' 
    from   bilgiler 
    group by hak_sahibi_id, year(islemtarihi)
    into a separate table first - lets call it fhgjek. Then add an index to this table on hak_sahibi_id and yyyy. Then add a new field to your original table called yyyy and set the value using something like this:
    Code:
    update bilgiler  set yyyy = year( islemtarihi )
    Then add an index to this table on the fields hak_sahibi_id and yyyy. Then do your query using this new table ie:
    Code:
    Select b.hak_sahibi_id,
              b.cinsiyet,
              date_format(b.dogumtarihi,'%d.%m.%Y') as tarihi,
              date_format(b.islemtarihi, '%d.%m.%Y') as itarihi,
             (To_days( b.islemtarihi ) - TO_DAYS( b.dogumtarihi )) as difference,
             b.tanikodu,
             b.dogum_ili,
             b.ikamet_ili,
             b.tedavi_ili, 
             b.repeat 
    from bilgiler b , fhgjek c
    where b.hak_sahibi_id = c.hak_sahibi_id
              and b.yyyy = c.yyyy
    into outfile 'C:/outcome/results.txt' 
    FIELDS TERMINATED BY '\t' 
    LINES TERMINATED BY '\r\n'
    You could copy the big table into a new table and do all your work here just to make sure you don't damage the original data. Done this way I doubt if the run time will be more than a couple of minutes (if that) - but I can't test it.

    Quote Originally Posted by telmessos
    My problem is the file size is too big.
    Is it the output file that's too big? If so then look up the limit command in MySQL - this should allow you to split the output into multiple files ie something like:
    Code:
    ...
    limit 1,2500000
    ...
    limit 2500001,5000000
    ...
    limit 5000001,7500000
    ...
    limit 7500001,20000000
    Mike

    PS
    Are you sure yyyy is not a reserved word?
    I'm not sure about 'repeat' either
    I obviously haven't tested anything here

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •