I have a table with every position we know for each chromosome
Table 1. Reference of known position - Comparison of the cells together
+------+--------+------+------+------+
| chr | pos | cell1 | cell2| cell3|
+------+--------+------+------+------+
| 2 | 21662 | NULL | NULL | NULL |
| 2 | 21693 | NULL | NULL | NULL |
| 2 | 24250 | NULL | NULL | NULL |
| 2 | 24445 | NULL | NULL | NULL |
| 2 | 24554 | NULL | NULL | NULL |
| 2 | 24800 | NULL | NULL | NULL |
| 2 | 25252 | NULL | NULL | NULL |
| 2 | 25279 | NULL | NULL | NULL |
| 2 | 26634 | NULL | NULL | NULL |
| 2 | 27286 | NULL | NULL | NULL |
| 2 | 27373 | NULL | NULL | NULL |
| 2 | 27793 | NULL | NULL | NULL |
| 2 | 28721 | NULL | NULL | NULL |
| 2 | 30291 | NULL | NULL | NULL |
Table2. Position of cell1 to compare with table1
+------+--------+------+------+
| chr | pos | cell1 | type |
+------+--------+------+------+
| 2 | 21662 | NULL | heterozygote |
| 2 | 21693 | NULL | homozygote |
| 2 | 24250 | NULL | heterozygote |
| 2 | 24445 | NULL | heterozygote |
| 2 | 24554 | NULL | homozygote |
| 2 | 27793 | NULL | homozygote |
| 2 | 28721 | NULL | heterozygote |
GOAL : Everytime that a
CHR and POS match from table2 (cell1) to table 1 (reference position), if we have :
- type = heterozygote --> write 1 in table1.cell1
- type = homozygote --> write 2 in table1.cell1
In order to have something like that :
EXPECTED RESULT:
+------+--------+------+------+------+
| chr | pos | cell1 | cell2| cell3|
+------+--------+------+------+------+
| 2 | 21662 | 1 | NULL | NULL |
| 2 | 21693 | 2 | NULL | NULL |
| 2 | 24250 | 1 | NULL | NULL |
| 2 | 24445 | 1 | NULL | NULL |
| 2 | 24554 | 2 | NULL | NULL |
| 2 | 24800 | NULL | NULL | NULL |
| 2 | 25252 | NULL | NULL | NULL |
| 2 | 25279 | NULL | NULL | NULL |
| 2 | 26634 | NULL | NULL | NULL |
| 2 | 27286 | NULL | NULL | NULL |
| 2 | 27373 | NULL | NULL | NULL |
| 2 | 27793 | 2 | NULL | NULL |
| 2 | 28721 | 1 | NULL | NULL |
| 2 | 30291 | NULL | NULL | NULL |
So here is what I did :
Code:
UPDATE tbl1 CROSS JOIN tbl2 ON
(tbl1.pos = tbl2 AND tbl2.type='heterozygote')
SET tbl1.cell1='1';
UPDATE tbl1 CROSS JOIN tbl2 ON
(tbl1.pos = tbl2 AND tbl2.type='homozygote')
SET tbl1.cell1='2';
There's only one match possible between tbl2 and tbl1, so maybe a cross join wasn't a good idea from the beginning..
My tables are over 300MB each query takes hours to run..
Im looking for a faster solution, any suggestions are welcome ~




