Reason to move to Percona is to use the cluster for high availability and scalability.
We plan to setup an active-active-active cluster with a load balancer by implementing Percona cluster, basic proof of concept is successful.
Reason to move to a new database version is because of the possible character set utf8mb4 in 5.6 in comparison to 5.1 which is not applicable.
This proof of concept is still under investigation, hence this post.
As you can see, the current environment is setup to latin1 character set.
Reason to use character set utf8mb4 is that we need to support Chinese simplified, Chinese traditional and Cyrillic characters and maintain the characters in latin1.
Now, in my opinion, a hell of a job (and you are most probably guessing the question)
We need to convert our complete database to the new character set / collation, what is the best approach?
I've read the blog on charcoll (http://mysql.rjweb.org/doc.php/charcoll)
Nevertheless, this change is quite big for us.
The blog dates back to 2013, conceptual probably nothing changed, nevertheless, this area is changing vastly.
For that, I'd like to ask some questions, to put our heads together for the best appropriate approach and solution.
Any help from you and other fellow forum contributors would be greatly appreciated!
I think it's best to set to "UTF-8 Unicode, utf8mb4_general_ci"
This is the character set known in mySql
The global variables need to be set, I think this is OK, any remarks?
I don't know for sure which collation to use, any advice?
Do we need to change the connection strings in our applications?
In other words, do we need to use specific session parameters while building the connection form the client programs?
What is the best approach?
Update the 3 Percona cluster databases with correct character sets and
- Fix current data while dumping the data from current environment.
or Fix current data while loading data the data in the new environment.
Setup a new Percona cluster identical to character sets and collation as current environment.
Dump the current environment and load the databases in the Percona cluster and update variables and tables afterwards.
Dump current environment, fix current environment, Dump current environment, setup Percona cluster with that information.
As our current character set is latin1, do you foresee problems?
I know that in oracle there is a possibility with the program csscan to check if your current character set can be converted to the new character set.
Is there such a tool in mySql (I've googled, but could not find it).
Are there more things to think about? than just setting and / or fixing the encoding on the database objects?
Are there settings on the operating system or programming level (java, jmx web-services) we should take in consideration?
Please keep in mind that we are having quite a big database (no rocket science, but quite some rows)
To sum up, 9 databases, 172 tables, 230 million records, 100 GB.
Total downtime in the night with our customers is possible for 5 hours maximum.
Is that optimistic?
Do you have any questions or remarks, please share, any help would be appreciated.