KBEC-00385 - Converting a MySQL Database From Latin-1 to UTF-8 Before Upgrading to ElectricFlow 8.0.1

 

This document describes how to convert your MySQL database (DB) from the Latin-1 character set to the Unicode format (UTF-8). This document is adapted from https://docs.moodle.org/33/en/Converting_your_MySQL_database_to_UTF8#Converting_an_empty_database.

The ElectricFlow server requires UTF-8 for better multilingual support. It requires that you store all your data in UTF-8 and case-insensitive collation (utf8_unicode_ci). 

A test case to convert a CHARACTER SET latin1 COLLATE latin1_swedish_ci database to CHARACTER SET utf8 COLLATE utf8_unicode_ci is at the end of this document. The test case files are in the testcase_convert_latin1_charset_to_utf8_and_ci_collation.zip attachment.

Changing the Default MySQL Character Set to UTF-8

You must change MySQL to use UTF-8 as its character set and change your DB to UTF-8. The procedures below also cover making the UTF-8 DB versions using mysqldump.

1. Edit the client section of /etc/my.cnf (the MySQL configuration file) to read:

[client] ... .... ...
default-character-set=utf8 ....

2. Edit the mysqld section in /etc/my.cnf to read:

[mysqld] ... ...
default-character-set=utf8
default-collation=utf8_unicode_ci
character-set-server=utf8
collation-server=utf8_unicode_ci
... ...

After changing your default character set to UTF-8, you will use mysqldump to restore your DB with the --skip-character-set parameter to restore it with your new default UTF-8 character set.

Converting an Empty Database

If you created your DB schema and your DB is still empty, complete the following steps:

1. Log in to your MySQL instance.

2. Run the following query in your DB to convert it to UTF-8:

ALTER DATABASE mydatabasename charset=utf8;

Converting a Database Containing Tables

If your ElectricFlow server already has a populated MySQL DB and you are getting an error, use the following procedures to convert your database.

Linux and MacOS

The following steps show how to create a DB dump, edit the DB dump so that the correct charset and collation are used, and then restore the new DB.

1. Open a terminal window and move to a temporary directory.

2. Dump out the DB:

mysqldump -u username -p password -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B dbname > dump.sql

The arguments in this command clean up the character sets and provide a dump that will not cause problems if you are moving the DB to a different DB server or need to restore the DB on a reverted system. For complete descriptions of these arguments, see https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html.

When you run this command, a DB dump is generated into dump.sql.

3. Next, create a backup of dump.sql by entering:

cp dump.sql dump-fixed.sql

You will modify dump-fixed.sql and will keep dump.sql as a backup.

4. Edit the dump file and correct the incorrect character that have been used. You can use any search-and-replace editor or program such as VIM:

vim dump-fixed.sql
:%s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/
:%s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/
:wq

The first command modifies the DB's default character set and collation by replacing all instances of DEFAULT CHARACTER SET latin1 with DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci. The second command converts all tables from Latin-1 to UTF-8 by replacing all instances of DEFAULT CHARSET=latin1 with DEFAULT CHARSET=utf8. The third command saves the file and exits the editor.

5. Restore the DB over the top of the existing DB:

mysql -u username -p password < dump-fixed.sql

Examples

mysqldump -u username -p password -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B dbname > dump.sql
cp dump.sql dump-fixed.sql
vim dump-fixed.sql
:%s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci/
:%s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/
:wq
mysql -u username -p password < dump-fixed.sql

or alternatively using sed:

# $1-dbusername $2-password $3-dbname
mysqldump -u$1 -p$2 -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B $3 > dump.sql
sed -i.bak -e 's/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci/' -e 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/' dump.sql
mysql -u$1 -p$2 < dump.sql

Windows

The following steps show how to create a DB dump, edit it so that the correct character set (utf8) and collation (utf8_unicode_ci) are used, and then restore the new DB . For this, first download Super Sed (Win32 executable, zipped).

1. Open a command window and move to a temporary folder.

2. Enter:

<your_mysql_dir>\bin\mysqldump -u [username] -p [password] -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B [dbname] -r dbdump.sql
ssed -e "s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci/" dbdump.sql | ssed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/" >dbdump_w.sql
[your mysql dir]\bin\mysql -u [username] -p[password] [dbname] < dbdump_w.sql

For complete descriptions of these arguments, see https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html.

If Special Characters Do Not Import Correctly

Under certain circumstances, when you restore a UTF-8-encoded MySQL dump, international special characters (which you can see correctly using, for example, the VIM editor), might not import correctly. In such cases, you may want to try the following under a Linux or UNIX system:

1. Log in to MySQL.

2. Create a DB with UTF-8 encoding.

3. Import your dump by using the MySQL source command: 

# cd /folder_where_your_dump_is/
# mysql -u your_user -p
> create database yourdb charset=utf8;
> use yourdb;
> SET NAMES 'utf8';
> source db_dump.sql
> quit;

Test Case to Convert a CHARACTER SET latin1 COLLATE latin1_swedish_ci database to CHARACTER SET utf8 COLLATE utf8_unicode_ci

This test case converts a CHARACTER SET latin1 COLLATE latin1_swedish_ci DB to CHARACTER SET utf8 COLLATE utf8_unicode_ci. The test case files are in the testcase_convert_latin1_charset_to_utf8_and_ci_collation.zip attachment.

1. Log in to your MySQL instance.

2. Create a DB that contains DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci:

CREATE DATABASE latin1_test_db DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci;

3. Switch to that DB:

USE latin1_test_db;

4. Check that the collation in the DB is latin1

show variables like '%collation%';

The output should appear as follows:

collation_connection utf8_general_ci
collation_database latin1_swedish_ci
collation_server utf8_general_ci

The collation_database latin1_swedish_ci line shows the collation.

 

5. Create a table with CHARSET=latin1:

CREATE TABLE `DATABASECHANGELOG` ( 
`ID` varchar(255) NOT NULL,
`AUTHOR` varchar(255) NOT NULL,
`FILENAME` varchar(255) NOT NULL,
`DATEEXECUTED` datetime NOT NULL,
`ORDEREXECUTED` int(11) NOT NULL,
`EXECTYPE` varchar(10) NOT NULL,
`MD5SUM` varchar(35) DEFAULT NULL,
`DESCRIPTION` varchar(255) DEFAULT NULL,
`COMMENTS` varchar(255) DEFAULT NULL,
`TAG` varchar(255) DEFAULT NULL,
`LIQUIBASE` varchar(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

6. From an OS command line, enter the following command to export the schema: 

mysql --host=localhost --port=3306 --user=root --password=password --html --execute "select '-----' ;SELECT default_character_set_name, default_collation_name FROM information_schema.SCHEMATA WHERE schema_name = 'latin1_test_db';select '-----'; SELECT T.table_name, T.table_collation, CCSA.character_set_name FROM information_schema.TABLES T, information_schema.COLLATION_CHARACTER_SET_APPLICABILITY CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = 'latin1_test_db'; select '-----'; SELECT table_name, column_name, character_set_name, collation_name FROM information_schema.COLUMNS WHERE character_set_name != 'NULL' AND table_schema = 'latin1_test_db'; select '-----' ;" > 2_schema_latin1_test_db.html

7 Check the CHARACTER SET and collation for the DB, table, and columns.

8. Insert test data to make sure the table is not empty:

insert into databasechangelog(id,author,filename,dateexecuted,orderexecuted,exectype) values('1','suresh','a.txt',sysdate(),1,'xml')

9. From an OS command line, enter the following command to export the dump as mentioned above:

mysqldump --host=localhost --port=3306 --user=root --password=password -c -e --default-character-set=utf8 --single-transaction --skip-set-charset --add-drop-database -B latin1_test_db -r 3_latin1_test_dbdump.sql

10. From an OS command line, enter the following command to fix the character set and collation in the exported file to utf8:

ssed -e "s/DEFAULT CHARACTER SET latin1/DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci/" 3_latin1_test_dbdump.sql | ssed -e "s/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8/" > 4_latin1_test_dbdump_now_utf8.sql

11. From an OS command line, enter the following command to import the new dump, which now has the utf8 CHARACTER SET and collation for the DB, table, and columns:

mysql --host=localhost --port=3306 --user=root --password=password --database=latin1_test_db < 4_latin1_test_dbdump_now_utf8.sql

12. Exit the DB and then reconnect and check that the collation in the DB is utf8:

show variables like '%collation%';

The output should appear as follows:

collation_connection utf8_general_ci
collation_database utf8_unicode_ci
collation_server utf8_general_ci

The collation_database utf8_unicode_ci line shows the collation.

13. From an OS command line, enter the following command to export the schema:

mysql --host=localhost --port=3306 --user=root --password=password --html --execute "select '-----' ;SELECT default_character_set_name, default_collation_name FROM information_schema.SCHEMATA WHERE schema_name = 'latin1_test_db';select '-----'; SELECT T.table_name, T.table_collation, CCSA.character_set_name FROM information_schema.TABLES T, information_schema.COLLATION_CHARACTER_SET_APPLICABILITY CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = 'latin1_test_db'; select '-----'; SELECT table_name, column_name, character_set_name, collation_name FROM information_schema.COLUMNS WHERE character_set_name != 'NULL' AND table_schema = 'latin1_test_db'; select '-----' ;" > 5_schema_latin1_test_db_now_utf8.html

14. Check that the CHARACTER SET and collation for the DB, table, and columns are now utf8 and case-insensitive (*_ci) collation.

For More Information

See MySQL full unicode support.

Have more questions? Submit a request

Comments

Powered by Zendesk