Could we help you? Please click the banners. We are young and desperately need the money
It often occurs that an old MySQL database is using an ancient or other charset than UTF-8. There are many ways to convert a database but most of them need the user to execute a lot of SQL commands in order to convert all the data properly.
We (LEXO) have developed a script to change the charset of files within a given directory recursively. The script does that by iterating through all files and change the charset of each file using the "iconv" program on a BA(SH) shell. you can find mor information about this here.
What we now did is extending this script so that it first converts the dump file of a MySQL database into a pure UTF-8 charset (every character in the database will be converted to UTF-8) and to make sure that it will be UTF-8 in the end the database collation and charsets within the file are being set properly to UTF-8 - automatically!
This will explain how the whole process works. It's basically very simple:
mysqldump --opt -Q -u DB_USER -p PASSWORD -h DB_HOSTNAME DB_NAME > sql_dump.sql
./convert_db.sh ./mysqldump
mysql -u DB_USER -p PASSWORD -h DB_HOSTNAME --default-character-set=utf8 DB_NAME < sql_dump.sql
In order for MySQL to properly allow UTF-8 connections you need to set the following settings to your MySQL configuration (usually /etc/mysql/my.cnf):
[mysqld] collation-server = utf8_unicode_ci init-connect='SET NAMES utf8' character-set-server = utf8
Also you need to check that your online application / CMS is set to work on UTF-8 as well. But that's not part of this KB entry.
Search for the keyword UTF on our Knowledge Base to find other articles about making your application ready for UTF-8
See: Howto convert a MySQL/MariaDB database from any charset into UTF-8 - with Notepad++ in Windows
There was a little but within the convert_db.sh script which sometimes generated errors. This has been fixed.