This page describes how to update the character set and collation of Cloud SQL databases.
Overview
When you create a database in Cloud SQL for MySQL, you can specify the character set and collation configuration for a database. If you don't specify custom values for these settings, then the database uses the default value. In MySQL, you can specify custom values for the character set and collation at different object levels, including databases, tables, and columns. For more information about using character set and collation configurations in MySQL, see Character Sets, Collations, Unicode.
If you want to change the character set or collation of your database after you've already created the database and some objects in the database, then you can do so by following the procedure in this document.
This procedure consists of the following steps:
- Check current character set and collation values.
- Determine what level of update to perform.
- Take a backup of the instance.
- Generate rebuild commands for stored database objects.
- Perform updates to your databases, and if necessary, tables and columns.
- Validate the updates and look for errors.
- Recreate your stored database objects.
- Take another backup of the instance.
Check the current character set and collation values
Check the existing values that are configured for character set and collation for your database and database objects. When you check your database, you also need to check the configuration of its tables, columns, and stored objects. You can use the following statements to check your databases and its database objects.
Check your instance configuration
To check the default global values on the instance (server) level, start a MySQL shell and enter the following command:
mysql> SELECT @@character_set_server,@@collation_server;
Example output:
+------------------------+--------------------+
| @@character_set_server | @@collation_server |
+------------------------+--------------------+
| utf8mb4 | utf8mb4_0900_ai_ci |
+------------------------+--------------------+
1 row in set (0.00 sec)
You can check that the default global values are different from the custom values that you'll provide for your database later.
Check your database configuration
To check the existing character set and collation values on the database level, use the following statement. After you run each statement, make a list of all the databases and database objects that you need to update with new character sets or collation values. The list depends on which databases you want to change and how many tables, columns, and database objects require configuration updates.
SELECT * FROM INFORMATION_SCHEMA.SCHEMATA WHERE schema_name NOT IN ('performance_schema', 'mysql', 'sys', 'information_schema');
For example, you receive the following output:
+--------------+-------------+----------------------------+------------------------+----------+--------------------+
| CATALOG_NAME | SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME | SQL_PATH | DEFAULT_ENCRYPTION |
+--------------+-------------+----------------------------+------------------------+----------+--------------------+
| def | test1 | utf8mb4 | utf8mb4_0900_ai_ci | NULL | NO |
| def | test2 | utf8mb4 | utf8mb4_0900_ai_ci | NULL | NO |
| def | test3 | utf8mb4 | utf8mb4_0900_ai_ci | NULL | NO |
| def | test4 | utf8mb4 | utf8mb4_0900_ai_ci | NULL | NO |
+--------------+-------------+----------------------------+------------------------+----------+--------------------+
However, you want to change the databases to the following custom character set values:
+--------------+-------------+----------------------------+------------------------+----------+--------------------+
| CATALOG_NAME | SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME | SQL_PATH | DEFAULT_ENCRYPTION |
+--------------+-------------+----------------------------+------------------------+----------+--------------------+
| def | test1 | ascii | ascii_general_ci | NULL | NO |
| def | test2 | latin1 | latin1_swedish_ci | NULL | NO |
| def | test3 | utf16 | utf16_general_ci | NULL | NO |
| def | test4 | ucs2 | ucs2_general_ci | NULL | NO |
+--------------+-------------+----------------------------+------------------------+----------+--------------------+
Then, in this case, for each database that has a value that you want to change, you need to update each object that resides in the database.
Check the configuration of your tables
To check the existing values for the tables, use the following statement:
SELECT T.table_name, T.table_schema, CCSA.character_set_name, CCSA.collation_name FROM information_schema.`tables` T, information_schema.`collation_character_set_applicability` CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema NOT IN ('performance_schema', 'mysql', 'sys','information_schema');
Example output:
+------------+--------------+--------------------+-------------------+
| TABLE_NAME | TABLE_SCHEMA | CHARACTER_SET_NAME | COLLATION_NAME |
+------------+--------------+--------------------+-------------------+
| t1 | test1 | ascii | ascii_general_ci |
| t2 | test2 | latin1 | latin1_swedish_ci |
| t3 | test3 | utf16 | utf16_general_ci |
| t4 | test4 | ucs2 | ucs2_general_ci |
+------------+--------------+--------------------+-------------------+
Check the configuration of your table columns
To check the values for columns, use the following statement:
SELECT table_schema, table_name, column_name, character_set_name, collation_name FROM information_schema.columns WHERE table_schema NOT IN ('information_schema', 'performance_schema', 'sys', 'mysql');
Example output:
+--------------+------------+-------------+--------------------+------------------+
| TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | CHARACTER_SET_NAME | COLLATION_NAME |
+--------------+------------+-------------+--------------------+------------------+
| test1 | t1 | c | ascii | ascii_general_ci |
| test1 | t1 | i | NULL | NULL |
| test2 | t2 | j | NULL | NULL |
| test2 | t2 | v | ascii | ascii_general_ci |
| test3 | t3 | k | NULL | NULL |
| test3 | t3 | summary | utf16 | utf16_general_ci |
| test4 | t4 | col | ucs2 | ucs2_general_ci |
| test4 | t4 | p | NULL | NULL |
+--------------+------------+-------------+--------------------+------------------+
Check the active schema or database
To check the details for the active schema or database, use the following statement.
SELECT TABLE_SCHEMA, TABLE_NAME, CCSA.CHARACTER_SET_NAME AS DEFAULT_CHAR_SET, COLUMN_NAME, COLUMN_TYPE, C.CHARACTER_SET_NAME FROM information_schema.TABLES AS T JOIN information_schema.COLUMNS AS C USING (TABLE_SCHEMA, TABLE_NAME) JOIN information_schema.COLLATION_CHARACTER_SET_APPLICABILITY AS CCSA ON (T.TABLE_COLLATION = CCSA.COLLATION_NAME) WHERE TABLE_SCHEMA=SCHEMA() AND C.DATA_TYPE IN ('enum', 'varchar', 'char', 'text', 'mediumtext', 'longtext', 'set' ) ORDER BY TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME;
Example output:
+------------+
| DATABASE() |
+------------+
| test3 |
+------------+
+--------------+------------+------------------+-------------+-------------+--------------------+
| TABLE_SCHEMA | TABLE_NAME | DEFAULT_CHAR_SET | COLUMN_NAME | COLUMN_TYPE | CHARACTER_SET_NAME |
+--------------+------------+------------------+-------------+-------------+--------------------+
| test3 | t3 | utf16 | summary | text | utf16 |
+--------------+------------+------------------+-------------+-------------+--------------------+
Check the configuration of your stored procedures
To check the values for stored procedures, use the following statement:
SHOW PROCEDURE STATUS WHERE db = 'DB_NAME';
Replace DB_NAME with the name of the database.
Example output:
+-------+------+-----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+
| Db | Name | Type | Definer | Modified | Created | Security_type | Comment | character_set_client | collation_connection | Database Collation |
+-------+------+-----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+
| test2 | p1 | PROCEDURE | root@% | 2024-08-09 11:47:05 | 2024-08-09 11:47:05 | DEFINER | | utf8mb4 | utf8mb4_0900_ai_ci | latin1_swedish_ci |
| test2 | p2 | PROCEDURE | root@% | 2024-08-09 11:48:36 | 2024-08-09 11:48:36 | DEFINER | | utf8mb4 | utf8mb4_0900_ai_ci | latin1_swedish_ci |
+-------+------+-----------+---------+---------------------+---------------------+---------------+---------+----------------------+----------------------+--------------------+
To get the definition of a stored procedure, use the following statement:
SHOW CREATE PROCEDURE DB_NAME.SP_NAME;
Replace the following:
- DB_NAME: the name of the MySQL database
- SP_NAME: the name of the stored procedure
If a stored procedure isn't using your intended character set or collation value, then you might need to drop and recreate it with the new character set or collation.
Check database collation mapping
In the previous example output, the stored
procedure is created with latin1_swedish_ci
as the default collation and the
corresponding character set is latin1
. This means all processing happens using
the latin1
character set (unless specific parameters of the procedure are
tagged with a custom character set.) If you change only the database-level character
set and collation, then the database-level update doesn't update the stored
procedure automatically. The procedure continues to operate with the
latin1
character set and the respective collation.
Dropping and recreating the stored procedure ensures that the procedure starts
to use the new character set and new collation of the database that you just updated.
You can find out the mapping between database collation and character set by
querying the information_schema.COLLATION_CHARACTER_SET_APPLICABILITY
table.
Use the following statement as an example:
SELECT CHARACTER_SET_NAME
FROM information_schema.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE COLLATION_NAME = (SELECT database_collation FROM information_schema.ROUTINES
WHERE ROUTINE_SCHEMA = "test2" and ROUTINE_NAME = "p1");
Example output:
+--------------------+
| CHARACTER_SET_NAME |
+--------------------+
| latin1 |
+--------------------+
Check the configuration of your triggers
To check the values for triggers, use the following statement as an example.
SHOW TRIGGERS FROM DB_NAME;
Replace DB_NAME with the name of the MySQL database.
To get the definition of a trigger, use the following example statement.
SHOW CREATE trigger DB_NAME.TRIGGER_NAME;
Replace the following:
- DB_NAME: the name of the MySQL database
- TRIGGER_NAME: the name of the trigger
If the trigger isn't using your intended character set or collation value, then you might need to drop and recreate it with the new character set or collation. Check that the trigger is using the correct database collation mapping. For more information about checking database collation, see Check database collation mapping.
Check the configuration of your events
To check the values for events, use the following statement as an example.
SHOW EVENTS FROM DB_NAME;
Replace DB_NAME with the name of the database.
To get the definition of an event, use the following example statement.
SHOW CREATE EVENT DB_NAME.EVENT_NAME;
Replace the following:
- DB_NAME: the name of the database
- EVENT_NAME: the name of the event
If the event isn't using your intended character set or collation value, then you might need to drop and recreate it with the new character set or collation. Check that the event is using the correct database collation mapping. For more information about checking database collation, see Check database collation mapping.
Check the configuration of your functions
To check the values for functions, use the following statement as an example.
SHOW FUNCTION STATUS WHERE db = 'DB_NAME';
Replace DB_NAME with the name of the database.
To get the definition of a function, use the following example statement.
SHOW CREATE FUNCTION DB_NAME.FUNCTION_NAME;
Replace the following:
- DB_NAME: the name of the MySQL database
- FUNCTION_NAME: the name of the function
If the function isn't using your intended character set or collation value, then you might need to drop and recreate it with the new character set or collation. Check that the function is using the correct database collation mapping. For more information about checking database collation, see Check database collation mapping.
I character set or collation. Check that the function is using the correct database collation mapping. For more information about checking database collation, see Check database collation mapping.
Determine what level of updates to perform
After you have checked the character set and collation configuration of your database, you have a list of databases and possibly a list of database objects that you need to update.
The number of tasks you'll need to perform depend on the type of the database objects that require configuration changes.
- Perform database-level updates. You need to update MySQL metadata only.
- Perform database and table-level updates. You need to update MySQL metadata only.
- Perform database, table, and column-level updates. You must update the metadata and rebuild tables and database objects. These updates can take the most amount of time; however, you can use one of the provided example scripts to update all of the tables and columns in a given database.
Take a backup of the instance
Before you do any of these updates, take a backup of your instance. For more information, see Create an on-demand backup. We recommend that you take the backup without any DDL operations running on the instance.
Generate rebuild commands for the stored database objects in your database
If you need to drop and recreate any stored database objects in your databases with the
new character set and collation, then use the mysqldump
tool to generate the
commands now. After you update your database and table columns, you'll use the
generated mysqldump
file to rebuild the stored database objects in your
database. This command generates statements for all stored procedures,
functions, triggers, and events.
Run the following command so you can use the output to recreate all the stored procedures, functions, and events of a particular database later (when you recreate stored database objects).
$ mysqldump -uDBAuser -p -h IP_ADDRESS \ -P 3306 DB_NAME --no-data --no-create-db \ --no-create-info --routines --triggers --events \ --set-gtid-purged=OFF --events > dump_objects.sql
Replace the following:
- IP_ADDRESS: IP address of the Cloud SQL for MySQL instance
- DB_NAME: the name of the MySQL database
Perform updates to your databases, and if necessary, tables and columns.
In this step, you update the character set and collation values of your databases. Also, you might need to rebuild your tables and columns.
Perform database-level updates
If you are performing database-level updates only, then run the following statement to update your database:
ALTER DATABASE DB_NAME CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION;
Replace the following:
- DB_NAME: the name of the database to update
- NEW_CHARSET: the new character set to apply to the database
- NEW_COLLATION: the new collation to apply to the database
Perform database and table-level updates
If you are performing database and table-level updates, then do the following:
Run the following statement to update your database:
ALTER DATABASE DB_NAME CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION;
Replace the following:
- DB_NAME: the name of the database to update
- NEW_CHARSET: the new character set to apply to the database
- NEW_COLLATION: the new collation to apply to the database
For each table that you want to update, run the following statement:
ALTER TABLE TABLE_NAME CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION;
Replace the following:
- TABLE_NAME: the name of the table to update
- NEW_CHARSET: the new character set to apply to the table
- NEW_COLLATION: the new collation to apply to the table
Perform database, table, and column-level updates
At this level, you need to re-encode existing data to comply with the new character set and collation settings. If you are performing a database, table, and column-level updates, then do the following:
Before proceeding with updates to the table columns, review the following considerations:
- The
ALTER TABLE
character set and collation conversion causes the table to rebuild and locks the table. As a result, any active query that tries to access the table is blocked. - The
ALTER TABLE
character set and collation conversion takes time because it fetches the complete table in memory. This operation might increase the latency of any workload running in parallel because the buffer pool pages are being used for the updated table. - If there are queries accessing multiple tables and some of the tables are being updated while others are not yet updated, then the queries might produce inconsistent results during this transaction time.
- The process of running an
ALTER TABLE
character set and collation conversion creates a shadow table, so plan disk usage accordingly. - The
ALTER TABLE
character set and collation conversion also causes any indexes on the table to rebuild. - You don't need to update replica instances separately.
ALTER
,DROP
, andCREATE
commands are replicated automatically. However, running DDL commands can cause replica lag time to increase during the operation. - If you restore any of the databases from a backup taken from before you applied these updates, then you'll need to re-apply the updates.
- The
Check your table for any data tuples that might be incompatible with the new character set. If there are any incompatibilities, then you need to fix them before you update the character set. Otherwise, you'll encounter errors during the conversion process. You can use the following
SELECT CONVERT
statement to validate your data with new character set.SELECT COLUMN_NAME,CONVERT(COLUMN_NAME USING NEW_CHARSET) FROM TABLE_NAME WHERE COLUMN_NAME != CONVERT(COLUMN_NAME USING NEW_CHARSET);
SELECT c,CONVERT(c USING 'ascii') FROM t WHERE c != CONVERT(c USING ascii);
Example output:
+------+--------------------------+ | c | convert(c using 'ascii') | +------+--------------------------+ | é | ? | | é | ? | | £ | ? | +------+--------------------------+
In this example, since the output returns these three tuples, there are three inconsistent values that need to be fixed. Before moving to the next step, examine the output of your query, and fix the data until your query returns zero tuples for all the columns that need conversion. You can also combine multiple columns in the same
SELECT
query.Choose an option for updating the character set and collation for your tables at the column level.
If you want to update the character set for a very specific table column, then follow the steps in the Column-level tab where you only have to re-encode selected columns. Otherwise, follow the steps in the Table-level tab which re-encode the full table.
Table-level
Run the following statement to update your database:
ALTER DATABASE DB_NAME CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION;
Replace the following:
- DB_NAME: the name of the database to update
- NEW_CHARSET: the new character set to apply to the database
- NEW_COLLATION: the new collation to apply to the database
For each affected table, run the following statement to update your table:
ALTER TABLE TABLE_NAME CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION;
Replace the following:
- TABLE_NAME: the name of the table to update
- NEW_CHARSET: the new character set to apply to the table
- NEW_COLLATION: the new collation to apply to the table
Update at table level so that all columns that are part of the table are updated with the new character set and collation. This option rebuilds the table completely. To use this option, run the following statement:
ALTER TABLE DB_NAME.TABLE_NAME CONVERT TO CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION;
Replace the following:
- TABLE_NAME: the name of the table to update
- NEW_CHARSET: the new character set to apply to the table
- NEW_COLLATION: the new collation to apply to the table
Column-level
Update at the column level. This option also rebuilds the table completely. If multiple columns are updated, then this option can cause multiple rebuilds.
To use this option, run the following statement:
- Run the following statement to update your database:
- DB_NAME: the name of the database to update
- NEW_CHARSET: the new character set to apply to the database
- NEW_COLLATION: the new collation to apply to the database
- For each affected table, run the following statement to update your table:
- TABLE_NAME: the name of the table to update
- NEW_CHARSET: the new character set to apply to the table
- NEW_COLLATION: the new collation to apply to the table
- Run the following statement to update at the column level:
- DB_NAME: the name of the database to update
- TABLE_NAME: the name of the table to update
- COLUMN_DEFINITION: specify an exact column definition that includes constraints. The column definition might carry the old character set and collation values. You must update the column definition with the new values.
- NEW_CHARSET: the new character set to apply to the table column
- NEW_COLLATION: the new collation to apply to the table column
ALTER DATABASE DB_NAME CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION;
Replace the following:
ALTER TABLE TABLE_NAME CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION;
Replace the following:
ALTER TABLE DB_NAME.TABLE_NAME MODIFY COLUMN_DEFINITION CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION;
Replace the following:
Repeat this statement for all string columns present in the table that need a character set or collation update.
Validate the updates and look for errors
During the ALTER TABLE
operation, look for any errors that indicate that
existing data in the table isn't compatible with the new character set. For
example:
ERROR 1366 (HY000): Incorrect string value: '\xC3\xA9' for column 'c' at row 1
If you encounter any data compatibility errors, then we recommend that you
check your tables again for any data compatibility errors and fix them before
re-running the ALTER TABLE
statements to re-encode the data.
In addition, during the ALTER TABLE
operation, the tables (the foreign key
table and the referenced table), a foreign key check might fail while the column
undergoes re-encoding. If this happens, then you can disable the foreign key
check (SET FOREIGN_KEY_CHECKS=0;
) and run the update again.
Recreate your stored database objects
After you have updated your database successfully, you can use the generated
mysqldump
file to recreate the stored
database objects with the character set and collation values.
To recreate your stored database objects, do the following:
- Verify that the stored database objects are present on the database.
- Make sure that there's no parallel load that expects the stored database objects to be present during this drop and recreate time.
The
mysqldump
file that you generated with themysqldump
tool doesn't include the statement for dropping triggers. If you need to drop triggers, then open a MySQL terminal to generate theDROP
statement for triggers.mysql> SELECT CONCAT('DROP TRIGGER ', TRIGGER_NAME, ';') AS drop_trigger_statements FROM information_schema.TRIGGERS WHERE TRIGGER_SCHEMA = DB_NAME;
Copy the output.
Run the statement to drop the triggers for your database, either in a MySQL shell or in a script.
After you drop the triggers, apply the dump file that you generated with the
mysqldump
tool. To apply the dump file and recreate the objects, enter the following:$ mysql -uDBAuser -p -h IP_ADDRESS \ -P 3306 DB_NAME < dump_objects.sql
Replace the following:
- IP_ADDRESS: IP address of the Cloud SQL for MySQL instance
- DB_NAME: the name of the database
Take another backup of the instance
After you complete your updates, take another backup of your instance. For more information, see Create an on-demand backup.
Example scripts
If you need to update databases, tables, and columns, then this section provides example scripts to help re-encode your data.
Script to update multiple tables
The following script generates a statement for all the tables in the given database. The statement lets you re-encode existing data to comply with the new character set and collation settings.
SELECT CONCAT("ALTER TABLE ",TABLE_SCHEMA,".",TABLE_NAME," CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION; ", "ALTER TABLE ",TABLE_SCHEMA,".",TABLE_NAME," CONVERT TO CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION; ") AS alter_sql FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'DB_NAME' AND TABLE_TYPE='BASE TABLE';
Replace the following:
- DB_NAME: the name of the database to update
- NEW_CHARSET: the new character set to apply to the table
- NEW_COLLATION: the new collation to apply to the table
Example output:
+--------------------------------------------------------------------------------------------+
|alter_sql
+--------------------------------------------------------------------------------------------+
| ALTER TABLE test1.t1 CHARACTER SET <new-charset> COLLATE <new-collation>;
ALTER TABLE test1.t1 CONVERT TO CHARACTER SET <new-charset> COLLATE <new-collation>; |
+--------------------------------------------------------------------------------------------+
Script to update multiple columns in a table
The following script generates a statement for all the columns in a given table. The statement lets you re-encode existing data to comply with the new character set and collation settings.
Generate
ALTER TABLE
statements for all tables in the databaseSELECT CONCAT("ALTER TABLE ",TABLE_SCHEMA,".",TABLE_NAME," CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION; ") AS alter_table_statements FROM information_schema.TABLES WHERE TABLE_SCHEMA = 'DB_NAME' AND TABLE_TYPE='BASE TABLE';
Replace the following:
- DB_NAME: the name of the database to update
- NEW_CHARSET: the new character set to apply to the table
- NEW_COLLATION: the new collation to apply to the table
Example output:
+------------------------------------------------------------------------------+ | alter_table_statements | +------------------------------------------------------------------------------+ | ALTER TABLE test1.t1 CHARACTER SET <new-charset> COLLATE <new-collation>; | +------------------------------------------------------------------------------+
For every table, generate
ALTER TABLE MODIFY
commands.SELECT CONCAT( "ALTER TABLE ", table_name, " MODIFY ", column_name, 'COLUMN_DEFINITION' , " CHARACTER SET NEW_CHARSET COLLATE NEW_COLLATION ") AS alter_table_column_statements FROM information_schema.columns WHERE table_schema = 'DB_NAME' AND table_name = 'TABLE_NAME' AND data_type IN ('char', 'varchar', 'tinytext', 'text', 'mediumtext', 'longtext','set','enum');
Replace the following:
- DB_NAME: the name of the database to update
- TABLE_NAME: the name of the table to update
- COLUMN_DEFINITION: after obtaining results from each query, replace with the column definition for each column (column definitions can differ across columns)
- NEW_CHARSET: the new character set to apply to the table column
- NEW_COLLATION: the new collation to apply to the table column
Example output:
+-------------------------------------------------------------------------------------------------------------------------------------+ |alter_table_column_statements | +-------------------------------------------------------------------------------------------------------------------------------------+ | ALTER TABLE t1 MODIFY c <column-definition-replaced-after-obtaining-result-set> CHARACTER SET <new-charset> COLLATE <new-collation> | +-------------------------------------------------------------------------------------------------------------------------------------+