PostgreSQL Accessing MySQL as a Data Source Using mysql_fdw
There are many organizations where front/web-facing applications use MySQL and back end processing uses PostgreSQL®. Any system integration between these applications generally involves the replication—or duplication—of data from system to system. We recently blogged about pg_chameleon which can be used replicate data from MySQL® to PostgreSQL. mysql_fdw can play a key role in eliminating the problem of replicating/duplicating data. In order to eliminate maintaining the same data physically in both postgres and MySQL, we can use mysql_fdw. This allows PostgreSQL to access MySQL tables and to use them as if they are local tables in PostgreSQL. mysql_fdw can be used, too, with Percona Server for MySQL, our drop-in replacement for MySQL.
This post is to showcase how easy it is to set that up and get them working together. We will address a few points that we skipped while discussing about FDWs in general in our previous post
Preparing MySQL for fdw connectivity
On the MySQL server side, we need to set up a user to allow for access to MySQL from the PostgreSQL server side. We recommend Percona Server for MySQL if you are setting it up for the first time.
This user needs to have privileges on the tables which are to be presented as foreign tables in PostgreSQL.
Installing mysql_fdw on PostgreSQL server
Under the hood, MySQL FDW (mysql_fdw) facilitates the use of PostgreSQL server as a client for MySQL Server, which means it can then fetch data from the MySQL database as a client. Obviously, mysql_fdw uses MySQL client libraries. Nowadays, many Linux distributions are packaged with MariaDB® libraries. This works well enough for mysql_fdw to function. If we install mysql_fdw from the PGDG repo, then mariadb-devel.x86_64 packages will be installed alongside other development packages. To switch to Percona packages as client libraries, you need to have the Percona development packages too.
Now we should be able to install the mysql_fdw from PGDG repository:
Connect to the PostgreSQL server where we are going to create the foreign table, and using the command line tool, create mysql_fdw extension:
Create a server definition to point to the MySQL server running on a host machine by specifying the hostname and port:
Now we can create a user mapping. This maps the database user in PostgreSQL to the user on the remote server (MySQL). While creating the user mapping, we need to specify the user credentials for the MySQL server as shown below. For this demonstration, we are using PUBLIC user in PostgreSQL. However, we could use a specific user as an alternative.
Import schema objects
Once we complete the user mapping, we can import the foreign schema.
Or we have the option to import only selected tables from the foreign schema.
This statement says that the tables “EMP” and “DEPT” from the foreign schema named “hrdb” in mysql_server need to be imported into the public schema of the PostgreSQL database.
FDWs in PostgreSQL allow us to import the tables to any schema in postgres.
Let’s create a schema in postgres:
Suppose we need the foreign table to be part of multiple schemas of PostgreSQL. Yes, it is possible.
You might be wondering if there’s a benefit to doing this. Yes, since in a multi-tenant environment, it allows us to centralize many of the master/lookup tables. These can even sit in a remote server, and that can be MySQL as well!.
IMPORTANT: PostgreSQL extensions are database specific. So if you have more than one database inside a PostgreSQL instance/cluster, you have to create a separate fdw extension, foreign server definition and user mapping.
Foreign tables with a subset of columns
Another important property of foreign tables is that you can have a subset of columns if you are not planning to issue DMLs on the remote table. For example MySQL’s famous sample database Sakila contains a table “film” with the following definition
Imagine that we don’t need all of these fields to be available to the PostgreSQL database and its application. In such cases, we can create a foreign table with only the necessary columns in the PostgreSQL side. For example:
The challenges of incompatible syntax and datatypes
There are many syntactical differences between MySQL and PostgreSQL. Consequently, you may need to manually intervene to create foreign tables. For example, MySQL tables accepts definition of enumerations in place, whereas PostgreSQL expects enumeration types to be defined before creating the table like this:
Many such things are not handled perfectly. So it is better to specify them as a text datatype. The same applies to the set datatype.
I’m used to receiving scepticism from people about treating enum and set as text. Well, please don’t forget that we are not storing them in PostgreSQL, the text datatype is just a method for handling input and output from the table. The data is pulled and pushed from the foreign server, which is MySQL, and this converts these texts into the corresponding enumeration before storing them.
IMPORTANT : mysql_fdw has the capability to do data type conversion (casting) automatically behind the scenes when a user fires DML against foreign tables.
Generally, DML against a remote MySQL database from the PostgreSQL side can be quite challenging because of the architecture differences. These impose restrictions, such as the first column of the foreign table must be unique. We will cover these in more depth in a future post.
Handling views on the MySQL side
While foreign tables are not limited tables on the MySQL side, a view can also be mapped as a foreign table. Let’s create a view in the MySQL database.
PostgreSQL can treat this view as a foreign table:
Views on the top of foreign table on PostgreSQL
PostgreSQL allows us to create views on the top of foreign tables. This might even be pointing to a view on the remote MySQL server. Let’s try creating a view using the newly created foreign table v_film.
Materializing the foreign tables (Materialized Views)
One of the key features mysql_fdw implements is the ability to support persistent connections. After query execution, the connection to the remote MySQL database is not dropped. Instead it retains the connection for the next query from the same session. Nevertheless, in some situations, there will be concerns about continuously streaming data from the source database (MySQL) to the destination (PostgreSQL). If you have a frequent need to access data from foreign tables, you could consider the option of materializing the data locally. It is possible to create a materialized view on the top of the foreign table.
Whenever required, we can just refresh the materialized view.
Automated Cleanup
One of the features I love about the FDW framework is its ability to clean up foreign tables in a single shot. This is very useful when we setup foreign table for a temporary purpose, like data migration. At the very top level, we can drop the extension, PostgreSQL will walk through the dependencies and drop those too.
Conclusion
I should concede that the features offered by mysql_fdw are far fewer compared to postgres_fdw. Many of the features are not yet implemented, including column renaming. But the good news is that the key developer and maintainer of mysql_fdw is here with Percona! Hopefully, we will be able to put more effort into implementing some of the missing features. Even so, we can see here that the features implemented so far are powerful enough to support system integration. We can really make the two sing together!
Percona’s support for PostgreSQL
As part of our commitment to being unbiased champions of the open source database eco-system, Percona offers support for PostgreSQL – you can read more about that here.
nice! If the MYSQL schema changes, how are they propagated? what happens if there’s (new) impedance mismatches (e.g. incompatible syntax / datatypes).
Then you need to drop the PostgreSQL foreign table and create a new one with matched MySQL table. Your data in mysql will be intact by dropping the PostgreSQL foreign table.
1) how do you know you need to do this? can it be automated?
I prefer not to automate that, as we saw in the section “Foreign tables with a subset of columns” in the article, we may not need every data from remote database to be present in PostgreSQL side.
We can pull only minimal information which is required to satisfy our business needs. For example, A payroll system running in PostgreSQL can use essential data from HR system.
Another advantage is many of the changes happening in the source system (MySQL) may not directly impact PostgreSQL side, such as adding new column to table in MySQL which is not required at the PostgreSQL side.
2) what if you have non-trivial amounts of data? seems crazy to re-import everything?
3) if this is done transactionally in Postgres, are you holding table locks? How do you avoid deadlocks/timeouts?
i.e. how practical is all of this?
2) what if you have non-trivial amounts of data? seems crazy to re-import everything?
Data never gets fully “imported”/stored into PostgreSQL side, only table definitions gets imported. PostgreSQL serer acts just like a MySQL client. PostgreSQL pulls only essential data from MySQL side based on query/join optimization.
if query on remote table is very frequent and if you want to avoid streaming data as part of query every time, you can consider the materializing them at PostgreSQL side. Please refer the section “Materializing the foreign tables (Materialized Views)” for the same
3) if this is done transitionally in Postgres, are you holding table locks? How do you avoid deadlocks/timeouts?
As we discussed, PostgreSQL server connects to MySQL server as a client. So automatically it is transaction compliant.
All server side strategies including locking is applicable as PostgreSQL is acting a client. Completely avoiding deadlocks is not possible in any of the transaction systems, however we can avoid the chance by smart designs.
i.e. how practical is all of this?
FDWs are in general, practically used very much. It is key part in many system-integrations and migrations. It is even used as a simple sharding technique. However, It is not well known outside PostgreSQL circles, Just because SQL/MED feature are not present in my databases. Purpose of this blog is to make it more popular