Tutorial for migrating the database used by Django from MySQL to PostgreSQL

2020-05-07 19:58:23
OfStack

We have completed the migration of Django 1.6 app database from mysql to PostgreSQL. If your environment is clean, the process is simple, just allow syncdb or migrate to create the data in truncating table, truncating table, and then run dumpdata and loaddatamanagement commands.
Step 1: create an empty instance in your PostgreSQL database:


CREATE DATABASE dbname OWNER rolename;

Step 2: configure the database you created in Django

In setting.py, we configure the following:


DATABASES = {
  'default': {
    'ENGINE': 'django.db.backends.mysql',
    'NAME': 'dbname',
    'USER': 'dbuser',
    'PASSWORD': 'dbpass',
    'HOST': 'mysql.example.com',
    'PORT': '',
  },
  'postgresql': {
    'ENGINE': 'django.db.backends.postgresql_psycopg2',
    'NAME': 'dbname',
    'USER': 'dbuser',
    'PASSWORD': 'dbpass',
    'HOST': 'postgresql.example.com',
    'PORT': '',
  }
}

This allows us to specify which database the name will operate on.
Step 3, create the table in the PostgreSQL instance


python manage.py syncdb --migrate --no-initial-data --database=postgresql

Running syncdb and migrations in PostgreSQL has no initial data.
Step 4, truncate's newly created table

Although we specified the following steps in the previous step, in order to prevent the user's custom data from being added to the table without knowing it during the migration, we'd better continue with the newly created table under truncate1. We can generate an SQL script and run:


python manage.py sqlflush --database=postgresql

Step 5: back up the data from mysql to JSON

Django has 1 dumpdata command, with which you can generate 1 database-independent backup. The default format is JSON.


python manage.py dumpdata --all --natural --indent=4 > dbname.json

The -all parameter here is to make sure you have your own filtering and modification requirements while exporting data, and the -natural parameter tells Django to use the natural keys(if available) and the indent parameter to make the output more readable.

You may want to export only the data in a particular apps, or export only one celery logs, in which case you can use the WMD 76en parameter, for example:


python manage.py dumpdata --all --natural --exclude=djcelery --exclude=search.SearchLog --indent=4 > dbname.json

Step 6: load the JSON data into the PostgreSQL database


python manage.py loaddata dbname.json --database=postgresql

Basically the migration process is over, now you just have to change 1 of your database configuration, and PostgerSQL becomes the default database.


DATABASES = {
  'default': {
    'ENGINE': 'django.db.backends.postgresql_psycopg2',
    'NAME': 'dbname',
    'USER': 'dbuser',
    'PASSWORD': 'dbpass',
    'HOST': 'postgresql.example.com',
    'PORT': '',
  }
}

In our case, our database is not very clean, our database is for 1 PHP legacy code created, we are still in one step by step to get rid of it, we have 1 more Django outside of the database, and are used by the program, in order to put these migrated to PostgreSQL the, I use this tool, process or simple 1.
Notes for
Django signal

You might want to disable these, and when a database record is created, your program might send you used emails, and in order not to disturb them, when you load the data, you need to make sure that they are disabled, Here's one way to handle this and we use a detector to make sure that the signal is not triggered.
Constraints (like non-empty, unique and foreign keys)

Our database migration process met with many of the problems, such as a foreign key constraints, but the other one table, one empty record, but not allowed in the model definition, such as copying unique constraints exist in the process, all these need manual intervention, I have to use SQL remove these statements Dumpdata and loaddata will detect these, so your the consistency of the database must be a clean state.

Hard coded for the primary key

This was painful, because in our test suite, there was a lot of hard coding of primary keys everywhere, so many tests failed, because the sequence method processed by PostgreSQL was not quite the same as that of mysql. I had to manually modify over 700 test cases, most of which were simple but time-consuming.

native SQL statement

Fortunately, we only used native sql in one place, and some of the functions in mysql don't work in PostgreSQL, so we can just change them to functions that do the same thing in PostgreSQl.

case sensitive

String comparisons are case-sensitive in PostgreSQL, but not in Mysql. I also had some problems migrating non-Django data tables. When the index was created, the command required id, but the field name was Id(I).