One of the ways to import data into MySQL is using the LOAD DATA INFILE. It is a faster method than recovering from a dump, as it’s raw data instead of SQL sentences.
The import time depends on the table engine, for example, MyISAM can be 40 times faster than Innodb. Let’s benchmark this:
Preparation
I’m gonna make some benchmarking using MySQL 5.1.36 (64 bits MacOS X). I’ll need a big table, so I’ll take City from the World Database and create a huge table called “city_huge”:
CREATE TABLE city_huge LIKE CITY;
INSERT INTO city_huge
SELECT NULL, name, CountryCode, District, Population FROM city;
# Run this sentence 100 times,
# so city_huge table will be 100 times bigger than city.
# Tip: use a script, temporary table, stored procedure...
# or tell your monkey to do so.
SELECT COUNT(*) FROM city_huge;
# +----------+
# | COUNT(*) |
# +----------+
# | 407900 |
# +----------+
# Make a table data backup:
SELECT * FROM city_huge INTO OUTFILE 'city_huge.bak';
# Truncate table, so we'll start with an empty table.
TRUNCATE TABLE city_huge;
Direct Import
Let’s import the backup into the city_huge table, using MyISAM, InnoDB and MEMORY:
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
# Query OK, ... (5.85 sec)
# So, that was using MyISAM.
# Let's empty the table and change the engine to InnoDB:
TRUNCATE TABLE city_huge;
ALTER TABLE city_huge ENGINE = InnoDB;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
# Query OK, ... (3 min 59.53 sec)
# With Memory:
TRUNCATE TABLE city_huge;
SET @@max_heap_size= 128 * 1024 * 1024;
ALTER TABLE city_huge ENGINE = MEMORY;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
# Query OK, ... (2.18 sec)
MyISAM0:5.85InnoDB3:59.53MEMORY0:2.18
Wow, MyISAM is almost 40 times faster. And MEMORY is even faster.
Alter Table
Ok, InnoDB is a bit slow, but sometimes you can’t use another storage engine. In those cases, you could import in the other engine and then change the table engine to InnoDB.
That would look like this:
TRUNCATE TABLE city_huge;
ALTER TABLE city_huge ENGINE = MyISAM;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
# Query OK, ... (5.85 sec)
ALTER TABLE city_huge ENGINE = InnoDB;
# Query OK, ... (4 min 11.24 sec)
Ooops, 4 min 17 sec is more than 3 min 59 sec.
Let’s try Memory:
TRUNCATE TABLE city_huge;
SET @@max_heap_size= 128 * 1024 * 1024;
ALTER TABLE city_huge ENGINE = MEMORY;
LOAD DATA INFILE 'city_huge.bak' INTO TABLE city_huge;
# Query OK, ... (2.18 sec)
ALTER TABLE city_huge ENGINE = InnoDB;
# Query OK, ... (3 min 28.39 sec)
Yes, 3 min 31 sec is faster than 4 min. 30 seconds are around 10% faster.
Disclaimer
This benchmark is done using the default configuration, I’m sure that tuning InnoDB will improve the results. Also, I’m not the most accurate benchmarker, so I encourage you to do your own benchmarks.
This solution is not a silver bullet, MEMORY engine needs lots of memory. And this is NOT always the best approach, InnoDB should be fast by it’s own.
Anyway, its funny to discover those not so obvious behaviours :D.