I wrestled with this for some time. The problem lies not in how to load the data, but how to construct the table to hold it. You must generate a DDL statement to build the table before importing the data.
Particularly difficult if the table has a large number of columns.
Here's a python script that (almost) does the job:
#!/usr/bin/python
import sys
import csv
# get file name (and hence table name) from command line
# exit with usage if no suitable argument
if len(sys.argv) < 2:
sys.exit('Usage: ' + sys.argv[0] + ': input CSV filename')
ifile = sys.argv[1]
# emit the standard invocation
print 'create table ' + ifile + ' ('
with open(ifile + '.csv') as inputfile:
reader = csv.DictReader(inputfile)
for row in reader:
k = row.keys()
for item in k:
print '`' + item + '` TEXT,'
break
print ')\n'
The problem it leaves to solve is that the final field name and data type declaration is terminated with a comma, and the mySQL parser won't tolerate that.
Of course it also has the problem that it uses the TEXT data type for every field. If the table has several hundred columns, then VARCHAR(64) will make the table too large.
This also seems to break at the maximum column count for mySQL. That's when it's time to move to Hive or HBase if you are able.