[java] Efficient way to do batch INSERTS with JDBC

In my app I need to do a lot of INSERTS. Its a Java app and I am using plain JDBC to execute the queries. The DB being Oracle. I have enabled batching though, so it saves me network latencies to execute queries. But the queries execute serially as separate INSERTs:

insert into some_table (col1, col2) values (val1, val2)
insert into some_table (col1, col2) values (val3, val4)
insert into some_table (col1, col2) values (val5, val6)

I was wondering if the following form of INSERT might be more efficient:

insert into some_table (col1, col2) values (val1, val2), (val3, val4), (val5, val6)

i.e. collapsing multiple INSERTs into one.

Any other tips for making batch INSERTs faster?

This question is related to java sql performance jdbc

The answer is


You'll have to benchmark, obviously, but over JDBC issuing multiple inserts will be much faster if you use a PreparedStatement rather than a Statement.


Though the question asks inserting efficiently to Oracle using JDBC, I'm currently playing with DB2 (On IBM mainframe), conceptually inserting would be similar so thought it might be helpful to see my metrics between

  • inserting one record at a time

  • inserting a batch of records (very efficient)

Here go the metrics

1) Inserting one record at a time

public void writeWithCompileQuery(int records) {
    PreparedStatement statement;

    try {
        Connection connection = getDatabaseConnection();
        connection.setAutoCommit(true);

        String compiledQuery = "INSERT INTO TESTDB.EMPLOYEE(EMPNO, EMPNM, DEPT, RANK, USERNAME)" +
                " VALUES" + "(?, ?, ?, ?, ?)";
        statement = connection.prepareStatement(compiledQuery);

        long start = System.currentTimeMillis();

        for(int index = 1; index < records; index++) {
            statement.setInt(1, index);
            statement.setString(2, "emp number-"+index);
            statement.setInt(3, index);
            statement.setInt(4, index);
            statement.setString(5, "username");

            long startInternal = System.currentTimeMillis();
            statement.executeUpdate();
            System.out.println("each transaction time taken = " + (System.currentTimeMillis() - startInternal) + " ms");
        }

        long end = System.currentTimeMillis();
        System.out.println("total time taken = " + (end - start) + " ms");
        System.out.println("avg total time taken = " + (end - start)/ records + " ms");

        statement.close();
        connection.close();

    } catch (SQLException ex) {
        System.err.println("SQLException information");
        while (ex != null) {
            System.err.println("Error msg: " + ex.getMessage());
            ex = ex.getNextException();
        }
    }
}

The metrics for 100 transactions :

each transaction time taken = 123 ms
each transaction time taken = 53 ms
each transaction time taken = 48 ms
each transaction time taken = 48 ms
each transaction time taken = 49 ms
each transaction time taken = 49 ms
...
..
.
each transaction time taken = 49 ms
each transaction time taken = 49 ms
total time taken = 4935 ms
avg total time taken = 49 ms

The first transaction is taking around 120-150ms which is for the query parse and then execution, the subsequent transactions are only taking around 50ms. (Which is still high, but my database is on a different server(I need to troubleshoot the network))

2) With insertion in a batch (efficient one) - achieved by preparedStatement.executeBatch()

public int[] writeInABatchWithCompiledQuery(int records) {
    PreparedStatement preparedStatement;

    try {
        Connection connection = getDatabaseConnection();
        connection.setAutoCommit(true);

        String compiledQuery = "INSERT INTO TESTDB.EMPLOYEE(EMPNO, EMPNM, DEPT, RANK, USERNAME)" +
                " VALUES" + "(?, ?, ?, ?, ?)";
        preparedStatement = connection.prepareStatement(compiledQuery);

        for(int index = 1; index <= records; index++) {
            preparedStatement.setInt(1, index);
            preparedStatement.setString(2, "empo number-"+index);
            preparedStatement.setInt(3, index+100);
            preparedStatement.setInt(4, index+200);
            preparedStatement.setString(5, "usernames");
            preparedStatement.addBatch();
        }

        long start = System.currentTimeMillis();
        int[] inserted = preparedStatement.executeBatch();
        long end = System.currentTimeMillis();

        System.out.println("total time taken to insert the batch = " + (end - start) + " ms");
        System.out.println("total time taken = " + (end - start)/records + " s");

        preparedStatement.close();
        connection.close();

        return inserted;

    } catch (SQLException ex) {
        System.err.println("SQLException information");
        while (ex != null) {
            System.err.println("Error msg: " + ex.getMessage());
            ex = ex.getNextException();
        }
        throw new RuntimeException("Error");
    }
}

The metrics for a batch of 100 transactions is

total time taken to insert the batch = 127 ms

and for 1000 transactions

total time taken to insert the batch = 341 ms

So, making 100 transactions in ~5000ms (with one trxn at a time) is decreased to ~150ms (with a batch of 100 records).

NOTE - Ignore my network which is super slow, but the metrics values would be relative.


In my code I have no direct access to the 'preparedStatement' so I cannot use batch, I just pass it the query and a list of parameters. The trick however is to create a variable length insert statement, and a LinkedList of parameters. The effect is the same as the top example, with variable parameter input length.See below (error checking omitted). Assuming 'myTable' has 3 updatable fields: f1, f2 and f3

String []args={"A","B","C", "X","Y","Z" }; // etc, input list of triplets
final String QUERY="INSERT INTO [myTable] (f1,f2,f3) values ";
LinkedList params=new LinkedList();
String comma="";
StringBuilder q=QUERY;
for(int nl=0; nl< args.length; nl+=3 ) { // args is a list of triplets values
    params.add(args[nl]);
    params.add(args[nl+1]);
    params.add(args[nl+2]);
    q.append(comma+"(?,?,?)");
    comma=",";
}      
int nr=insertIntoDB(q, params);

in my DBInterface class I have:

int insertIntoDB(String query, LinkedList <String>params) {
    preparedUPDStmt = connectionSQL.prepareStatement(query);
    int n=1;
    for(String x:params) {
        preparedUPDStmt.setString(n++, x);
    }
    int updates=preparedUPDStmt.executeUpdate();
    return updates;
}

How about using the INSERT ALL statement ?

INSERT ALL

INTO table_name VALUES ()

INTO table_name VALUES ()

...

SELECT Statement;

I remember that the last select statement is mandatory in order to make this request succeed. Don't remember why though. You might consider using PreparedStatement instead as well. lots of advantages !

Farid


This is a mix of the two previous answers:

  PreparedStatement ps = c.prepareStatement("INSERT INTO employees VALUES (?, ?)");

  ps.setString(1, "John");
  ps.setString(2,"Doe");
  ps.addBatch();

  ps.clearParameters();
  ps.setString(1, "Dave");
  ps.setString(2,"Smith");
  ps.addBatch();

  ps.clearParameters();
  int[] results = ps.executeBatch();

Using PreparedStatements will be MUCH slower than Statements if you have low iterations. To gain a performance benefit from using a PrepareStatement over a statement, you need to be using it in a loop where iterations are at least 50 or higher.


You can use addBatch and executeBatch for batch insert in java See the Example : Batch Insert In Java


SQLite: The above answers are all correct. For SQLite, it is a little bit different. Nothing really helps, even to put it in a batch is (sometimes) not improving performance. In that case, try to disable auto-commit and commit by hand after you are done (Warning! When multiple connections write at the same time, you can clash with these operations)

// connect(), yourList and compiledQuery you have to implement/define beforehand
try (Connection conn = connect()) {
     conn.setAutoCommit(false);
     preparedStatement pstmt = conn.prepareStatement(compiledQuery);
     for(Object o : yourList){
        pstmt.setString(o.toString());
        pstmt.executeUpdate();
        pstmt.getGeneratedKeys(); //if you need the generated keys
     }
     pstmt.close();
     conn.commit();

}

You can use this rewriteBatchedStatements parameter to make the batch insert even faster.

you can read here about the param: MySQL and JDBC with rewriteBatchedStatements=true


The Statement gives you the following option:

Statement stmt = con.createStatement();

stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')");
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')");
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)");

// submit a batch of update commands for execution
int[] updateCounts = stmt.executeBatch();

Examples related to java

Under what circumstances can I call findViewById with an Options Menu / Action Bar item? How much should a function trust another function How to implement a simple scenario the OO way Two constructors How do I get some variable from another class in Java? this in equals method How to split a string in two and store it in a field How to do perspective fixing? String index out of range: 4 My eclipse won't open, i download the bundle pack it keeps saying error log

Examples related to sql

Passing multiple values for same variable in stored procedure SQL permissions for roles Generic XSLT Search and Replace template Access And/Or exclusions Pyspark: Filter dataframe based on multiple conditions Subtracting 1 day from a timestamp date PYODBC--Data source name not found and no default driver specified select rows in sql with latest date for each ID repeated multiple times ALTER TABLE DROP COLUMN failed because one or more objects access this column Create Local SQL Server database

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to jdbc

Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver' Hibernate Error executing DDL via JDBC Statement Unable to create requested service [org.hibernate.engine.jdbc.env.spi.JdbcEnvironment] MySQL JDBC Driver 5.1.33 - Time Zone Issue Spring-Boot: How do I set JDBC pool properties like maximum number of connections? Where can I download mysql jdbc jar from? Print the data in ResultSet along with column names How to set up datasource with Spring for HikariCP? java.lang.ClassNotFoundException: sun.jdbc.odbc.JdbcOdbcDriver Exception occurring. Why? java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/dbname