Query optimization with Hibernate

5 min readMar 25, 2022

By: David Vellé & Nelson Jumbo

ORM’s are widely used today because they help abstracting from relational data models and focus on the code, however, as anything in software, comes with some trade-offs. In this article, we are going to focus on some optimization problems that you can run into when using one of the most popular implementations for Java projects: Hibernate.

Before we start, we want to clarify that this is not a general tips and tricks about hibernate (there are already great articles about that topic). We are going to expand on some points that may be controversial, or at least opinionated because there are developers with a strong opinion in favor or against the use of ORMs.

Why is optimization important?

We are living in a time with “infinite” cloud resources and using an ORM that helps us to not think about the underlying database. So, why do I have to care about optimizing anything? As you are probably aware, resources are not really infinite (even in the cloud!), particularly relational databases struggle to scale horizontally, and sometimes the abstraction makes wrong assumptions and do really non performant queries.

In an environment where your application is suffering from insufficient resources it becomes really important to understand what Hibernate is doing behind the scenes; that will help to identify low performant queries.

On the other hand, having a more proactive approach also could help, like having guidelines about how hibernate works and what practices will decrease the application performance, so we have that as a baseline that will help to avoid such problems in the future.

Batch updates

Imagine you have to ingest a CSV file with 600k rows, read all of them and then insert the data in your local relational database, usually you would do this by sending the data row by row to the database, but this will cause a roundtrip to the database for every row, increasing the data ingestion time.

Doing batch updates is a good way to improve performance by picking small chunks of data, like 100 rows on each iteration and sending that chunk altogether to the database, reducing the round trips to the database.

Hibernate supports them but even if we configure it correctly this feature could be silently disabled if you are using a GenerationType.Identity for your @Id.

So, how do I re-enable them?

Use a different Identity generation strategy. Only GenerationType.Identity will disable the batch updates, so any of the other GenerationTypes will work. The trade-off is that GenerationType.Identity creates an autoincrement column on the database, and that’s usually more efficient than other strategies like using a table or a sequence. The recommendation would be to use:

GenerationType.SEQUENCE: Hibernate will create a new sequence in the Database for you; however you can also create it by yourself.
GenerationType.TABLE: Hibernate will create a table in the Database that keeps track of the next primary key value.

@Entitypublic class AnEntity {@Id@GeneratedValue(strategy = GenerationType.SEQUENCE)@Column(name = “id”)protected Long id;}

Remove the @GeneratedValue annotation. This will disable hibernate’s @Id generation capabilities so you will need to take care of it manually, for example:

Querying a native sequence from the database: This would work pretty much like using GenerationType.IDENTITY but you need to query the sequence manually before doing an insert.
Using an autoincrement column: Like GenerationType.Identity, the database auto increments the id, so you don’t need to keep track of its value, the main difference with GenerationType.Identity is that you no longer get the id automatically after an insert, so if you really need it you’ll need to do it manually. Of course the good news is that this approach does not disable batching.

@Entitypublic class AnEntity {@Id@Column(name = “id”)protected Long id;}

Nullable attributes

When we declare an attribute of an entity by using Java Wrappers like Integer, Long, Double instead of their primitives, we have to deal with nullable values.

This is true for Hibernate too, so we can declare the same column using primitives or their Object counterpart:

@Entitypublic class AnEntity {@Columnprivate long primitiveColumn;@Columnprivate Long objectColumn;}

Nullable attributes can interfere with Hibernate’s query generation, making the database not behave correctly and make full scans.

Why does this happen?

Query statements that use foreign keys when not using custom @Query annotations can lead to this particular problem.

@Entitypublic class AnEntity {@OneToManyprivate Long anotherEntityId;}public interface AnEntityRepository {findByAnotherEntityId(Long anotherEntityId);}

Hibernate tends to compile this type of queries using a LEFT OUTER JOIN using a null value, triggering a full scan on both tables.

SELECT * FROM an_entities aLEFT OUTER JOIN another_entities b ON a.anotherEntityId = b.idWHERE a.anotherEntityId = null;

How do I prevent it?

Check that every parameter that is going to be used in your query is NOT NULL before executing it.

Use primitives instead of objects whenever possible so you don’t have to deal with nulls.

Wrapping up

As seen, sometimes dealing with performance in relational databases using Hibernate can be hard and you have to keep in mind how this ORM is working behind the hood so you can make the appropriate optimizations.

If Hibernate does not fit the necessities of your project, there are other options that you can try likeSpring Data JDBC.

Spring Data JDBC is part of the Spring Data project, it brings you the ability to do custom queries in pure SQL, without having to worry about JPA. The main trade-off is that you need to do the mapping between the Database Result Sets and your POJOS by yourself:

public class jdbcRepositoryImpl {@AutowiredJdbcTemplate jdbcTemplate;public AnObject customQuery() {jdbcTemplate.query(“SELECT * FROM an_object_table”, (resultSet, rowNum) -> {return new AnObject(…);});}}

Also, if you only need simple queries (like CRUD), Spring Data JDBC gives you some utilities to do it without code, pretty similar to what you would do with Hibernate.

public interface jdbcRepository extends CrudRepository<AnObject, Long> {findByAnObjectParameter(Integer anObjectParameter);}

As you can see, using Spring Data JDBC is somewhere in between an ORM like Hibernate and pure low level JDBC giving you more options for your projects.

References

Spring Data JDBC
Java Persistence with Hibernate Second Edition by Christian Bauer, Gavin King, and Gary Gregory

Disclaimer: the statements and opinions expressed in this article are those of the authors and do not necessarily reflect the stances of Thoughtworks.

Want to join Thoughtworks Spain? Apply for our vacancies on our website https://thght.works/3F3T4JA