Java's new keyword, under the hood ☕️

Java as this is a very popular language, but also its use of the new keyword is one of the seemingly simplest across various languages. That doesn’t mean it doesn’t have any surprising implications though! We’ll dive into the memory model of Java to understand why new acts how it does. I’ll also reference the Oracle Java Language Specification at times to disambiguate.

Ultimately we’ll build up to my favourite surprising Java fact, which is why the following piece of code acts how it does:

Wait up! ⚠️

In this article, will be extensively referring to the stack and the heap as memory concepts. Please check here for some reading links if you need to confirm your understanding before continuing.

Basic understanding of garbage collection is also required.

At a glance

new always allocates a fresh object on the heap
Both arrays and lists are objects
Primitives are never newed
Except when they’re boxed
Boxed types such as Integer cache small values

A cat with a cup of java. Thanks to Saint Sab at Unsplash

At its core, Java’s new keyword means “create a new object, place it on the heap, and give me a handle to it.”

Here I use the word object as a specific technical term: in Java, a type is either primitive or an object.

There are 8 primitives: int, float, boolean, byte, short, long, double and char. If declared in a function, a primitive will usually be stored on the stack, ready to be cleared out when the function completes.

What is an object then? An object is anything created through use of the new keyword — which will always be called on a class. This is the bread and butter of Java:

Here Dog is a class, and dog is an object.

Object vs object ⚽

Note how I’ve been talking about objects, not Objects. Object is a class that every other class in Java must ultimately derive from. If you don’t explicitly choose a class to extend from, Object will be implicitly chosen for you. In this case, Dog extends Object.

All objects are creations of classes, which means that all objects must have all the methods of Object.

Boxed primitives (int vs Integer)

Have you ever come across a piece of code where it looks like someone is calling new on a primitive? They’ll actually be calling it on a class which looks like the primitive.

You cannot new a primitive. You can however, new a class which boxes the primitive — in this case, Integer. This will take the stack allocated int, and move it onto the heap. However, in this case we then immediately automatically unbox the Integer back to an int for assignment to a.

Here int is a primitive, and Integer is a class that creates an object when we call new on it.

If that was a mouthful of a paragraph, let’s break it down a bit further.

Boxing (or autoboxing) the practise of wrapping a primitive (like int) in a matching object (like an instance of Integer). These are known as primitive wrapper classes. This means the primitive is no longer on the stack, and is now owned by the wrapper on the heap. As such, instead of being freed when the function ends, its lifetime is now controlled by garbage collection.

Unboxing is where we convert a primitive wrapper class back to a primitive — either automatically, or by calling a function like intValue() on it.

Why is boxing useful, and when we would use it?

Where possible, prefer primitives: they use less memory, are more cache efficient, and are easier to reason about. An Integer uses 16 bytes, vs an int using only 4.

However, you MUST box primitives in a wrapper class when making a List of some type. You cannot have a List<int>, but you can have a List<Integer>. We’ll talk about why later.

Are arrays primitives, objects, or something else? 🐒

Let’s first confirm the difference between an array and a list in Java.

An array is a fixed length block of memory where we can place anything we want, as long as we know the size of it. This includes both primitives and objects. Arrays are almost always allocated on the heap, not the stack — this is in contrast to say C++.

A list is some collection of objects, where we the size can grow and shrink as we add new values into it. Again, objects here is used technically — you cannot have a list of primitives.

So, are arrays (or lists?) primitives, objects, or something else entirely? Take a look at the new keyword used in there — this means we can tell it’s an object.

Above we talked about objects vs Objects. An array’s parent class is Object, so it gets all the methods of that too, such as .equals() and toString().

What about strings?

Strings in Java are not primitives — though you do not need new to make one, the language has special support to convert a string literal to a heap allocated String.

“A string literal is a reference to an instance of class String.”
-

Oracle Java Language Specification

Note that internally, the JVM may actually reuse the same piece of memory if it can see that two or more Strings have the same value. As Strings are immutable, this is a safe optimization. (Note the word “may”, not “will”. It’s a hint.)

Why can’t I have a list of primitives?

This is mostly a legacy issue. This StackOverflow answer covers it really well.

Essentially, when making a List<Dog>, Java compiles it down to a List<Object> so it can generically perform its array operations. All objects inherit from Object, however no primitives do.

If you want a matching interface of a list of primitive types, you can use libraries like Trove’s class, TIntArrayList.

Does Java do anything clever to re-use objects if I call new? 🤓

No. Every time you call new, you will always get a new fresh object. Guaranteed.

Isn’t calling new all the time really expensive?

No (and yes). Java’s entire core is built around using new when you need to, so the language is highly optimized for the scenario anyway.

Let’s quickly talk about garbage collection and the heap. In Java, our accessible heap is cut into two major sections: Young Generation, and Old Generation. (There is more nuance than this, but this is the key concept you need to know.)

When you call new, the newly created object is put into the Young Generation. This area of memory is regularly garbage collected. The belief is that most objects that are created are likely to be thrown away quickly and never used again (like temporary variables).

If an object is not evicted from Young Generation after garbage collection, it is moved to Old Generation. This is garbage collected far less regularly — the idea being that if an object has stuck around for a while, it’s likely to stick around for much longer, and doesn’t need to be checked so much.

However, this doesn’t mean you can constantly call new whenever you want: remember that the fastest code is that which isn’t run. The same applies to new.

Okay, what’s going on with that ridiculous example at the beginning?

Just before, I mentioned that calling new guarantees a brand new object. This is true even of Strings, as seen in this example:

String s = "abc"; String t = "abc"; shows that both s and t point at the same point in memory. But String s1 = new String("abc") points to its own area in memory.

Notice however that s and t share the same piece of memory. It makes sense to reuse memory where we can.

That mindset is behind the Java Integer Cache. Let’s look at the code example in full:

The key to figuring out what is happening here can be see in the difference between line 7 and line 19. In Java, == does not mean “are these things equal?”, it means:

If these two values are primitives, do they share the same primitive value?
If these two values are objects, do they occupy the same space in memory?

As such, if we have two new Integers, they will not == each other, as we know from above that they will both be given a new piece of memory to own. Instead, we want to call the Object method, .equals(), which Integer has overridden to compare the actual value.

Okay, so if that’s the case — why does Integer 127 == Integer 127? 🦒

Above, we talk about how most objects are thrown away immediately, and those that stick around are likely to stay for a long time and be used a lot.

Java holds the belief that if you’re going to use an integer number, in an overwhelming number of cases, it’ll be small (between -128 and 127). It also believes you’re likely to throw these numbers away and re-use them a lot. If you’re going to be making a large amount of predictable Integer objects, why not simply cache them?

Instead of remaking new Integer(1), which we know is 16 bytes, putting it in Young Generation a lot of times, then throwing it away, wasting both CPU and memory, why not keep it around in a special part of memory?

When you write Integer a = 127;, Java first checks its Integer cache to see if it already exists. Since you did not use the new keyword, it doesn’t have to guarantee a new piece of memory!

If the value is reused, when we use == we check the memory addresses of the two values — and since they’re the same, return true.

Conclusion 🕯️

Java’s new keyword itself is fairly straightforward, with no hidden side effects. However, Java’s memory model and other legacy decisions can produce effects that seem surprising in the context of how unambiguous new is.

A common mistake for developers moving between Java, C++ and JavaScript is to believe that the new keyword is the same. Java has a new keyword to make the transition seem simpler for C++ developers, however there is no matching delete keyword which is critical for C++ memory management. Similarly, JavaScript has a delete keyword but it has nothing to do with new or memory management!

Happy newing!