At a glance
Java as this is a very popular language, but also its use of the new
keyword is one of the seemingly simplest across various languages. That doesn’t mean it doesn’t have any surprising implications though! We’ll dive into the memory model of Java to understand why new
acts how it does. I’ll also reference the Oracle Java Language Specification at times to disambiguate.
Ultimately we’ll build up to my favourite surprising Java fact, which is why the following piece of code acts how it does:
Wait up! ⚠️
In this article, will be extensively referring to the stack and the heap as memory concepts. Please check here for some reading links if you need to confirm your understanding before continuing.
Basic understanding of garbage collection is also required.
new
always allocates a fresh object on the heap- Both arrays and lists are objects
- Primitives are never newed
- Except when they’re boxed
- Boxed types such as
Integer
cache small values
At its core, Java’s new
keyword means “create a new object, place it on the heap, and give me a handle to it.”
Here I use the word object as a specific technical term: in Java, a type is either primitive or an object.
There are 8 primitives: int
, float
, boolean
, byte
, short
, long
, double
and char
. If declared in a function, a primitive will usually be stored on the stack, ready to be cleared out when the function completes.
What is an object then? An object is anything created through use of the new
keyword — which will always be called on a class. This is the bread and butter of Java:
Here Dog
is a class, and dog
is an object.
Note how I’ve been talking about objects, not Objects. Object
is a class that every other class in Java must ultimately derive from. If you don’t explicitly choose a class to extend from, Object
will be implicitly chosen for you. In this case, Dog
extends Object
.
All objects are creations of classes, which means that all objects must have all the methods of Object
.
Have you ever come across a piece of code where it looks like someone is calling new
on a primitive? They’ll actually be calling it on a class which looks like the primitive.
You cannot new
a primitive. You can however, new
a class which boxes the primitive — in this case, Integer
. This will take the stack allocated int
, and move it onto the heap. However, in this case we then immediately automatically unbox the Integer back to an int
for assignment to a
.
Here int
is a primitive, and Integer
is a class that creates an object when we call new
on it.
If that was a mouthful of a paragraph, let’s break it down a bit further.
Boxing (or autoboxing) the practise of wrapping a primitive (like int
) in a matching object (like an instance of Integer
). These are known as primitive wrapper classes. This means the primitive is no longer on the stack, and is now owned by the wrapper on the heap. As such, instead of being freed when the function ends, its lifetime is now controlled by garbage collection.
Unboxing is where we convert a primitive wrapper class back to a primitive — either automatically, or by calling a function like intValue()
on it.
Where possible, prefer primitives: they use less memory, are more cache efficient, and are easier to reason about. An Integer
uses 16 bytes, vs an int
using only 4.
However, you MUST box primitives in a wrapper class when making a List
of some type. You cannot have a List<int>
, but you can have a List<Integer>
. We’ll talk about why later.
Let’s first confirm the difference between an array and a list in Java.
An array is a fixed length block of memory where we can place anything we want, as long as we know the size of it. This includes both primitives and objects. Arrays are almost always allocated on the heap, not the stack — this is in contrast to say C++.
A list is some collection of objects, where we the size can grow and shrink as we add new values into it. Again, objects here is used technically — you cannot have a list of primitives.
So, are arrays (or lists?) primitives, objects, or something else entirely?
Take a look at the new
keyword used in there — this means we can tell it’s an object.
Above we talked about objects vs Object
s. An array’s parent class is Object
, so it gets all the methods of that too, such as .equals()
and toString()
.
Strings in Java are not primitives — though you do not need new
to make one, the language has special support to convert a string literal to a heap allocated String.
“A string literal is a reference to an instance of class String.”
-
Oracle Java Language Specification
Note that internally, the JVM may actually reuse the same piece of memory if it can see that two or more String
s have the same value. As String
s are immutable, this is a safe optimization. (Note the word “may”, not “will”. It’s a hint.)
This is mostly a legacy issue. This StackOverflow answer covers it really well.
Essentially, when making a List<Dog>
, Java compiles it down to a List<Object>
so it can generically perform its array operations. All objects inherit from Object, however no primitives do.
If you want a matching interface of a list of primitive types, you can use libraries like Trove’s class, TIntArrayList.
No. Every time you call new
, you will always get a new fresh object. Guaranteed.
No (and yes). Java’s entire core is built around using new
when you need to, so the language is highly optimized for the scenario anyway.
Let’s quickly talk about garbage collection and the heap. In Java, our accessible heap is cut into two major sections: Young Generation, and Old Generation. (There is more nuance than this, but this is the key concept you need to know.)
When you call new
, the newly created object is put into the Young Generation. This area of memory is regularly garbage collected. The belief is that most objects that are created are likely to be thrown away quickly and never used again (like temporary variables).
If an object is not evicted from Young Generation after garbage collection, it is moved to Old Generation. This is garbage collected far less regularly — the idea being that if an object has stuck around for a while, it’s likely to stick around for much longer, and doesn’t need to be checked so much.
However, this doesn’t mean you can constantly call new
whenever you want: remember that the fastest code is that which isn’t run. The same applies to new
.
Just before, I mentioned that calling new
guarantees a brand new object. This is true even of String
s, as seen in this example:
Notice however that s
and t
share the same piece of memory. It makes sense to reuse memory where we can.
That mindset is behind the Java Integer Cache. Let’s look at the code example in full:
The key to figuring out what is happening here can be see in the difference between line 7 and line 19. In Java, ==
does not mean “are these things equal?”, it means:
- If these two values are primitives, do they share the same primitive value?
- If these two values are objects, do they occupy the same space in memory?
As such, if we have two new Integer
s, they will not ==
each other, as we know from above that they will both be given a new piece of memory to own. Instead, we want to call the Object
method, .equals()
, which Integer
has overridden to compare the actual value.
Above, we talk about how most objects are thrown away immediately, and those that stick around are likely to stay for a long time and be used a lot.
Java holds the belief that if you’re going to use an integer number, in an overwhelming number of cases, it’ll be small (between -128 and 127). It also believes you’re likely to throw these numbers away and re-use them a lot. If you’re going to be making a large amount of predictable Integer objects, why not simply cache them?
Instead of remaking new Integer(1)
, which we know is 16 bytes, putting it in Young Generation a lot of times, then throwing it away, wasting both CPU and memory, why not keep it around in a special part of memory?
When you write Integer a = 127;
, Java first checks its Integer cache to see if it already exists. Since you did not use the new
keyword, it doesn’t have to guarantee a new piece of memory!
If the value is reused, when we use ==
we check the memory addresses of the two values — and since they’re the same, return true.
Java’s new
keyword itself is fairly straightforward, with no hidden side effects. However, Java’s memory model and other legacy decisions can produce effects that seem surprising in the context of how unambiguous new
is.
A common mistake for developers moving between Java, C++ and JavaScript is to believe that the new
keyword is the same. Java has a new
keyword to make the transition seem simpler for C++ developers, however there is no matching delete
keyword which is critical for C++ memory management. Similarly, JavaScript has a delete
keyword but it has nothing to do with new
or memory management!
Happy new
ing!