What's new

Welcome to yeywe | Welcome My Forum

Join us now to get access to all our features. Once registered and logged in, you will be able to create topics, post replies to existing threads, give reputation to your fellow members, get your own private messenger, and so, so much more. It's also quick and totally free, so what are you waiting for?

Java vs C++: Trading UB for Semantic Memory Leaks (Same Problem, Different Punishment for Failure)

Hoca

Administrator
Staff member
Joined
Mar 19, 2024
Messages
549
Reaction score
0
Points
16
C++ vs Java: UB vs Semantic Memory Leaks

For a long while, quite a few people (mostly from academy and/or Java programming teams) faithfully believed in a horrible misperception along the lines of “Garbage-collected programs cannot possibly memory leak” (or at the very least along the lines of “it is fundamentally more difficult to have a memory leak in the garbage-collected program”, which public readily translates into the former) [GC-FAQ][C2-GC]. This is in spite of issues related to memory leaks in Java, were discussed at least as early as in 1999 [Lycklama99], and are often discussed at about the same place as the misperception above [C2-MemoryLeaksGC].

However, the reality of {most|quite a few|some}1 real-world Java programs being horrible memory-eaters over time, was knocking on the door more and more persistently, and by 2017 at least opinion leaders came to the understanding that[Sor17][Paraschiv17][Java8docs.MemLeaks][etc. etc. etc.]

there ARE memory leaks in Java

1 pick one depending on the camp you’re in, but don’t forget about Eclipse and OpenHAB



Syntactic vs Semantic Memory Leaks​


The problem with the misperception above comes from a subtle difference between what is known as “syntactic memory leaks” and “semantic memory leaks” (named “loiterers” in [Lycklama99]). Sure, any half-decent garbage collector will ensure that unreachable objects are cleaned up2; however, while all unreachable objects are useless,

not all useless objects are unreachable

It is fairly common to call those objects which are unreachable but still present in the program, syntactic memory leaks, and those objects which are useless but still reachable, semantic memory leaks.

So far so good, but now we have to observe that from the point of view of the end-user of the program, I do not care about unreachability – not at all; instead, what I do care about is the program not going into swap after half a day of use; as practice shows – even with all the unreachable objects being removed (i.e. even if there are no syntactic memory leaks), those semantic memory leaks can easily cause that dreaded swapping.


2 actually, it is “are eventually clean up”, but in a true spirit of being nice to those-already-suffering we will forget about this eventually word for the time being



Semantic Memory Leaks in Java​


There are quite a few common scenarios how memory leaks can appear in Java (see, for example, classification in [Lycklama99]), but most of them3 boil down either to forgetting to remove a reference-to-an-item from some collection, or to forgetting to set a no-longer-needed reference to null. Indeed, if we keep something-useless within a collection, or are keeping a reference to a no-longer-needed object without any chance to use this reference again – we do have a semantic memory leak.

BB_emotion_0009b.png
Some authors tend to oversimplify the latter problem to something like “hey, let’s just be careful with mutable static data”; however, with all due disrespect to mutable static/global data (yes, this includes singletons), I have to say that the problem of semantic memory leaks is NOT restricted to statics (in fact, statics is just a special case of existing but never-used reference). For example, even if I put the non-nulled reference onto the stack, it won’t be released until I am past this very point in stack – which, depending on the application, can easily last pretty much until the death of the app will us part.

One such example is an object with a reference held by main() function. More generally – as soon as we have any kind of top-level loop – such as event loop – then all the objects held for us by the event loop, including all the objects reachable via references coming from any of such objects, DO need their references null’ed manually to avoid such references from becoming semantic memory leaks.


3 saving for JVM peculiarities or esoteric stuff such as ClassLoaders



What about C/C++?​


So, in Java, to avoid semantic memory leaks, we DO need to use x = null; to avoid memory leaks. But this is an exact equivalent of explicit delete which have to do in C/C++(!), albeit for a different reason (to avoid dangling pointers)!

Let’s compare the following three pieces of code:


//pre-C++11 C++
struct State {
uint8_t* data;

void addData() {
data = new uint8_t[1000000];
//do something with data
}
void removeData() {
delete [] data;
data = nullptr;//(*)
}
~State() {
delete [] data;
}
};



//post-C++11 C++
struct State {
std::unique_ptr<uint8_t[]> data;

void addData() {
data = make_unique<uint8_t[]>(1'000'000);
//do something with data
}
void removeData() {
data.reset();//(*)
}
};



//JAVA
class State {
byte[] data;

void addData() {
data = new byte[1000000];
//do something with data
}
void removeData() {
data = null;//(*)
}
};

From my current perspective, these three pieces of code are semantically identical (i.e. the only difference is about syntax – which is TBH is not too different either).

Are They Really Identical? Well, Not Exactly…​


In spite of these striking similarities between what can be seen as “safe and memory-leak-free code” under two supposedly-very-different-in-this-regard programming languages, there is still a major difference.

Specifically, if we forget to assign null to data in line marked with (*) (or to call reset() for post-C++11 C++), effects will be different:

  • BB_emotion_0005b.png
    pre-C++11 C++ punishes for accessing already-deleted data with Undefined Behavior (UB) – which in this case will translate at best into the crash <ouch! />, and at worst – into data being corrupted <double-ouch! />
  • Java is significantly more lenient in this regard, and forgotten data = null is punished only with the semantic memory leak.
    • OTOH, it is this lenience which leads to Java programs with semantic memory leaks being ubiquitous: a C++ program which crashes is an obvious bug which is much more likely to be fixed than Java program with a semantic memory leak (among other things, memory leaks are often not obvious until somebody runs the program for many hours – which might be ignored in most of the routine testing
      🙁
      ).
    • Moreover, in Java there is a chance to have an instance of some other class to refer to data even after we null’ed it here. From what I seen, such hidden references is a major source of semantic memory leaks in complicated real-world Java programs.
  • post-C++11 C++ behaves much more like Java in this regard.
    • It is still quite different from Java because C++’s unique_ptr<> is guaranteed to be the only reference to the data object. This, in turn, eliminates those Java-like hidden references, and in turn greatly reduces chances of us having a semantic memory leak. However, under C++ such a hidden reference will become a dangling pointer, causing once again dreaded UB/crash/memory corruption <ouch! />.

Summary​


Attempting to summarize my ranting above:

  • Code which can be considered ‘good’ memory-wise (is safe both from crashes and memory leaks) is strikingly similar under C++ and Java.
    • Yes, contrary to what-lots-of-the-books tend to tell us, even when programming in Java we DO have to think about memory management (hey, one can argue that data = null IS manual memory management).
  • However, IF we deviate from such ‘good’ code practices, different programming languages will punish us differently (in C++ in can be a crash or memory corruption, in Java it can be a semantic memory leak).
  • BB_emotion_0017b.png
    In other words, when moving from C++ into Java we’re trading crashes for memory leaks.
    • OTOH, as memory leaks are not AS obvious as crashes, they have a tendency to survive longer (often MUCH longer). In other words, when moving from C++ to Java, we tend to trade A FEW crashes for A LOT of memory leaks; which is BTW tends to be consistent with whatever personal experience / anecdotal evidence I have. I am not going to argue whether it is a good trade-off or not; what is IMNSHO more important is that semantics of good code is about the same regardless of Java/C++ choice. Dixi.

References​



[GC-FAQ] “GC FAQ”

[C2-GC] “Garbage Collection”

[Lycklama99] Ed Lycklama, “Does Java™ Technology Have Memory Leaks?”

[C2-MemoryLeaksGC] “Memory Leak Using Garbage Collection”

[Sor17] Vladimir Sor, “Memory Leaks: Fallacies and Misconceptions”

[Paraschiv17] Eugen Paraschiv, “How Memory Leaks Happen in a Java Application”

[Java8docs.MemLeaks] “Debug a Memory Leak Using Java Flight Recorder”, Oracle

[etc. etc. etc.] Google for 'Java memory leak'


Acknowledgement​


Cartoons by Sergey GordeevIRL from Gordeev Animation Graphics, Prague.

P.S.​


Don't like this post? Criticize↯

P.P.S.​


We've tried to optimize our feed for viewing in your RSS viewer. However, our pages are quite complicated, so if you see any glitches when viewing this page in your RSS viewer, please refer to our original page.
 
Top Bottom