Why did dlclose not unload the library?

December 9, 2023

Recently, at work, we were debugging an issue where dlclose was not unloading the library . You might wonder why that even matters - if your library contains any global symbols, then their state will persist across a dlopen, dlclose sequence. In other words, your library isn’t starting from a clean state every time.

In the case we were debugging, this issue manifested in the following fashion: We had two libraries libA and libB and libA dynamically depends on libB. When you dlopen libA, it also implicitly loads libB since it is a dependency. But when you dlclose libA, you may expect libA to be unloaded but also libB (since it was brought in purely due to dlopening libA).

Instead, what was strangely happening was that libA was being unloaded from the process address space but not libB. The next time you load libA & it talks to libB, libA is starting from a clean slate but libB is not. This caused an initialization function in libB to fail since it was already initialized.

To give you a little bit more context, libA is a Rust library. libB is a C++ library. We’re dlopen’ing libA from another C++ codebase progA. What made this issue even more mysterious was that when we turned on logging through an environment variable in libA, the issue stopped happening. Ah, a Heisenbug!

Skip down to The dlclose unmap condition if you just want to know about dlclose. But first, the story!

A debugging tale

It all started with some strange symptoms - a Rust construction routine was running multiple times in libA. But it was wrapped in a lazy_static, so it should be only initialized once!

The suspicion was immediately on the library somehow being loaded multiple times. Though only one copy of the symbols should be present due to the way dlopen works, the various failure cases around visibility (including modes like RTLD_LOCAL) were always a bit confusing to me. We first started by confirming that the addresses of the static in both calls are the same. Normally, if you had two copies of the symbols being updated & out of sync, you would expect the addresses to be different. But the addresses were the same.

After a couple of false starts, we put a breakpoint in dlopen & lo & behold, the library was indeed being dlopen’ed multiple times. That still didn’t explain why it was crashing inside the initialization routine in libB.

Now the next question was whether libA was being dlclose’ed between the dlopen called. Indeed, it was. That explains why the static initialization routine was running multiple times - the library had been unloaded & so is starting from a clean slate.

But surely then all of its dependencies would also be starting from a clean slate? It seems they’re not, as libB remembers that it has been initialized.

I wrote a quick program to confirm this suspicion - all it does is that it dlopen’s libA, then calls dlclose on it & tries to obtain a symbol from both libA and libB using dlsym

The symbol from libA doesn’t resolve but the symbol from libB does. This indicates that libA was unloaded but libB was not. Clearly, there’s something weird going on.. I later discovered that there’s a much easier environment flag (LD_DEBUG) to give this information & much more.

After reading various Stackoverflow answers, putting the breakpoint in dlclose & tracing through the unload path, I learned some interesting facts about when a library is not unloaded even after dlclose.

The dlclose unmap condition

Now, POSIX does not guarantee that dlclose has to unload the library. There’s the obvious case where the library is being used by multiple libraries (i.e its ref-count is greater than 1). But what else can prevent a library from being unloaded? It is this snippet of code in glibc that summarizes the conditions:

  /* Check whether this object is still used.  */
      if (l->l_type == lt_loaded
	  && l->l_direct_opencount == 0
	  && !l->l_nodelete_active
	  /* See CONCURRENCY NOTES in cxa_thread_atexit_impl.c to know why
	     acquire is sufficient and correct.  */
	  && atomic_load_acquire (&l->l_tls_dtor_count) == 0
	  && !l->l_map_used)

Namely, the library won’t be unloaded if:

If the reference count > 1
If the NODELETE flag is set on the binary - this can be specified in the ELF itself using a linker flag (-z nodelete) or can be a flag for dlopen (RTLD_LAZY). Interestingly, these are not the only cases: For example, if you’re using C++ and have a templated static or a static inside an inline function, then that symbol is marked as STB_GNU_UNIQUE. Any library containing a symbol with this marker is automatically marked as NODELETE.

libstdc++.so contains many such symbols, which is why it is marked as NODELETE. This means that once libstdc++.so is loaded into a process’ address space, it cannot be unloaded even after

Interestingly, it seems this marker also ignores the RTLD_LOCAL flag given when you dlopen a library - it will load those symbols globally.

You can use nm to inspect the binary - if any symbol is marked with ‘u’ (lowercase), it is marked as STB_GNU_UNIQUE.
If the loaded module has any registered thread local storage destructors - This was actually the reason for the bug. Initially I thought that somehow since libB used libstdc++.so & libstdc++.so was marked as NODELETE due to the presence of STB_GNU_UNIQUE, it somehow propagated over to libB & it too was marked as NODELETE. However, this was not the case - tracing through dlclose showed that it was not marked as NODELETE. However, it had thread local storage destructors registered.

When are thread local storage destructors run? When the thread exits. libB (which was written in C++) was calling into some Rust functions which further called into some Rust library functions, which use thread local storage & so register a thread local storage destructor using __cxa_thread_atexit_impl.

On issuing dlclose to libA, there’s nothing that makes the threads in libB exit. libA doesn’t have any such destructors registered so it exits normally, but libB does not as it has pending destructors (which would never run as the thread never exits).

There’s still one more thing - why did it all work when logging was enabled through an environment variable? Well, the crate used for logging by libA (env_logger) actually uses thread local storage & so registers a TLS destructor. So libA also doesn’t get unloaded on dlclose, so the shared state of libA & libB are consistent.

If you’re facing similar problems, do use the LD_DEBUG envvar in order to debug the behavior of the dynamic loader. It actually tells you when it loads a library, where it is searching for the library & when it unloads the library. It even tells you if a library is being marked as NODELETE due to some reason. Note that it doesn’t tell you anything for thread local storage destructors, so you should put a breakpoint in _dlclose in glibc in order to see whether you have any such destructors registered & pending.