Function Poisoning in C++

Abhishek

https://www.fluentcpp.com/ cached 7 years, 10 months ago

CACHED ENTRY

Function Poisoning in C++

Published September 4, 2018 - 6 Comments

Today’s guest post is written by Federico Kircheis, a (mainly C++) developer in Berlin, always looking how to improve himself, and finding interesting problems to solve. Federico talks to us about a little known compiler feature that could have an impact on how you design code: function poisoning.

Also interested in writing on Fluent C++? Check out our guest posting area!

The gcc compiler has an interesting pragma that I’ve rediscovered after four years since I’ve noticed it the first time: #pragma GCC poison.

It works as follow: If there is an identifier that you want to prohibit in your source code, you can “poison” it, in order to get a compile error if that identifier appears in your codebase.

For example:

#include <stdio.h> #pragma GCC poison puts int main() { puts("a"); }

#include <stdio.h>

#pragma GCC poison puts

int main() {

puts("a");

}

will not compile, and you’ll get an error message such as:

error: attempt to use poisoned "puts"

1	error: attempt to use poisoned "puts"

I thought it was a nice trick, but did not realize how I could use it for a long time. After four years, I found some compelling use cases.

function poisoning C++

A seemingly useless feature

This pragma accepts a list of space-separated words. It does not make any distinction between functions, macros, classes, keywords or something else, and therefore does not support features like overloads, and does not work with namespaces.

Another downside of #pragma GCC poison is that there might be a place in our codebase where we would want to make an exception. Unfortunately, there is no way to undo locally the pragma. I hoped there would be some verbose method like

#include <stdio.h> #pragma GCC poison puts int main(){ #pragma GCC bless begin puts puts('a'); #pragma GCC bless end puts }

#include <stdio.h>

#pragma GCC poison puts

int main(){

#pragma GCC bless begin puts

puts('a');

#pragma GCC bless end puts

}

It would have made the intent clear that this place is an exception. There seems to be no way to accomplish something like that. Once an identifier gets poisoned, you cannot use it anymore.

It is possible to provide some sort of backdoor, by creating, for example, an alias, or by encapsulating the identifier in another function

#include <stdio.h> void puts_f(const char* s){puts(s);} #define puts_m puts #pragma GCC poison puts int main(){ puts_f("s"); puts_m("s"); }

#include <stdio.h>

void puts_f(const char* s){puts(s);}

#define puts_m puts

#pragma GCC poison puts

int main(){

puts_f("s");

puts_m("s");

}

What I also did not realize the first time, is that #pragma GCC poison applies only to the current translation unit, it has, therefore, the same scope of a macro.

I could not see a great benefit, and so I nearly forgot that this compiler-specific feature is available.

Use cases for poisoning functions

But after leaving it four years collecting dust in the back of my mind, I ran into use cases where function poisoning allows to write more expressive and safer code. Let’s see some of them.

Even if programming mainly in C++ and not C, many libraries provide only a C interface, for example, OpenSSL, zlib, the Win32 and Win64 API, system functions, and so on and so on.

All those APIs are pure C, most of them return error codes, return pointers owning memory, sometimes pointers that do not own memory, they take pointers that own memory, and sometimes pointers that do not own memory, and do not provide overloads, but sets of functions that takes argument of different types, for doing the same logical thing (look for example at the fabs, fabsf, fabsl, cabsf, cabs, cabsl, abs, labs, … functions).

After tracking down some memory related issue, I realized that very often, since C++ is more expressive, it would be very convenient to hide (from myself and other people that are working with me), all (or just many) C functions that allocate memory, and replace them with something more RAII-friendly.

For example, consider the function:

foo* get_foo_from_bar(bar*);

1	foo* get_foo_from_bar(bar*);

It allocates memory, but this is not clearly stated in the documentation, and you might notice it only if you already know the function, or use some memory analyzers.

Even if the function would be documented very well, most of the time we read the code, not the appending documentation, so it is still easy to oversee it. The pointer could point somewhere in the internal structure of bar, so it is not obvious from the signature of the function that we are allocating.

But even if it would be obvious, because the function might have a name that would strongly suggest an allocation, like foo* create_foo_from_bar(bar*), we still need to pay attention where and how the returned value is going to be used.

It does not seem to be something difficult, but resource leaks happen all the time, especially in a big codebase.

Wouldn’t it be better if we could write our own create_foo_from_bar that return a smart pointer such as std::unique_ptr, and ensure that get_foo_from_bar is not available? This way, creating a memory leak needs to be an explicit action.

This is where I realized I could use #pragma GCC poison.

Poisoning bad resource management

Ideally, in our code, when using a third-party library with a C interface, we would define something like

struct foo_deleter { void operator()(foo* h) { // foo_destroy provided by the 3rd party library as function, macro, ... foo_destroy(h); } }; using unique_foo = std::unique_ptr<foo, foo_deleter>;

struct foo_deleter {

void operator()(foo* h) {

// foo_destroy provided by the 3rd party library as function, macro, ...

foo_destroy(h);

}

};

using unique_foo = std::unique_ptr<foo, foo_deleter>;

and use it like

// foo_create provided by the 3rd party library as function, macro, ... unique_foo h{foo_create()};

1 2	// foo_create provided by the 3rd party library as function, macro, ... unique_foo h{foo_create()};

This way, the compiler helps us to get resource management done right. But we still need to remember every time to save the result of foo_create inside our unique_foo.

So let’s use #pragma GCC poison at our advantage:

struct foo_deleter { void operator()(foo* h) { foo_destroy(h); } }; using unique_foo = std::unique_ptr<foo, foo_deleter>; inline unique_foo create_unique_foo() { // we do not have poisoned create_foo yet! return unique_foo{create_foo()}; } #pragma GCC poison create_foo

struct foo_deleter {

void operator()(foo* h) {

foo_destroy(h);

}

};

using unique_foo = std::unique_ptr<foo, foo_deleter>;

inline unique_foo create_unique_foo() {

// we do not have poisoned create_foo yet!

return unique_foo{create_foo()};

}

#pragma GCC poison create_foo

This way, the compiler will help us even more. And we need to remember to encapsulate the return value of create_foo only once!

// unable to call create_foo, we can only call ... auto h = create_unique_foo();

1 2	// unable to call create_foo, we can only call ... auto h = create_unique_foo();

Of course, we do not need #pragma GCC poison for writing create_unique_foo. We use it to enforce the usage of create_unique_foo instead of create_foo. Otherwise, we will have as before the burden to check manually if we are storing owning pointers in some std::unique_ptr-like structure.

A minor downside of this approach is that create_unique_foo cannot be declared in some header file and implemented in a .cpp file because if the identifier gets poisoned, we won’t be able to provide the implementation. (Actually we can, we just need to ensure that the #pragma poison directive does not appear in the translation unit where we are defining create_unique_foo). I believe this is only a minor issue since, given our set of constraints, many functions will simply call one or more functions, without adding any logic, so they are actually good candidates for inlining, even if the compiler will not inline a function based on the inline keyword.

But what if we need the raw pointer returned from create_foo because we are going to store it in a function of this external C library? And what if this function is going to take ownership of the pointer?

It means that instead of writing

bar(create_foo());

1	bar(create_foo());

we will need to write

bar(create_unique_foo().release())

1	bar(create_unique_foo().release())

This has the benefit to make the intent clearer. It says to the reader that the function bar will handle the memory, and not that we might have forgotten to call foo_destroy.

Removing deprecated features

This is a simple one. Keywords like register do not have any meaning in C++ (it used too, and you might find it some pre-C++11 codebases). Also, some classes where deprecated in newer standards, like std::auto_ptr, std::strstream or std::random_shuffle.

We can use #pragma GCC poison to prohibit all of them in our codebase.

And since it works on tokens, there is no need to import the definition of std::random_shuffle in order to disallow it, it means we can use #pragma GCC poison random_shuffle in every codebase with every C++ version.

Other keywords, like throw as exception specification, where mostly deprecated too. However, throw is also used for throwing exceptions, so we cannot poison it.

Improving type-safety

Resource management is not the only place where the C++ programming language is more expressive compared to C. Writing generic functions is another area where in C++ we have better tools at our disposal. It would be possible, for example, to prohibit std::qsort in favor of std::sort, std::bsearch in favor of std::binary_search or other algorithms and functions like std::copy over std::memcpy.

Yes, poisoning something from the standard library seems like a bold move. But in our codebase, we do not have the same backward-compatibility concerns that the ISO C++ committee has, and we want to improve the quality of our code and reduce the chance of making common errors.

For example, one of the most common errors with memset is writing memset(&t, sizeof(t), 0) instead of memset(&t, 0, sizeof(t)). And since memset takes a void*, it is possible to pass the wrong data type (something that it is not trivially copyable). If that happens, that would lead to undefined behavior. This error can be prevented at compile time, but it does not.

Consider this fillmem function that could replace memset in a safer way:

template <class T> void fillmem(T* t, int val, std::size_t size){ static_assert(std::is_trivially_copyable<T>::value, "will trigger UB when calling memset on it"); std::memset(t, val, size); } template <typename T, class = typename std::enable_if<!std::is_pointer<T>::value>::type> void fillmem(T& t, int val = 0, std::size_t size = sizeof(T)){ static_assert(std::is_trivially_copyable<T>::value, "will trigger UB when calling memset on it"); assert(size <= sizeof(T)); fillmem(std::addressof(t), val, size); } template <class T> void fillmem(T&, std::size_t, int) = delete; #pragma GCC poison memset

template <class T>

void fillmem(T* t, int val, std::size_t size){

static_assert(std::is_trivially_copyable<T>::value, "will trigger UB when calling memset on it");

std::memset(t, val, size);

}

template <typename T, class = typename std::enable_if<!std::is_pointer<T>::value>::type>

void fillmem(T& t, int val = 0, std::size_t size = sizeof(T)){

static_assert(std::is_trivially_copyable<T>::value, "will trigger UB when calling memset on it");

assert(size <= sizeof(T));

fillmem(std::addressof(t), val, size);

}

template <class T>

void fillmem(T&, std::size_t, int) = delete;

#pragma GCC poison memset

The advantages of fillmem are that like bzero (even though it has been deprecated), it reduces the chances of making mistakes, and it tries to make the most common operation simple.

Actually there is no need to use memset for implementing fillmem. You can use an STL algorithm instead, such as std::fill_n:

template <typename T, class = typename std::enable_if<!std::is_pointer<T>::value>::type> void fillmem(T& t, int val = 0, std::size_t size = sizeof(T)){ static_assert(std::is_trivially_copyable<T>::value, "will trigger UB when calling memset on it"); assert(size <= sizeof(T)); fillmem(&t, val, size); } template<class T> void fillmem(T* t, int val, std::size_t size){ static_assert(std::is_trivially_copyable<T>::value, "will trigger UB when calling memset on it"); std::fill_n(reinterpret_cast<unsigned char*>(t), size, val); } template<class T> void fillmem(T&, std::size_t, int) = delete;

template <typename T, class = typename std::enable_if<!std::is_pointer<T>::value>::type>

void fillmem(T& t, int val = 0, std::size_t size = sizeof(T)){

static_assert(std::is_trivially_copyable<T>::value, "will trigger UB when calling memset on it");

assert(size <= sizeof(T));

fillmem(&t, val, size);

}

template<class T>

void fillmem(T* t, int val, std::size_t size){

static_assert(std::is_trivially_copyable<T>::value, "will trigger UB when calling memset on it");

std::fill_n(reinterpret_cast<unsigned char*>(t), size, val);

}

template<class T>

void fillmem(T&, std::size_t, int) = delete;

When compiling with any optimization flag (even with -O1), this code gets the exact same assembly with GCC and clang. Since std::memset does not accept a pointer to nullptr as parameter (with size == 0 of course), using std::fill_n assures to have a consistent and defined behaviour on all platforms.

The same holds for std::memcpy and std::memmove.

And just to make one thing clear: I do not think there are any valid use cases where to use any of the std::mem* functions. They can all be replaced by a standard algorithm or language construct, for example instead of writing:

struct foo{ // some data }; foo f; std::memset(&f, 0, sizeof(f));

struct foo{

// some data

};

foo f;

std::memset(&f, 0, sizeof(f));

we should directly write:

struct foo{ // some data }; foo f{};

struct foo{

// some data

};

foo f{};

And therefore we wouldn’t even have to provide alternatives like fillmem to those functions.

A more general concept: banning a function

Since after poisoning a functions no one can ever use it again, we need to provide an alternative that suits all needs. Otherwise, it will lead to unmaintainable code. There should never be a reason to use the old functions. Never.

We need to provide a pretty strong guarantee.

I’ve tried to come up with some guidelines in order to avoid deleting functions that later would have been necessary.

This is the banning policy I’m using to decide if I might want to ban a function from my codebase

You might ban a function f if there exists a strict replacement, or if there are no valid use cases for f.

I’m using the term “ban” and not poisoning because I do not want to restrict myself to the compiler specific pragma. Banning a function might simply mean to delete it if it is a function that we wrote ourselves. It does not have to be always something coming from an external library.

It is also always possible to resort to external tools for ensuring that a function is not used in our codebase. A simple script calling grep might do the job in many cases, even if you need to pay attentions to comments and code that does not get compiled, or only conditionally.

The banning policy is not very clear when stating “no valid use cases” and strict replacement (“strict replacement” is a term I made up, more on that later). The problem is that it is very difficult to list all valid use cases, and those also depends on the environment.

Some issues that might be relevant, but are, strictly speaking, not part of the language programming language:

compile-time constraints (additional include header, linking, …)
non-conforming compilers
size of generated executables (you might prefer void* over a template, or qsort over std::sort to try to reduce it)
documentation
and surely other things too

A use case being is valid or not depends on your project and goals. I tried and came up with a definition of “strict replacement”, to provide a guideline when it is safe to ban a function in favor of another.

A function g is a strict replacement of a function f of a library l if

g provides clear benefits over f.
g can act as a drop-in replacement for f, which means
- it can interact with the library l without writing more than one line of glue code that has no particular drawbacks.
- updating f to g in the working codebase is a trivial operation.
- the cost of removing f is not too high.
g does not have any drawback compared to f, in particular
- it does not add any measurable runtime overhead compared to f.
- it does not add any new dependency
- it cannot be less type-safe, exception-safe or thread-safe
- it cannot introduce new kinds of programming errors
g does not reduce readability or hides intent compared to f
- there is no need to document what g does since it should do the same of f, only the benefits if those are not clear to everyone

And therefore, if g is a strict replacement of f, we can apply the banning policy on f in our codebase.

A non-strict replacement could be g where not every point hold, but only some of them. It might need a new dependency, have some overhead and so on, but it might be acceptable for the current project, where other properties are more important.

For example, std::unique_ptr is very often a strict replacement of owning raw pointers that works with all those constraints:

it is compatible with the surrounding code since the contained pointer is accessible.
it is orthogonal to the error strategy.
it has been designed with the zero-overhead principle in mind.
it’s part of the standard library, so it incurs no additional dependency (even if an additional include might be necessary).

Possible drawbacks of function poisoning

So, function poisoning works, it has its advantages, but also some drawbacks. Here are three of them

a) It is not a standard feature, and as such, it is not implemented on all compilers. MSVC, for example, does not seem to have an equivalent functionality.

That’s unfortunate because with the windows API such a technique would be so much valuable. Maybe there are other compiler-specific techniques to get a similar behavior that I do not know (please drop a comment if you know it!).

b) The error message is correct, but far from ideal. It explains that an identifier has been poisoned, but not where and why the identifier has been poisoned.

Therefore, if you are in some project that you do not know very well, you might have some difficulties to find the function that you should use instead of create_foo().

c) As already mentioned, this pragma works on identifiers, and has no notions of function, classes or namespaces. This means that it is not possible to prohibit only some overloads or functions from a specific namespace.

This is not a problem when working with C interfaces, where you want to provide a better C++ alternative, but if you are dealing with C++-code you might want to consider fixing the offending functions. Indeed, without overloading and namespaces, poisoning is arguably easier to use on C functions.

Where we should not use function poisoning

What we have done is changing the public API of a library (standard or third-party, it does not matter). This can be risky because we are not in control of that API. As long as those changes are limited to our project, it provides some benefits, and the possible issues are limited.

The worst that can happen when using pragma poison is that some code won’t compile. It means that we need to change that code (we can, it’s ours because the changes where limited to our project), or that we need to delete the pragma. And if we remove the pragma, we lose some compile-time guarantees, but the meaning of no compilable code changes.

The use of function poisoning needs to be local in our project! You do not want to tell people that are going to use your library that they need to adapt their codebase because you have deleted some functions of another library that they are using too!

For example the following snippet won’t compile

#pragma GCC poison new #include <memory>

1 2	#pragma GCC poison new #include <memory>

new is used inside <memory> at least for providing std::make_unique and std::make_shared. We can avoid this problem by including memory before our pragma. Additional includes will work because of the header guard since the pragma does not take into account code that won’t get compiled, ie both

#include <memory> #pragma GCC poison new #include <memory>

#include <memory>

#pragma GCC poison new

#include <memory>

and

#pragma GCC poison foo #if 0 int foo; #endif

#pragma GCC poison foo

#if 0

int foo;

#endif

will compile.

There are some exceptions, for example <cassert> has no header guards, but otherwise it will work with the majority of headers, even if they are using the #pragma once extension.

Nevertheless, the solution is very brittle, since other system headers might be using new and they have not been included yet. Our code might fail to compile again. Since we want to ban an identifier from our codebase, and not from the system headers or third-party libraries or clients that will use our library, it’s better to just keep this policy local.

Apart from that, in order to avoid confusion and complains inside your codebase, refer to the ban and strict replacement policy: There should never be a reason to use the old API.

Last, but not least: If you are following such a guideline, and are working with a library that provides a C++ interface, you might contact the author and propose your alternative functions. This way you’ll not need to maintain an alternate API for your third-party libraries.

When working with a C library, it might not be possible to do the same, since lots of the techniques we can use to enhance an API (destructor, overloads, …) are not available to the C language, but you might be able to convince the library author to provide a tiny C++ wrapper.

Federico Kircheis is a (mainly C++) developer in Berlin, always looking how to improve himself, and finding interesting problems to solve.