[c++] What are Aggregates and PODs and how/why are they special?

POD in C++11 was basically split into two different axes here: triviality and layout. Triviality is about the relationship between an object's conceptual value and the bits of data within its storage. Layout is about... well, the layout of an object's subobjects. Only class types have layout, while all types have triviality relationships.

So here is what the triviality axis is about:

  1. Non-trivially copyable: The value of objects of such types may be more than just the binary data that are stored directly within the object.

    For example, unique_ptr<T> stores a T*; that is the totality of the binary data within the object. But that's not the totality of the value of a unique_ptr<T>. A unique_ptr<T> stores either a nullptr or a pointer to an object whose lifetime is managed by the unique_ptr<T> instance. That management is part of the value of a unique_ptr<T>. And that value is not part of the binary data of the object; it is created by the various member functions of that object.

    For example, to assign nullptr to a unique_ptr<T> is to do more than just change the bits stored in the object. Such an assignment must destroy any object managed by the unique_ptr. To manipulate the internal storage of a unique_ptr without going through its member functions would damage this mechanism, to change its internal T* without destroying the object it currently manages, would violate the conceptual value that the object possesses.

  2. Trivially copyable: The value of such objects are exactly and only the contents of their binary storage. This is what makes it reasonable to allow copying that binary storage to be equivalent to copying the object itself.

    The specific rules that define trivial copyability (trivial destructor, trivial/deleted copy/move constructors/assignment) are what is required for a type to be binary-value-only. An object's destructor can participate in defining the "value" of an object, as in the case with unique_ptr. If that destructor is trivial, then it doesn't participate in defining the object's value.

    Specialized copy/move operations also can participate in an object's value. unique_ptr's move constructor modifies the source of the move operation by null-ing it out. This is what ensures that the value of a unique_ptr is unique. Trivial copy/move operations mean that such object value shenanigans are not being played, so the object's value can only be the binary data it stores.

  3. Trivial: This object is considered to have a functional value for any bits that it stores. Trivially copyable defines the meaning of the data store of an object as being just that data. But such types can still control how data gets there (to some extent). Such a type can have default member initializers and/or a default constructor that ensures that a particular member always has a particular value. And thus, the conceptual value of the object can be restricted to a subset of the binary data that it could store.

    Performing default initialization on a type that has a trivial default constructor will leave that object with completely uninitialized values. As such, a type with a trivial default constructor is logically valid with any binary data in its data storage.

The layout axis is really quite simple. Compilers are given a lot of leeway in deciding how the subobjects of a class are stored within the class's storage. However, there are some cases where this leeway is not necessary, and having more rigid ordering guarantees is useful.

Such types are standard layout types. And the C++ standard doesn't even really do much with saying what that layout is specifically. It basically says three things about standard layout types:

  1. The first subobject is at the same address as the object itself.

  2. You can use offsetof to get a byte offset from the outer object to one of its member subobjects.

  3. unions get to play some games with accessing subobjects through an inactive member of a union if the active member is (at least partially) using the same layout as the inactive one being accessed.

Compilers generally permit standard layout objects to map to struct types with the same members in C. But there is no statement of that in the C++ standard; that's just what compilers feel like doing.

POD is basically a useless term at this point. It is just the intersection of trivial copyability (the value is only its binary data) and standard layout (the order of its subobjects is more well-defined). One can infer from such things that the type is C-like and could map to similar C objects. But the standard has no statements to that effect.


can you please elaborate following rules:

I'll try:

a) standard-layout classes must have all non-static data members with the same access control

That's simple: all non-static data members must all be public, private, or protected. You can't have some public and some private.

The reasoning for them goes to the reasoning for having a distinction between "standard layout" and "not standard layout" at all. Namely, to give the compiler the freedom to choose how to put things into memory. It's not just about vtable pointers.

Back when they standardized C++ in 98, they had to basically predict how people would implement it. While they had quite a bit of implementation experience with various flavors of C++, they weren't certain about things. So they decided to be cautious: give the compilers as much freedom as possible.

That's why the definition of POD in C++98 is so strict. It gave C++ compilers great latitude on member layout for most classes. Basically, POD types were intended to be special cases, something you specifically wrote for a reason.

When C++11 was being worked on, they had a lot more experience with compilers. And they realized that... C++ compiler writers are really lazy. They had all this freedom, but they didn't do anything with it.

The rules of standard layout are more or less codifying common practice: most compilers didn't really have to change much if anything at all to implement them (outside of maybe some stuff for the corresponding type traits).

Now, when it came to public/private, things are different. The freedom to reorder which members are public vs. private actually can matter to the compiler, particularly in debugging builds. And since the point of standard layout is that there is compatibility with other languages, you can't have the layout be different in debug vs. release.

Then there's the fact that it doesn't really hurt the user. If you're making an encapsulated class, odds are good that all of your data members will be private anyway. You generally don't expose public data members on fully encapsulated types. So this would only be a problem for those few users who do want to do that, who want that division.

So it's no big loss.

b) only one class in the whole inheritance tree can have non-static data members,

The reason for this one comes back to why they standardized standard layout again: common practice.

There's no common practice when it comes to having two members of an inheritance tree that actually store things. Some put the base class before the derived, others do it the other way. Which way do you order the members if they come from two base classes? And so on. Compilers diverge greatly on these questions.

Also, thanks to the zero/one/infinity rule, once you say you can have two classes with members, you can say as many as you want. This requires adding a lot of layout rules for how to handle this. You have to say how multiple inheritance works, which classes put their data before other classes, etc. That's a lot of rules, for very little material gain.

You can't make everything that doesn't have virtual functions and a default constructor standard layout.

and the first non-static data member cannot be of a base class type (this could break aliasing rules).

I can't really speak to this one. I'm not educated enough in C++'s aliasing rules to really understand it. But it has something to do with the fact that the base member will share the same address as the base class itself. That is:

struct Base {};
struct Derived : Base { Base b; };

Derived d;
static_cast<Base*>(&d) == &d.b;

And that's probably against C++'s aliasing rules. In some way.

However, consider this: how useful could having the ability to do this ever actually be? Since only one class can have non-static data members, then Derived must be that class (since it has a Base as a member). So Base must be empty (of data). And if Base is empty, as well as a base class... why have a data member of it at all?

Since Base is empty, it has no state. So any non-static member functions will do what they do based on their parameters, not their this pointer.

So again: no big loss.

Examples related to c++

Method Call Chaining; returning a pointer vs a reference? How can I tell if an algorithm is efficient? Difference between opening a file in binary vs text How can compare-and-swap be used for a wait-free mutual exclusion for any shared data structure? Install Qt on Ubuntu #include errors detected in vscode Cannot open include file: 'stdio.h' - Visual Studio Community 2017 - C++ Error How to fix the error "Windows SDK version 8.1" was not found? Visual Studio 2017 errors on standard headers How do I check if a Key is pressed on C++

Examples related to c++11

Remove from the beginning of std::vector Converting std::__cxx11::string to std::string What exactly is std::atomic? C++ How do I convert a std::chrono::time_point to long and back Passing capturing lambda as function pointer undefined reference to 'std::cout' Is it possible to use std::string in a constexpr? How does #include <bits/stdc++.h> work in C++? error::make_unique is not a member of ‘std’ no match for ‘operator<<’ in ‘std::operator

Examples related to aggregate

Pandas group-by and sum SELECT list is not in GROUP BY clause and contains nonaggregated column Aggregate multiple columns at once Pandas sum by groupby, but exclude certain columns Extract the maximum value within each group in a dataframe How to group dataframe rows into list in pandas groupby Mean per group in a data.frame Summarizing multiple columns with dplyr? data.frame Group By column Compute mean and standard deviation by group for multiple variables in a data.frame

Examples related to c++17

How to enable C++17 compiling in Visual Studio? What are the new features in C++17? enum to string in modern C++11 / C++14 / C++17 and future C++20 Iterator invalidation rules What are Aggregates and PODs and how/why are they special?

Examples related to standard-layout

What are Aggregates and PODs and how/why are they special?