Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Unneeded?) Memory bloat in fair::mq::shmem::Message #477

Open
ktf opened this issue Jul 24, 2023 · 11 comments
Open

(Unneeded?) Memory bloat in fair::mq::shmem::Message #477

ktf opened this issue Jul 24, 2023 · 11 comments
Assignees

Comments

@ktf
Copy link
Contributor

ktf commented Jul 24, 2023

While doing some profiling I just noticed that the fair::mq::shmem::Message has a fQueued boolean in the middle of it, introducing some unneeded extra padding. If I naively move it to the MetaHeader class, I immediately go from 96 bytes to 88.

I also suspect one could gain quite some extra memory with a few extra reasonable limitations. For example:

  • fAlignment at the moment is a size_t, while it could easily be a char storing log2 (assuming no one needs alignment which is not a power of 2).
  • size and hint could be limited to 48 bits, leaving space for the above mentioned fAlignment, fQueued, fManaged to be stored there.
  • removing the hint (which at the moment it's not actually used for anything, AFAICT) altogether one could get down to 72 bytes.

All in all, I think one could get down below 64 bytes by having something like the following:

  struct  __attribute__((__packed__)) Message : public fair::mq::Message // Extra 16 bytes from this
  {
    Manager* fManager;
    mutable UnmanagedRegion* fRegionPtr;
    mutable char* fLocalPtr;
    int16_t fRegionId;
    int16_t fSegmentId;
    size_t fSize: 48;
    size_t fHandle: 48;
    size_t fShared: 48;
    size_t fAlignment: 14;
    size_t fManaged: 1;
    size_t fQueued: 1;
  };

and it could probably be even less if one used handles for fManager and the the other pointers. Do you think there is any space to evolve the managed region to go into this direction?

@rbx
Copy link
Member

rbx commented Jul 24, 2023

The MetaHeader part contains data that will be transferred to a different process, while other members of shmem Message class are only valid/meaningful locally per process.

struct Message
{
    Manager& fManager; // local ptr to the manager
    bool fQueued; // is the message queued? (can the buffer removed when dtor is called).
    MetaHeader fMeta; // contains all the "transferrable" info
    size_t fAlignment; 
    mutable UnmanagedRegion* fRegionPtr; // used in case of unmanaged region messages
    mutable char* fLocalPtr; // local data ptr, for received messages obtained from the handle
}

struct MetaHeader
{
    size_t fSize;
    size_t fHint; // this is used for unmanaged region messages to store user provided info, which is returned when message can be freed. Afaik this is used at least by FLP/readout.
    boost::interprocess::managed_shared_memory::handle_t fHandle; // interprocess handle, convertable to local ptr
    mutable boost::interprocess::managed_shared_memory::handle_t fShared; // used for the ref count when msg is shared
    uint16_t fRegionId; // unmanaged region id
    mutable uint16_t fSegmentId; // segment id
    bool fManaged; // managed/unmanaged
};

Both should be minimal size, but I would say having the Metaheader part smaller is more important.

fQueued boolean in the middle of it, introducing some unneeded extra padding. If I naively move it to the MetaHeader class, I immediately go from 96 bytes to 88

Logically it belongs to local data, not shared data. For this and similar suggestions one could think to create the final MetaData header only immediately before transfer, then there would be more freedom to resize Message. I guess this line would have to be a lil bit more complicated then: https://github.com/FairRootGroup/FairMQ/blob/master/fairmq/shmem/Socket.h#L134

fAlignment at the moment is a size_t, while it could easily be a char storing log2 (assuming no one needs alignment which is not a power of 2).

Sounds good!

size and hint could be limited to 48 bits, leaving space for the above mentioned fAlignment, fQueued, fManaged to be stored there.

Would 48 be enough for current and future use cases? Hint is 64 bits as per requirement from FLP/readout.

it could probably be even less if one used handles for fManager and the the other pointers

Which handles do you mean? ptrs and boost interproccess handles are both 64, as they are supposed to be convertible into each other.

@ktf
Copy link
Contributor Author

ktf commented Jul 24, 2023

Both should be minimal size, but I would say having the MetaHeader part smaller is more important.
Logically it belongs to local data, not shared data. For this and similar suggestions one could think to create the final MetaData header only immediately before transfer, then there would be more freedom to resize Message.

Ok, I understand the difference now. What you propose indeed makes sense to me.

Sounds good!

Indeed, if you do that, than it's enough to have fQueued shuffled after fAlignment.

Would 48 be enough for current and future use cases? Hint is 64 bits as per requirement from FLP/readout.

As far as I know, there is no x86 hardware which supports more than 48 bits of virtual address space. Maybe this should be architecture dependent? However Sylvain should probably comment here.

Which handles do you mean? ptrs and boost interproccess handles are both 64, as they are supposed to be convertible into each other.

What I mean is that right now you have:

    Manager* fManager;
    mutable UnmanagedRegion* fRegionPtr;

However you could, for example, decide that regions are known to a given Manager (i.e. the latter keeps a list of them) and change mutable UnmanagedRegion* fRegionPtr to e.g. int8_t fRegionInManagerIndex adding an indirection and giving to the manager the task to retrieve the actual region pointer. It would save you a pointer.

@ktf
Copy link
Contributor Author

ktf commented Aug 22, 2023

So how do we proceed here? Do you plan to attack this any time soon?

@rbx
Copy link
Member

rbx commented Aug 22, 2023

I will likely come to this sometime in September.

@ktf
Copy link
Contributor Author

ktf commented Aug 22, 2023

Ok, thank you!

@rbx rbx self-assigned this Sep 19, 2023
@rbx
Copy link
Member

rbx commented Oct 18, 2023

The need to store the alignment value is for use in Message::SetUsedSize, which may need it in an edge-case.

Instead of storing it with each message, we could also accept it as a parameter to SetUsedSize.

What do you think?

@ktf
Copy link
Contributor Author

ktf commented Oct 18, 2023

Sounds reasonable to me. Indeed I could not find any invocation of SetUsedSize in the framework and only one in the old TOFCompressor, which indeed is known in advance so its not needed in any case.

Maybe you could determine minimum alignment from the current pointer (and cap it to, say, page boundaries).

@ktf
Copy link
Contributor Author

ktf commented Oct 18, 2023

basically:

size_t newAlignment = 1 << min(__builtin_ctz(oldPtr), 12);

@rbx
Copy link
Member

rbx commented Oct 18, 2023

Sounds good, then that could be done when no alignment is provided to SetUsedSize. That way no need to break the interface, alignment argument to SetUsedSize will be optional.

Then I would do this and not store alignment in shmem::Message at all.

@ktf
Copy link
Contributor Author

ktf commented Oct 18, 2023

Fully support this.

@rbx
Copy link
Member

rbx commented Oct 24, 2023

The current state in the dev branch is:

  • the alignment member is removed.
  • content of the MetaHeader member is moved directly into Message and sorted:
Manager& fManager;
mutable UnmanagedRegion* fRegionPtr = nullptr;
mutable char* fLocalPtr = nullptr;
size_t fSize = 0;
size_t fHint = 0;
boost::interprocess::managed_shared_memory::handle_t fHandle = -1;
mutable boost::interprocess::managed_shared_memory::handle_t fShared = -1; 
uint16_t fRegionId = 0;
mutable uint16_t fSegmentId;
bool fManaged = true;
bool fQueued = false;

This (plus the fTransport ptr from the parent and the this ptr) brings the size to 80 bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants