Skip to content

[core] Converted GlobControlLock to a read-write lock.#3014

Draft
maxsharabayko wants to merge 1 commit intoHaivision:masterfrom
maxsharabayko:develop/shared-group-ctrl-lock
Draft

[core] Converted GlobControlLock to a read-write lock.#3014
maxsharabayko wants to merge 1 commit intoHaivision:masterfrom
maxsharabayko:develop/shared-group-ctrl-lock

Conversation

@maxsharabayko
Copy link
Collaborator

CUDTUnited::m_GlobControlLock is now a read-write lock (srt::sync::SharedMutex).

CUDTUnited::m_GlobControlLock guards all member containers of the CUDTUnited.

  • CUDTUnited::m_Groups - add/remove under exclusive lock, access - under non-exclusive lock.

  • CUDTUnited::m_Sockets - add/remove under exclusive lock, access - under non-exclusive lock.

  • CUDTUnited::m_PeerRec – record sockets from peers to avoid repeated connection request. Add/remove under exclusive lock, access - under non-exclusive lock.

  • CUDTUnited::m_ClosedSockets - add/remove under exclusive lock, access - under non-exclusive lock.

  • CUDTUnited::m_mMultiplexer - - add/remove and access under exclusive lock. ❗

Addresses #2393.

@maxsharabayko maxsharabayko added Type: Maintenance Work required to maintain or clean up the code [core] Area: Changes in SRT library core labels Aug 21, 2024
@maxsharabayko maxsharabayko added this to the v1.6.0 milestone Aug 21, 2024
@maxsharabayko maxsharabayko requested a review from ethouris August 21, 2024 15:10
setupMutex(m_GCStopLock, "GCStop");
setupCond(m_GCStopCond, "GCStop");
setupMutex(m_GlobControlLock, "GlobControl");
setupMutex(m_IDLock, "ID");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to decide something about these calls. With the current locking implementation these are not needed and have been left here to allow implementation of the mutex tracker, which is still hanging around in one of the PRs. So, either add an appropriate empty function with that name to cover the stub, or we'll need to remove them all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a problem with an initialization of a conditional variable. There is a difference in POSIX on how you initialize the one in a global / static scope, and the one in runtime. That's what setupCond do on POSIX builds. And there was a crash if that initialization was done in the constructor.
setupMutex is indeed not needed except for the mutex tracker.

void Condition::init()
{
    pthread_condattr_t* attr = NULL;
#if SRT_SYNC_CLOCK == SRT_SYNC_CLOCK_GETTIME_MONOTONIC
    pthread_condattr_t  CondAttribs;
    pthread_condattr_init(&CondAttribs);
    pthread_condattr_setclock(&CondAttribs, CLOCK_MONOTONIC);
    attr = &CondAttribs;
#endif
    const int res = pthread_cond_init(&m_cv, attr);
    if (res != 0)
        throw std::runtime_error("pthread_cond_init monotonic failed");
}

leaveCS(m_GlobControlLock);
s->core().closeInternal();
enterCS(m_GlobControlLock);
HLOGC(smlog.Debug, log << "GC/removeSocket: DELETING SOCKET @" << u);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was here to fix the problem of locking m_ConnectionLock after m_GlobControlLock. Why is that no longer necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GlobControl "resource" is now held in a shared manner protecting containers from modification, but there is no mutex being locked. Hence no need for this unlock-lock trick.

#endif

sync::Mutex m_GlobControlLock; // used to synchronize UDT API
/// Guards all member containers of `CUDTUnited`. Protect UDT API from data races.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UDT???

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Guards all member containers of `CUDTUnited`. Protect UDT API from data races.
/// Guards all member containers of `CUDTUnited`. Protect SRT API from data races.


sync::Mutex m_GlobControlLock; // used to synchronize UDT API
/// Guards all member containers of `CUDTUnited`. Protect UDT API from data races.
/// Non-exclusive lock prohibits changes of containers (insert/remove), but allows modifications
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Shared lock" would be not unconfusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Non-exclusive lock prohibits changes of containers (insert/remove), but allows modifications
/// A shared (non-exclusive) lock prohibits changes of containers (insert/remove), but allows modifications

CUDTGroup* g = m_parent->m_GroupOf;
if (g)
{
// TODO: Lock this group `g`??
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's likely a good idea. And also the shared lock suffices for that, while acquireSocketsGroup could be used here (that is, the constructor overload of GroupKeeper that uses this call).

// m_RecvAckLock is ordered AFTER m_GlobControlLock, so this can only
// be done now that m_RecvAckLock is unlocked.
ScopedLock glock (uglobal().m_GlobControlLock);
SharedLock glock (uglobal().m_GlobControlLock);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this place you can consider using GroupKeeper instead of SharedLock on m_GlobControlLock - if possible.

Copy link
Collaborator

@ethouris ethouris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment describes a serious problem here, please fix. Others just to consider.

}

leaveCS(CUDT::uglobal().m_GlobControlLock);
CUDT::uglobal().m_GlobControlLock.lock_shared();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be unlock_shared()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agh, right.

Suggested change
CUDT::uglobal().m_GlobControlLock.lock_shared();
CUDT::uglobal().m_GlobControlLock.unlock_shared();

@ethouris ethouris mentioned this pull request Mar 13, 2025
43 tasks
@isotherm
Copy link

isotherm commented Mar 25, 2025

Greetings,

I arrived here due to a contention problem with many sockets (330+ receive sockets on a 48-core Xeon running Linux). This creates 1300+ threads: network receive, TsbPd, RcvQ, SndQ. According to perf, this global lock was the main problem, with CPU usage at 100%, mostly spent in futex spinlocks.

Therefore I cherry picked this change onto 1.5.4 to verify the results. However, performance actually dropped. perf showed even more futex contention. I was able to resolve by replacing the SharedMutex implementation with pthread_rwlock_* (or std::shared_mutex). These implementations do not require a futex once a reader already holds the lock, since it can be done with atomics. Using either of these library shared mutexes, CPU usage drops to ~45%, and we could receive on all sockets without loss.

I am trying to fix this problem now with an eye on the future, so I have some questions about this changeset:

  1. The SharedMutex API appears to mirror std::shared_mutex. Will there be a way to enable the more performant implementation (pthread or C++17) at compile time? Would you welcome a pull request to that effect? (Note, SharedMutex::getReaderCount, used only in tests, wouldn't be supported with either library implementation.)
  2. Is the global mutex needed to serialize more than access to the global maps, or can we further improve, for example:
    • Using concurrent data structures, for example a map that has locks per tree branch?
    • Using wait-free, lock-free structures like a wait-free, lock-free hash map?
    • Using read-copy-update, since the reads happen thousands of times per second per socket but opening and closing sockets, and other socket events, are quite seldom?
  3. Has there been any general consideration of how to address performance scaling? Is there anything on the roadmap that I can work toward and/or submit pull requests toward? Since I will be generating patches, I would prefer them to align with future goals, where possible.

Note, all the threads are still taking significant CPU and frequent wake ups (see e.g. #3094), but that is mostly not related to the global lock.

@ethouris
Copy link
Collaborator

ethouris commented Aug 4, 2025

Hi @isotherm, sorry that it took that long to review this.

This PR is going to be abandoned because it can be no longer processed, however changes provided here are currently merged to the dev branch, which is going to become the future 1.6.0 release.

The std::shared_mutex is used as an implementation for srt::sync::SharedMutex if you --use-stdc++-sync and --use-c++-std=c++17, which isn't default, but it's supported. This will be not merged to 1.5.5 yet, but is in this development branch for the future 1.6.0 version.

We probably will have some lock-free containers in the future, but m_GlobControlLock isn't that easy like just to change the implementation for the sockets container. This mutex keeps consistency of the whole system, so for example removal of a socket from m_Sockets and putting it into m_ClosedSockets must be done non-preemptively, and there ain't no way to do this using lock-free algorithms.

I will be all the way interested how the performance has changed in the dev branch towards 1.5.5 candidate. My last measurements were optimistic, but the perspective of the others is all too welcome.

@ethouris ethouris modified the milestones: v1.6.0, Parking Lot Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[core] Area: Changes in SRT library core Type: Maintenance Work required to maintain or clean up the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants