Skip to content

Support TCP for protocol messages#3636

Draft
softins wants to merge 2 commits into
jamulussoftware:mainfrom
softins:tcp-protocol
Draft

Support TCP for protocol messages#3636
softins wants to merge 2 commits into
jamulussoftware:mainfrom
softins:tcp-protocol

Conversation

@softins

@softins softins commented Mar 11, 2026

Copy link
Copy Markdown
Member

Short description of changes

Support fallback to TCP for protocol messages, in order to overcome potential loss of large messages due to UDP fragmentation. Currently an incomplete draft, for comment as development continues.

CHANGELOG: Client/Server: Support TCP fallback for protocol messages.

Context: Fixes an issue?

Discussed in issue #3242.

Does this change need documentation? What needs to be documented and how?

It will need documentation once design and development are complete. Particularly need to explain the firewall requirements for a server or directory.

Status of this Pull Request

Incomplete, still under development. Main server side complete and working. Client side development in progress. Complete and ready for review and testing. Still marked draft as it needs some of the debug messages to be commented out before merging.

What is missing until this pull request can be merged?

A lot of testing of both server and client. Intended for Jamulus 4.0.0.

Checklist

  • I've verified that this Pull Request follows the general code principles
  • I tested my code and it does what I want
  • My code follows the style guide
  • I waited some time after this Pull Request was opened and all GitHub checks completed without errors.
  • I've filled all the content above

@softins softins added this to the Release 4.0.0 milestone Mar 11, 2026
@softins softins self-assigned this Mar 11, 2026
@softins

softins commented Mar 11, 2026

Copy link
Copy Markdown
Member Author

So far, this implements the server side of the design described here and here

@softins softins force-pushed the tcp-protocol branch 4 times, most recently from 5e1a658 to 0ae51e2 Compare March 16, 2026 13:05
@softins softins linked an issue Mar 16, 2026 that may be closed by this pull request
@softins softins added the feature request Feature request label Mar 16, 2026
@softins softins force-pushed the tcp-protocol branch 3 times, most recently from 7ad1d1f to d939e5b Compare March 26, 2026 17:38
@softins

softins commented Mar 28, 2026

Copy link
Copy Markdown
Member Author

So the next stage of implementation has been achieved: client-side support in the Connect dialog.

  1. If the server list has not been received via UDP when the associated message indicating TCP support has arrived, the client will retry fetching the server list over TCP.
  2. If the client list for a server has not been received via UDP when the associated message indicating TCP support has arrived, the client will retry fetching the client list over TCP, and will continue to use TCP for that server while the Connect dialog is open.
  3. A directory or server that does not have TCP support will not send the TCP supported message, and will continue to be handled as in current versions.
  4. If the server list or client list is successfully received over UDP, there is no need for the client to try TCP.

It has been tested by using nft to drop outbound Jamulus UDP messages with a specific message ID, to simulate loss due to fragmentation.

Examples for a directory-enabled server running on port 22120:

  • drop UDP server list: nft add rule inet filter output udp sport 22120 @ih,16,16 0xee03 drop
  • drop UDP client list: nft add rule inet filter output udp sport 22120 @ih,16,16 0xf503 drop
  • drop UDP "TCP supported" msg: nft add rule inet filter output udp sport 22120 @ih,16,16 0xfb03 drop

Note that nft rules require network byte order (big-endian), but Jamulus IDs are little-endian:

  • CLM_SERVER_LIST = 1006 = 0x03ee => 0xee03 (LE byte order)
  • CLM_RED_SERVER_LIST = 1018 = 0x03fa => 0xfa03 (LE byte order)
  • CLM_CONN_CLIENTS_LIST = 1013 = 0x03f5 => 0xf503 (LE byte order)
  • CLM_TCP_SUPPORTED = 1019 = 0x03fb => 0xfb03 (LE byte order)

@softins

softins commented Mar 28, 2026

Copy link
Copy Markdown
Member Author

The next step is to try implementing the connected-mode TCP described here

@ann0see ann0see self-requested a review April 7, 2026 14:51
Comment thread src/tcpserver.h
Comment thread src/main.cpp
bool bUseTranslation = true;
bool bCustomPortNumberGiven = false;
bool bEnableIPv6 = false;
bool bEnableTcp = false;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we'll have a long time for the 4.0 release, I'd enable it by default soon (of course once we've tested that the basics work)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I disagree. It's a server-only option, and most servers operators will not need to enable TCP support. Only those running large directories or large servers will need to, and they also need to understand and configure their firewall requirements.

TCP support in the client will indeed be enabled by default, but will only take effect when talking to a directory or server that has enabled it.

If a server operator enables TCP without having configured their firewall correctly, client users could have problems as the server would advertise TCP support to the client, but the client could be unable to connect.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not give an error message or fallback procedure in case the TCP connection timed out?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm sure we can. I haven't yet tested that scenario.

But it doesn't negate my view that server-side TCP support needs to be an explicit option.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/connectdlg.cpp Outdated
Comment thread src/connectdlg.cpp Outdated
@ann0see ann0see added the bug Something isn't working label Apr 9, 2026
@github-project-automation github-project-automation Bot moved this to Triage in Tracking Apr 9, 2026
@ann0see ann0see moved this from Triage to In Progress in Tracking Apr 9, 2026
@softins

softins commented Apr 9, 2026

Copy link
Copy Markdown
Member Author

Well I've finished implementing everything I intended to, for directory, server and client, so it's ready for reviewing and trying out, as and when time permits (post 3.12.0).

I have a private directory and server built and running with TCP support, at newjam.softins.co.uk on the standard port 22124.

In order to demonstrate the use of TCP in a new client's connect dialog, it will be necessary to use custom firewall filters on the client end to temporarily drop incoming UDP Jamulus protocol messages containing a server list or connected clients list.

There is full forward and backward compatibility between clients and servers built with TCP support and older versions.

@softins softins marked this pull request as ready for review April 9, 2026 22:48
@softins softins marked this pull request as draft April 10, 2026 06:30
@softins

softins commented Apr 10, 2026

Copy link
Copy Markdown
Member Author

Keeping as draft, because it will need quite a few debug messages removed before merging.

Comment thread docs/TCP.md

If a server were to offer TCP to the client, but the server's firewall didn't allow the incoming TCP connection, the client request for TCP would wait until its request times out.

This has to be the responsibility of the server/directory operator, and is why TCP operation must be controlled by a command-line option, rather than always enabled. The operator should only enable TCP in the Jamulus server if they know their environment has been configured to support it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to disagree with it being a reason for not always enabling it, as stated. Things should carry on working exactly as they would without it -- except an extra exchange happens that fails. Enabling by default IPv6 does the same.

Comment thread docs/TCP.md

This has to be the responsibility of the server/directory operator, and is why TCP operation must be controlled by a command-line option, rather than always enabled. The operator should only enable TCP in the Jamulus server if they know their environment has been configured to support it.

Most operators of small servers of directories will not need to be concerned with TCP at all. _The only server operators who will need to enable TCP support are those running large directories (e.g. Volker, Peter) or those running a large server designed to support many simultaneous client connections._

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is the real reason. I think it should be explain like this primarily, rather than on the technicalities.

It's pointless doing it for everyone. It doesn't matter, it's just pointless.

Comment thread docs/TCP.md

The reason for using a TCP connection in an active session is just to provide a reliable path for delivering a list of connected clients that could be large and subject to fragmentation (if it is sent over UDP). So the established TCP connection is only used to deliver client lists, and not other protocol messages.

Therefore, if the server has an active TCP connection from the client, it will use the connectionless `CLM_CONN_CLIENTS_LIST` message to deliver updates for the connected client list. If there is no active TCP connection, updates will be delivered using the connected-mode `CONN_CLIENTS_LIST` over UDP as at present.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth noting at this point (as it's in the flow and notable) why the two messages are different - i.e. why it doesn't bother to use the CONN_CLIENTS_LIST.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll add something. The reason is that CONN_CLIENTS_LIST messages are numbered and acked, to provide reliable delivery or retry over UDP. That's not needed over reliable TCP and would complicate things, so a simple CLM_CONN_CLIENTS_LIST is used instead.

@softins

softins commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

I have a few things to change here:

  • Change the values of CLM_CLIENT_ID and CLM_TCP_SUPPORTED, since the values I used have now been taken by server features and (hopefully) welcome message.
  • Add a TCP bit value to the server features.
  • This one will please @pljones: I am going to try and push the retry-with-TCP logic and handling of CLM_TCP_SUPPORTED down to CClient, so that the connect dialog doesn't see or control it, but just gets delivered the resulting message, whether it was obtained by UDP or TCP. It was planning how to implement it for JSONRPC that convinced me to try that.

Comment thread src/protocol.cpp
@softins softins force-pushed the tcp-protocol branch 2 times, most recently from c78767e to 1ac1a02 Compare June 26, 2026 19:56
@pljones

pljones commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Probably worth squashing now. How close do you reckon it is to safe to use on a private Directory and Server for testing?

It really would be good to write up a set of test cases, with the logging enabled to show it's working right. Things like explaining how to test over a LAN... (I'd not have a clue.)

@softins

softins commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

Probably worth squashing now. How close do you reckon it is to safe to use on a private Directory and Server for testing?

It really would be good to write up a set of test cases, with the logging enabled to show it's working right. Things like explaining how to test over a LAN... (I'd not have a clue.)

Well I have an outstanding significant structural change to make on the client side, but am on holiday for two weeks from tomorrow, and I don't know if I will get time to dip into it while away.

It certainly should work in its current state, but maybe better to wait until I'm back in circulation.

@softins

softins commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

In the meantime, if anyone wants to test, you can set up a server and a client on the same LAN, or even the same machine. The server should also be a directory, using -e localhost, and should also enable TCP with --enabletcp. By default, communication between client and server while in the Connect Dialog will not fall back to TCP, as the server list and client list will get through ok on UDP. The fallback can be tested using firewall rules to suppress specific UDP responses. Please see https://github.com/softins/jamulus-tcp-tests for a set of scripts that can do this on a Linux system with nft (I wasn't confident that these should go in the main Jamulus repo).

@softins

softins commented Jun 26, 2026

Copy link
Copy Markdown
Member Author

As mentioned in an earlier comment, I have a directory at newjam.softins.co.uk port 22124 running the current version of code for this PR.

@softins

softins commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

I have reworked the client side to do all the UDP/TCP selection at the CClient level, and there are no longer any changes to clientdlg or connectdlg, and none needed to clientrpc. @pljones, I think this should be much more to your liking.

Instead of storing the UDP/TCP mode for a particular server in the connect dialog, it now uses a QHash within CClient - in fact separate ones for fetching the list of servers from a directory, and the list of connected clients from a server.

I've also squashed the multitude of commits into a single commit (although there is now a second to follow it).

I still need to remove or comment out a lot of debug messages which were intended more for proving to myself the logic during development, than they were for production.

Comment thread src/client.cpp
QTcpSocket* pSocket = new QTcpSocket ( this );

// timer for TCP connect timeout because Qt defaults to 30 seconds
// and we want it to be 3 seconds (TCP_CONNECT_TIMEOUT_MS)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs clarifying: nowhere in the close neighbourhood is TCP_CONNECT_TIMEOUT_MS mentioned. (I can see it's kicked off after all the set up bits have set things up - maybe just mention that.)

@pljones

pljones commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Very nice. It makes it more like "here's another way the to handle protocol messages".

Comment thread src/channel.cpp
{
qDebug() << "- sending client list via TCP";

ConnLessProtocol.CreateCLConnClientsListMes ( InetAddr, vecChanInfo, pTcpConnection );

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make clear why the connectionless protocol is used here - I suppose since TCP handles the session, it's enough.

Comment thread src/client.cpp
switch ( eFetchMode )
{
case CFM_UDP_REQUEST:
qWarning() << "Unsatisfied Client List request via UDP for" << InetAddr.toString();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
qWarning() << "Unsatisfied Client List request via UDP for" << InetAddr.toString();
qWarning() << "Unsatisfied Client List request via UDP for " << InetAddr.toString();

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed. qWarning() and its relatives automatically put a space between << items.

Comment thread src/client.cpp
ConnLessProtocol.CreateCLReqConnClientsListMes ( InetAddr, PROTO_UDP );
break;
case CFM_TCP_REQUEST:
qWarning() << "Unsatisfied Client List request via TCP for" << InetAddr.toString() << "(switching back to UDP)";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
qWarning() << "Unsatisfied Client List request via TCP for" << InetAddr.toString() << "(switching back to UDP)";
qWarning() << "Unsatisfied Client List request via TCP for " << InetAddr.toString() << "(switching back to UDP)";

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, as above

Comment thread src/client.cpp
{
if ( pendingClientList.value ( InetAddr ) == CFM_UDP_REQUEST )
{
qDebug() << "- UDP client list not received from" << InetAddr.toString() << "- retrying via TCP";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to log this in production.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. I was actually considering removing it, as it can be quite noisy if a directory contains servers that are either not configured properly or very distant.

Comment thread src/protocol.cpp
- PROTMESSID_CLM_CLIENT_ID


- PROTMESSID_CLM_CLIENT_ID: Sends the client's channel ID back to the server

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any security implications if the client sends a wrong id?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can think of, because I made sure the server will only accept it if it comes from the correct IP address. Otherwise it's ignored.

Comment thread src/protocol.cpp
}

bool CProtocol::EvaluateCLReqConnClientsListMes ( const CHostAddress& InetAddr )
bool CProtocol::EvaluateCLReqConnClientsListMes ( const CHostAddress& InetAddr, CTcpConnection* pTcpConnection )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we need to pass CTcpConnection that often and cannot just pass it once to the class?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we have a separate TCP connection with each different peer.

Comment thread src/serverlist.cpp
{
// send empty message to keep NAT port open at registered server
pConnLessProtocol->CreateCLEmptyMes ( ServerList[iIdx].HostAddr );
pConnLessProtocol->CreateCLEmptyMes ( ServerList[iIdx].HostAddr, nullptr );

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateCLEmptyMes could also have a default argument == nullptr I suppose.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, I'll check.

@ann0see ann0see left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@ann0see

ann0see commented Jul 2, 2026

Copy link
Copy Markdown
Member

I disagree. The CLM_TCP_SUPPORTED message is not just to let the client know we support TCP. It also enables the client to know that it didn't receive the expected message via UDP that was sent immediately before it.

Then we should probably rename it to express that it is not just TCP support but rather that it request a TCP connection due to network problems.

@softins

softins commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

I disagree. The CLM_TCP_SUPPORTED message is not just to let the client know we support TCP. It also enables the client to know that it didn't receive the expected message via UDP that was sent immediately before it.

Then we should probably rename it to express that it is not just TCP support but rather that it request a TCP connection due to network problems.

But it doesn't mean that. The message is still sent and received if there are no network problems. It is the client that decides what to do with it, by inferring network problems only if CLM_TCP_SUPPORTED is received without being preceded by the expected server or client list. It is only telling the client that the server can support TCP, not that TCP must be used.

@pljones

pljones commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

It's more "TCP is available as an alternative to the UDP response you were expecting (<this one>) - ask and you'll get it reliably".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working feature request Feature request needs documentation PRs requiring documentation changes or additions

Projects

Status: Waiting on Team

Development

Successfully merging this pull request may close these issues.

Support TCP for protocol messages

3 participants