Race Condition - Connection is terminated (Nextcloud)

Hey folks,

I am stuck having collabora online on a race condition where the connection is “just closed”:

docker run -t -d -p 127.0.0.1:9980:9980 -e 'aliasgroup1=https://XXX.com:443' --restart always --cap-add MKNOD collabora/code

I have the 127.0.0.1:9980 proxied outside with https and apache2 as a reverse proxy is also proxying with https towards the collabora server (websockets too).

When trying to open a document, this happens in the logs of nextcloud:

nextcloud-nc-1     | 172.17.0.2 - - [14/Oct/2024:12:50:11 +0000] "GET /index.php/apps/richdocuments/wopi/files/429_ocfbhm7lx6ve?access_token=TOKEN&access_token_ttl=0 HTTP/1.1" 200 2392 "-" "COOLWSD HTTP Agent 24.04.7.2"
nextcloud-nc-1     | 172.17.0.2 - - [14/Oct/2024:12:50:11 +0000] "GET /index.php/apps/richdocuments/wopi/files/429_ocfbhm7lx6ve/contents?access_token=TOKEN&access_token_ttl=0 HTTP/1.1" 200 14419 "-" "COOLWSD HTTP Agent 24.04.7.2"

With 172.17.0.2 being the docker IP (on the same host) of collabora.

Collabora itself prints this:

sh: 1: /usr/bin/coolmount: Operation not permitted
sh: 1: /usr/bin/coolmount: Operation not permitted
sh: 1: /usr/bin/coolmount: Operation not permitted
wsd-00001-00055 2024-10-14 12:50:11.705704 +0000 [ docbroker_002 ] WRN  File system of [/opt/cool/child-roots/1-c5c05716/.] is dangerously low on disk space.| wsd/COOLWSD.cpp:433
frk-00028-00028 2024-10-14 12:50:11.707396 +0000 [ forkit ] WRN  The systemplate directory [/opt/cool/systemplate] is read-only, and at least [/opt/cool/systemplate//etc/hosts] is out-of-date. Will have to copy sysTemplate to jails. To restore optimal performance, make sure the files in [/opt/cool/systemplate/etc] are up-to-date.| common/JailUtil.cpp:585
wsd-00001-00055 2024-10-14 12:50:11.801447 +0000 [ docbroker_002 ] WRN  File system of [/opt/cool/child-roots/1-c5c05716/.] is dangerously low on disk space.| wsd/COOLWSD.cpp:433
wsd-00001-00055 2024-10-14 12:50:12.040789 +0000 [ docbroker_002 ] WRN  ToClient-00c: Unusual race - attempts to transition from SessionState::WAIT_DISCONNECT to SessionState::LIVE| wsd/ClientSession.cpp:144
wsd-00001-00055 2024-10-14 12:50:12.056160 +0000 [ docbroker_002 ] ERR  #34: Read failed, have 0 buffered bytes (ECONNRESET: Connection reset by peer)| net/Socket.hpp:1156

I cannot get collabora to respond to nextcloud correctly, and wasnt able to find any logs online that mention “Unusual race” or “SessionState::WAIT_DISCONNECT” at all.

If you have any ideas, please let me know. Thank you!

hii @MatKaplinski Welcome to collabora online forums !

Which version you are using ?

These are some points that may help, please go through this once

  1. coolmount Permission Issue:
    The error /usr/bin/coolmount: Operation not permitted suggests that the container does not have sufficient permissions to execute the coolmount utility. You might want to try running the container with additional capabilities to see if this resolves the issue. For example, try adding --cap-add SYS_ADMIN to your Docker run command:

    docker run -t -d -p 127.0.0.1:9980:9980 -e 'aliasgroup1=https://XXX.com:443' --restart always --cap-add MKNOD --cap-add SYS_ADMIN collabora/code
    

    Make sure your system is secure before adding SYS_ADMIN, as it grants additional privileges to the container.

  2. Disk Space Warning:
    The warning File system of [/opt/cool/child-roots/1-c5c05716/.] is dangerously low on disk space indicates that the container might be running out of disk space. Verify the available space on the host system, especially the partition where Docker stores its volumes. You can do this with:

    df -h
    

    Freeing up space or expanding the partition could resolve this issue.

  3. Read-Only Systemplate Directory:
    The message The systemplate directory [/opt/cool/systemplate] is read-only suggests that Collabora is having trouble updating essential files. You may need to ensure that these directories can be properly mounted and updated by the container. Try adding a writable volume to your Docker run command:

    docker run -t -d -p 127.0.0.1:9980:9980 -e 'aliasgroup1=https://XXX.com:443' --restart always --cap-add MKNOD -v /path/to/systemplate:/opt/cool/systemplate collabora/code
    

    Replace /path/to/systemplate with a location on your host that can be writable.

ATB,
Darshan

Hi Darshan,

Thanks for your quick reply!

I now have the following command:
docker run -t -d -p 127.0.0.1:9980:9980 --restart always --cap-add SYS_ADMIN --cap-add MKNOD -v /mnt/HC_Volume_101450284/collabora/systemplate/:/opt/cool/systemplate collabora/code:24.04.6.2.1

The disk was indeed running low, but still had 2 GB left on the machine, so yeah, I now moved the docker data root onto a seperate disk/mount in my VM.

Now, collabora does not start up completely anymore.
The (end) of the log now looks like this:

wsd-00001-00023 2024-10-16 13:32:50.416630 +0000 [ prisoner_poll ] TRC  #16: Read 3 bytes in addition to 0 buffered bytes| net/Socket.hpp:1167
wsd-00001-00023 2024-10-16 13:32:50.416662 +0000 [ prisoner_poll ] TRC  #16: Incoming data buffer 3 bytes, read result: 3, events: 0x1 (not closed)| net/Socket.hpp:1357
wsd-00001-00023 2024-10-16 13:32:50.416674 +0000 [ prisoner_poll ] TRC  #16: Incoming WebSocket data of 3 bytes: 8A 01 00  | ...| net/WebSocketHandler.hpp:349
wsd-00001-00023 2024-10-16 13:32:50.416741 +0000 [ prisoner_poll ] TRC  #16: Incoming WebSocket frame code 10, fin? true, mask? false, payload length: 1, residual socket data: 0 bytes| net/WebSocketHandler.hpp:363
wsd-00001-00023 2024-10-16 13:32:50.416753 +0000 [ prisoner_poll ] TRC  #16: Pong received: 1211 microseconds| net/WebSocketHandler.hpp:391
wsd-00001-00023 2024-10-16 13:32:50.416765 +0000 [ prisoner_poll ] TRC  #15: setupPollFds getPollEvents: 0x1| net/Socket.hpp:883
wsd-00001-00023 2024-10-16 13:32:50.416773 +0000 [ prisoner_poll ] TRC  #16: setupPollFds getPollEvents: 0x1| net/Socket.hpp:883
wsd-00001-00001 2024-10-16 13:32:36.310939 +0000 [ coolwsd ] INF  Waiting for a new child for a max of 20000ms| wsd/COOLWSD.cpp:4404
wsd-00001-00001 2024-10-16 13:32:56.311301 +0000 [ coolwsd ] INF  Waiting for a new child for a max of 20000ms| wsd/COOLWSD.cpp:4404

Thats why I added the version 24.04.6.2.1 to the docker run command (according to a github issue this once solved it), but it does not work for me yet.

I also cannot reach the instance on 127.0.0.1:9980 in contrast to before where I could get a “OK” via curl.

@MatKaplinski i see :thinking:

SessionState Race Condition: The Unusual race - attempts to transition from SessionState::WAIT_DISCONNECT to SessionState::LIVE and ECONNRESET: Connection reset by peer errors might indicate network issues or mismatched expectations between the reverse proxy and Collabora. Check your Apache reverse proxy configuration to ensure that:

  • WebSockets are correctly handled.
  • The ProxyPass settings properly account for Collabora connections, e.g.:

apache

ProxyPass / http://127.0.0.1:9980/ retry=0
ProxyPassReverse / http://127.0.0.1:9980/
ProxyPassMatch /cool/(.*)/ws ws://127.0.0.1:9980/cool/$1/ws
  • Check for any limitations, timeouts, or additional settings that could be causing the connection to drop.

@MatKaplinski I would suggest if you can update the version to latest ? Is is possible?

I guess 24.04.6.2 is not the latest one…

So the issue with the race condition just disappeared, I assume due to the elevated conditions.
However, more than that, it doesnt start, like I said the container isnt reachable at all.

I now have the container freshly started with the newest image and the first line where I can find “Fail” or “Error” are the following:

Failed to unmount [/opt/cool/child-roots/]| common/JailUtil.cpp:204

Failed to unmount [/opt/cool/child-roots/1-d5a7d6f1/cool_test_mount]| common/JailUtil.cpp:204

Failed to link [/etc/passwd] -> [/opt/cool/systemplate//etc/passwd] (No such file or directory). Will copy and disable linking dynamic system files [...]

Error while copying from /etc/passwd to /opt/cool/systemplate//etc/passwd2TuOl1MP0alz: Failed to open dest /opt/cool/systemplate//etc/passwd2TuOl1MP0alz

while the Error while copying from /etc/passwd is the first log output at level “ERR”.

First, the docker command now in use is:

docker run -t -d -p 127.0.0.1:9980:9980 --restart always --cap-add SYS_ADMIN --cap-add MKNOD -v /mnt/H
C_Volume_101450284/collabora/systemplate/:/opt/cool/systemplate -v /etc/timezone:/etc/timezone:ro -v /etc/localtime:
/etc/localtime:ro collabora/code

And, this is the apache config for collabora:

SSLProxyEngine on
SSLProxyVerify none

SSLProxyCheckPeerName off

SSLProxyCheckPeerCN off

SSLProxyCheckPeerExpire off

 # static html, js, images, etc. served from coolwsd
 # browser is the client part of Collabora Online
 ProxyPass           /browser https://127.0.0.1:9980/browser retry=0
 ProxyPassReverse    /browser https://127.0.0.1:9980/browser


 # WOPI discovery URL
 ProxyPass           /hosting/discovery https://127.0.0.1:9980/hosting/discovery retry=0
 ProxyPassReverse    /hosting/discovery https://127.0.0.1:9980/hosting/discovery


 # Capabilities
 ProxyPass           /hosting/capabilities https://127.0.0.1:9980/hosting/capabilities retry=0
 ProxyPassReverse    /hosting/capabilities https://127.0.0.1:9980/hosting/capabilities

 # Main websocket
 ProxyPassMatch      "/cool/(.*)/ws$"      wss://127.0.0.1:9980/cool/$1/ws nocanon


 # Admin Console websocket
 ProxyPass           /cool/adminws wss://127.0.0.1:9980/cool/adminws


 # Download as, Fullscreen presentation and Image upload operations
 ProxyPass           /cool https://127.0.0.1:9980/cool
 ProxyPassReverse    /cool https://127.0.0.1:9980/cool
 # Compatibility with integrations that use the /lool/convert-to endpoint
 ProxyPass           /lool https://127.0.0.1:9980/cool
 ProxyPassReverse    /lool https://127.0.0.1:9980/cool

@MatKaplinski, I recommend configuring your reverse proxy following the guidelines provided in this documentation: Reverse Proxy with Apache 2. The SDK includes detailed examples and explanations.

If the issue persists after updating the settings, feel free to reach out!

Cheers,
Darshan