Collabora failed to save document opened from Nextcloud

I have a really strange issue after migrating from a physical server to another.

All my setup is docker based and docker images are the same on both server. The installation is a replica from one machine to another (same domain as the old server will be retired soon).

The reverse proxy is Traefik which is connected through a docker network called web to NGINX in front of Nextcloud (php-fpm) and Collabora CODE.

In Nextcloud the Office app shows that everything is fine and I removed the WOPI list for test purposes.

I can open any document at first, but at the first save I have an error message saying that the document cannot be saved and to check the rights. After that any document I try to open is failing with the Unauthorized WOPI Host error.

I can see this in the Collabora logs:

 
kit-00135-00135 2026-06-16 17:51:17.617206 [ kitbroker_017 ] WRN  #24: Socket still open post onDisconnect(), forced shutdown.|net/Socket.hpp:1298
kit-00406-00406 2026-06-16 17:51:17.617959 [ kitbgsv_00d_001 ] WRN  #23: Socket still open post onDisconnect(), forced shutdown.|net/Socket.hpp:1298
kit-00135-00135 2026-06-16 17:51:17.837347 [ kitbroker_017 ] WRN  #23: Background save process disconnected but not terminated 406|kit/KitWebSocket.cpp:378
kit-00135-00135 2026-06-16 17:51:17.837378 [ kitbroker_017 ] WRN  #23: Socket still open post onDisconnect(), forced shutdown.|net/Socket.hpp:1298
wsd-00001-00387 2026-06-16 17:51:17.855133 [ docbroker_017 ] ERR  Unexpected response to WOPI::PutFile. Cannot upload file to WOPI storage uri [https://nextcloud.domain.com/index.php/apps/richdocuments/wopi/files/317393_oc5qbla4yldx/contents?access_token=ZZGIU5KI5e4PqOrJlNfycsGQUbjSj4k3&access_token_ttl=0]: No response received. Connection terminated or timed-out.|wsd/wopi/WopiStorage.cpp:1046
wsd-00001-00387 2026-06-16 17:51:17.855152 [ docbroker_017 ] ERR  Failed to upload docKey [https%3A%2F%2Fnextcloud.domain.com%3A443%2Findex.php%2Fapps%2Frichdocuments%2Fwopi%2Ffiles%2F317393_oc5qbla4yldx] to URI [https://nextcloud.domain.com/index.php/apps/richdocuments/wopi/files/317393_oc5qbla4yldx?access_token=ZZGIU5KI5e4PqOrJlNfycsGQUbjSj4k3&access_token_ttl=0]. Notifying client.|wsd/DocumentBroker.cpp:3360
wsd-00001-00387 2026-06-16 17:51:17.856629 [ docbroker_017 ] ERR  #127: WOPI::CheckFileInfo returned 0 (Unknown)  for URI [https://nextcloud.domain.com/index.php/apps/richdocuments/wopi/files/317393_oc5qbla4yldx?access_token=ZZGIU5KI5e4PqOrJlNfycsGQUbjSj4k3&access_token_ttl=0]. Headers:         Body: []|wsd/wopi/CheckFileInfo.cpp:98
wsd-00001-00387 2026-06-16 17:51:17.856642 [ docbroker_017 ] ERR  #127: Failed or timed-out CheckFileInfo [https://nextcloud.domain.com/index.php/apps/richdocuments/wopi/files/317393_oc5qbla4yldx?access_token=ZZGIU5KI5e4PqOrJlNfycsGQUbjSj4k3&access_token_ttl=0]|wsd/wopi/CheckFileInfo.cpp:112

The first 4 lines are also present in the log of a working instance, so it is not related to the save issue.

Even more strange, if I re-launch Collabora on the old server (and change the DNS to have the URL pointing back to this old server), everything is working fine without changing anything on my Nextcloud instance.

The log on the working instance at save:

kit-00037-00037 2026-06-16 17:48:23.641594 [ kitbroker_001 ] WRN  #25: Socket still open post onDisconnect(), forced shutdown.|net/Socket.hpp:1298
kit-00055-00055 2026-06-16 17:48:23.642068 [ kitbgsv_008_001 ] WRN  #24: Socket still open post onDisconnect(), forced shutdown.|net/Socket.hpp:1298
kit-00037-00037 2026-06-16 17:48:23.807339 [ kitbroker_001 ] WRN  #24: Background save process disconnected but not terminated 55|kit/KitWebSocket.cpp:378
kit-00037-00037 2026-06-16 17:48:23.807375 [ kitbroker_001 ] WRN  #24: Socket still open post onDisconnect(), forced shutdown.|net/Socket.hpp:1298

I cannot figure out what could be the difference as my setup has been working for years like that.

For sure, I cannot keep the old server just to have a functional Collabora instance :rofl:

Any idea on what to look for to try to figure out the reason for the issue ?

While continuing my investigation, I tried several other things:

  1. Connect Nextcloud to other Collabora instances (not on the same physical server) => working fine
  2. Connect another Nextcloud (hosted on a different machine) to Collabora => seems OK
  3. Sometimes with the initial setup (Nextcloud and Collabora on the same new machine), I can edit some file without any issue after 2 or 3 files, I get either the Save error or the WOPI error

I even uninstalled completely docker-ce and containerd and removing all associated directories, but no change.

This randomness is quite strange, any traces to activate to try to figure out what could be the issue ?

The only alternative I can see so far is to host Collabora on another machine :man_shrugging: But that would only be a workaround and a waste of resources as the new server is powerful enough to host it.

hi @doc75

Thanks for the really thorough write-up — the follow-up details actually help a lot. Two things in particular stand out: it works fine for a few files and then fails, and pointing your Nextcloud at other Collabora instances works perfectly. That combination strongly suggests the problem isn’t Collabora itself but the network path from the CODE container back to Nextcloud on this specific new host. The “Unauthorized WOPI Host” you see afterwards is almost certainly a downstream symptom — once a PutFile/CheckFileInfo times out, the session state goes bad and later opens get rejected. So I’d ignore the WOPI allowlist for now and chase the connectivity instead.

Here are three things to capture that should pin it down:

1. Loop the WOPI endpoint from inside the CODE container to catch the intermittent failure:

docker exec -it <collabora_container> bash
getent hosts nextcloud.domain.com    # where does it actually resolve to?
for i in $(seq 1 20); do
  curl -sS -o /dev/null -w "%{http_code} %{time_total}s\n" \
    https://nextcloud.domain.com/status.php
done

If some requests hang or time out while others return 200, that confirms it’s network-layer.

2. Test for an MTU mismatch — send a large packet with the don’t-fragment bit set:

docker exec -it <collabora_container> ping -M do -s 1472 nextcloud.domain.com

If small packets pass but large ones drop, that’s your culprit. It would also explain the symptom perfectly: opening a doc sends small requests, but PutFile uploads the whole document as a big payload — so opens succeed and saves fail. Fix is setting the mtu on your docker network (or the daemon) to match the host.

3. Suspect hairpin NAT. Since same-machine fails but external Collabora works, the container is likely routing out to your public IP and back in through Traefik, and the new host handles that hairpin differently than the old one did. The cleanest fix is to keep that traffic on the host entirely with extra_hosts:

extra_hosts:
  - "nextcloud.domain.com:172.x.x.x"   # internal IP of your proxy / nextcloud container

That way the CODE container resolves Nextcloud to the internal address and never leaves the host.

My money’s on #2 or #3 given the “works for a couple files then dies” pattern.

Thanks for the hints :folded_hands: . I was away from keyboard this week, but I will look to it carefully and get back with the information gathered.

I had some slots to perform the tests.
Here are the outcomes.

returns `2001:xxxx:xxxx:xxxx::1 machine.domain.com nextcloud.domain.com`

returns:

200 0.086982s
200 0.052043s
200 0.089986s
200 0.057560s
200 0.048608s
200 0.080195s
200 0.061344s
200 0.093282s
200 0.079749s
200 0.062167s
200 0.074365s
200 0.080013s
200 0.090614s
200 0.068656s
200 0.053457s
200 0.058816s
200 0.077725s
200 0.076787s
200 0.084326s
200 0.054671s

returns:

PING machine.domain.com (127.0.1.1) 1472(1500) bytes of data.
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=1 ttl=64 time=0.026 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=2 ttl=64 time=0.031 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=3 ttl=64 time=0.054 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=4 ttl=64 time=0.052 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=5 ttl=64 time=0.046 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=6 ttl=64 time=0.043 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=7 ttl=64 time=0.047 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=8 ttl=64 time=0.046 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=9 ttl=64 time=0.062 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=10 ttl=64 time=0.048 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=11 ttl=64 time=0.039 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=12 ttl=64 time=0.054 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=13 ttl=64 time=0.045 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=14 ttl=64 time=0.045 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=15 ttl=64 time=0.046 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=16 ttl=64 time=0.044 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=17 ttl=64 time=0.045 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=18 ttl=64 time=0.045 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=19 ttl=64 time=0.051 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=20 ttl=64 time=0.049 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=21 ttl=64 time=0.050 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=22 ttl=64 time=0.049 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=23 ttl=64 time=0.052 ms
1480 bytes from machine.domain.com (127.0.1.1): icmp_seq=24 ttl=64 time=0.049 ms

I put both IPv4 and IPv6, as when putting only IPv4, the getent returned me the machine IPv6:

extra_hosts:
  - "nextcloud.domain.com:172.18.0.5"
  - "nextcloud.domain.com:fd00:dead:beef::5"

But doing that, it does not solve the issus and got me the following output:

$ getent hosts nextcloud.domain.com
fd00:dead:beef::5 nextcloud.domain.com

$ for i in $(seq 1 20); do curl -sS -o /dev/null -w "%{http_code} %{time_total}s\n" https://nextcloud.domain.com/status.php; done
curl: (7) Failed to connect to nextcloud.domain.com port 443 after 0 ms: Could not connect to server
000 0.000758s
...

I guess that the last errors are linked to the fact that my container is not handling SSL as it is behind the SSL endpoint which is traefik.

@darshan, If that gives you any idea on what I could do next, do not hesitate :wink: