CODE docker start up extremely slowly

I’ve started to test the CODE docker container, and I find it painfully slow to start. I launch it with docker-compose, and it takes up to 20 minutes before being ready to accept connections. Most of the time is apparently spent for creating symlinks, e.g.:

code         | kit-00039-00037 2021-11-02 15:46:25.645672 [ kit_spare_001 ] INF  Linking file "/opt/collaboraoffice6.4/share/template/common/wizard/styles/night.ots" to "/opt/lool/child-roots/ArjnoTSdKxGcSeC0/lo/share/template/common/wizard/styles/night.ots"| kit/Kit.cpp:257

and similar. From time to time, these messages also appear in the logs:

code         | wsd-00001-00036 2021-11-02 15:46:09.047874 [ prisoner_poll ] TRC  Poll completed with 0 live polls max (2985842us)(timedout)| net/Socket.cpp:346
code         | wsd-00001-00036 2021-11-02 15:46:09.048035 [ prisoner_poll ] TRC  #20: Sending  ping.| ./net/WebSocketHandler.hpp:528
code         | wsd-00001-00036 2021-11-02 15:46:09.048052 [ prisoner_poll ] TRC  WebSocketHandler::sendFrame: Writing to #20 1 bytes in addition to 0 bytes buffered.| ./net/WebSocketHandler.hpp:704
code         | wsd-00001-00036 2021-11-02 15:46:09.048070 [ prisoner_poll ] TRC  ppoll start, timeoutMicroS: 5000000 size 2| net/Socket.cpp:327
code         | wsd-00001-00036 2021-11-02 15:46:09.048089 [ prisoner_poll ] TRC  Poll completed with 1 live polls max (5000000us)| net/Socket.cpp:346
code         | wsd-00001-00036 2021-11-02 15:46:09.048117 [ prisoner_poll ] TRC  #20: Incoming data buffer 0 bytes, closeSocket? false| ./net/Socket.hpp:1197
code         | wsd-00001-00036 2021-11-02 15:46:09.048189 [ prisoner_poll ] TRC  #20: Wrote outgoing data 3 bytes of 3 buffered bytes.| ./net/Socket.hpp:1275
code         | wsd-00001-00036 2021-11-02 15:46:09.048203 [ prisoner_poll ] TRC  ppoll start, timeoutMicroS: 5000000 size 2| net/Socket.cpp:327
code         | wsd-00001-00036 2021-11-02 15:46:09.048330 [ prisoner_poll ] TRC  Poll completed with 1 live polls max (5000000us)| net/Socket.cpp:346
code         | wsd-00001-00036 2021-11-02 15:46:09.048356 [ prisoner_poll ] TRC  #20: Incoming data buffer 3 bytes, closeSocket? false| ./net/Socket.hpp:1197
code         | wsd-00001-00036 2021-11-02 15:46:09.048369 [ prisoner_poll ] TRC  #20: Incoming WebSocket data of 3 bytes: 8a 01 00  | ...| ./net/WebSocketHandler.hpp:317
code         | wsd-00001-00036 2021-11-02 15:46:09.048400 [ prisoner_poll ] TRC  #20: Incoming WebSocket frame code 10, fin? true, mask? false, payload length: 1, residual socket data: 0 bytes.| ./net/WebSocketHandler.hpp:331
code         | wsd-00001-00036 2021-11-02 15:46:09.048412 [ prisoner_poll ] TRC  #20: Pong received: 381 microseconds| ./net/WebSocketHandler.hpp:356
code         | wsd-00001-00036 2021-11-02 15:46:09.048423 [ prisoner_poll ] TRC  ppoll start, timeoutMicroS: 5000000 size 2| net/Socket.cpp:327

I’ve tried several tweaks to the configuration, but I have not been able to reduce the startup time; this is my current docker-compose:

code:
    container_name: code
    image: collabora/code
    restart: always
    environment:
      - password=??????
      - username=??????
      - domain=nextcloud
      - dictionaries=en
      - extra_params="--o:ssl.enable=false"
    cap_add:
      - MKNOD
    expose:
      - 9980
    networks:
      - net

Am I doing something wrong or really such an infinite startup time is intended behavior?

Wow - now it should start up in a handful of seconds. Try turning full trace logging on and sending the log in - hopefully that will timestamp what is going on. We do need to move files somewhere close so they can be copied, and we need quite a number of capabilities to be able to work sensibly:

setcap cap_fowner,cap_chown,cap_mknod,cap_sys_chroot=ep /usr/bin/loolforkit
setcap cap_sys_admin=ep /usr/bin/loolmount

if you want fast bind-mounting you’ll want loolmount enabled too.

Thanks!

Thanks for your reply. How do I set the capabilities you mentioned in the docker-compose file? I guess something like:

cap_add:
  - MKNOD
  - FOWNER
  - CHOWN
  - SYS_CHROOT
  - SYS_ADMIN

Is that correct?

By the way, I also tried to run the container as privileged by adding:

privileged: true

in order to fix this, but it had no impact on the startup time.

I tried to restart the container with the additional parameters described in the previous post, but still it took ages. At this address you can find the logs of the startup process. It seems to me that most of the time is spent creating symlinks, they are a lot and each one needs about 100-150 ms according to the logs. But there must be some kind of overhead in how the startup script creates the links, since I tried to create a test symlink inside the container and it took just 3 ms:

root@9bed2b6e99f3:/# time ln -s start-collabora-online.sh testlink

real    0m0.003s
user    0m0.002s
sys     0m0.001s

This is interesting - we start to log because the linking is -so- slow.

kit-00043-00041 2021-11-03 06:56:22.848290 [ kit_spare_001 ] INF Jail path: /opt/lool/child-roots/nga9SP6lb0ulsOBD/| kit/Kit.cpp:2388
kit-00043-00041 2021-11-03 06:56:22.848525 [ kit_spare_001 ] INF Mounting is disabled, will link/copy /opt/lool/systemplate → /opt/lool/child-roots/nga9SP6lb0ulsOBD/| kit/Kit.cpp:2460
kit-00043-00041 2021-11-03 06:56:22.848594 [ kit_spare_001 ] INF linkOrCopy all from [/opt/lool/systemplate] to [/opt/lool/child-roots/nga9SP6lb0ulsOBD/].| kit/Kit.cpp:428
kit-00043-00041 2021-11-03 06:56:22.848664 [ kit_spare_001 ] TRC nftw: Skipping redundant path: /opt/lool/systemplate| kit/Kit.cpp:323
kit-00043-00041 2021-11-03 06:56:26.223104 [ kit_spare_001 ] WRN Linking/copying files from /opt/lool/systemplate to /opt/lool/child-roots/nga9SP6lb0ulsOBD/ is taking too much time. Enabling verbose link/copy logging.| kit/Kit.cpp:335

It would be great to get a view of what filesystems are used on your system for the jail paths there.

I believe we had a (still pending) patch from Gabriel to detect stack-fs and in this case do a copy to linkable/ and then hard-link since this is (apparently) much faster than making the FS do an underlying copy by hard-linking.

That should be easy by removing the first:

    // first try a simple hard-link
    if (link(fpath, newPath.c_str()) == 0)
        return;

in linkOrCopyFile.

Gabriel ?

Ah - I found it:

kit/Kit.cpp- bool detectSlowStackingFileSystem(const std::string &directory)

This is only in master of online - ie. CO 2021.

Any chance you can test with the latest alpha release of CO-2021 to see if it improves things? if not we may need to detect a new FS magic here and adapt - and/or back-port that patch to cp-6.4 if it is working well.

Thanks !

The fix is only in master. You just need to backport it to any version you need. I tested locally in 6.4.10 and it works. This is the commit: Optimize for overlayfs by forcing an initial copy to linkable/ · CollaboraOnline/online@4794476 · GitHub

I created a pull request for porting to 6.4: Optimize for overlayfs by forcing an initial copy to linkable/ by gmasei11 · Pull Request #3555 · CollaboraOnline/online · GitHub

Thanks to both for your support. Currently I’m using the collabora/code image (image ID: 30827c1421b8) that was the latest one when I pulled it a couple of days ago. If I’m correct it sports loolswd 6.4.13 (git hash: 078e8b8). To test the fix the easiest way for me would be to have an updated docker image, is it (or will it be) available?

I tried to build the docker image by myself as explained in online/docker/README, but the code compilation gets stuck at:

cd /tmp/online/docker/from-source/builddir/core && ./g -f clone
Submodule 'translations' (https://gerrit.libreoffice.org/translations) registered for path 'translations'
Submodule 'dictionaries' (https://gerrit.libreoffice.org/dictionaries) registered for path 'dictionaries'
Cloning into '/tmp/online/docker/from-source/builddir/core/translations'...

since it seems that https://gerrit.libreoffice.org/translations does not exist.

Edit: I probably fixed it by substituting gerrit with git in the submodule URLs, but now it comes out that I can’t build the docker image since the build script wants to build on my system (Archlinux, which is not supported) rather than inside a container. Weird way to build a docker image…

I tried the freshly released image for 6.4.14.2 and indeed it fixes the problem. @mmeeks @gmasei11 thanks a lot for you support!

@snack You’re welcome! :+1: