Documents created in Collabora Online rejected by Microsoft Exchange attachment scanners

Summary

Documents created or edited in Collabora Online are sometimes rejected by Microsoft Exchange servers when sent as email attachments. The issue appears to be caused by empty XML elements and other minor structural differences in the OOXML output that trigger Exchange’s attachment scanning/filtering.

Environment

  • Collabora Online (CODE) via Nextcloud integration

  • Recipients using Microsoft Exchange / Office 365

Symptoms

  • Emails with .docx attachments are rejected or attachments are stripped

  • Same document content saved from Microsoft Word is accepted

  • No issues opening the documents in Word, LibreOffice, or Collabora itself

Root Cause Analysis

After unpacking and comparing affected .docx files with Word-generated equivalents, the issue appears to be empty run property elements scattered throughout the XML:

xml

<!-- Collabora generates these empty elements -->
<w:rPr></w:rPr>

<!-- Word either omits them entirely or includes content -->

These appear in:

  • word/document.xml

  • word/styles.xml

  • word/numbering.xml

  • word/footer*.xml

While technically valid XML, Exchange’s scanner appears to flag these as malformed or suspicious.

Workaround

We’ve created a PHP script that post-processes .docx files by:

  1. Opening the .docx as a ZIP archive

  2. Removing empty <w:rPr></w:rPr> elements from XML files

  3. Cleaning other minor issues (BOM markers, null bytes, orphaned bookmarks)

After processing, documents pass through Exchange without issues.

Suggested Fix

Could the OOXML writer in Collabora/LibreOffice be modified to either:

  1. Omit empty <w:rPr> elements entirely when serializing, or

  2. Add a “strict compatibility mode” export option that produces cleaner XML?

Happy to share our cleaning script or sample documents that reproduce the issue if that would help with debugging.

#!/usr/bin/env php
<?php
/**
 * DOCX Scrubber - Fixes Collabora/LibreOffice documents for Exchange compatibility
 * 
 * Problem: Documents created in Collabora Online are sometimes rejected by 
 * Microsoft Exchange servers due to empty XML elements and other minor 
 * structural differences in the OOXML output.
 * 
 * Usage: php scrub_docx.php input.docx [output.docx]
 * 
 * If no output file specified, creates input_scrubbed.docx
 * 
 * License: MIT
 */

if (php_sapi_name() !== 'cli') {
    die("This script must be run from the command line\n");
}

if ($argc < 2) {
    echo "DOCX Scrubber - Fix Collabora documents for Exchange compatibility\n\n";
    echo "Usage: php scrub_docx.php input.docx [output.docx]\n";
    exit(1);
}

$inputFile = $argv[1];
$outputFile = $argv[2] ?? pathinfo($inputFile, PATHINFO_DIRNAME) . '/' 
                        . pathinfo($inputFile, PATHINFO_FILENAME) . '_scrubbed.docx';

if (!file_exists($inputFile)) {
    die("Error: File '$inputFile' not found\n");
}

if (!class_exists('ZipArchive')) {
    die("Error: PHP ZipArchive extension required\n");
}

echo "Input:  $inputFile\n";
echo "Output: $outputFile\n\n";

// Open source docx
$zip = new ZipArchive();
if ($zip->open($inputFile) !== true) {
    die("Error: Could not open input file\n");
}

// Create output docx
copy($inputFile, $outputFile);
$zipOut = new ZipArchive();
if ($zipOut->open($outputFile, ZipArchive::CREATE) !== true) {
    $zip->close();
    die("Error: Could not create output file\n");
}

$changes = [];

// Process each file in the archive
for ($i = 0; $i < $zip->numFiles; $i++) {
    $name = $zip->getNameIndex($i);
    $content = $zip->getFromIndex($i);
    
    // Only process XML files
    if (str_ends_with($name, '.xml') || str_ends_with($name, '.rels')) {
        $original = $content;
        
        // 1. Remove empty run properties (main culprit for Exchange rejection)
        $content = preg_replace('/<w:rPr>\s*<\/w:rPr>/', '', $content);
        
        // 2. Remove empty paragraph properties
        $content = preg_replace('/<w:pPr>\s*<\/w:pPr>/', '', $content);
        
        // 3. Remove LibreOffice-specific namespaces and elements
        $content = preg_replace('/\s+xmlns:lo="[^"]*"/', '', $content);
        $content = preg_replace('/<lo:[^>]*\/>/', '', $content);
        $content = preg_replace('/<lo:[^>]*>.*?<\/lo:[^>]*>/s', '', $content);
        
        // 4. Remove UTF-8 BOM if present
        $content = preg_replace('/^\xEF\xBB\xBF/', '', $content);
        
        // 5. Remove null bytes
        $content = str_replace("\x00", '', $content);
        
        // 6. Fix malformed XML declaration (leading whitespace)
        $content = preg_replace('/^[\s]+(<\?xml)/', '$1', $content);
        
        // 7. Normalize line endings
        $content = str_replace(["\r\n", "\r"], "\n", $content);
        
        // 8. Remove orphaned bookmark ends
        if (str_contains($name, 'document.xml')) {
            preg_match_all('/<w:bookmarkStart[^>]*w:id="([^"]*)"[^>]*\/>/', $content, $starts);
            preg_match_all('/<w:bookmarkEnd[^>]*w:id="([^"]*)"[^>]*\/>/', $content, $ends);
            $startIds = $starts[1] ?? [];
            foreach (($ends[1] ?? []) as $id) {
                if (!in_array($id, $startIds)) {
                    $content = preg_replace('/<w:bookmarkEnd[^>]*w:id="' . preg_quote($id, '/') . '"[^>]*\/>/', '', $content);
                }
            }
        }
        
        if ($content !== $original) {
            $changes[] = $name;
            $zipOut->addFromString($name, $content);
        }
    }
}

$zip->close();
$zipOut->close();

if (empty($changes)) {
    echo "No changes needed - file appears clean.\n";
} else {
    echo "Cleaned " . count($changes) . " file(s):\n";
    foreach ($changes as $f) {
        echo "  - $f\n";
    }
}

echo "\nOutput: $outputFile\n";

Hi @dala

Thanks for sharing the detailed report.

I have asked the team about the issue let me get back to you once i have some insight about this interop problem.

Nice to see how you did the workaround thing, Awesome :tada:

thanks
Darshan

Thanks… I ended up incorporating it into the right-click context menu in Nextcloud for now, but it seems something that could be considered for default save, save as and/or download.

@dala Looks great. May I ask how you’re using Collabora Online? I’m curious about how it’s integrated, are you using it with Nextcloud?

Yes, we use Collabora Online in Nextcloud

1 Like

The worst thing is that when an e-mail is sent with the empty elements, Exchange just disappears the message without notification to the sender or recipient… so it took a while to figure out what was happening.