Summary
Documents created or edited in Collabora Online are sometimes rejected by Microsoft Exchange servers when sent as email attachments. The issue appears to be caused by empty XML elements and other minor structural differences in the OOXML output that trigger Exchange’s attachment scanning/filtering.
Environment
-
Collabora Online (CODE) via Nextcloud integration
-
Recipients using Microsoft Exchange / Office 365
Symptoms
-
Emails with .docx attachments are rejected or attachments are stripped
-
Same document content saved from Microsoft Word is accepted
-
No issues opening the documents in Word, LibreOffice, or Collabora itself
Root Cause Analysis
After unpacking and comparing affected .docx files with Word-generated equivalents, the issue appears to be empty run property elements scattered throughout the XML:
xml
<!-- Collabora generates these empty elements -->
<w:rPr></w:rPr>
<!-- Word either omits them entirely or includes content -->
These appear in:
-
word/document.xml -
word/styles.xml -
word/numbering.xml -
word/footer*.xml
While technically valid XML, Exchange’s scanner appears to flag these as malformed or suspicious.
Workaround
We’ve created a PHP script that post-processes .docx files by:
-
Opening the .docx as a ZIP archive
-
Removing empty
<w:rPr></w:rPr>elements from XML files -
Cleaning other minor issues (BOM markers, null bytes, orphaned bookmarks)
After processing, documents pass through Exchange without issues.
Suggested Fix
Could the OOXML writer in Collabora/LibreOffice be modified to either:
-
Omit empty
<w:rPr>elements entirely when serializing, or -
Add a “strict compatibility mode” export option that produces cleaner XML?
Happy to share our cleaning script or sample documents that reproduce the issue if that would help with debugging.
#!/usr/bin/env php
<?php
/**
* DOCX Scrubber - Fixes Collabora/LibreOffice documents for Exchange compatibility
*
* Problem: Documents created in Collabora Online are sometimes rejected by
* Microsoft Exchange servers due to empty XML elements and other minor
* structural differences in the OOXML output.
*
* Usage: php scrub_docx.php input.docx [output.docx]
*
* If no output file specified, creates input_scrubbed.docx
*
* License: MIT
*/
if (php_sapi_name() !== 'cli') {
die("This script must be run from the command line\n");
}
if ($argc < 2) {
echo "DOCX Scrubber - Fix Collabora documents for Exchange compatibility\n\n";
echo "Usage: php scrub_docx.php input.docx [output.docx]\n";
exit(1);
}
$inputFile = $argv[1];
$outputFile = $argv[2] ?? pathinfo($inputFile, PATHINFO_DIRNAME) . '/'
. pathinfo($inputFile, PATHINFO_FILENAME) . '_scrubbed.docx';
if (!file_exists($inputFile)) {
die("Error: File '$inputFile' not found\n");
}
if (!class_exists('ZipArchive')) {
die("Error: PHP ZipArchive extension required\n");
}
echo "Input: $inputFile\n";
echo "Output: $outputFile\n\n";
// Open source docx
$zip = new ZipArchive();
if ($zip->open($inputFile) !== true) {
die("Error: Could not open input file\n");
}
// Create output docx
copy($inputFile, $outputFile);
$zipOut = new ZipArchive();
if ($zipOut->open($outputFile, ZipArchive::CREATE) !== true) {
$zip->close();
die("Error: Could not create output file\n");
}
$changes = [];
// Process each file in the archive
for ($i = 0; $i < $zip->numFiles; $i++) {
$name = $zip->getNameIndex($i);
$content = $zip->getFromIndex($i);
// Only process XML files
if (str_ends_with($name, '.xml') || str_ends_with($name, '.rels')) {
$original = $content;
// 1. Remove empty run properties (main culprit for Exchange rejection)
$content = preg_replace('/<w:rPr>\s*<\/w:rPr>/', '', $content);
// 2. Remove empty paragraph properties
$content = preg_replace('/<w:pPr>\s*<\/w:pPr>/', '', $content);
// 3. Remove LibreOffice-specific namespaces and elements
$content = preg_replace('/\s+xmlns:lo="[^"]*"/', '', $content);
$content = preg_replace('/<lo:[^>]*\/>/', '', $content);
$content = preg_replace('/<lo:[^>]*>.*?<\/lo:[^>]*>/s', '', $content);
// 4. Remove UTF-8 BOM if present
$content = preg_replace('/^\xEF\xBB\xBF/', '', $content);
// 5. Remove null bytes
$content = str_replace("\x00", '', $content);
// 6. Fix malformed XML declaration (leading whitespace)
$content = preg_replace('/^[\s]+(<\?xml)/', '$1', $content);
// 7. Normalize line endings
$content = str_replace(["\r\n", "\r"], "\n", $content);
// 8. Remove orphaned bookmark ends
if (str_contains($name, 'document.xml')) {
preg_match_all('/<w:bookmarkStart[^>]*w:id="([^"]*)"[^>]*\/>/', $content, $starts);
preg_match_all('/<w:bookmarkEnd[^>]*w:id="([^"]*)"[^>]*\/>/', $content, $ends);
$startIds = $starts[1] ?? [];
foreach (($ends[1] ?? []) as $id) {
if (!in_array($id, $startIds)) {
$content = preg_replace('/<w:bookmarkEnd[^>]*w:id="' . preg_quote($id, '/') . '"[^>]*\/>/', '', $content);
}
}
}
if ($content !== $original) {
$changes[] = $name;
$zipOut->addFromString($name, $content);
}
}
}
$zip->close();
$zipOut->close();
if (empty($changes)) {
echo "No changes needed - file appears clean.\n";
} else {
echo "Cleaned " . count($changes) . " file(s):\n";
foreach ($changes as $f) {
echo " - $f\n";
}
}
echo "\nOutput: $outputFile\n";
