Saturday 17 June 2017

Bash script to update rows in a table in a MS word file

In this article I'll explore something I thought for a long time could not be done correctly. I'll update a MS word document in Linux using a shell script without  the use of open office or any other similar software. A MS word document is essentially composed of a bunch of compressed XML file and some other data. We can uncompress the doc file in Linux via zip to obtain the XML file and also extract the text from that XML file using sed. But inserting text into a word document is a whole different story altogether.

The scenario I'm about to demonstrate is that I have a word file with a table in it and update the rows in the table.

Given below is the screenshot of the document content.


Now I wish to update each cell in the entire row and given below is the script to do it.

#!/bin/bash

echo "enter absolute path of MS WORD .docx file"

read DOC_LOCATION

DOC_FILE=$(echo $DOC_LOCATION |  awk -F/ '{print $NF}')

cp ${DOC_LOCATION} /tmp/XML

cd /tmp/XML

unzip ${DOC_FILE} >> /dev/null

cd /tmp/XML/word

cp document.xml /tmp/append

cd /tmp/append

echo "Enter user name that was created:"
read USER

echo "Enter ticket number:"
read TNO

echo "Enter Approver name:"
read APP

echo "Enter your full name"
read ADMIN

sed -e "s#USERNAME#${USER}#" -e "s#TICKET#${TNO}#" -e "s#APPROVER#${APP}#" -e "s#UNIX#${ADMIN}#" text2insert > text2add

sed -i "s#</w:tr></w:tbl>#</w:tr>$(cat text2add)</w:tbl>#" document.xml

mv -f document.xml /tmp/XML/word

cd /tmp/XML/

rm -f ${DOC_FILE}
zip -r ${DOC_FILE} * >> /dev/null

echo "updated document is available at location /tmp/XML/"


The logic used within the script uses a file named text2insert with XML tags corresponding to a row with arbitrary/default text values inserted in the tags. 
When the script is run the user is prompted to enter some input values. Based on the received inputs the text file text2insert is modified and the updated file text2add is created.

The content of text2add file is inserted into the word document XML file in between the closure the last row tag and the closure of the table tag thereby inserting an entire row of text.

Here is the text within text2insert file.

[root@cent7 append]# cat text2insert
<w:tr><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>USERNAME</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>TICKET</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>APPROVER</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>UNIX</w:t></w:r></w:p></w:tc></w:tr>


This is a screenshot of a run of the script:


After the script completes execution the content of text2add is as follows:

root@cent7 append]# cat text2add
<w:tr><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>forth</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>Q779911</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>The TSM</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>Sahil Suri</w:t></w:r></w:p></w:tc></w:tr>


The script will not create a new document after adding the new row. Instead I performed an in place edit with sed to update the original document.

After running the script, the updated document looks like this:




I hope that this has been an intuitive read and reinforces the philosophy that when there is a shell there is a way.

No comments:

Post a Comment

Using capture groups in grep in Linux

Introduction Let me start by saying that this article isn't about capture groups in grep per se. What we are going to do here with gr...