In this article I'll explore something I thought for a long time could not be done correctly. I'll update a MS word document in Linux using a shell script without the use of open office or any other similar software. A MS word document is essentially composed of a bunch of compressed XML file and some other data. We can uncompress the doc file in Linux via zip to obtain the XML file and also extract the text from that XML file using sed. But inserting text into a word document is a whole different story altogether.
The scenario I'm about to demonstrate is that I have a word file with a table in it and update the rows in the table.
Given below is the screenshot of the document content.
Now I wish to update each cell in the entire row and given below is the script to do it.
#!/bin/bash
echo "enter absolute path of MS WORD .docx file"
read DOC_LOCATION
DOC_FILE=$(echo $DOC_LOCATION | awk -F/ '{print $NF}')
cp ${DOC_LOCATION} /tmp/XML
cd /tmp/XML
unzip ${DOC_FILE} >> /dev/null
cd /tmp/XML/word
cp document.xml /tmp/append
cd /tmp/append
echo "Enter user name that was created:"
read USER
echo "Enter ticket number:"
read TNO
echo "Enter Approver name:"
read APP
echo "Enter your full name"
read ADMIN
sed -e "s#USERNAME#${USER}#" -e "s#TICKET#${TNO}#" -e "s#APPROVER#${APP}#" -e "s#UNIX#${ADMIN}#" text2insert > text2add
sed -i "s#</w:tr></w:tbl>#</w:tr>$(cat text2add)</w:tbl>#" document.xml
mv -f document.xml /tmp/XML/word
cd /tmp/XML/
rm -f ${DOC_FILE}
zip -r ${DOC_FILE} * >> /dev/null
echo "updated document is available at location /tmp/XML/"
The scenario I'm about to demonstrate is that I have a word file with a table in it and update the rows in the table.
Given below is the screenshot of the document content.
Now I wish to update each cell in the entire row and given below is the script to do it.
#!/bin/bash
echo "enter absolute path of MS WORD .docx file"
read DOC_LOCATION
DOC_FILE=$(echo $DOC_LOCATION | awk -F/ '{print $NF}')
cp ${DOC_LOCATION} /tmp/XML
cd /tmp/XML
unzip ${DOC_FILE} >> /dev/null
cd /tmp/XML/word
cp document.xml /tmp/append
cd /tmp/append
echo "Enter user name that was created:"
read USER
echo "Enter ticket number:"
read TNO
echo "Enter Approver name:"
read APP
echo "Enter your full name"
read ADMIN
sed -e "s#USERNAME#${USER}#" -e "s#TICKET#${TNO}#" -e "s#APPROVER#${APP}#" -e "s#UNIX#${ADMIN}#" text2insert > text2add
sed -i "s#</w:tr></w:tbl>#</w:tr>$(cat text2add)</w:tbl>#" document.xml
mv -f document.xml /tmp/XML/word
cd /tmp/XML/
rm -f ${DOC_FILE}
zip -r ${DOC_FILE} * >> /dev/null
echo "updated document is available at location /tmp/XML/"
The logic used within the script uses a file named text2insert with XML tags corresponding to a row with arbitrary/default text values inserted in the tags.
When the script is run the user is prompted to enter some input values. Based on the received inputs the text file text2insert is modified and the updated file text2add is created.
The content of text2add file is inserted into the word document XML file in between the closure the last row tag and the closure of the table tag thereby inserting an entire row of text.
Here is the text within text2insert file.
[root@cent7 append]# cat text2insert
<w:tr><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>USERNAME</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>TICKET</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>APPROVER</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>UNIX</w:t></w:r></w:p></w:tc></w:tr>
This is a screenshot of a run of the script:
After the script completes execution the content of text2add is as follows:
root@cent7 append]# cat text2add
<w:tr><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>forth</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>Q779911</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>The TSM</w:t></w:r></w:p></w:tc><w:tc><w:tcPr><w:tcW w:type="dxa"/></w:tcPr><w:p ><w:r><w:t>Sahil Suri</w:t></w:r></w:p></w:tc></w:tr>
The script will not create a new document after adding the new row. Instead I performed an in place edit with sed to update the original document.
After running the script, the updated document looks like this:
I hope that this has been an intuitive read and reinforces the philosophy that when there is a shell there is a way.
No comments:
Post a Comment