Home .NET Generating OfficeOpenXML documents in 5 minutes

Generating OfficeOpenXML documents in 5 minutes

by admin

Often you need to generate an OpenXML report on the server from an ASP.NET application.
There are several familiar ways to do this :

  1. "Find it, slink it, use it" – go to Google, look for a library for generating docx or xlsx, plug it in, figure it out, generate it. It’s familiar, but long.
  2. "Ugh, " to use COM. This is not recommended, requires Microsoft Office installed on the server, not very thread-safe, not x64 friendly and generally old-fashioned.
  3. "AD – figure out the format, assemble from XML and zap. Brutal.
  4. "Microsoft way"-this way is explained under the cat.
  5. A small introduction

    OfficeOpenXML is what you save your documents in by default when you work in Word and Excel: docx and xlsx. The file is a zip archive. You can rename it to zip, open it with an archiver, and examine what’s inside :
    Generating OfficeOpenXML documents in 5 minutes
    Reports in OOXML are readable and editable with the usual tools. I wouldn’t recommend limiting serious applications to this particular format, but I suggest supporting it.

    Preparing

    We will need :

    Download OpenXMLSDKTool from Microsoft site and install it :
    Generating OfficeOpenXML documents in 5 minutes

    Here we go

    Launch Open XML SDK 2.0 Productivity Tool:
    Generating OfficeOpenXML documents in 5 minutes
    This toolkit is very simple and can do two small but important operations :

    • Generate code by document
    • Compare documents on XML level

    But first things first.

    Code generation

    Load our document into the program and click "Reflect Code":
    Generating OfficeOpenXML documents in 5 minutes
    On the left we see the structure of the document – the same files that are present in the archive, and a representation of their contents.
    Nodes in the tree can be selected: on the right you see the content of the node as XML and the code which can generate that particular piece. In my example, you can see one paragraph from the body of the document. It just lives in word/document.xml.
    If you select the root of the tree (the document itself), you get the code for the whole document.

    Now let’s use this code
    1. Let’s make a project in Visual Studio. Let it be a simple console C# application
    2. We add a reference to the DocumentFormat.OpenXml assembly:
      Generating OfficeOpenXML documents in 5 minutes
      I have it in the GAC. If you don’t want to put it there, you can add a link to the file itself. You can download it separately. Ibid where was OpenXMLSDKTool, but at the link OpenXMLSDKv2.msi
    3. Add reference to WindowsBase
    4. Adding the file "GeneratedClass.cs"
    5. Copy the code from the toolzine, from the ReflectedCode box
    6. Close the file, save it, and go to Program.cs
    7. We write Main method:

      new GeneratedCode.GeneratedClass().CreatePackage( @"D:TempOutput.docx" );

    8. Launch

    That’s all. The code to generate the document is ready. The document will look exactly the same as it did before you saved it to Word. Quick, isn’t it?

    What’s inside?

    What’s inside the generated class?
    First, there’s one single open method :

    public void CreatePackage( string filePath) {
    using (WordprocessingDocument package = WordprocessingDocument.Create(filePath, WordprocessingDocumentType.Document)) {
    CreateParts(package);
    }
    }

    This is where the text that will be in the document is inserted :

    private void GenerateMainDocumentPart1Content(MainDocumentPart mainDocumentPart1) {
    Run run2 = new Run() {RsidRunProperties = "00184031" };
    Text text2 = new Text();
    text2.Text = "Calculus of predicates, by definition, philosophically derives structuralism, changing the familiar reality.quot ; // o.O what kind of weed was Yandex smoking?
    }

    As you can see from the private-method names in the code, an OpenXml-document consists of parts. To generate each part a separate method is made.
    The most inquisitive, of course, smiled evilly and inserted a picture into the document.
    Pictures are stored directly in this file, as base64, here :

    #region Binary Data
    //...
    #endregion

    Tie a bow

    Refactoring images and replacing static content with dynamic content will be left to the reader as an exercise.
    And here’s a method that generates an array of bytes, not a file, to give to the client from asp.net without temporary files :

    public byte [] CreatePackageAsBytes() {
    using ( var mstm = new MemoryStream()) {
    using (WordprocessingDocument package = WordprocessingDocument.Create(mstm, WordprocessingDocumentType.Document)) {
    CreateParts(package);
    }
    mstm.Flush();
    mstm.Close();
    return mstm.ToArray();
    }
    }

    That’s it, the code to generate the report in docx format is ready.
    Now we just need to replace the content with dynamic content. We did not do all this in order to give the same thing all the time, right? And add a "Download in Word format" link to the page.

    Document comparison

    So we’ve generated code from the document. We added a lot of data to it, refactored it, implemented it in production. And now we need to change the font and text in the report. So how do you do this? There is a lot of code and it is a long time to search through it.
    It turns out that everything is very simple, we can use a document comparison feature :

    1. Put the old and new documents side by side
    2. Open Open XML Productivity Tool, choose "Compare files…":
      Generating OfficeOpenXML documents in 5 minutes
    3. Open the files and click OK. Here is the result of the comparison :
      Generating OfficeOpenXML documents in 5 minutes
      You can click on the lines with the file names and see what the differences are :
      Generating OfficeOpenXML documents in 5 minutes
      In MoreOprions you choose what to ignore when comparing.
      View Part Code shows the code of the part of the XML you see.
      It’s easy to match the XML with the code.

    By the way, this feature is also very handy if you’re just getting acquainted with OpenXML format: add something to your document and see what’s changed. Useful for those who chose the "B" method mentioned at the beginning of the article.

    Evidence

    • It works with Xlsx. Just like with docx
    • If there is a graph or chart inside docx, it will be fine
    • This is just a wrapper over the System.IO.Packaging library
    • You don’t need anything on the server except this library
    • No problems with x64
    • Performance is top notch

    Conclusions

    I think that using DocumentFormat.OpenXml for report generation in web applications is a good choice. This SDK tool will help you not to waste your time.

    What to read

    About the OpenXML SDK: msdn.microsoft.com/en-us/library/bb448854(office.14).aspx
    About OpenXML (if anyone is not familiar with it): en.wikipedia.org/wiki/Office_Open_XML
    Good luck! Thanks for attention.

You may also like