Biswajit Banerjee

Input files: Reading XML

Read XML in C++

Mechanics research codes are typically written by graduate students who aim to get their work done as quickly as possible. These codes are not meant to last beyond the publication of a few related papers. As a result, typical input files have the form

   0.000000e+00   -1.000000e+00   0.000000e+00   100   1   50
   6
   1 0
   1 -1 0 0 0 0 25
   1 0
   2 1 0 0 100 0 25
   1 0
   3 0 -1 0 50 -1 25
   1 0
   4 0 1 0 50 1 25
   1 0
   5 0 0 -1 50 0 0
   1 0
   6 0 0 1 50 0 50

These files have the advantage that they can be read in quickly using an input file stream and the code for doing that can be written in minutes.

The problem

But sometimes these code last beyond the tenure of the student, and some other person (or the student her/himself) has to try to understand the format of the input file. That process can take hours and sometimes days, especially if the format has not been documented and the next person has to dig into the code to understand what each of the numbers means.

The XML Approach

One potential approach is to use XML to mark-up the input file which can be transformed into:

  <?xml version='1.0' encoding='ISO-8859-1' ?>
  <Boundary>
    <!-- Container limits -->
    <containerMin>  [0.0,  -1.0,  0.0] </containerMin>
    <containerMax>  [100.0, 1.0, 50.0] </containerMax>
    <!-- Internal boundaries -->
    <boundary type="plane" id="1">
      <direction> [-1.0, 0.0, 0.0] </direction>
      <position> [100.0, 0.0, 25.0] </position>
    </boundary>
    <boundary type="plane" id="2">
      <direction> [1.0, 0.0, 0.0] </direction>
      <position> [100.0, 0.0, 25.0] </position>
    </boundary>
    <boundary type="plane" id="3">
      <direction> [0.0, -1.0, 0.0] </direction>
      <position> [50.0, -1.0, 25.0] </position>
    </boundary>
    <boundary type="plane" id="4">
      <direction> [0.0, 1.0, 0.0] </direction>
      <position> [50.0, 1.0, 25.0] </position>
    </boundary>
    <boundary type="plane" id="5">
      <direction> [0.0, 0.0, -1.0] </direction>
      <position> [50.0, 0.0, 0.0] </position>
    </boundary>
    <boundary type="plane" id="6">
      <direction> [0.0, 0.0, 1.0] </direction>
      <position> [50.0, 0.0, 50.0] </position>
    </boundary>
  </Boundary>

Though this file is more verbose that the original, the user does not have to read the source code to understand the format of the file and what the various numbers mean. The challenge for mechanics researchers is parsing and reading that sort of format. Below I’ll show you how that’s done quickly and easily with modern C++.

Reading XML

For production codes such as Vaango we use full featured libraries such as Xerces or LibXml2. However, for research codes, there are a number of simpler alternatives. I prefer the header-only style provided by ZenXml.

The reader code is quite straightforward. Let’s encapsulate it in the class BoundaryReader.

First, the header BoundaryReader.h

#ifndef BOUNDARY_READER_H
#define BOUNDARY_READER_H

class BoundaryReader
{
public:
  BoundaryReader() = default;
  ~BoundaryReader() = default;
  bool readXML(const std::string& inputFileName) const; 
private:
  BoundaryReader(BoundaryReader const&) = delete; // don't implement
  void operator=(BoundaryReader const&) = delete; // don't implement
};

#endif

and then the implementation BoundaryReader.cc

#include <BoundaryReader.h>
#include <zenxml/xml.h>
bool
BoundaryReader::readXML(const std::string& inputFileName) const 
{
  // Read the input file
  zen::XmlDoc doc;
  try {
    std::cout << "Input file name= " << inputFileName << "\n";
    doc = zen::load(inputFileName);
  } catch (const zen::XmlFileError& err) {
    std::cout << "*ERROR** Could not read input file " << inputFileName << "\n";
    std::cout << "    Error # = " << err.lastError << "\n";
    return false;
  } catch (const zen::XmlParsingError& err) {
    std::cout << "*ERROR** Could not read input file " << inputFileName << "\n";
    std::cout << "    Parse Error in line: " << err.row + 1
              << " col: " << err.col << "\n";
    return false;
  }

  // Load the document into input proxy for easier element access
  zen::XmlIn ps(doc);

  // Read the boundary information
  auto boundary_ps = ps["Boundary"];
  if (!boundary_ps) {
    std::cout << "**ERROR** <Boundary> tag not found. \n";
    return false;
  }

  // Read the container dimensions
  std::string vecStr;
  if (!boundary_ps["containerMin"](vecStr)) {
    std::cout
      << "**ERROR** Container min. position not found in boundary geometry\n";
    std::cout << "  Add the <containerMin> [x, y, z] </containerMin> tag.";
    return false;
  }
  Vec boxMin = Vec::fromString(vecStr); // Convert the string [x, y, z]

  if (!boundary_ps["containerMax"](vecStr)) {
    ....
    return false;
  }
  Vec boxMax = Vec::fromString(vecStr); // Convert the string [x, y, z]

  // Read each boundary
  for (auto bound_ps = boundary_ps["boundary"]; bound_ps; bound_ps.next()) {
    std::string boundaryType;
    bound_ps.attribute("type", boundaryType);
    BoundaryId id;
    bound_ps.attribute("id", id);
    if (!bound_ps["direction"](vecStr)) {
      ....
      return false;
    }
    Vec direction = Vec::fromString(vecStr); // Convert the string into a vector
    if (!bound_ps["position"](vecStr)) {
      ....
      return false;
    }
    Vec position = Vec::fromString(vecStr); // Convert the string into a vector
  }

  return true;
}

Of course, if you want to use the data that you have read, you will have to save them. That’s left as an exercise for the reader. As you can see, reading a XML file can be quite straightforward and you have checks on the validity of your file along each step of the reading process.

Conclusion

I recommend shifting the XML or some other similar structured format for the input files in your research codes. However, you may think that the XML format is too verbose. In that case I’d suggest using JSON and in a future post I’ll show you how to read JSON in C++.

If you have questions/comments/corrections, please contact banerjee at parresianz dot com dot zen (without the dot zen).

 