Want to contribute? Fork us in GitHub!

XMIR, a Quick Tour

XMIR is a dialect of XML, which we use to represent a parsed EO program. It is a pretty simple format, which has a few important tricks, which I share below in this blog post. You may also want to check our schema: XMIR.xsd.

Consider this simple EO program that prints "Hello, world!":

[] > app
  [x] > foo
    QQ.io.stdout > @
      QQ.txt.sprintf
        "Hello, %s\n"
        x
  foo > @
    "world!"

If you parse it using Syntax class from eo-parser (without any optimizations), you will get this XMIR:

<program ms="401" name="test-1"
  time="2022-11-25T11:50:45.419417Z" version="1.0-SNAPSHOT">
  <listing>[] &gt; app
  [x] &gt; foo
    QQ.io.stdout &gt; @
      QQ.txt.sprintf
        "Hello, %s\n"
        x
  foo &gt; @
    "world!"</listing>
  <errors/>
  <sheets/>
  <objects>
    <o abstract="" line="1" name="app" pos="0">
      <o abstract="" line="2" name="foo" pos="2">
        <o line="2" name="x" pos="3"/>
        <o base="QQ" line="3" pos="4"/>
        <o base=".io" line="3" method="" pos="6"/>
        <o base=".stdout" line="3" method="" name="@" pos="9">
          <o base="QQ" line="4" pos="6"/>
          <o base=".txt" line="4" method="" pos="8"/>
          <o base=".sprintf" line="4" method="" pos="12">
            <o base="string" data="string" line="5" pos="8">Hello, %s\n</o>
            <o base="x" line="6" pos="8"/>
          </o>
        </o>
      </o>
      <o base="foo" line="7" name="@" pos="2">
        <o base="string" data="string" line="8" pos="4">world!</o>
      </o>
    </o>
  </objects>
</program>

The element <program> is the root element and it will always be there. It has a few mandatory attributes:

The <listing> element contains the source code of the EO program, which was parsed, without any modifiations, “as is.”

Errors and Warnings

The <errors> element is always there, but it may either be empty, which means that the parser didn’t find any errors, or may have a few <error> elements. Usually, such elements are added there by optimizers and other software that does post-processing of the XMIR file. For example, it may look like this:

<program>
  [..]
  <errors>
    <error severity="warning" line="3">There is an extra bracket</error>
    <error severity="error" line="12">The object 'x' is not found</error>
  </errors>
</program>

The errors with the warning severity may more or less safely be ignored. The errors with the error severity will lead to failures in further compilation and processing. There could also be elements with the critical severity, which must stop the processing of the document immediately.

Sheets

The <sheets> element will rarely be empty. It contains a list of all post-processors that were applied to the document after is parsing. We process our XMIR documents using dozens of XSL stylesheets. That’s why the name of the XML element. You may find something like this over there:

<program>
  [..]
  <sheets>
    <sheet>not-empty-atoms</sheet>
    <sheet>middle-varargs</sheet>
    <sheet>duplicate-names</sheet>
    <sheet>many-free-attributes</sheet>
    [...]
  </sheets>
</program>

The names you see in the <sheet> elements are the names of the files. For example, not-empty-atoms represents the not-empty-atoms.xsl file in the objectionary/eo GitHub repository.

Metas

There may be an optional element <metas> with a list of <meta> elements. For example, if my source code would have this meta at the 3rd line of the source file:

+alias foo com.example.foo

I would see the following in my XMIR:

<program>
  [..]
   <metas>
    <meta line="3">
      <head>alias</head>
      <tail>foo com.example.foo</tail>
      <part>foo</part>
      <part>com.example.foo</part>
    </meta>
    [..]
  </metas>
</program>

Each <meta> element contains parts of the meta. The <head> contains everything that goes after the + until the first space. The <tail> contains everything after the first space. There could be a number of <part> elements, each of which containing the parts of the <tail> separated by spaces.

Objects

The <objects/> element contains object, as they are found in the source code, where each objects is represented by the <o> element. Each <o> element may have a few optional attributes:

There could be other attributes too, but they are even more optional and supplementary.

Data Objects

Data literals found in the source code are presented with <o> XML elements that contain the data, for example:

<o base="string" data="string" line="8" pos="4">world!</o>

The value of the data attribute is the “type” of the data found in the sources. It may be one of the following five: string, float, int, bool, and bytes. For example:

<o data="string">Hello, 大家!</o>
<o data="float">3.141593</o>
<o data="int">65535</o>
<o data="bool">TRUE</o>
<o data="bytes">world!</o>

Methods

Our parser is top-down. Technically, this means that it reads the source file line by line, starting from the top and going to the bottom. When it meets this source code:

a.times > x
  b

The following XMIR is generated (I removed all the meta information and show you only the block with <objects>):

<o base="a"/>
<o base=".times" method="" name="x">
  <o base="b"/>
</o>

Then, we apply wrap-method-calls.xsl tranformation and the XMIR is tranformed to:

<o base=".times" name="x">
  <o base="a"/>
  <o base="b"/>
</o>

Now, the dot in front of the base attribute tells us that the first <o> kid XML element is the object to which the dot notation refers. Also, pay attention that the name attribute was moved to the right place.

References

If you apply add-refs.xsl optimization XSL stylesheet to the document above, an additional attribute ref will be added to some <o> elements. This attribute is added when it’s obvious from the available information what object is referred by the attribute base. For example, if this is the code in EO:

[] > foo
  42 > a
  a.plus 1 > @

Then, the following XMIR will be created by the parser:

<o abstract="" name="foo" line="1">
  <o name="a" data="int" line="2">42</o>
  <o name="@" base=".plus" line="3">
    <o base="a" line="3">1</o>
    <o data="int" line="4">1</o>
  </o>
</o>

After the application of add-refs.xsl, the XMIR will look like this (pay attention the attribute ref added to <o base="a">):

<o abstract="" name="foo" line="1">
  <o name="a" data="int" line="2">42</o>
  <o name="@" base=".plus" line="3">
    <o base="a" line="3" ref="2">1</o> <!-- here! -->
    <o data="int" line="4">1</o>
  </o>
</o>

Using the information from ref attributes you can distinguish base references to objects that are not visible in the current document (they don’t have ref attribute) from objects that are present in the current document (they have ref attribute with the value equal to the number of the line where the object is found).

Locators

If you apply set-locators.xsl optimization XSL stylesheet to the following XMIR document:

<o base=".times" name="x">
  <o base="a"/>
  <o base="b"/>
</o>

You will get additional attribute loc added to each <o> element:

```xml
<o base=".times" name="x" loc="Φ.x">
  <o base="a" loc="Φ.x.ρ"/>
  <o base="b" loc="Φ.x.α0"/>
</o>

Locators are absolute and unique coordinates of any object in the entire object “Universe.”


This description of XMIR is not complete. If you want me to explain something else in more details, please post a message below this blog post and I will add the content.