Histograms and Graphs with XForms

Steven Pemberton, CWI Amsterdam

Version: 2018-04-11

Introduction

I have some data:

<y>9</y><y>15</y><y>11</y><y>6</y><y>5</y><y>10</y><y>8</y>
  <y>8</y><y>3</y><y>12</y><y>14</y><y>9</y><y>16</y><y>14</y>

and I want to see that data as a histogram. Something like this:

Source

Yes, that is in XForms. Try playing with it. Edit values. Delete values. Add new values. Go wild.

Now to show you how to do it.

SVG

The histogram is displayed using SVG, driven from the data in the XForms. It is a series of rectangles, one per data point, of equal width, and of a height that depends on the value of the datapoint.

We load the data:

<instance id="data" src="data.xml"/>

and bind how we get to the data values that we want:

<bind id="values" ref="instance('data')/y"/>

We are going to create values to drive the histogram, so we will collect them in a new instance:

<instance id="hist">
   <data xmlns="">...</data>
</instance>

The first value we need is how many data values there are:

<bind ref="n" calculate="count(bind('values'))"/>

(This is XForms 2.0; with earlier versions, you have to expand the bind:)

<bind ref="n" calculate="count(instance('data')/y)"/>

We are going to allocate space for the histogram, that is 100 × 100. "100 what?" you ask? It doesn't matter, SVG lets you scale (it's what the S stands for). Think of the 100 as 100%. Since there are n values, we will split the horizontal space into n vertical bars, and we will share the horizontal space amongst them by giving them a width of 100 ÷ n units each:

<bind ref="width" type="double" calculate="100 div ../n"/>

The vertical space is distributed over the values. If they are all positive, then the space will be distributed over 0 to max. If they are all negative, the space will be distributed over 0 to min. And if there are negative and positive values, the space will be distributed from min to max:

<bind ref="min" calculate="min(bind('values'))"/>
<bind ref="max" calculate="max(bind('values'))"/>

That's easy. Now the range.

<bind ref="rmin" calculate="if(../min &lt; 0, ../min, 0)"/>
<bind ref="rmax" calculate="if(../max &gt; 0, ../max, 0)"/>
<bind ref="range" calculate="../rmax - ../rmin"/>

This is the range that the vertical height is distributed over:

<bind ref="vscale" type="double" calculate="100 div ../range"/>

So if max is 3 and min is -2, range is 5; if max is 3 and min is 1, range is 3; if max is -1 and min is -4, range is 4.

Wait! Why I am telling you that? Try it for yourself:

Source

"But what if there are no values? What if all the values are zero?" I hear you cry. Good. Glad you're paying attention. If there is no data then indeed several of these values get odd values (go on, try it, delete all the values). "NaN" is "Not a Number", and "Infinity" is, well, infinity. But no fear, if there are no values, the histogram will be empty, and we won't get called upon to use any of the values.

On the other hand, if all the data points are zero, (go on, try it) while most of the calculated values will also be zero, one, the vertical scale, will be infinite, and we will end up trying to draw boxes of height zero, scaled by infinity, which will be NaN high. So we do have to catch that case:

<bind ref="vscale" type="double" calculate="if(../range=0, 1, 100 div ../range)"/>

(It doesn't matter what value we use, since zero times anything is still zero.)

There's one other case it would be good to catch: if any of the data values is empty, or not a number for any other reason, then max and min return NaN (try it on the values above). We fix that by changing the calculation for them to only select those values that are numbers:

<bind ref="min" calculate="min(bind('values')[number(.)!='NaN'])"/>
<bind ref="max" calculate="max(bind('values')[number(.)!='NaN'])"/>

OK. Now we have enough to be able to draw the histogram.

The Histogram

As was said, we're going to draw a series of rectangles, one for each data value, each of the same width, and of a height depending on the value itself. Roughly speaking like this:

<svg ...some svg attributes here...>
   <xf:repeat bind="values">
      <rect width="{instance('hist')/width}"
            height="{...}"
            x="{...}"
            y="{...}"
            />
   </xf:repeat>
</svg>

We would like to say:

height="{instance('hist')/vscale * .}"

but alas, for some unfathomable reason, SVG does not permit negative heights, so we have to do a bit more work:

height="{instance('hist')/vscale * if(. &lt; 0, 0 - ., .)}"

Before telling you how to calculate x and y, you have to know something about how SVG works:

  1. The point (0,0) is by default at the top left corner.
  2. The positive horizontal direction is to the right, negative to the left.
  3. The positive vertical direction is downwards, and negative upwards.

So, x is easy to calculate. The first rectangle will be at position 0; the next at 1×width; the next at 2×width, and so on.

Each item in a repeat has a position from 1 to n, so we have to subtract 1 to get 0 to n-1:

x="{(position()-1)*instance('hist')/width}"

So now we have a row of rectangles, of the correct height, of the correct width, at the correct horizontal position:

Source

Now to fix the vertical position.

The base of the tallest (positive) bar is the zero line. All positive values should line up with their bases there; therefore we need to add a little to their x value to push them down. That amount is rmax less the value of the datapoint. Negative values need to start at that baseline, so their y value is just rmax. (I should point out that if SVG had allowed negative heights, both calculations would have been the same; that's the power of generalisation...)

y="{instance('hist')/vscale * 
         if(. &lt; 0, instance('hist')/rmax, 
                      instance('hist')/rmax - .) }"

Giving the final result:

Source

There you have it, a histogram with XForms. About 25 lines of XForms, depending on exactly what you count.

Turning it into a Graph

Actually, now that we have all those values at our disposal, it doesn't take much to turn the histogram into a graph.

This time, instead of drawing a box for each value, we will draw a line from one value, to the next.

So we will repeat over all values except the last (which has no next):

<xf:repeat ref="bind('values')[position() != last()]">

and within the repeat draw a line from that value to the next:

<line x1="{(position() - 1) * instance('hist')/width}"
      x2="{(position()    ) * instance('hist')/width}" 
      y1="{instance('hist')/vscale * (instance('hist')/rmax - .)}"
      y2="{instance('hist')/vscale * (instance('hist')/rmax - following-sibling::*)}"
      class="line"/>

and draw (outside of the repeat of course) two axes:

<line x1="0"   y1="{instance('hist')/rmax * instance('hist')/vscale}"
      x2="100" y2="{instance('hist')/rmax * instance('hist')/vscale}" 
      class="axis"/>
<line x1="0" y1="0" 
      x2="0" y2="100" class="axis"/>

and it looks like this (try making one of the values negative):

Source

If you look very closely (or delete all but two or three of the numbers), you will see that there is some extra space at the right-hand side. That is because we are now graphing over one less value, so the horizontal space is divided over too many values. That can be fixed by changing the calculation for width to:

<bind ref="width"  calculate="100 div (../n - 1)"/>

If we graph data where the values are further from zero, with small differences between the values, we get something like this:

Source

In such cases, we may prefer to graph over the range of actual values, if the difference between values is what interests us.

It's a simple change: we just change the definition of rmin and rmax, which define the range of values we are drawing over:

<bind ref="rmin"   calculate="../min"/>
<bind ref="rmax"   calculate="../max"/>

which gives us, for the same data:

Source

which reveals much more clearly that the values are descending.

Of course, it doesn't have to be either/or; it can be made a choice in the code whether to include zero in the display. Add a new value to the display data:

<include0>true</include0>
  ...
<bind ref="include0" type="boolean"/>

Change how rmin and rmax are calculated:

<bind ref="rmin"     calculate="if(../min < 0 or ../include0=false(), ../min, 0)"/>
<bind ref="rmax"     calculate="if(../max > 0 or ../include0=false(), ../max, 0)"/>

And add an input to change whether you want zero to be included or not:

<input ref="instance('hist')/include0"><label>zero</label></input>

Giving:

Source