Surfaces Represented by Irregular Data.

Copyright (c) Susan Laflin. August 1999.

This data has been recorded as (x,y,z) triads dotted about over the surface in an irregular or random manner.

If the data points are much closer together in one part of the area than in another, it may be very difficult to reconstruct the surface in the sparser areas to a satisfactary degree of accuracy. Sometimes it is known that few points have been recorded because the surface is `flat and uninteresting' in this area, and in this case the interpolation formulae can be based on the assumption that the surface is flat in this area. It is very dangerous to make this assumption when it has not been stated by the people making the measurements - it might have been an extremely undulating field with a great many interesting features and the lack of data was because an unfriendly bull objected to further measurements !

There are two main groups of methods for dealing with irregular data. The first group uses some method of surface fitting to estimate the values at the grid points of a rectangular mesh superimposed over the surface, and then uses these values to reconstruct the surface in the same way as regular data. Ideally, the calculated values at the actual data points should be compared with the original data to give an estimate of the accuracy of the surface, but many of the commercial packages provided for this problem appear to omit this test. One such package available at many Universities is the Ginosurf package and its documentation does not mention any provision for such a check.

Example of Irregular Data

The other group of methods joins the data points in a triangular mesh and uses suitable interpolation formulae which are valid over a triangle. These are closely based on the original data points, and so you are certain that the resulting surface passes through the actual data points. These methods should give a better approximation to the surface.

Appearance of the Output

If the irregular data have been fitted to a rectangular mesh, then the output has the same appearance as for regular data. Otherwise the data is joined up in a triangular mesh and some forms of output will show this. The mosaic method will result in polygonal patches, starting with the original data and reducing in size as the extra points are inserted. The contours will look the same, but will have been calculated from triangular surface patches. The three-dimensional histogram have will polygonal pillars and the projective drawing will show a triangular mesh over the surface instead of a rectangular one. The size of the triangles will vary according to the distance between data points, so this output may look untidy compared with the regular case, especially if the variation in size is very large. This form of data will require a general hidden-line removal algorithm without any of the short cuts which can be used for the regular data.

Fitting to a Mesh

For this approach, a rectangular mesh is superimposed over the area covered by the data and the data-values are used to estimate spot-heights at the mesh points. Once these spot-heights have been calculated, this array may be used as input for any of the methods in the previous chapter. There are several methods which adopt this approach and they differ in the way in which they carry out the fitting to the rectangular mesh. Two of them will be discussed here.

Fitting a mesh to Irregular Data

Paraboloid Surface

This is the method used in the Ginosurf package and is illustrated in the above figure. Each spot-height is fitted independently of the others and is calculated as the centre of a paraboloid surface. The data-points lying within a circle of radius s, centred on the mesh point, are used to fit the surface and the value of s is chosen to be the smallest value which will contain at least four data-points.

Since the data-points closest to the mesh-point should have the greatest effect on the shape of the surface and hence the height of the mesh point, a weight is attached to each data point, given by the equation:

w = (s-d)*(s-d)/d

where d is the distance from the mesh-point to that data-point. Then a weighted least-squares fit is carried out to find the best-fitting paraboloid surface. The equation of this surface is used to estimate the spot-height at the mesh-point.

The equation of a paraboloid surface of revolution is given by

d2 = 4a(h - h0)

where h is the height of the surface, h0 is the height of the centre of the surface, and d is the distance from the centre of the surface. i.e.

d2 = x2 + y2

The form of the surface is shown in the next figure and the centre of the surface corresponds to the required mesh point, so that the value of h0 gives the spot-height at this point.

Paraboloid Surface of Revolution

To carry out a least-squares fitting, it is necessary to minimise the sum of the squares of the errors.
If E is this sum then we have,

E = sum [ w*(z-h)2]

E = sum [ w*(z - d2/4a - h0)]

Different surfaces are given by different values of a and h0 and so to find the surface which minimises E it is necessary to set the partial derivatives of E with respect to a and h0 to zero.
This corresponds to the equations:

dE/da = 0

dE/da=sum [ 2w*(z-d2/4a - h0)] [d2/4a2]

and

dE/dh0 = 0

dE/dh0 = sum [ 2w*(z-d2/4a - h0)]

which have to be solved to give the value of h0 for that particular mesh-point.

If the paraboloid is not a surface of revolution, then presumably its intersection with a plane z = constant will be an ellipse and so

d2 = x2 + k*y2

This gives an additional variable k and corresponding condition dE/dk = 0 and the resulting three equations are sufficient to calculate the three unknowns. Once again, the only value of interest is h0. This method does not necessarily treat all the data-points in the same way, since some may be used to calculate several spot-heights and others used only once.

Spline Function

To design a method which treats all data-points as being of equal importance and validity, requires some method of fitting all the data to a complete surface covering the whole area. One such method was developed by P.Dierckx and described in the I.M.A. Journal of Numerical Analysis in July 1981. Those who wish for a full description are referred to that paper, but a brief outline is given here.

The surface chosen was the two-dimensional spline function, given by the equation,

s(x,y) = sum sum [ Mi(x)*Nj(y)]

where z approximates to s(x,y) for each data-point (x,y,z) . The splines Mi(x) and Nj(y) are zero outside the rectangle R(i,j) and so the knot-points of the spline are the mesh-points of the fitted grid. Once the best-fitting surface has been found for the given set of data, this equation may be used to generate contours or any other representation. Not only does this give values of the spot-heights at the mesh points, but it also provides the equations of the surface patches within the rectangles of the grid and these equations should be used for any interpolation which is required.

Equation of a Surface

Sometimes you may wish to use these methods to depict the shape of a surface given by a mathematical formula. In such a case, the Ginosurf package may not be very satisfactory because it uses the equation to calculate the spot-heights over the grid and then applies interpolation formulae to calculate intermediate values. It would be much better to use the equation of the surface to calculate these intermediate values, so that no information is lost. In some cases, Ginosurf requires a very large number of grid-points and a very fine mesh to give an acceptable result.

Triangulation Methods

The next figure shows a set of data points joined by red lines which form a triangular mesh over the whole area. Lines such as the blue line are not acceptable since they introduce extra intersection points for which a spot height value is not known.

Triangulated Data

While it would be possible to join all the points in pairs and then remove those lines which cause additional intersections (probably retaining the shorter of an intersecting pair of lines) this is a very inefficient and time-consuming method.

Once the data has been successfully triangulated, the results can be stored in arrays. The initial data is probably stored in an array of coordinates, giving the [x,y,z] values for each recorded data point. The triangles will need to be recorded as an array of integers pointing to the values in the array of coordinates which correspond to the three vertices of the triangle. If you wish to be able to track a contour through the array of data, you will need to know which are the neighbouring triangles. This can be achieved by three more integers which point to other entries in the array of triangles. Neighbour N1 lies opposite vertex V1 and these two triangles have side V2V3 in common (see the figure below), and this information may be used to move through the array of triangles. Alternatively the array can be scanned from top to bottom and this will ensure that all triangles are examined. Unlike the regular case, this does not necessarily pass from one triangle to an adjacent one.

Triangles and Neighbours

Delaunay Triangulation

If the area containing all the points for which Pi is the closest data-point are enclosed by a boundary, this will be a polygon surrounding the point Pi. If all the data points are treated in this way, the resulting diagram (shown in the next figure) is known as the `Voronoi Tessellation' or the `Dirichlet Tessellation'.

Voronoi Tessalation

If Pi and Pj are adjacent points in this tessellation, then the line joining Pi to Pj is the perpendicular bisector of the side common to the polygons for Pi and Pj, and the points which lie along these sides are equidistant from Pi and Pj. If you join each Pi to all neighbouring points whose polygons have a side in common with the polygon for Pi, then the result (illustrated in the next figure) is the `Delaunay Triangulation' which is an adequate solution to the problem.

Delaunay Triangulation.

Now it is appropriate to derive an algorithm for the Delaunay Triangulation. If there is at least one path from Pi to Pj such that either Pi or Pj is the nearest data point to every point along this path, then Pi and Pj should be joined by a straight line. When all these lines have been found, the triangulation will be complete.

Consider all the data points in pairs, so for each Pi, take all other points Pj and consider them in turn. For the current pair Pi and Pj, calculate m, the mid-point of the line joining Pi to Pj. Draw the circle, centre m, which passes through Pi and Pj, and check to see whether any other data point Pk lies inside or on the boundary of this circle. If it is empty of all other data points, then the line Pi Pj is part of the Delaunay Triangulation. If each half of the circle (semi-circles lying on either side of Pi Pj) contains at least one other data point, then the line Pi Pj is not part of the triangulation.

Circles for PiPj

If one semi-circle is empty and the other contains at least one data point, then further tests are needed since Pi Pj may still belong to the Delaunay Triangulation. (Remember that a point on the line Pi Pj counts as being in both semi-circles and a point on the circumference counts as being inside the appropriate semi-circle.)

Now suppose one semi-circle is empty and the other is not. For each Pk, construct a circle passing through the three points Pi, Pj and Pk. Calculate d, the distance between the centre of this circle and the point m. If there is more than one such Pk, consider the point and circle for which d is maximum. Now check whether any other data points lie anywhere in or on the circle through Pi, Pj and Pk. If there are no other data points in this circle, then there is a path from Pi to Pj which satisfies the condition and so the line PiPj does belong to the Delaunay Triangulation. Conversely if there is at least one other data point inside or on the boundary of this circle, then PiPj does not belong to the Delaunay Triangulation.

The basic algorithm for this method takes the following form:

For each point, i = 1 to n

Consider each line PiPj, for j = 1 to n and j not equal to i.

For every other point Pk, k = 1 to n, and k not equal to i or j
Test remaining points against the circle.

This includes three nested loops, the outer one with n values, the next with n-1 values and the innermost with n-2 values. This is very time-consuming and is not necessary. The first improvement is to note that the result for PiPj is also the result for Pj Pi and so this need only be tested once. Consequently the second loop can be written j = i+1 to n, which will about halve the computation.

Excluding Unnecessary Data-points

The second improvement avoids testing values of k for which Pk is too far from Pi or Pj to have any effect. This requires some ordering of the data points, for example into increasing order of x. (It is not possible to order on both x and y in this case). Then if the circle for PiPj has centre (mx,my) and radius r, then the array can be checked to find k1 and k2 such that
xk is less than (mx-r) whenever k is less than k1.
and
xk is greater than (mx+r) whenever k is greater than k2.
and the innermost loop need only test those points for which k lies between k1 and k2.

Once these limits on k have been set, these points may be tested to exclude all those which cannot possibly lie within the circle. For values of k from k1 to k2, test the y-values and make a list of those which satisfy the condition

(my-r) less than yk less than (my+r)

They will not occur in any particular order, which is why it is necessary to make a list rather than selecting a range of values of k. If this list only contains i and j, then the line PiPj belongs to the Delaunay Triangulation. Otherwise there is a comparatively small number of points which still need to be tested against the circle, as described earlier.

Once the Delaunay Triangulation is complete, the lines which have been found will need to be joined up into triangles and the neighbouring triangles identified. These can then be used as input to the various methods of surface representation.

Contouring

For either the contour following method or the triangle-by-triangle method, it is possible to use linear interpolation across the triangles that have been found. This will give a quick and easy representation of the contours. If smooth curves are required, then it is necessary to derive interpolation formulae which are valid over a triangle and which give smooth joins across the sides of adjacent triangles. This problem has been discussed by Powell and Sabin in a paper published in the ACM Transactions on Mathematical Software in 1977 and students are referred to this for the details.

Mosaic Method

This method will once again surround each data-point with an area shaded in with the correct colour corresponding to the height of the data point. The area will now be the interior of the Voronoi polygon surrounding that point and will need to be calculated from the triangulation. It will need to use the fact that the side of the polygon is the perpendicular bisector of the side of the triangle and move systematically round each vertex evaluating the equations of the sides of the polygon. Calculation of the intersections of adjacent sides will then give the vertices for use in the fill-area function. Although the ideas are simple, this is a time-consuming problem which needs carefully designed software.

To reduce the size of the mosaic will now require more computation than in the regular case. For each triangle, the mid-point of each side may be entered as a new vertex whose spot-height is equal to the average of the values at the end points. Inserting lines joining these new vertices will divide the original triangle into four new triangles and this may be repeated for each of the original triangles. The Voronoi Tessellation must then be calculated for all these triangles and the mosaic output produced. The quantity of calculation will become prohibitive after one or two such iterations.

Alternatively the surface may be coloured in, scan-line by scan-line, using a formula for interpolation over a triangle. This will result in output very similar to that obtained for regular data and for contours filled with suitable colours from either regular or irregular data.