9.4.2. RSM Available Algorithms

Your Ad Here

There are many interpolation methods; some methods are based on geometrical nearness, some on statistical, some on basic functions or on artificial neural networks.

The following algorithms are actually available in modeFRONTIERTM:

9.4.2.1. User Defined Response Surface

In this panel the user can provide his own response surface. The selected output variable can be defined using mathematical relationships.

This method is very useful if the user already know a mathematical expression that is a good model for the output variable.

User RSM Panel

Figure 9.59. User RSM Panel

The user must insert a regular expression in the User RSM textfield. The mathematical expression can be composed with the help of the Expression Editor. Click on the Expression Editor icon to open it.

A list of all possible Mathematical and logical operators is present in this manual.

Errors and warnings generated by this method are shown in the Log Panel.

9.4.2.2. Single Value Decomposition

The Single Value Decomposition method should be always the first method to be tried. Usually, many functions are good for interpolating a set of points, but the data analysis principle of parsimony suggests to try fitting simple functions first.

In this method (least sum of squares, LSS), the unknown values of the parameters, ß1, ß2,..., ßn in the regression function are estimated by finding the numeric values for the parameters that minimise the sum of the squared deviations between the observed responses and the functional portion of the model.

Mathematically, the least sum-of-squares criterion that is minimised to obtain the parameter estimates is:

To make this more concrete consider the straight-line model,

For this model the least squares estimates of the parameters would be computed by minimising

Doing this by

  1. taking partial derivatives of Q with respect to ß1 and ß2

  2. setting each partial derivative equal to zero

  3. solving the resulting system of two equations with two unknowns

Single Value Decomposition Panel when two input variables are present in the Work Flow

Figure 9.60. Single Value Decomposition Panel when two input variables are present in the Work Flow

In SVD the maximum number of training set is fixed to 2000 designs. The minimum requested number of designs depends on the basis function selected and on the number of input variables. The number of minimum requested designs are exactly the number of unknown parameters.

More precisely:

  • Linear polynomials request at least N + 1 designs

  • Quadratic polynomials request at least (1+N)*N/2 + N + 1 designs

  • Exponential functions request at least (1+N)*N/2 + N + 1 designs

where N is the number of input variables.

Two parameters can be defined for this algorithm:

  1. Training set: the designs set for the single value decomposition. It is possible to choose between All designs database and Only marked designs. Have a look at Design Selection to see how to mark and unmark designs in the Design Space panel.

  2. Algorithm Type: the basis function. It is actually possible to choose between:

    • SVD Linear

    • SVD Quadratic

    • SVD Exponential

9.4.2.3. K-Nearest

K-nearest is a statistical methods that interpolates the function value in the unknown point using weighted average of the known values in the sample nearest points. So

where F'(x0) is the interpolate value in x0, F'(xi) is the value of the measurement in the k neighbours point xi i=1,...,k and are the weights.

The weighting uses an inverse distance methods so:

where p is the inverse distance exponent.

K-nearest Example (K=5)

Figure 9.61. K-nearest Example (K=5)

This method have the property of smoothing the data reducing the juttings and dips.

K-nearestes Panel

Figure 9.62. K-nearestes Panel

Only three parameters can be defined for this algorithm:

  1. Training set: the designs set for the training. It is possible to choose between "All designs" database and "Only marked designs". Have a look at Design Selection to see how to mark and unmark designs in the Design Space panel.

  2. K-nearest designs: is the number of neighbour points to consider (k in the previous formula). The maximum number of neighbours is 100.

  3. Inverse distance exponent (P): P in the previous formula.

The dimension of the training set must be equal or greater than the number of the neighbours (K).

9.4.2.4. Kriging

Kriging is a statistical tool developed by Matheron (1963) and named in honour of D.G. Krige. This method uses variogram to express the spatial variation and it minimises the error of predicted values which are estimated by spatial distribution of the predicted values.

Kriging is associated with the acronym B.L.U.E (Best Linear Unbiased Estimator).

An estimator is said to be a best linear unbiased estimator (BLUE) if:

  1. it is a linear estimator (it can be expressed as a linear combination of the sample observations)

  2. it is unbiased (the mean of error is 0)

  3. no other linear unbiased estimator has a smaller variance.

A BLUE is not necessarily the best estimator, since there may well be some non-linear estimator with a smaller sampling variance than BLUE. In many situations, however, the efficient estimator may be so difficult to find that we have to be satisfied with the BLUE (if the BLUE can be obtained).

The basic premise of kriging interpolation is that every unknown point can be estimated by the weighted sum of the known points. The matrix of the covariances of all the sample points in the search neighbourhood operates to take into account data redundancy.

Two points that are close to each other in one direction and have a high covariance are redundant, may want to weight them together as much as a single point in the opposite direction the same distance away. After inverting C, which rescales the covariances, a large covariance becomes a small weight.

So the matrices take care of the clustering of the data points and the distance between unknown point and sample points in variogram space, not just Euclidean space. Put those two together to get the kriging weights.

Generally the estimation of an unknown point only takes a limited range of the known values into consideration (K-nearest). This is done for two reasons:

  1. known values at a great distance from the unknown point are unlikely to be of great benefit to the accuracy of the estimate

  2. the operation is less expensive

Kriging Panel

Figure 9.63. Kriging Panel

Only few parameters must be defined for this algorithm:

  1. Training set: the designs set for the training. It is possible to choose between "All designs" database and "Only marked designs". Have a look at Design Selection to see how to mark and unmark designs in the Design Space panel.

  2. K-nearest Designs: is the number of neighbour points to consider. The maximum number of neighbours is 100.

  3. Semivariance Exponent: the exponent of the semivariance that is a measure of the degree of spatial dependence between designs. The magnitude of the semivariance between points depends on the distance between the points. The Automatic checkbox will let the algorithm choose the most appropriate Semivariance Exponent value. The Automatic technique uses an optimisation algorithm to find the best Semivariance Exponent for the training designs.

  4. Random Sequence Seed: this value is significant only in the case of Automatic Semivariance Exponent. The modification of this value allows the execution of different runs. The Seed is an integer number used for sequence repeatability. If two Kriging response surfaces are evaluated with the same seed, they will generate and return identical results. If the seed value is 0, the sequence is automatically seeded to a value based on the machine clock.

  5. Training cycles: this value is significant if and only if the Automatic checkbox is active and defines the maximum size of the run.

The dimension of the Training set must be equal or greater than the number of the selected neighbours (K-nearest Designs).

9.4.2.5. Parametric

In this panel the user can provide his own response surface. The selected Output Variable can be defined using mathematical relationships.

This method is very useful if the user already knows a mathematical expression that fits for the Output Variable. The algebraic function can include unknown parameters. The training algorithm will calculate the values of the parameters that yield the best fit between the Output Variable being modeled and the RSM.

Parametric Panel

Figure 9.64. Parametric Panel

The user must specify:

  1. Param RSM: The user must insert here a regular expression. The mathematical expression can be composed with the help of the Expression Editor. Click on the Expression Editor icon to open it. The algebraic expression can contain many unknown parameters. The above example shows a polynomial expression of the second order with 5 unknown parameters (A, B, C, D and F): A+B*x+C*x2+D*y+F*y2 Note that E is not used in the parametric expression since it represents the Neper's number (the base of the natural logarithms).

  2. Training set: the designs set for training. It is possible to choose between All designs database and Only marked designs. Have a look at Design Selection to see how to mark and unmark designs in the Design Table.

  3. Training Cycles: The training algorithm has only few tuning parameters; this parameter defines the maximum number of steps the training algorithm uses to calculate the expression's unknown parameters. Bigger is the number of training cycles more accurate the training will be, of course the training time will increase as well.

  4. Error Type: is the type of error that has to be minimized. It is possible to choose between Absolute and Relative(%) errors.

  5. Parameters Bounds: The user can decide if the parameters bounds should be considered Fixed or not. If the parameters should be considered fixed, the results always respect the lower and upper bounds. Otherwise, the bounds are only consider for initialization purposes.

  6. RSM Parameters: in this table, the user has to specify in the range (lower/upper bound) and the base of all the unknown parameters. The training algorithm will try to find in those ranges the values of the parameters which let the Param RSM to best fit the design data base.

N.B. A list of all possible Mathematical and logical operators can be found in this manual. Errors and warnings generated by this RSM algorithm are shown in the Log Panel.

9.4.2.6. Gaussian Processes

Over the last decade there has been a growing interest in Bayesian approach to regression problems using both neural networks and Gaussian process prediction.

The Bayesian approach is based upon the expression of knowledge in terms of probability distributions.

Gaussian Process is a powerful regression model specified by parameterized mean and covariance functions.

This method defines a prior over the space of possible functions to model the data where is some set of hyperparameters. It also defines a prior over the noise where is some appropriate noise vector and is a set of hyperparameters. These parameters are known by the name Automatic Relevance Determination (ARD) hyperparameters and they are determined automatically using the training data. The likelihood of the data is give by:

where . Now if we define the vector then we can write down the conditional distribution of .

and hence we can use the conditional distribution to make predictions about .

Gaussian Processes Panel

Figure 9.65. Gaussian Processes Panel

This method is best suited for non polynomial responses. The maximum dimension of the training set is fixed to 500 valid designs. The number of minimum requested designs depends on the number of input variables and are exactly N + 4 (where N is the number of input variables). The RSM Log Panel provides errors and suggestions for a fine tuning of the approximation.

Several parameters can be defined for this algorithm:

  1. Training set: the designs set for Gaussian Processes training. It is possible to choose between All designs database and Only marked designs. Have a look at Design Selection to see how to mark and unmark designs in the Design Space panel.

  2. Random Sequence Seed: a positive integer from 0 to 999 for repeatability. If the same seed and parameters are used, an RSM evaluation can be exactly reproduced. If the seed value is 0, the sequence is automatically seeded to a value based on the current time.

  3. Training cycle: a positive integer number from 1 to 1024 to fix the maximum training cycles for Gaussian process approximations.

  4. Algorithm Type: is the method used to determine the ARD hyperparameters. The ARD parameters can be found by Maximising the Likelihood function or Minimising the Interpolation Errors. The second method is usually more accurate near the training points, but it could generate overfitting. Overfitting can occur with the minimisation of the interpolation errors especially when the number of input variables is large with respect to the training set. Overfitting is dangerous because it can easily produce wild predictions.

  5. Error Type: is the type of error to consider during running. It is possible to choose between Relative(%) and Absolute.

  6. Max Average Error: is the average error acceptable in the training set.

Note: The algorithm will stop only when it has found a solution that respects the requested errors. It is always possible to stop the run by clicking on the Stop RSM button. When the Gaussian Processes algorithm is stopped the best approximation will be kept.

References

  • [1] MacKay D.J.C. : Introduction to Gaussian Process, Cavendish Laboratory, Cambridge, United Kingdom.

  • [2] Gibbs M., MacKay D.J.C. : Efficient Implementation of Gaussian Process, Cavendish Laboratory, Cambridge, United Kingdom (May 1997).

  • [3] Sundararajan S., Keerthi S.S. : Predictive Approaches for Choosing Hyperparameters in Gaussian Processes.

9.4.2.7. Neural Net

In order to approximate with higher accuracy the output variables, a non-linear model should be used, but it can be easily seen that with a number of variables higher than about 10 the interpolation problem becomes almost not tractable.

The possible solution to this problem is the use of a Neural Network that can be considered one of the best approaches to approximate a generic function.

Neural Net Panel

Figure 9.66. Neural Net Panel

The maximum number of training designs are fixed to 200.

Several parameters can be defined for this algorithm:

  1. Training set: the designs set for Neural Net training. It is possible to choose between All designs database and Only marked designs. Have a look at Design Selection to see how to mark and unmark designs in the Design Space panel.

  2. Random Sequence Seed: a positive integer from 0 to 9999 for repeatability. If the same seed and parameters are used, an Neural Net evaluation can be exactly reproduced. If the seed value is 0, the sequence is automatically seeded to a value based on the current time.

  3. Number of Nodes: a positive integer from 4 to 20.

  4. Number of Iterations: a positive integer from 100 to 20,000.

  5. Error Type: is the type of error to consider during running. It is possible to choose between Relative(%) and Absolute.

  6. Max Error: is the biggest error acceptable in the training set.

  7. Max Average Error: is the average error acceptable in the training set.

  8. Output Filter: It is possible to choose between On and Off.

It is always possible to stop the run by clicking on the Stop RSM button.


Return to modeFRONTIER Index


Your Ad Here