Machine Learning in Civil Engineering - Surrogate Models

A quick introduction from Seán

Welcome to the second tutorial in our Machine Learning in Civil Engineering series. In this tutorial, Ehsan Es'haghi we'll explore the predictive power and computational efficiency of Surrogate Models. If you haven't done so already, check out the first tutorial in this series on sensitivity analysis and layout optimisation. That said, you can still read this tutorial as a standalone piece.

Traditional numerical analysis methods like Finite Element Analysis (FEA) require solving large systems of equations, which becomes computationally expensive for large-scale problems. Here, we introduce the fundamental concept of a surrogate model, an approximation technique that reduces computational time by predicting outcomes without needing to solve the equations directly.

📂 To follow along, make sure to download the Jupyter Notebook (linked above) to run locally as you read through the tutorial.

Dr Seán Carroll CEng MIEI

Founder of EngineeringSkills.com

1. Introduction

In the previous tutorial in this series on sensitivity analysis and layout optimisation, we examined an efficient method for computing the derivative of compliance with respect to element sizes. Despite using advanced sensitivity analysis techniques like the adjoint method, we still face the challenge of solving $N$ equations with $N$ unknowns in each iteration, which has $O(N^3)$ computational complexity.

As the number of nodes in the structure increases, this can become impractical. Additionally, while global compliance is a popular objective due to its simplicity, it is often impractical for real-world problems.

Moreover, optimising a structural design might involve relocating nodes, changing joint types, and other modifications. Taking derivatives of even a straightforward objective like global compliance with respect to these variables is highly complex.

To address these challenges, we will explore a machine learning approach based on developing a so-called surrogate model to estimate solutions to this system of equations more efficiently.

2. Formal Problem Definition

Imagine you are presented with a static load vector $F$ and a structure characterised by a stiffness matrix $K$ . Calculating nodal displacements within a structure, involves solving $N$ equations for $N$ unknowns, where $N$ represents the product of the degrees of freedom, with the matrix $K$ having dimensions $N \times N$ .

The most formidable aspect of this task is matrix inversion, $U = K^{-1}F$ , a process with a computational complexity of $O(N^3)$ . This implies that doubling the number of nodes results in an eightfold increase in computational operations!

Finite Element Analysis (FEA) software, which can feature thousands or even millions of nodes, utilises advanced linear algebra techniques to efficiently compute this inversion. These techniques capitalise on certain properties of stiffness matrices:

Most entries in a stiffness matrix are zero, making these matrices sparse.
Given linear elastic materials and isotropy, the stiffness matrix can be assumed to be positive definite.

These characteristics allow for the optimisation of the matrix inversion operation, significantly enhancing both computational and memory efficiency. This enables simulations of building deflection involving millions of nodes to be performed in under 15 minutes on a standard laptop.

Another approach to reduce analysis time in mesh-based FEA software is to increase the mesh size, thereby reducing the dimension of $K$ but at the cost of losing accuracy. This presents a constant trade-off between accuracy and efficiency. In this tutorial, we will explore a different approach using the burgeoning field of machine learning.

As we discussed, $K^{-1}F$ is a function that inputs matrix $K$ and vector $F$ and returns the nodal displacement vector $U$ . The challenge with this function is its rapidly increasing computation time as $K$ enlarges.

Throughout this series of tutorials, we will aim to design a function with a much lower time-complexity than $O(N^3)$ while still providing an acceptable approximation. Such a function is known as a surrogate model. Here is a concise yet suitable broad definition of a surrogate model from Wikipedia which nicely articulates the concept:

A surrogate model, also known as a metamodel or a response surface, is an engineering method used when an outcome of interest cannot be easily measured or computed directly, necessitating an approximate model or a substitute for it.

This tutorial will focus on applying machine learning techniques to this problem. To begin, we'll cover some machine learning fundamentals. The next tutorial in this series will introduce Artificial Neural Networks (ANNs), particularly Graph Neural Networks (GNNs), which are well-suited for our specific challenges.

While promising, it's crucial to acknowledge that this research area is not fully mature, and surrogate models do not yet match the precision of traditional FEA methods. Our exploration will illuminate both the potential and the current limitations of machine learning in this exciting interdisciplinary domain.

In the next section, we’ll explore the development of a surrogate model by considering the behaviour of a simple truss under the action of a single point load.

3. Effect of Truss Element Cross-sectional Area on Nodal Displacement

Consider the simple 2D truss subject to a single point load, shown in Fig 1.

2D pin-jointed truss structure with a single point load | EngineeringSkills.com

Fig 1. 2D pin-jointed truss structure with a single point load.

This example structure is taken from the EngineeringSkills course The Direct Stiffness Method for Truss Analysis with Python.

Readers not familiar with the stiffness method for 2D truss analysis (and/or its implementation in Python) can refer to this course for a refresher.

The Direct Stiffness Method for Truss Analysis with Python

Build your own finite element truss analysis software using Python and tackle large scale structures.

After completing this course...

You’ll understand how to use the Direct Stiffness Method to build complete structural models that can be solved using Python.
You’ll have your own analysis programme to identify displacements, reactions and internal member forces for any truss.
You’ll understand how common models of elastic behaviour such as plane stress and plane strain apply to real-world structures.

In the following, we will examine the relationship between the vertical displacement of node $1$ and the cross-sectional area of element $C$ . For clarity, we'll adopt the following notation:

Area of element $C$ : $A_c$
Vertical displacement of node $1$ : $U_0$

Download the Jupyter Notebook

From this point onwards, to effectively follow the tutorial you'll need to download and run the Jupyter Notebook. Download it at the top of this page (make sure to log in first). You'll see reference to various functions below. These functions are defined in the notebook and not included here for brevity.

When you run the accompanying Jupyter Notebook, you’ll see a slider that allows you to visualise the deflected shape of the structure for different values of $A_c$ , Fig. 2. We utilised the direct stiffness method (via OpenSeesPy - see this tutorial for a detailed explanation) to calculate the displacement, keeping the area of other elements constant.

Variation of nodal displacement with element cross-sectional area | EngineeringSkills.com

Fig 2. Variation of nodal displacement with element cross-sectional area.

Let's record the $U_0$ for $11$ different variations of $A_c$ , Fig 3.

Plot of nodal displacement U_0 versus area of element C, A_c | EngineeringSkills.com

Fig 3. Plot of nodal displacement $U_0$ versus area of element $C$ , $A_c$ ."

As we can see, the vertical displacement decreases as we increase the cross-sectional area. This relationship is clearly nonlinear. It aligns with expectations because the displacement approaches $0$ without crossing it. In simpler terms, node $1$ moves upward as we enlarge element $C$ , but it doesn't surpass its initial position.

If we consider a single mass-spring system, the relationship between stiffness and displacement is straightforward:

X = \frac{F}{K}

We anticipate a similar relationship in our scenario because the truss geometry, external load, and area of all other elements remain constant, with only one non-zero entry in our force vector. If we had non-zero external loads on other nodes as well, you would observe a similar pattern between $A_c$ and $U_0$ .

In a general case, imagine we're given a truss and a force vector and asked to compute nodal displacements for thousands of different combinations of element sizes (thousands of possible $A$ vectors).

In general,

X_i = \left(\sum_{i=0}^{N} K_{i\text{,}j}^{-1} \cdot F_j\right)

If we could somehow exclude vector $A$ from the computation of $K^{-1}$ , then we could perform matrix inversion just once and subsequently compute $K^{-1}$ for any $A$ vector through matrix multiplication, which has a computational cost of $O(N^2)$ .

Although there are solutions based on advanced linear algebra, such as the Sherman–Morrison formula and methods of sensitivity analysis, to address this specific problem, we'll focus on our machine learning journey and not delve into those approaches here.

Now, let's investigate if we can uncover a simple linear relationship between $A_c$ and $U_0$ .

4. Displacement Prediction

Let’s store the areas (variations) and displacements (displacements) in a dataframe. If you’re not familiar with dataframes, think of them as a user-friendly table object.

df = pd.DataFrame({"A_c":variations,"U_0":displacements})

	A_c	U_0
0	0.0010	0.029333
1	0.0018	0.023408
2	0.0026	0.021128
3	0.0034	0.019922
4	0.0042	0.019175
5	0.0050	0.018667
6	0.0058	0.018299
7	0.0066	0.018020
8	0.0074	0.017802
9	0.0082	0.017626
10	0.0090	0.017482

We can attempt to fit a linear model to the observations shown in Fig 3 above. By adjusting the intercept and slope of the linear model by dragging the sliders left and right (in the Jupyter Notebook), we observe the distance between the line and the real displacements, shown by dashed lines, Fig 4.

Linear model and associated error with respect to the observed data points | EngineeringSkills.com

Fig 4. Linear model and associated error with respect to the observed data points.

Each blue dot is called a data point or a sample. The dashed lines represent the error of our estimation with respect to each sample. To obtain the most accurate linear relationship between the $A_c$ and the $U_0$ with respect to all the samples, we need to find a line that minimises the sum of the absolute errors. As you play with the line, you can monitor sum of absolute error which is denoted simply as Error.

For an arbitrary line, we can compute the sum of absolute errors as follows:

\sum_{i=0}^{M} |Y_i - w \times X_i + b|

Where:

$M$ is the number of samples.
$X$ is a vector that contains variations of area ( $A_c$ ).
$Y$ is a vector that contains the corresponding displacements ( $U_0$ ).
$b$ is the intercept.
$w$ is the slope

For instance, let us calculate the sum of absolute error for the line $Y = -1.54X + 0.03$ .

df["estimation"] = -1.54 * df["A_c"] + 0.03
df["error"] = df["U_0"] - df["estimation"]
df["abs_error"] = np.abs(df["error"])
display(df.head())
sum_ebs_error = df["abs_error"].sum()
print("sum of absolute estimation error: ", np.round(sum_ebs_error,4))

This gives…

	A_c	U_0	estimation	error	abs_error
0	0.0010	0.029333	0.028460	0.000873	0.000873
1	0.0018	0.023408	0.027228	-0.003820	0.003820
2	0.0026	0.021128	0.025996	-0.004868	0.004868
3	0.0034	0.019922	0.024764	-0.004842	0.004842
4	0.0042	0.019175	0.023532	-0.004357	0.004357

sum of absolute estimation error: 0.0294

We compute the average sum of errors to establish a metric for the effectiveness of our estimator, regardless of the number of data points. Moreover, due to certain technical considerations, we utilise the root mean square error (RMSE) instead of the absolute value of the error.

5. Error Minimisation

Given the samples and all truss parameters except for the area of element $C$ as known and constant, the error can be modeled as a function $f: \mathbb{R}^2 \rightarrow \mathbb{R}$ . This function takes the slope and intercept as inputs and produces a real number, which represents the RMSE (Root Mean Squared Error).

In a more rigorous format, RMSE can be expressed as:

\text{RMSE} = f(w,b) = \sqrt{ \frac{1}{M} \sum_{i=0}^{M} (Y_i - w\times X_i + b)^2 }

Let's observe how RMSE varies for different sets of slopes and intercepts. We expect to see a quadratic function.

How different intercepts and slopes, influence the estimation error | EngineeringSkills.com

Fig 5. How different intercepts and slopes, influence the estimation error.

In Fig 6. below, you can observe the contour plot of estimation errors as a function of slope and intercept.

Contour plot of estimation errors as a function of slope and intercept | EngineeringSkills.com

Fig 6. Contour plot of estimation errors as a function of slope and intercept

5. Linear Regression Fit for Nodal Displacement Prediction

So far, our approach involved determining the slope and intercept that minimise the error by evaluating the error for numerous combinations and selecting the one that minimises the estimation error.

However, an alternative method involves utilising the gradient of this error function to identify the optimal parameters. This process of fitting a linear function to a dataset is known as linear regression.

When employing the RMSE objective function to solve it, it's termed linear or ordinary least squares, aiming to minimise the squared estimation error by fitting a linear function to the data. Rather than delving into the intricacies of least squares or other objective functions, we'll utilise the implementation provided by scikit-learn (sklearn).

scikit-learn (sklearn) provides machine learning algorithms for data analysis and modelling.

from sklearn.linear_model import LinearRegression

X = df["A_c"].values.reshape(-1,1)  # Extract the "A_c" column from df, reshape it into a 2D array
y = df["U_0"].values  # Extract the "U_0" column from df as the target variable

reg = LinearRegression().fit(X, y)  # Fit a linear regression model to the data (X vs y)

print(f"fitted line: {reg.coef_[0]:.3f}*x + {reg.intercept_:.3f} ")

fitted line: -1.103*x + 0.026

As you can observe, the fitted line closely resembles what we obtained through brute force searching, Fig 6.

6. Polynomial Regression Fit for Nodal Displacement Prediction

Let's take a step forward and endeavor to fit a second-order polynomial to our data. Our estimator will take the form: $Y = ax^2 + bx + c$ . In this scenario, our error function will operate from $\mathbb R^3 \rightarrow \mathbb R$ since we must determine three parameters, a, b, and c, that minimise the error.

In the code block below, sklearn is used for preprocessing (via PolynomialFeatures) and model fitting (via LinearRegression). These features make it easy to apply machine learning models to datasets, such as the polynomial regression in this case.

from sklearn.preprocessing import PolynomialFeatures

X = df["A_c"].values.reshape(-1,1)  # Extract "A_c" column from df, reshape it into a 2D array
y = df["U_0"].values  # Extract "U_0" column from df as the target variable

poly = PolynomialFeatures(degree=2)  # Create a PolynomialFeatures object to generate 2nd-degree polynomial terms
X_poly = poly.fit_transform(X)  # Transform X into polynomial features (adds x^2 term)

# Fitting linear regression model
reg = LinearRegression().fit(X_poly, y)  # Fit the linear regression model to the polynomial features and target values
df["estimation"] = reg.predict(X_poly)  # Store the predicted values in a new column "estimation" in df

# Plotting
fig, ax = plt.subplots(figsize=(12, 8))
plt.scatter(df["A_c"].values, df["U_0"].values)  # Scatter plot of original data points (A_c vs U_0)
plt.plot(df["A_c"].values, df["estimation"].values)  # Plot the fitted polynomial curve
plt.xlabel(f"Area of Edge C")
plt.ylabel(f"Vertical Displacement of Node 0")
print(f"fitted polynomial: ({reg.coef_[2]:.3f}*x^2) + ({reg.coef_[1]:.3f}*x) + {reg.intercept_:.3f} ")

fitted polynomial: (287.575*x^2) + (-3.978*x) + 0.031

Fitted second-degree polynomial regression curve | EngineeringSkills.com

Fig 7. Fitted second-degree polynomial regression curve.

7.0 Categorical Target

This time, assume that instead of estimating the exact displacement of node 1, we want to make sure that the displacement is below 19 mm. In other words, given that the displacement is below 19 mm, you do not care about its exact value. Let’s prepare our dataset with this new requirement.

df['below_19_mm'] = df['U_0'] < 0.019
df

	A_c	U_0	estimation	error	abs_error	below_19_mm
0	0.0010	0.029333	0.027250	0.000873	0.000873	False
1	0.0018	0.023408	0.024711	-0.003820	0.003820	False
2	0.0026	0.021128	0.022541	-0.004868	0.004868	False
3	0.0034	0.019922	0.020738	-0.004842	0.004842	False
4	0.0042	0.019175	0.019304	-0.004357	0.004357	False
5	0.0050	0.018667	0.018238	-0.003633	0.003633	True
6	0.0058	0.018299	0.017540	-0.002769	0.002769	True
7	0.0066	0.018020	0.017210	-0.001816	0.001816	True
8	0.0074	0.017802	0.017248	-0.000802	0.000802	True
9	0.0082	0.017626	0.017654	0.000254	0.000254	True
10	0.0090	0.017482	0.018428	0.001342	0.001342	True

Now we can visualise our categorised data:

# Use the 'hue' argument to provide a factor variable
sns.lmplot( x="A_c", y="U_0", data=df, fit_reg=False, hue='below_19_mm', legend=True)
plt.axvline(x=0.0045, color='red', linestyle='--')
plt.xlim([df['A_c'].min() - 0.001, df['A_c'].max() + 0.001])
plt.ylim([df['U_0'].min() - 0.008, df['U_0'].max() + 0.008])
plt.xlabel("Cross-sectional Area of Element C" )
plt.ylabel("Vertical Displacement of Node 0")
plt.show()

Fig 8. Plot of nodal displacement $U_0$ versus area of element C, $A_c$ with data categorised by whether the displacement is below 19 mm.

In this example, instead of looking for a function to predict the exact value of $U_0$ , given $A_c$ , which is called regression, we are looking for a function that, given $A_c$ , estimates if our target variable (below_19_mm) will be true or false.

From the samples, you can observe that if $A_c$ is approximately above 0.0045, $U_0$ will be below 19 mm. Therefore, our function has a very simple form:

f(x) = \begin{cases} \text{True} & \text{if } A_c - 0.0045 \leq 0 \\ \text{False} & \text{if } A_c - 0.0045 > 0 \\ \end{cases}

This problem is called classification, and the vertical dashed red line in the image is the simplest classifier function. In other words, $f(A_c)$ is a zero dimensional fuction, that takes an scalar as input.

If we have more than one independent variable (input is a vector, instead of an scalar), you can convince yourself that the classifier function will be a line or, more generally, a curve instead of a constant value.

Example of a linear classification function | EngineeringSkills.com

Fig 9. Example of a linear classification function.

Example of a non-linear classification function | EngineeringSkills.com

Fig 10. Example of a non-linear classification function.

You can extend your imagination to higher dimensions for both classification and regression. For instance, with two features, the regression function will be a 2D plane, etc.

8. Wrapping up and what next?

In this tutorial, we focused on estimating the displacement of a single node under a simple load vector while holding all parameters constant except for the area of an element. This problem was the simplest version of regression called simple linear regression, or single explanatory regression.

We can go beyond simple regression and fit a function that maps more features, such as the size of all elements, to the displacement of a node. In higher dimensions, our function will be a hyperplane.

Then we changed our target to predict whether the displacement stays below a threshold for different sizes of element $A$ . So our target variable was a categorical (True, False) variable, instead of a continuous variable.

We said that this is called a classification problem. Just like regression, we can extend classifier functions to map higher dimensional feature spaces to a class.

Both of these problems are subcategories of a more general type of problem in machine learning called supervised learning. In supervised learning, the goal is to learn a function that maps input features to an output label using labeled training data.

The model learns from the data and can make predictions on new, unseen data. Therefore, surrogate models can usually (not always) be defined in terms of a supervised learning problem.

As you're aware, adjusting the position of a node alters numerous angles and lengths within the structure with a knock-on influence on the stiffness matrix, making it considerably more difficult to predict the deflection of the truss accurately.

In the next tutorial, we delve into the complexity of this problem and explore why graph neural networks are particularly well-suited for it. We will introduce an advanced method that uses neural networks to estimate solutions for an $N \times N$ system of equations with acceptable approximation and extremely fast performance.