Latent Component Gaussian Process (LCGP): replicated 1D illustration#
The experiment compares LCGP behavior under three replication designs and two training modes:
Axis |
Options |
|---|---|
Replication design |
uniform, skewed, hotspot |
Training mode |
replicated-data reduction ( |
Figure type |
output predictions ( |
The notebook is deterministic: each (case, submethod) run receives its own fixed random seed.
Execution requirements#
This page expects the following packages to already be available in the JupyterBook build environment or to be installed during runtime:
lcgp
pandas
matplotlib
tensorflow-probability[tf]
Imports and global configuration#
Matplotlib is building the font cache; this may take a moment.
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 12
8 import pandas as pd
9 import matplotlib.pyplot as plt
10 from IPython.display import Markdown, display
11
---> 12 from call_model import LCGPRun
13 from lcgp import evaluation
14
15 plt.rcParams.update({
File ~/checkouts/readthedocs.org/user_builds/lcgp/checkouts/lcgp-r/docs/call_model.py:1
----> 1 from lcgp import LCGP
2 import numpy as np
5 class SuperRun:
ModuleNotFoundError: No module named 'lcgp'
# All options executed by this JupyterBook page.
CASES = (1, 2, 3)
SUBMETHODS = ("rep", "full")
PLOT_MODES = ("y", "g")
BASE_SEED = 42
RESULTS_ROOT = Path("results_figure_jupyterbook")
RESULTS_ROOT.mkdir(parents=True, exist_ok=True)
CASE_LABELS = {
1: "Uniform replication",
2: "Skewed replication in [0.20, 0.45]",
3: "Hotspot replication at selected x locations",
}
True function#
The input is one-dimensional, (x \in [0,1]), while the response has three output dimensions:
[ y(x) = \begin{bmatrix} f_1(x) \ f_2(x) \ f_3(x) \end{bmatrix}. ]
The three outputs share the same scalar input but have different shapes. This gives a compact multi-output regression problem where replication can affect both mean estimation and uncertainty quantification.
Replicated data generators#
Comparing multiple scenarios of replications#
summary_rows: list[dict] = []
metric_rows: list[dict] = []
diagnostic_rows: list[dict] = []
for case in CASES:
for submethod in SUBMETHODS:
seed = BASE_SEED + 100 * case + (0 if submethod == "rep" else 1)
xtrain, ytrain, xtest, ytrue = build_dataset(case, seed=seed)
rep_summary = summarize_replication(case, submethod, xtrain, seed)
summary_rows.append({
"case": rep_summary.case,
"case label": CASE_LABELS[case],
"submethod": rep_summary.submethod,
"N total": rep_summary.n_total,
"unique x": rep_summary.n_unique,
"rep min": rep_summary.rep_min,
"rep mean": rep_summary.rep_mean,
"rep max": rep_summary.rep_max,
"seed": rep_summary.seed,
})
display(Markdown(f"### Case {case}: {CASE_LABELS[case]} — `{submethod}`"))
display(pd.DataFrame([summary_rows[-1]]))
modelrun, predmean, ypredvar, yconfvar, elapsed = fit_lcgp(
xtrain=xtrain,
ytrain=ytrain,
xtest=xtest,
ytrue=ytrue,
submethod=submethod,
runno=f"case_{case}_{submethod}",
)
metrics = evaluate_prediction(ytrue, predmean, yconfvar)
metric_rows.append({
"case": case,
"case label": CASE_LABELS[case],
"submethod": submethod,
"training time (s)": elapsed,
**metrics,
})
display(pd.DataFrame([metric_rows[-1]]))
mdl = modelrun.model
lLmb, lLmb0, lsigma2s, lnugGPs = mdl.get_param()
r = np.asarray(mdl.r.numpy())
diagnostic_rows.append({
"case": case,
"case label": CASE_LABELS[case],
"submethod": submethod,
"diag_D": np.asarray(mdl.diag_D.numpy()).round(4).tolist(),
"phi^T phi diag": np.diag(mdl.phi.numpy().T @ mdl.phi.numpy()).round(4).tolist(),
"lengthscales": [np.asarray(lLmb[k].numpy()).round(4).tolist() for k in range(lLmb.shape[0])],
"noise std fitted": np.sqrt(np.exp(lsigma2s.numpy())).round(4).tolist(),
"rep avg/min/max": f"{np.mean(r):.2f} / {np.min(r)} / {np.max(r)}",
})
display(pd.DataFrame([diagnostic_rows[-1]]))
run_dir = RESULTS_ROOT / f"case_{case}_{submethod}"
run_dir.mkdir(parents=True, exist_ok=True)
fig_y = plot_output_predictions(
case=case,
submethod=submethod,
xtrain=xtrain,
ytrain=ytrain,
xtest=xtest,
ytrue=ytrue,
predmean=predmean,
yconfvar=yconfvar,
outfile=run_dir / "lcgp_output.png",
)
plt.show()
fig_g = plot_latent_gps(
case=case,
submethod=submethod,
modelrun=modelrun,
xtest=xtest,
outfile=run_dir / "lcgp_latent.png",
)
plt.show()
display(Markdown(f"Saved figures to `{run_dir.as_posix()}/`."))
Summary tables#
replication_summary = pd.DataFrame(summary_rows)
metric_summary = pd.DataFrame(metric_rows)
diagnostic_summary = pd.DataFrame(diagnostic_rows)
display(Markdown("### Replication design summary"))
display(replication_summary)
display(Markdown("### Predictive performance summary"))
display(
metric_summary.sort_values(["case", "submethod"])
.style.format({
"training time (s)": "{:.2f}",
"RMSE": "{:.4f}",
"NRMSE": "{:.4f}",
"95% PI coverage": "{:.3f}",
"95% PI width": "{:.4f}",
"DSS": "{:.4f}",
})
)
display(Markdown("### Fitted diagnostic summary"))
display(diagnostic_summary)