Skip to article frontmatterSkip to article content

3. Oscillations in Cloud Fraction

University of British Columbia

In this notebook, we repeat the GP regression analysis for the timeseries of cloud fraction fcf_c, or the changes in the fraction of the model domain covered by clouds.

Refer to Oh and Austin, 2024 for more insights.

# Set up local project directory path
from pathlib import Path
from tqdm.notebook import tqdm
import sys

project_src = Path('../src').resolve().as_posix()
sys.path.insert(0, project_src)

# try:
#     import lib.config
# except Exception:
#     raise Exception("Issue with dynamic import")

# config = lib.config.read_config()
# pwd = Path(config["pwd"])
src = Path('../output')
# Import modules needed for the rest of the notebook
import numpy as np
import xarray as xr

import sklearn.preprocessing as skl_prep
from statsmodels.nonparametric import smoothers_lowess as sl

# Pytorch + GP
import torch, gpytorch
import matplotlib as mpl
from matplotlib import pyplot as plt

import seaborn as sns

sns.set_context('notebook')
sns.set_style('ticks', 
    {
        'axes.grid': True, 
        'axes.linewidth': '1',
        'grid.color': '0.5',
        'grid.linestyle': u':',
        'legend.frameon': True,
    })
xkcd = sns.xkcd_rgb


rc_opt = {
    "figure.dpi": 300,
    "font.family": "serif",
    "font.size": 15,
    "text.usetex": False,
    'text.latex.preamble': r"\usepackage{libertine} \usepackage[libertine]{newtxmath}",
}
mpl.rcParams.update(rc_opt)
# CUDA setup
if torch.cuda.is_available():
    device = torch.device('cuda')
elif torch.backends.mps.is_available():
    device = torch.device('mps')
else:
    device = torch.device('cpu')

if device.type == 'cuda':
    print(f"Using {torch.cuda.get_device_name(0)}")
else:
    print(f"Using {device}")
Using mps
p = Path("../nc/CGILS_1728x576x192_25m_1s_ent.nc")

with xr.open_dataset(p) as df:
    y = np.array(df.CLDSHD[1440:2160])
    x = np.arange(len(y))

x_full = torch.tensor(x[:], dtype=torch.double)
y_full = torch.tensor(y[:], dtype=torch.double)

x_tr = torch.tensor(x[:600], dtype=torch.double)
y_tr = torch.tensor(y[:600], dtype=torch.double)
# Plot
fig, ax = plt.subplots(1, 1, figsize = (12, 6))

ax.plot(x_full[:600], y_full[:600], 'o-', label="Training")
ax.plot(x_full[600:], y_full[600:], 'o-', label="Testing")

ax.set_xlabel("Time [hr]", fontsize=14)
ax.set_ylabel("Cloud Fraction $f_c$", fontsize=14)

ax.tick_params(axis='both', which='major', labelsize=14)
<Figure size 3600x1800 with 1 Axes>

At this point, it is difficult to say if there is any oscillation in the timeseries (I’d be guessing). There is too much noise, and the general trend makes it difficult to see if there is an oscillatory motion.

I have a feeling if we tested the timeseries of cloud cover found in other studies, we might be able to find periodic motions in the cloud size distribution. That would be an interesting study, and I have already done so for BOMEX, GCSSARM, as well as CGILS. Both mass flux and cloud cover demonstrate periodic changes that can be identified using GP regression (more on this later).

GP Smoothing

Let’s define our GP model. The prior distribution is not given here, and if I need to, I’d just set it to a uniform distribution (e.g. a uniform probability between 1 and 180 minutes). There is a bit of trial and error involved here, since a good hyperparameter depends on the data and how much smoothing you want to apply.

Here we use RBF + periodic kernel. We can stick to the RBF kernel, which works well for uniform timeseries, but a periodic kernel keeps the estimate somewhat manageable. Depending on the sampling location, sometimes we are stuck in the middle of the fast transition, and GP interprets it as a quick rise (or fall) in the slope of the cloud size distribution. Because we know this generally does not happen so quickly, I added the periodic kernel.

class SmoothGPModel(gpytorch.models.ExactGP):
    def __init__(self, x_tr, y_tr, likelihood):
        super(SmoothGPModel, self).__init__(x_tr, y_tr, likelihood)
        
        self.mean_module = gpytorch.means.ConstantMean()
        
        self.cov_module = gpytorch.kernels.AdditiveKernel(
            gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel()),
            gpytorch.kernels.ScaleKernel(gpytorch.kernels.PeriodicKernel()),
        )

    def forward(self, x):
        x_mean = self.mean_module(x)
        x_cov = self.cov_module(x)
        
        return gpytorch.distributions.MultivariateNormal(x_mean, x_cov)
# initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihood(
    noise_constraint = gpytorch.constraints.Positive(),
    noise_prior = gpytorch.priors.NormalPrior(1e-3, 1e-1),
)
model = SmoothGPModel(x_tr, y_tr, likelihood)

# Initialize kernels
model.cov_module.kernels[0].base_kernel.initialize(
    lengthscale = torch.tensor(50.)
)

model.cov_module.kernels[1].base_kernel.initialize(
    lengthscale = torch.tensor(50.),
    period_length = torch.tensor(95.),
)

model.double()
SmoothGPModel( (likelihood): GaussianLikelihood( (noise_covar): HomoskedasticNoise( (noise_prior): NormalPrior() (raw_noise_constraint): Positive() ) ) (mean_module): ConstantMean() (cov_module): AdditiveKernel( (kernels): ModuleList( (0): ScaleKernel( (base_kernel): RBFKernel( (raw_lengthscale_constraint): Positive() ) (raw_outputscale_constraint): Positive() ) (1): ScaleKernel( (base_kernel): PeriodicKernel( (raw_lengthscale_constraint): Positive() (raw_period_length_constraint): Positive() ) (raw_outputscale_constraint): Positive() ) ) ) )
# Find optimal model hyperparameters
if device.type == 'cuda':
    model = model.to(device)
    likelihood = likelihood.to(device)
    
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam([{'params': model.parameters()},], lr=1e-1)

mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

max_iter = 250
losses = []

p_bar = tqdm(range(max_iter))
for i in p_bar:
    optimizer.zero_grad()
    output = model(x_tr)
    loss = -mll(output, y_tr)
    losses += [loss.item()]
    loss.backward()

    p_status = f"{i + 1:3d}/{max_iter:3d} Loss: {loss.item():.3f}"
    p_bar.set_description(p_status)
    
    optimizer.step()
Loading...
# Model parameters
for param_name, param in model.named_parameters():
    print(f"{param_name:42} \t {param.item()}")
likelihood.noise_covar.raw_noise           	 -12.219345103858062
mean_module.raw_constant                   	 0.21342715339007723
cov_module.kernels.0.raw_outputscale       	 2.9800742737821473
cov_module.kernels.0.base_kernel.raw_lengthscale 	 30.709179086390435
cov_module.kernels.1.raw_outputscale       	 5.100203046950103
cov_module.kernels.1.base_kernel.raw_lengthscale 	 28.847522721217562
cov_module.kernels.1.base_kernel.raw_period_length 	 91.3306575618754

The learning rate looks good. It may be a bit too fast (only 200 epochs), but a slower learning rate does not improve the result.

fig, ax = plt.subplots(1, 1, figsize=(12, 6))
ax.plot(np.arange(len(losses)), losses)
<Figure size 3600x1800 with 1 Axes>
dt = 120

# Make predictions by feeding model through likelihood
x_test = torch.arange(x_tr.size(dim=0) + dt, dtype=torch.double)
if device.type == 'cuda':
    x_test = x_test.to(device)

model.eval()
likelihood.eval()

with torch.no_grad():
    observed = likelihood(model(x_test))
    y_mean = observed.mean
    lower, upper = observed.confidence_region()
# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# Test points are regularly spaced along [0,1]
# Make predictions by feeding model through likelihood
with torch.no_grad():
    x_test = torch.arange(len(x_tr), dtype=torch.double)
    observed = likelihood(model(x_test))

with torch.no_grad():
    # Initialize plot
    fig, ax = plt.subplots(1, 1, figsize=(12, 6))

    lower, upper = observed.confidence_region()

    ax.plot(x_tr.numpy(), y_tr.numpy(), 'o-')
    ax.plot(x_test.numpy(), observed.mean.numpy())
    ax.fill_between(x_test.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)

    ax.set_xlabel("Time [hr]")
    ax.set_ylabel("Cloud Fraction")

    ax.legend(['Observed Data', 'Mean', 'Confidence'])
/Users/lorenoh/.local/opt/mamba/envs/stats/lib/python3.10/site-packages/gpytorch/models/exact_gp.py:284: GPInputWarning: The input matches the stored training data. Did you forget to call model.train()?
  warnings.warn(
<Figure size 3600x1800 with 1 Axes>

It may look like I am being too aggressive with the smoothing, but that’s because of the long-term trend in the timeseries and the uncertainty is higher because of it. Of course, I want to remain faithful to the (albeit noisy) data. Reducing the length scales for the two kernels definitely help achieve a much better fit, but it is easy to overfit the dataset while not improving the quality of the regression (and especially the periodicity estimation). Overfitting is definitely more harmful than noise.

The hyperparameters used aren’t meant to be accurate, as they are driven by noisy data. The goal here is to smooth out the noise and local variations to obtain a more general trend.

At times, depending on the data, GP regression will pick up a long-term trend (e.g. over 2-3 hours). In that case, we will have to repeat the regression until no periodic motions can be observed. For the slope of cloud size distribution, fortunately for me, it is not needed.

fig, ax = plt.subplots(1, 1, figsize=(12, 6))
ax.plot(x_test.numpy(), np.gradient(observed.mean.numpy()), '*-')

y_tr = torch.tensor(np.gradient(observed.mean), dtype=torch.double)
x_tr = torch.tensor(np.arange(len(y_tr)), dtype=torch.double)
<Figure size 3600x1800 with 1 Axes>

What do we do with the smooth regression? I used an RBF kernel because it is infinitely differentiable, which is useful. There are a lot of different ways to make a timeseries suitable for regression, but here I find a simply differentiation very powerful in isolating the periodic motion in the timeseries. This also makes the average roughly zero, across, which means that this periodic motion is independent of the general, long-term downward trend seen above. This has an added benefit of making the GP regression much easier, since the only parameter we need to worry about at this point is the periodicity.

GP Regression with Periodic Kernel

Let’s re-define the model. Now, we will only use a single periodic kernel with relatively smaller noise. The lengthscale parameter does not seem to matter too much, and I can use a 90-minute periodicity as a prior based on the results from the previous GP regression.

class PeriodicGPModel(gpytorch.models.ExactGP):
    def __init__(self, x_tr, y_tr, likelihood):
        super(PeriodicGPModel, self).__init__(x_tr, y_tr, likelihood)
        
        self.mean_module = gpytorch.means.ZeroMean()
        
        self.cov_module = gpytorch.kernels.ScaleKernel(
            base_kernel = gpytorch.kernels.PeriodicKernel()
        )

    def forward(self, x):
        x_mean = self.mean_module(x)
        x_cov = self.cov_module(x)

        return gpytorch.distributions.MultivariateNormal(x_mean, x_cov)
# Initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihood(
    noise_constraint = gpytorch.constraints.Positive(),
    noise_prior = gpytorch.priors.UniformPrior(0, 1),
)
model = PeriodicGPModel(x_tr, y_tr, likelihood)

model.cov_module.base_kernel.initialize(
    lengthscale = torch.tensor(50.),
    period_length = torch.tensor(90.),
)

model.double()
PeriodicGPModel( (likelihood): GaussianLikelihood( (noise_covar): HomoskedasticNoise( (noise_prior): UniformPrior(low: 0.0, high: 1.0) (raw_noise_constraint): Positive() ) ) (mean_module): ZeroMean() (cov_module): ScaleKernel( (base_kernel): PeriodicKernel( (raw_lengthscale_constraint): Positive() (raw_period_length_constraint): Positive() ) (raw_outputscale_constraint): Positive() ) )
# Model parameters
for param_name, param in model.named_parameters():
    print(f"{param_name:48} \t {param.item()}")
    
print(f"\n {model.cov_module.base_kernel}")
likelihood.noise_covar.raw_noise                 	 0.0
cov_module.raw_outputscale                       	 0.0
cov_module.base_kernel.raw_lengthscale           	 50.0
cov_module.base_kernel.raw_period_length         	 90.0

 PeriodicKernel(
  (raw_lengthscale_constraint): Positive()
  (raw_period_length_constraint): Positive()
)
# Find optimal model hyperparameters
if device.type == 'cuda':
    model = model.to(device)
    likelihood = likelihood.to(device)

model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=1e-1)

mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

max_iter = 200
losses = []

p_bar = tqdm(range(max_iter))
for i in p_bar:
    optimizer.zero_grad()
    output = model(x_tr)
    loss = -mll(output, y_tr)
    losses += [loss.item()]
    loss.backward()

    period = model.cov_module.base_kernel.raw_period_length.item()

    p_status = f"{i + 1:3d}/{max_iter:3d} Loss: {loss.item():.3f} Period: {period:.3f}"
    p_bar.set_description(p_status)
    
    optimizer.step()
Loading...
# Model parameters
for param_name, param in model.named_parameters():
    print(f"{param_name:42} \t {param.item()}")
likelihood.noise_covar.raw_noise           	 -13.048218942243954
cov_module.raw_outputscale                 	 -9.666979147993407
cov_module.base_kernel.raw_lengthscale     	 54.83489100779897
cov_module.base_kernel.raw_period_length   	 94.58361411570908
fig, ax = plt.subplots(1, 1, figsize=(12, 6))
ax.plot(np.arange(len(losses)), losses)
<Figure size 3600x1800 with 1 Axes>
dt = 120

# Make predictions by feeding model through likelihood
x_test = torch.arange(x_tr.size(dim=0) + dt, dtype=torch.double)
if device.type == 'cuda':
    x_test = x_test.to(device)

model.eval()
likelihood.eval()

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed = likelihood(model(x_test))
    y_mean = observed.mean
    lower, upper = observed.confidence_region()
if device.type == 'cuda':
    model = model.cpu()
    likelihood = likelihood.cpu()

    x_tr = x_tr.cpu()
    y_tr = y_tr.cpu()

    x_test = x_test.cpu()
    y_mean = y_mean.cpu()
    lower = lower.cpu()
    upper = upper.cpu()

# Initialize plot
fig, ax = plt.subplots(1, 1, figsize=(12, 6))

ax.plot(x_tr / 60, y_tr, 'o-')
ax.plot(x_test / 60, y_mean)
ax.fill_between(x_test / 60, lower, upper, alpha=0.5)

ax.legend(['Observed Data', 'Mean', 'Confidence'])

ax.set_xlabel("Time [Hr]", fontsize=14)
ax.set_ylabel("$\partial b / \partial t$", fontsize=14)

ax.tick_params(axis='both', which='major', labelsize=14)
<Figure size 3600x1800 with 1 Axes>

The mean posterier distribution seem to match well with the smooth timeseries. The mean posterier distribution has a single period (92 minutes), which means that the actual data shifts slightly between oscillations. Previously using scikit-learn’s GP model, I ended up using a multi-periodic kernel (e.g. 45 minutes and 75 minute periods). How can we be sure that it is indeed local variation, not an underlying oscillation?

One thing we can do is to use the fact that we know the timeseries oscillates mainly every 92 minutes. So, I can simply segment the smooth timeseries (not the posterier distribution, of course) at every 92 minutes. If there is an underlying oscillation, we will see that in the following figure.

# Redefine training domain
x_tr_rep = np.mod(x_tr, 93)

# Plot
fig, ax = plt.subplots(1, 1, figsize = (12, 6))
ax.plot(x_tr_rep, y_tr, '*')
<Figure size 3600x1800 with 1 Axes>

And now we can be sure that there is no underlying oscillation here.

But how do we show this works? It needs a bit of work, unfortunately. Since the original timeseries is noisy and non-uniform, we need to get rid of the general trend. I run a simple Lowess (which calculates moving average/regression) to make sure that the timeseries is (semi-)stable. Then the timeseries is scaled using the min-max normalization.

The same process is done for the GP posterier distribution, except for local regression because we only used a periodic kernel, which has no general trend (just a simple oscillating motion). Comparing the two should give us a good idea about how well the GP regression works for periodicity detection.

y_pr = y_mean.numpy()
y_pr = np.cumsum(y_pr - np.mean(y_pr, axis=0))

y_sl = y_full[:720] - sl.lowess(y_full[:720], x_full[:720], return_sorted=False, frac=0.3)

x_c = np.arange(len(y_full)) / 60
x_t = np.arange(len(x_test)) / 60


# Plot
fig, ax = plt.subplots(1, 1, figsize=(12, 6), dpi=300)

ax.plot(x_c,
    skl_prep.minmax_scale(y_sl[:, None], axis=0),
    "o-",
    label="Timeseries")
ax.plot(
    x_c,
    skl_prep.minmax_scale(y_pr[:, None], axis=0),
    "-",
    lw=2,
    label="93-minute Oscillation",
)

ax.axvspan(0, 9, alpha=0.2, color="C0", label="Traning Set")
ax.axvspan(9, 12, alpha=0.2, color="C1", label="Testing Set")

ax.legend(loc=0, ncol=2, frameon=True, fontsize=15)

ax.set_ylim([-0.1, 1.2])

ax.set_xlabel("Time [hr]", fontsize=16)
ax.set_ylabel("Normalized Cloud Fraction $f_c$", fontsize=16)

ax.tick_params(axis="both", which="major", labelsize=16)
<Figure size 3600x1800 with 1 Axes>

That looks great.

References
  1. Oh, G., & Austin, P. H. (2024). Quantifying the Oscillatory Evolution of Simulated Boundary-Layer Cloud Fields Using Gaussian Process Regression. 10.5194/egusphere-2024-352