In this problem set we also present some bonus problems that will be considered as in Problem Set 1.
In this task, you will need to solve the problem of semantic segmentation of the attributes of a person's face. You will work with CelebAMask-HQ dataset. You are free to solve and implement required steps on your own. However, there are several constraints and advices:
according to reported mean Dice score on the provided test split.
In the deep prior framework, we aim to learn a prior distribution over the parameters $\theta$ of a generative model such that, given some observation $\hat{y}$, we can sample a plausible output $y$ from the learned prior distribution. The goal is to learn a posterior distribution $p(z,\theta|\hat{y})$ over the latent variable $z$ and the parameters $\theta$ such that, given some observation $\hat{y}$ (e.g., an image of a face), we can sample a plausible latent variable $z$ and generate a corresponding output image $y= f_\theta(z)$ that is similar to the observation. The conditional distribution of $y$ given $\hat{y}$ can be written as an integral over $z$ and $\theta$:
$$p(y|\hat{y}) = \int p(y|z,\theta,\hat{y})p(z,\theta|\hat{y})dzd\theta$$Unfortunately, in practice, it is difficult to compute this integral exactly, since we don't know the conditional distribution $p(y|z,\theta,\hat{y})$ and the posterior distribution $p(z,\theta|\hat{y})$. Instead, we can use an optimization approach to learn the prior distribution $p(z,\theta)$ and the network $f_\theta(z)$ that best approximate the true conditional distribution.
Specifically, we first assume that the latent variable $z$ is independent of the observation $\hat{y}$, i.e., $p(z,\theta|\hat{y}) = p(\theta|\hat{y})p(z)$. This allows us to rewrite the conditional distribution of $y$ as:
$$p(y|\hat{y}) = \int p(y|z,\theta,\hat{y})p(\theta|\hat{y})p(z)dzd\theta$$Next, we use an optimization approach to learn the parameters $\theta$ of the network to minimize the expected distance between the generated output $y=f_\theta(z)$ and the observation $\hat{y}$:
$$\min_\theta E(f_\theta(z),\hat{y})$$To perform this optimization, we can apply a stochastic gradient descent approach where we sample random values of $z$ from the prior distribution $p(z)$ and compute the gradients of the loss function with respect to $\theta$ using backpropagation. This way, we can iteratively update the parameters of the network to improve its ability to generate realistic outputs given the observed data.
This approach is used to perform denoising, inpainting and super resolution. The basic architecture for the task is UNet, which we will use in our experiments.
import torch
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import torchvision
from torchvision import transforms
import gdown
from tqdm import tqdm
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
from torchmetrics import TotalVariation as TV
class Encoder_Block(torch.nn.Module):
def __init__(self,inp_channels,out_channels):
super().__init__()
self.model = torch.nn.Sequential(
torch.nn.Conv2d(inp_channels,out_channels,kernel_size=3,padding=1),
torch.nn.BatchNorm2d(out_channels),
torch.nn.ReLU(),
torch.nn.Conv2d(out_channels,out_channels,kernel_size=3,padding=1),
torch.nn.BatchNorm2d(out_channels),
torch.nn.ReLU(),
)
self.downsample = torch.nn.MaxPool2d(2)
def forward(self,x):
int_out = self.model(x)
return self.downsample(int_out), int_out
class Decoder_Block(torch.nn.Module):
def __init__(self,inp_channels,out_channels):
super().__init__()
self.upsample = torch.nn.ConvTranspose2d(inp_channels,out_channels,kernel_size=2,stride=2)
self.model = torch.nn.Sequential(
torch.nn.Conv2d(inp_channels,out_channels,kernel_size=3,padding=1),
torch.nn.BatchNorm2d(out_channels),
torch.nn.ReLU(),
torch.nn.Conv2d(out_channels,out_channels,kernel_size=3,padding=1),
torch.nn.BatchNorm2d(out_channels),
torch.nn.ReLU(),
)
def forward(self,x,enc_x):
x = self.upsample(x)
x = torch.cat([x,enc_x],dim=1)
return self.model(x)
class Unet(torch.nn.Module):
def __init__(self,inc,outc,hidden_size=16):
super().__init__()
self.Encoder = torch.nn.ModuleList([
Encoder_Block(inc,hidden_size),
Encoder_Block(hidden_size,hidden_size*2),
Encoder_Block(hidden_size*2,hidden_size*4),
Encoder_Block(hidden_size*4,hidden_size*8),
])
self.bottleneck = torch.nn.Sequential(
torch.nn.Conv2d(hidden_size*8,hidden_size*16,kernel_size=1),
torch.nn.BatchNorm2d(hidden_size*16),
torch.nn.ReLU(),
torch.nn.Conv2d(hidden_size*16,hidden_size*16,kernel_size=1),
torch.nn.BatchNorm2d(hidden_size*16),
torch.nn.ReLU()
)
self.Decoder = torch.nn.ModuleList([
Decoder_Block(hidden_size*16,hidden_size*8),
Decoder_Block(hidden_size*8,hidden_size*4),
Decoder_Block(hidden_size*4,hidden_size*2),
Decoder_Block(hidden_size*2,hidden_size*1),
])
self.last_layer = torch.nn.Sequential(
torch.nn.Conv2d(hidden_size,outc,kernel_size=3,padding="same"),
torch.nn.Sigmoid()
)
def forward(self,x):
enc_xs = []
for module in self.Encoder:
x, enc_x= module(x)
enc_xs.append(enc_x)
enc_xs = enc_xs[::-1]
x = self.bottleneck(x)
for i,module in enumerate(self.Decoder):
x = module(x,enc_xs[i])
return self.last_layer(x)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
sharing_link = "https://drive.google.com/file/d/1QMZ9_XdFRfj-arUvW_hlG5Mw8vjzTLsU/view?usp=share_link"
gdown.download(url=sharing_link, output="./data.zip", quiet=False, fuzzy=True)
!unzip ./data.zip
transform = transforms.Compose([
transforms.ToTensor(),
])
img = transform(Image.open("./data/denoising/F16_GT.png"))[None].to(device)
noise_strength = 0.1
corrupted_img = (img + torch.randn_like(img)*noise_strength).clamp(0,1)
transforms.ToPILImage()(torchvision.utils.make_grid(torch.cat([corrupted_img,img],dim=0),nrow=2,normalize=True))
Task: Implement an optimization function that takes a model, input image, corrupted image, original image, and the number of iterations as arguments.
Steps:
We have provided baseline values for hyperparameters in this and future tasks, but feel free to make adjustments if needed.
def optimization(model,z,corrupted_img,orig_img,iters,criterion=torch.nn.MSELoss(),reg_noise=0.01):
# your code is here
# your code is here
One solution to prevent the recurrence of past issues is by terminating the training process at an optimal point. However, determining the ideal number of iterations can be a challenging task. Therefore, your objective is to:
# your code is here, find out appropriate number of iterations
def optimization_modified(model,z,noised_img,orig_img,iters,criterion=torch.nn.MSELoss()):
#duplicate your previous code and stop optimization according to stopping criteria
Another way to solve the problem of increasing the loss function over iterations is to adopt optimization procedure. In the explanations above we have already mentioned that the goal of deep prior is to evaluate the integral:
$$p(y|\hat{y}) = \int p(y|z,\theta,\hat{y})p(\theta|\hat{y})p(z)dzd\theta$$Instead of directly solving the integral, we can use the Markov chain Monte Carlo (MCMC) method to estimate the posterior. This involves generating a sequence of correlated samples from the target distribution, which can converge to the true posterior during iterations. However, the MCMC approach can be slow and inefficient, and hence, using gradient-based optimization with noise can be a more effective solution. This involves minimizing the objective function and introducing noise to the gradient updates:
$$\theta_{i+1} = \theta_{i} + \mathrm{lr}\cdot\Delta_{i}^{\mathrm{standard}} + s\cdot\mathrm{lr}\cdot\epsilon$$where $\Delta_{i}^{\mathrm{standard}}$ standard optimization update, $\mathrm{lr}$ - learning rate, $\epsilon \sim N(0,1)$, $s$ - strength of added noise.
More details of how and why it works described in paper
def SGLD(model,z,corrupted_img,orig_img,iters,criterion=torch.nn.MSELoss(),reg_noise=0.01):
# your code is here
# your code is here
Deep Image Prior is a versatile tool that can be utilized in solving an important task such as image inpainting. In this task, the challenge lies in the inability to obtain values for the pixels that are damaged and defined by the mask. Thus, during the training process, it is essential to apply the mask to the generated image as these values are unavailable for this particular task. Furthermore, in order to effectively address this issue, you will try a more complex model called AttentionUNet. This model is similar to the UNet model in terms of structure but with the inclusion of attention blocks in the decoder part.
For AttentionUNet implementation you only need to implement Attention layer. Attention should scale hidden output of encoder block which is also an input for the corresponding decoder block in order to draw more "attention" to particular image parts. Below we have placed the pipeline which you should implement:
Input: $skip$ of size [batch,skip_channels,H_skip,W_skip], $x$ of size [batch,x_channels,H_x,W_x]
Output: $skip^*$ of size [batch,skip_channels,H_skip,W_skip]
class Attention(torch.nn.Module):
def __init__(self,skip_channels, x_channels):
super().__init__()
# your code is here
def forward(self,skip,x):
# your code is here
class Decoder_Block_With_Attention(torch.nn.Module):
def __init__(self,inp_channels,out_channels):
super().__init__()
self.upsample = torch.nn.ConvTranspose2d(inp_channels,out_channels,kernel_size=2,stride=2)
self.model = torch.nn.Sequential(
torch.nn.Conv2d(inp_channels,out_channels,kernel_size=3,padding=1),
torch.nn.BatchNorm2d(out_channels),
torch.nn.ReLU(),
torch.nn.Conv2d(out_channels,out_channels,kernel_size=3,padding=1),
torch.nn.BatchNorm2d(out_channels),
torch.nn.ReLU(),
)
self.attention = Attention(out_channels,inp_channels)
def forward(self,x,enc_x):
enc_x = self.attention(enc_x,x)
x = self.upsample(x)
x = torch.cat([x,enc_x],dim=1)
return self.model(x)
class AttentionUnet(torch.nn.Module):
def __init__(self,inc,outc,hidden_size=16):
super().__init__()
self.Encoder = torch.nn.ModuleList([
Encoder_Block(inc,hidden_size),
Encoder_Block(hidden_size,hidden_size*2),
Encoder_Block(hidden_size*2,hidden_size*4),
Encoder_Block(hidden_size*4,hidden_size*8),
])
self.bottleneck = torch.nn.Sequential(
torch.nn.Conv2d(hidden_size*8,hidden_size*16,kernel_size=1),
torch.nn.BatchNorm2d(hidden_size*16),
torch.nn.ReLU(),
torch.nn.Conv2d(hidden_size*16,hidden_size*16,kernel_size=1),
torch.nn.BatchNorm2d(hidden_size*16),
torch.nn.ReLU()
)
self.Decoder = torch.nn.ModuleList([
Decoder_Block_With_Attention(hidden_size*16,hidden_size*8),
Decoder_Block_With_Attention(hidden_size*8,hidden_size*4),
Decoder_Block_With_Attention(hidden_size*4,hidden_size*2),
Decoder_Block_With_Attention(hidden_size*2,hidden_size*1),
])
self.last_layer = torch.nn.Sequential(
torch.nn.Conv2d(hidden_size,outc,kernel_size=3,padding="same"),
torch.nn.Sigmoid()
)
def forward(self,x):
enc_xs = []
for module in self.Encoder:
x, enc_x= module(x)
enc_xs.append(enc_x)
enc_xs = enc_xs[::-1]
x = self.bottleneck(x)
for i,module in enumerate(self.Decoder):
x = module(x,enc_xs[i])
return self.last_layer(x)
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(256),
transforms.ToTensor()
])
img = transform(Image.open("./data/inpainting/library.png"))[None].to(device)
img_mask = transform(Image.open("./data/inpainting/library_mask.png"))[None].to(device)
corrupted_img = img * img_mask
transforms.ToPILImage()(torchvision.utils.make_grid(torch.cat([corrupted_img,img],dim=0),nrow=2,normalize=True))
Bonus task (4 pts) Try to find optimal hyper parameters (strength of added noise $s$, learning rate, reg_noise, number of iterations)
def optimization_inpainting(model,z,corrupted_img,mask,orig_img,iters,criterion=torch.nn.MSELoss(),reg_noise=0.03):
#your code is here
nc = 2
model_without_attention = Unet(nc,3,hidden_size=16).to(device)
z = torch.cat(torch.meshgrid(torch.arange(img.size(2))/img.size(2),torch.arange(img.size(3))/img.size(3))).reshape(1,2,img.size(2),img.size(3)).to(device)
# your code is here
nc = 2
model_with_attention = AttentionUnet(nc,3,hidden_size=16).to(device)
z = torch.cat(torch.meshgrid(torch.arange(img.size(2))/img.size(2),torch.arange(img.size(3))/img.size(3))).reshape(1,2,img.size(2),img.size(3)).to(device)
# your code is here
This problem requires uploading two csv files along with the solution notebook. Please compress these three files in zip archive and upload it in Canvas.
Natural language generation (NLG) is a well-known research problem concerned with generating textual descriptions of structured data, such as tables, as output. Compared to machine translation, where the goal is to completely convert an input sentence into another language, NLG requires overcoming two different challenges: deciding what to say, by selecting a relevant subset of the input data to describe, and deciding how to say it, by generating text that flows and reads naturally.
In this task you will need to generate table descriptions and titles for the dataset that can be downloaded from the link. Your inference pipeline should receive .csv
and output 2 strings: table description text
and table title title
.
As the solution to this task you shoud complete submission.csv
and submission_reranking.csv
files as below and report the link on your finetuned checkpoints.
import pandas as pd
data = pd.read_csv('./train.csv', index_col=0)
data.head(5)
text | title | |
---|---|---|
871923758931292416 | This statistic presents the global revenue of ... | Omnicom Group 's revenue from 2006 to 2019 ( i... |
12713542298181105208 | This statistic shows the number of hotel and s... | Number of hotel and similar accommodation esta... |
5796511258704617257 | In 2019 , just 2.5 percent of all private wage... | Unemployment rate in the U.S. broadcasting ind... |
14629703118053421010 | This statistic displays the benefits of using ... | If a “connected device†? had the following... |
14801098692472737046 | The statistic shows global gross domestic prod... | Global gross domestic product ( GDP ) at curre... |
sample = pd.read_csv('./data/1056174336234335.csv', index_col=0)
sample
Response | Share of respondents | |
---|---|---|
0 | Number of employees have already decreased | 20% |
1 | Number of employees will definitely decrease | 12% |
2 | Number of employees will most probably decrease | 25% |
3 | Number of emplyees will not change | 22% |
4 | Will hire new employees | 21% |
submission = pd.read_csv('./submission.csv', index_col=0)
submission.head(5)
text | title | |
---|---|---|
11686934923934967220 | NaN | NaN |
1615881324134991229 | NaN | NaN |
3501526718627373188 | NaN | NaN |
6452964031584956810 | NaN | NaN |
12416016809428991249 | NaN | NaN |
t5-base
checkpoint (paper). In order to handle 2 types of output, test usage of prefixes for T5 model.The following metrics should be reported:
Using the best checkpoint from above prepare submission file submission.csv
, where index is a table caption from the data
folder, and report the link on your finetuned checkpoint.
# Your code is here
Using maximum likelihood, an ideal model will assign all probability mass to the reference summary. During inference, the model must also generate the output based on possibly erroneous previous steps. This can affect the performance of the model, a phenomenon often called exposure bias. One way to solve this problem is to require our model to be able to accurately predict the ranking order of a set of most likely candidates via an additional contrastive loss term
$$L(x, y) = -LogLikelihood(x, y) + L_{contrastive}(x, y)$$where
$$ L_{contrastive}(x, y) = \sum_i\sum_{j < i}\max(0, f(s_i(x)) - f(s_j(x)) + \alpha_{ij}) $$where $\alpha_{ij} = \alpha \cdot (i - j)$ is a margin, $s_i$ and $s_j$ are different candidates (generated by beam search) such that for selected ranking function $r$ $r(s_j, y) > r(s_i, y)$, and $f(s)$ is a length-normalised estimated log-probability:
$$ f(s) = \frac{\sum_{t} LogProb(s_t| s_{<t}, x)}{|x|}, $$where $|x|$ is a lenght of $x$.
Your task is to fine-tune the model with reranking-aware loss using BERTScore as the ranking function $r$, provide hyperparameter search for the margin scaling factor $\alpha$ using BERTScore as objective, report metrics for the best case (SacreBLEU, ROUGEL, METEOR, BERTScore), and prepare the submission file submission_reranking.csv
and report the link on your finetuned checkpoint.
# Your code is here