LSTM Variational AutoEncoder (LSTM-Sequence-VAE)
A PyTorch Implementation of Generating Sentences from a Continuous Space by Bowman et al. 2015.
Table of Contents
Introduction
This is a PyTorch Implementation of Generating Sentences from a Continuous Space by Bowman et al. 2015. where LSTM based VAE is trained on Penn Tree Bank dataset.
Setup
The code is using pipenv
as a virtual environment and package manager. To run the code, all you need is to install the necessary dependencies. open the terminal and type:
git clone https://github.com/Khamies/Sequence-VAE.git
cd Sequence-VAE
pipenv install
And you should be ready to go to play with code and build upon it!
Run the code
-
To train the model, run:
python main.py
-
To train the model with specific arguments, run:
python main.py --batch_size=64
. The following command-line arguments are available:- Batch size:
--batch_size
- bptt:
--bptt
- Learning rate:
--lr
- Embedding size:
--embed_size
- Hidden size:
--hidden_size
- Latent size:
--latent_size
- Batch size:
Training
The model is trained on 30 epochs using Adam as an optimizer with a learning rate 0.001. Here are the results from training the LSTM-VAE model:
-
KL Loss
-
Reconstruction loss
-
KL loss vs Reconstruction loss
-
ELBO loss
Inference
1. Sample Generation
Here are generated samples from the model. We randomly sampled two latent codes z from standard Gaussian distributions, and specify “like” as the start of the sentence (sos), then we feed them to the decoder. The following are the generated sentences:
-
like other countries such as alex powers a former wang marketer
-
**like design and artists have been
by how many **
2. Interpolation
The “President” word has been used as the start of the sentences. We randomly generated two sentences and interpolated between them.
- Sentence 1: President bush veto power changes meant to be a great number
- Sentence 2: President bush veto power opposed to the president of the house
*bush veto power opposed to the president of the house
bush veto power opposed to the president of the house.
bush veto power opposed to the president of the house.
bush veto power opposed to the president of the house.
bush veto power opposed to the president of the house.
bush veto power opposed to the president of the house.
bush veto power opposed to the president of the house.
bush veto power opposed to the president of the house.
bush veto power opposed to the president ' s council.
bush veto power opposed to the president ' s council.
bush veto power opposed to the president ' s council.
bush veto power opposed to the president ' s council.
bush veto power opposed to the president ' s council.
bush veto power that kind of <unk> of natural gas.
bush veto power changes to keep the <unk> and that.
bush veto power changes to keep the <unk> and that.
bush veto power changes that is in a telephone to.
bush veto power changes that is in a telephone to.
bush veto power changes meant to be a great number.
*bush veto power changes meant to be a great number
Play with the model
To play with the model, a jupyter notebook has been provided, you can find it here
Citation
@misc{Khamies2021SequenceVAE,
author = {Khamies, Waleed},
title = {PyTorch Implementation of Generating Sentences from a Continuous Space by Bowman et al. 2015},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Khamies/Sequence-VAE}},
}
Acknowledgement
- This work has been inspired from Sentence-VAE , where their data prepossessing pipeline is used.