Adversarial Video Generation, machine and deep learning

2019-2020 tavasz

Nincs megadva

Téma leírása

In this laboratory work, we learn one of the meta-learning problems that are in the video frame
prediction experiment that explores “Deep multi-scale video prediction beyond
mean square error” paper. This paper's algorithm used Multi-scale network and
generative adversarial training as well as introducing different loss functions which is
to evaluate our results.

In 1956, AI's fundamental idea came from The Dartmouth Artificial Intelligence
Summer Research Project. Since then, many AI-related publications and research have been
introduced. For the last 10 years, the AI ​​sector has been flourished by new products and
research since not only can computers be improved but also the condition of
processing and preserving big data become more enhanced.
In 2011, Microsoft introduced a deep learning technology that is a commercial speech recognition
product and after that in 2012, Stanford AI lab members introduced
invented software that identifies the object with almost twice the accuracy of the other
competitor.Since then, many types of AI-based models have been introduced and it has been
implemented to the tool we use every day now. Another important innovation is
video frame prediction, in another word it's almost future action prediction based on
trained video data.
Actually, understanding object recognition as well as future action prediction are
fundamental problems in computer vision and also predicting the outcome of physical
reaction is a critical challenge for device and application, eg robots, autonomous cars,
and drones. However, it becomes possible to use an artificial intelligence model that uses
unlabeled raw video data to learn to predict physical action. Learning to Predict
Physical Movements Have More Challenges Since Physical Interactions Tend to
to be complex and stochastic, and learning from raw video demand to handling the high
dimensionality of image pixels and partial observability of object motion from
videos. There are several video frame prediction research materials that use
a different neural network that is computational models that work similarly to the
functioning of a human nervous system. We use several kinds of artificial neural
networks that have different function aims. These types of networks are implemented
based on mathematical operations and a set of parameters required to determine
the output.
Our used research paper model combined with 2 different neural network architecture
which is Multi-scale network and generative adversarial training which is adopted from
“Generative adversarial networks. NIPS 2014 ”.
Our studied paper approaching a convolutional network with a rectified linear unit (ReLU)
which is most commonly used to examine visual imagery but
has several kinds of weaknesses. However, our studied paper solved this kind of
disadvantage of combining multisc ale networks.

Convolutional network: consist of one or more convolutional layers and train and test
each input image data pass-through this convolution layers with filters (kernels) and
pooling as well as fully connected layers and also softmax function that classifying an
The first layer in a convolutional network is always a convolutional layer and its input
image is recognized like array of pixel values by computer and in this layer, features
were extracted using kernel matrix (filter) and the depth of the filter has to be same
as the depth of the input. Due to limited size of kernels, convolutions only can process
short-range dependencies. However, in examining model this issue solved using multiscale
Multi-Scale network: this kind of network makes a series of predictions based on input.
Adversarial training: generative models via an adversarial process, simultaneously
train two models: model G that captures the data distribution, and a discriminative
model D that estimates the probability that a sample came from the training data rather
than G. The training procedure for G is to maximize the probability of D making a
mistake [6].
Image Gradient difference loss (GDL): Another strategy to sharpen the image
prediction is to directly penalize the differences of image gradient predictions in the
generative loss function.
Results and Analysis
Done examination using the main dataset which is used in [1] and also tested using new
data set [3].
For the test and train, took 100 videos samples for each train and test from the Atari game
dataset and each video is consisting of 1200-2500 frames (160x210). And we trained
our model 20k samples and it tested each 5k steps.
Input frames

Tested on Atari games 2 datasets but there isn’t enough movement because of trained
frames accuracy, in other words, object movement consists of too many frames so we
cannot see enough movement prediction for generated one frame. Therefore, we also
tried to test 10 frame prediction.

[2] Deep Multi-Scale Video Prediction Beyond Mean Square
Error by Michael Mathieu, Camille Couprie, Yann LeCun
[3] https: // github. com / yobibyte / atarigrandchallenge.git






  • gépi tanulás, machine learning, deep learning, GANs

Maximális létszám: 3 fő