This is another straightforward example of deeplearning in Keras. Let’s summarize what is going on here:

  • IMDB reviews are bits or text consisting of words (duh!) but all these words are converted to numbers. In addition, the number of a word corresponds to its importance in terms of occurrence. So, a small numbers means it occurs more often than a large number.
  • reviews have different lengths, so one truncates everything to the same length (here 5000)
  • part of the reviews are taken for testing (here 33% of the whole)
  • only the 500 most important words are considered
  • the input is hence a vector of size 5000 which is a bit large and sparse so an embedding is applied which maps these vectors to a vector of size 32. Word embeddings is a topic of its own and you should have a look at the Glove site for instance.
  • the one-dimensional convolution is well explained in this article
  • the max-pooling is picking up the maximum out of a neighborhood and is a way to reduce the dimension
  • flattening a matrix is simply putting the rows of the matrix one after another
  • the dense layers are the usual deeplearning layers where the weights are trained. Note that we want a yes/no classification and the last dense layer is dimension one.
  • the whole networks has a low accuracy but is trained in a minute, which is altogether not too bad

The beauty of Keras resides really in the ease to add/change things. It runs on top of Theano and TensorFlow so you get really a delicious framework with all you can wish for.