TensorFlow is leading the pack but MXNet has its virtues. Among other things, the gluon standard and the integration with Mathematica is interesting. While it’s great to have a standard it would be even better if all frameworks would join and one would not have to remember tons of subtle differences in API and parameters. As the saying goes, the great thing about standards is that there are so many to choose from.

The code below is an explicit implementation of a linear regression with Gluon. The most annoying things going back and forth between TensorFlow and MXNet is the NDArray namespace as a close twin of Numpy. Things would have been easier if Numpy had been adopted in both frameworks.

Whether MxNet is the right choice for your business is a long chat on its own. Like all pro-contra-discussions on Stackoverflow, things quickly turn into religious arguments.. Personally, I don’t favor any framework but attempt to find solutions (for my customer).

# Linear regression with MXNet import mxnet as mx from mxnet import nd, autograd, gluon import matplotlib.pyplot as plt %matplotlib inline data_ctx = mx.cpu() model_ctx = mx.cpu() num_inputs = 2 num_outputs = 1 num_examples = 1000 def real_fn(X): return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2 X = nd.random_normal(shape=(num_examples, num_inputs)) noise = 0.01 * nd.random_normal(shape=(num_examples,)) y = real_fn(X) + noise batch_size = 4 train_data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, y), batch_size=batch_size, shuffle=True) net = gluon.nn.Dense(1, in_units=2) print(net.weight) print(net.bias) net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=model_ctx) net(nd.array([[0,1]])) square_loss = gluon.loss.L2Loss() trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.0001}) epochs = 150 loss_sequence = [] num_batches = num_examples / batch_size for e in range(epochs): cumulative_loss = 0 # inner loop for i, (data, label) in enumerate(train_data): data = data.as_in_context(model_ctx) label = label.as_in_context(model_ctx) with autograd.record(): output = net(data) loss = square_loss(output, label) loss.backward() trainer.step(batch_size) cumulative_loss += nd.mean(loss).asscalar() print("Epoch %s, loss: %s" % (e, cumulative_loss / num_examples)) loss_sequence.append(cumulative_loss) plt.figure(num=None,figsize=(8, 6)) plt.plot(loss_sequence) # Adding some bells and whistles to the plot plt.grid(True, which="both") plt.xlabel('epoch',fontsize=14) plt.ylabel('average loss',fontsize=14) Text(0,0.5,'average loss') params = net.collect_params() # this returns a ParameterDict print('The type of "params" is a ',type(params)) # A ParameterDict is a dictionary of Parameter class objects # therefore, here is how we can read off the parameters from it. for param in params.values(): print(param.name,param.data())