TensorFlow is leading the pack but MXNet has its virtues. Among other things, the gluon standard and the integration with Mathematica is interesting. While it’s great to have a standard it would be even better if all frameworks would join and one would not have to remember tons of subtle differences in API and parameters. As the saying goes, the great thing about standards is that there are so many to choose from.

The code below is an explicit implementation of a linear regression with Gluon. The most annoying things going back and forth between TensorFlow and MXNet is the NDArray namespace as a close twin of Numpy. Things would have been easier if Numpy had been adopted in both frameworks.

Whether MxNet is the right choice for your business is a long chat on its own. Like all pro-contra-discussions on Stackoverflow, things quickly turn into religious arguments.. Personally, I don’t favor any framework but attempt to find solutions (for my customer).


# Linear regression with MXNet

import mxnet as mx
from mxnet import nd, autograd, gluon
import matplotlib.pyplot as plt
%matplotlib inline

data_ctx = mx.cpu()
model_ctx = mx.cpu()

num_inputs = 2
num_outputs = 1
num_examples = 1000

def real_fn(X):
    return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2

X = nd.random_normal(shape=(num_examples, num_inputs))
noise = 0.01 * nd.random_normal(shape=(num_examples,))
y = real_fn(X) + noise

batch_size = 4
train_data =, y), batch_size=batch_size, shuffle=True)

net = gluon.nn.Dense(1, in_units=2)


net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=model_ctx)


square_loss = gluon.loss.L2Loss()

trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.0001})

epochs = 150
loss_sequence = []
num_batches = num_examples / batch_size

for e in range(epochs):
        cumulative_loss = 0
        # inner loop
        for i, (data, label) in enumerate(train_data):
                data = data.as_in_context(model_ctx)
                label = label.as_in_context(model_ctx)
                with autograd.record():
                        output = net(data)
                        loss = square_loss(output, label)
                cumulative_loss += nd.mean(loss).asscalar()
        print("Epoch %s, loss: %s" % (e, cumulative_loss / num_examples))

plt.figure(num=None,figsize=(8, 6))

# Adding some bells and whistles to the plot
plt.grid(True, which="both")
plt.ylabel('average loss',fontsize=14)
Text(0,0.5,'average loss')

params = net.collect_params() # this returns a ParameterDict

print('The type of "params" is a ',type(params))

# A ParameterDict is a dictionary of Parameter class objects
# therefore, here is how we can read off the parameters from it.

for param in params.values():