Training and evaluating neural networks flexibly and transparently

Priyansi & Victor & Taras



Priyansi @PriyansiCS Undergrad working on revamping PyTorch-Ignite's docs and managing the community
Victor @vfdev-5Software Engineer at Quansight working on AI-related open source projects
Taras @trsvchnOpenSource Enthusiast with MS degree in Biology


  1. What is PyTorch ?
  2. What is PyTorch-Ignite ?
  3. Quick-start PyTorch-Ignite example
  4. Convert PyTorch to PyTorch+Ignite
  5. About the PyTorch-Ignite project

PyTorch in a nutshell

import torch
import torch.nn as nn

device = "cuda"

class MyNN(nn.Module):
    def __init__(self):
        super(MyNN, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.Linear(512, 512),
            nn.Linear(512, 10)

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = MyNN().to(device)
  • tensor manipulations (device: CPUs, GPUs, TPUs)
  • NN components, optimizers, loss functions
  • Distributed computations
  • Profiling
  • other cool features …
  • Domain libraries: vision, text, audio
  • Rich ecosystem

Quick-start ML with PyTorch

Computer Vision example with Fashion MNIST

Problem: 1 - how to classify images ?

model(image) -> predicted label

2 - How measure model performances ?

predicted labels vs correct labels

Quick-start ML with PyTorch

  • Setup training and testing data
from import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose

# Setup training/test data
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, transform=ToTensor())

batch_size = 64

# Create data loaders
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

# Optionally, for debugging:
for X, y in test_dataloader:
    print("Shape of X [N, C, H, W]: ", X.shape)
    print("Shape of y: ", y.shape, y.dtype)

# Output:
# Shape of X [N, C, H, W]:  torch.Size([64, 1, 28, 28])
# Shape of y:  torch.Size([64]) torch.int64

Quick-start ML with PyTorch

  • Create a model
import torch
from torch import nn

device = "cuda" if torch.cuda.is_available() else "cpu"

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.Linear(512, 512),
            nn.Linear(512, 10)

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)

Quick-start ML with PyTorch

  • Model training
    • Loss function: cross-entropy
    • Optimization with Stochastic Gradient Descent
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

def train(dataloader, model, loss_fn, optimizer):
    for X, y in dataloader:
        X, y =,
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation

def test(dataloader, model, loss_fn):
    # code to compute and print average loss and accuracy

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)

Why using PyTorch without Ignite is suboptimal ?

For NN training and evaluation:

  • PyTorch gives only “low”-level building components
  • Common bricks to code in any user project:
    • metrics
    • checkpointing, best model saving, early stopping, …
    • logging to experiment tracking systems
    • code adaptation for device (e.g. GPU, XLA)
  • Pure PyTorch code

model = Net()
train_loader, val_loader = get_data_loaders(train_batch_size, val_batch_size)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.8)
criterion = torch.nn.NLLLoss()

max_epochs = 10
validate_every = 100
checkpoint_every = 100

def validate(model, val_loader):
    model = model.eval()
    num_correct = 0
    num_examples = 0
    for batch in val_loader:
        input, target = batch
        output = model(input)
        correct = torch.eq(torch.round(output).type(target.type()), target).view(-1)
        num_correct += torch.sum(correct).item()
        num_examples += correct.shape[0]
    return num_correct / num_examples

def checkpoint(model, optimizer, checkpoint_dir):
    # ...

def save_best_model(model, current_accuracy, best_accuracy):
    # ...

iteration = 0
best_accuracy = 0.0

for epoch in range(max_epochs):
    for batch in train_loader:
        model = model.train()
        input, target = batch
        output = model(input)
        loss = criterion(output, target)

        if iteration % validate_every == 0:
            binary_accuracy = validate(model, val_loader)
            print("After {} iterations, binary accuracy = {:.2f}"
                  .format(iteration, binary_accuracy))
            save_best_model(model, binary_accuracy, best_accuracy)

        if iteration % checkpoint_every == 0:
            checkpoint(model, optimizer, checkpoint_dir)
        iteration += 1

PyTorch-Ignite: what and why? πŸ€”

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

def train_step(engine, batch):
  #  ... any training logic ...
  return batch_loss

trainer = Engine(train_step)

# Compose your pipeline ..., max_epochs=100)

metrics = {
  "precision": Precision(),
  "recall": Recall()

evaluator = create_supervised_evaluator(

def run_evaluation():

handler = ModelCheckpoint(
  '/tmp/models', 'checkpoint'
  {'model': model}

Key concepts in a nutshell

PyTorch-Ignite is about:

  1. Engine and Event System
  2. Out-of-the-box metrics to easily evaluate models
  3. Built-in handlers to compose training pipeline
  4. Distributed Training support

What makes PyTorch-Ignite unique ?

  • Composable and interoperable components
  • Simple and understandable code
  • Open-source community involvement

How PyTorch-Ignite makes user’s live easier ?

With PyTorch-Ignite:

  • Less code than pure PyTorch while ensuring maximum control and simplicity
  • Easily get more refactored and structured code
  • Extensible API for metrics, experiment managers, and other components
  • Same code for non-distributed and distributed configs

Quick-Start Example πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Let’s train a MNIST classifier with PyTorch-Ignite!

⬇️ Installation ⬇️

Install PyTorch and TorchVision

$ pip install torch torchvision

Install PyTorch-Ignite

via pip πŸ“¦

$ pip install pytorch-ignite

or conda 🐍

$ conda install ignite -c pytorch

πŸ“¦ Imports πŸ“¦

import torch
from torch import nn
from import DataLoader
from torchvision.datasets import MNIST
from torchvision.models import resnet18
from torchvision.transforms import Compose, Normalize, ToTensor

from ignite.engine import Events, create_supervised_trainer, create_supervised_evaluator
from ignite.metrics import Accuracy, Loss
from ignite.handlers import ModelCheckpoint
from ignite.contrib.handlers import TensorboardLogger

Start with a PyTorch code

Set up the dataflow, define a model (adapted ResNet18), a loss and an optimizer.
data_transform = Compose([ToTensor(), Normalize((0.1307,), (0.3081,))])

train_dataset = MNIST(download=True, root=".", transform=data_transform, train=True)
val_dataset = MNIST(download=True, root=".", transform=data_transform, train=False)

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=256, shuffle=False)

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.model = resnet18(num_classes=10)
        self.model.conv1 = nn.Conv2d(1, 64, kernel_size=3, padding=1, bias=False)

    def forward(self, x):
        return self.model(x)

device = "cuda"
model = Net().to(device)

optimizer = torch.optim.RMSprop(model.parameters(), lr=0.005)
criterion = nn.CrossEntropyLoss()

It’s time for PyTorch-Ignite! πŸ”₯

trainer = create_supervised_trainer(model, optimizer, criterion, device)

val_metrics = {
    "accuracy": Accuracy(),
    "loss": Loss(criterion)

evaluator = create_supervised_evaluator(model, metrics=val_metrics, device=device)
  • trainer engine to train the model
  • evaluator engine to compute metrics on validation set + save the best models

Add handlers for logging the progress

def log_training_loss(engine):
    print(f"Epoch[{engine.state.epoch}], Iter[{engine.state.iteration}] Loss: {engine.state.output:.2f}")

def log_validation_results(trainer):
    metrics = evaluator.state.metrics
    print(f"Validation Results - Epoch[{trainer.state.epoch}] "
          f"Avg accuracy: {metrics['accuracy']:.2f} "
          f"Avg loss: {metrics['loss']:.2f}")

Add ModelCheckpoint handler with accuracy as a score function

model_checkpoint = ModelCheckpoint(
    score_function=lambda e: e.state.metrics["accuracy"],

evaluator.add_event_handler(Events.COMPLETED, model_checkpoint, {"model": model})

Add Tensorboard Logger

tb_logger = TensorboardLogger(log_dir="tb-logger")

    output_transform=lambda loss: {"batch_loss": loss},


πŸš€Liftoff!πŸš€, max_epochs=5)
Epoch[1], Iter[100] Loss: 0.19
Epoch[1], Iter[200] Loss: 0.13
Epoch[1], Iter[300] Loss: 0.08
Epoch[1], Iter[400] Loss: 0.11
Training Results - Epoch[1] Avg accuracy: 0.97 Avg loss: 0.09
Validation Results - Epoch[1] Avg accuracy: 0.97 Avg loss: 0.08
Epoch[5], Iter[1900] Loss: 0.02
Epoch[5], Iter[2000] Loss: 0.11
Epoch[5], Iter[2100] Loss: 0.05
Epoch[5], Iter[2200] Loss: 0.02
Epoch[5], Iter[2300] Loss: 0.01
Training Results - Epoch[5] Avg accuracy: 0.99 Avg loss: 0.02
Validation Results - Epoch[5] Avg accuracy: 0.99 Avg loss: 0.03

Inspect results in Tensorboard

Complete code

PyTorch-Ignite Code-Generator

  • What is Code-Generator?: web app to quickly produce quick-start python code for common training tasks in deep learning.

  • Why to use Code-Generator?: start working on a task without rewriting everything from scratch.

Any questions before we go on ?

The Big Picture

Engine and Event System

  • Engine

    • Loops on user data
    • Applies an arbitrary user function on batches
  • Event system

    • Customizable event collections
    • Triggers handlers attached to events
In its simpliest form:
while epoch < max_epochs:

    for batch in data:
        output = train_step(batch)


Simplified training and validation loop

No more coding for/while loops on epochs and iterations. Users instantiate engines and run them.

from ignite.engine import Engine, Events, create_supervised_evaluator
from ignite.metrics import Accuracy

# Setup training engine:
def train_step(engine, batch):
    # Users can do whatever they need on a single iteration
    # Eg. forward/backward pass for any number of models, optimizers, etc.
    # ...

trainer = Engine(train_step)

# Setup single model evaluation engine
evaluator = create_supervised_evaluator(model, metrics={"accuracy": Accuracy()})

def validation():
    state =
    # print computed metrics
    print(trainer.state.epoch, state.metrics)

# Run model's validation at the end of each epoch
trainer.add_event_handler(Events.EPOCH_COMPLETED, validation)

# Start the training, max_epochs=100)

Power of Events & Handlers πŸš€

1. Execute any number of functions whenever you wish

Handlers can be any function: e.g. lambda, simple function, class method, etc.

trainer.add_event_handler(Events.STARTED, lambda _: print("Start training"))

# attach handler with args, kwargs
mydata = [1, 2, 3, 4]
logger = ...

def on_training_ended(data):
    print(f"Training is ended. mydata={data}")
    # User can use variables from another scope"Training is ended")

trainer.add_event_handler(Events.COMPLETED, on_training_ended, mydata)
# call any number of functions on a single event
trainer.add_event_handler(Events.COMPLETED, lambda engine: print(engine.state.times))

def log_something(engine):

Power of Events & Handlers

2. Built-in events filtering and stacking

# run the validation every 5 epochs
def run_validation():
    # run validation

@trainer.on(Events.COMPLETED | Events.EPOCH_COMPLETED(every=10))
def run_another_validation():
    # ...

# change some training variable once on 20th epoch
def change_training_variable():
    # ...

# Trigger handler with customly defined frequency
def log_gradients():
    # ...

Power of Events & Handlers

3. Custom events to go beyond standard events

from ignite.engine import EventEnum

# Define custom events
class BackpropEvents(EventEnum):
    BACKWARD_STARTED = 'backward_started'
    BACKWARD_COMPLETED = 'backward_completed'
    OPTIM_STEP_COMPLETED = 'optim_step_completed'

def train_step(engine, batch):
    # ...
    loss = criterion(y_pred, y)
    # ...

trainer = Engine(train_step)

def function_before_backprop(engine):
    # ...

Out-of-the-box metrics πŸ“ˆ

50+ distributed ready out-of-the-box metrics to easily evaluate models.

  • Dedicated to many Deep Learning tasks
  • Easily composable to assemble a custom metric
  • Easily extendable to create custom metrics
precision = Precision(average=False)
recall = Recall(average=False)
F1_per_class = (precision * recall * 2 / (precision + recall))
F1_mean = F1_per_class.mean()  # torch mean method
F1_mean.attach(engine, "F1")

Built-in Handlers

  • Logging to experiment tracking systems
  • Checkpointing,
  • Early stopping
  • Profiling
  • Parameter scheduling
  • etc.
# model checkpoint handler
checkpoint = ModelCheckpoint('/tmp/ckpts', 'training')
trainer.add_event_handler(Events.EPOCH_COMPLETED(every=2), handler, {'model': model})

# early stopping handler
def score_function(engine):
    val_loss = engine.state.metrics['acc']
    return val_loss
es = EarlyStopping(3, score_function, trainer)
evaluator.add_event_handler(Events.COMPLETED, handler)

# Piecewise linear parameter scheduler
scheduler = PiecewiseLinear(optimizer, 'lr', [(10, 0.5), (20, 0.45), (21, 0.3), (30, 0.1), (40, 0.1)])
trainer.add_event_handler(Events.ITERATION_STARTED, scheduler)

# TensorBoard logger: batch loss, metrics
tb_logger = TensorboardLogger(log_dir="tb-logger")
    trainer, event_name=Events.ITERATION_COMPLETED(every=100), tag="training",
    output_transform=lambda loss: {"batch_loss": loss},

    evaluator, event_name=Events.EPOCH_COMPLETED,
    tag="training", metric_names="all",

Distributed Training support

Run the same code across all supported backends seamlessly

  • Backends from native torch distributed configuration: nccl, gloo, mpi
  • Horovod framework with gloo or nccl communication backend
  • XLA on TPUs via pytorch/xla
import ignite.distributed as idist

def training(local_rank, *args, **kwargs):
    dataloder_train = idist.auto_dataloder(dataset, ...)

    model = ...
    model = idist.auto_model(model)

    optimizer = ...
    optimizer = idist.auto_optimizer(optimizer)

backend = 'nccl'  # or 'gloo', 'horovod', 'xla-tpu' or None
with idist.Parallel(backend) as parallel:

Distributed Training support

Distributed launchers

Handle distributed launchers with the same code

  • torch.multiprocessing.spawn
  • torch.distributed.launch
  • horovodrun
  • slurm

Distributed Training support

Unified Distributed API

  • High-level helper methods

    • idist.auto_model()
    • idist.auto_optim()
    • idist.auto_dataloader()
  • Collective operations

    • all_reduce, all_gather, and more

Any questions before we go on ?

πŸ”₯ Convert PyTorch to Ignite ❀️‍πŸ”₯

How to translate pure PyTorch code to PyTorch+Ignite

Native PyTorch β†’ PyTorch + Ignite

Native PyTorch β†’ PyTorch + Ignite

Native PyTorch β†’ PyTorch + Ignite

Native PyTorch β†’ PyTorch + Ignite

Native PyTorch β†’ PyTorch + Ignite

Native PyTorch β†’ PyTorch + Ignite

Native PyTorch β†’ PyTorch + Ignite

Native PyTorch β†’ PyTorch + Ignite

Native PyTorch β†’ PyTorch + Ignite

Native PyTorch β†’ PyTorch + Ignite

Any questions before we go on ?

About “PyTorch-Ignite” project

Community-driven open source and NumFOCUS Affiliated Project

maintained by volunteers in the PyTorch community:

@vfdev-5, @ydcjeff, @KickItLikeShika, @sdesrozis, @alykhantejani, @anmolsjoshi,
@trsvchn, @Moh-Yakoub, ..., @fco-dv, @gucifer, @Priyansi, ...

o1 o2 o3

With the support of:

Projects using PyTorch-Ignite

More details here:

Community Engagement

  • Google Summer of Code 2021

    • Mentored two great students (Ahmed and Arpan)
  • Google Season of Docs 2021

    • Working with great tech writer (Priyansi)
  • Hacktoberfest 2020 and 2021

  • PyData Global Mentored Sprint 2020 and coming up 2021 (End of October)

  • Our new website development

  • PyTorch-Ignite Code-Generator project

  • Public meetings on Discord, open to everyone

Stay tuned for upcoming events …

Hacktoberfest 2021

The repositories participating:


How it works:

  • Read our project’s Contributing Guidelines
  • Check out and pick an issue from the list of “Help-wanted” issues
    • Comment out that you would like to tackle the issue
  • Any questions on the issue => reach out to us on GH or Discord for more guidance

Join the PyTorch-Ignite Community

We are looking for motivated contributors to help out with the project.

o1 Everyone is welcome to contribute o2


How to start:

Thanks for your attention !



Follow us on

and check out our new website: