Data module

class pynews.data.NewsDataset(path, vocab_size=3000)

NewsDataset class of different newspaper. The class heritates from torch.utils.data.DataSet class.

save_vecorizer(filename='vectorizer.pickle')

Save the text_vectorizer method used to transform the dataset into Bag of Words tensors. This file is saved with a binary pickle format.

Parameters:filename (str, optional) – Name of the text vectorizer file. The default is “vectorizer.pickle”.
Returns:
Return type:None.
tokenizer(sample)

Tokenize a text and extract only the words.

Parameters:sample (list) – Text sample to tokenize.
Returns:sample – Tokenized text, containing only words.
Return type:list

Model module

class pynews.model.NewsModel(*layers_size)

Newspaper Model.

forward(inputs)

Predict the outputs from the given inputs. The ReLU function is used from the inputs to the last hidden layer, then a linear function from the last hidden layer to the outputs.

Parameters:inputs (torch tensor) – Inputs tensor.
Returns:outputs – Predicted tensor.
Return type:torch tensor

Trainer module

class pynews.trainer.Trainer(model, train_loader, model_name='model')

Train a model with a dataset.

run(criterion, optimizer, epochs, lr)

Method to run the trainer. Use this function to train your model with your data.

Parameters:
  • criterion (torch.nn.modules.loss) – The loss function.
  • optimizer (torch.optim) – Method to optimize the model.
  • epochs (int) – Number of iterations.
  • lr (float) – Learning rate.
Returns:

train_losses – List of size epochs, containing

Return type:

list

Eval module

pynews.eval.analyze_confusion_matrix(confusion_matrix)

Analyse a confusion matrix by printing the True Positive (TP), False Positive (FP), … and the specificity and sensitivity as well for each classes.

Parameters:confusion_matrix (torch tensor of size (number_of classes, number_of_classes)) – The confusion matrix computed on the test data.
Returns:
Return type:None.
pynews.eval.eval_func(batched_data, model)

Evaluate the model on the test data.

Parameters:
  • batched_data (torch DataLoader) – The loaded test data.
  • model (torch Model) – The PyTorch model to evaluate.
Returns:

  • accuracy (float) – Global accuracy of the model.
  • predicted (torch tensor) – Predicted output.
  • gold_label (torch tensor) – Truth output.
  • confusion_matrix (torch tensor) – Confusion matrix.