Data module¶

class pynews.data.NewsDataset(path, vocab_size=3000)¶

NewsDataset class of different newspaper. The class heritates from torch.utils.data.DataSet class.

save_vecorizer(filename='vectorizer.pickle')¶

Save the text_vectorizer method used to transform the dataset into Bag of Words tensors. This file is saved with a binary pickle format.

Parameters:	filename (str, optional) – Name of the text vectorizer file. The default is “vectorizer.pickle”.
Returns:
Return type:	None.

tokenizer(sample)¶

Tokenize a text and extract only the words.

Parameters:	sample (list) – Text sample to tokenize.
Returns:	sample – Tokenized text, containing only words.
Return type:	list

Model module¶

class pynews.model.NewsModel(*layers_size)¶

Newspaper Model.

forward(inputs)¶

Predict the outputs from the given inputs. The ReLU function is used from the inputs to the last hidden layer, then a linear function from the last hidden layer to the outputs.

Parameters:	inputs (torch tensor) – Inputs tensor.
Returns:	outputs – Predicted tensor.
Return type:	torch tensor

Trainer module¶

class pynews.trainer.Trainer(model, train_loader, model_name='model')¶

Train a model with a dataset.

run(criterion, optimizer, epochs, lr)¶

Method to run the trainer. Use this function to train your model with your data.

Parameters:	criterion (torch.nn.modules.loss) – The loss function. optimizer (torch.optim) – Method to optimize the model. epochs (int) – Number of iterations. lr (float) – Learning rate.
Returns:	train_losses – List of size epochs, containing
Return type:	list

Eval module¶

pynews.eval.analyze_confusion_matrix(confusion_matrix)¶

Analyse a confusion matrix by printing the True Positive (TP), False Positive (FP), … and the specificity and sensitivity as well for each classes.

Parameters:	confusion_matrix (torch tensor of size (number_of classes, number_of_classes)) – The confusion matrix computed on the test data.
Returns:
Return type:	None.

pynews.eval.eval_func(batched_data, model)¶

Evaluate the model on the test data.

Parameters:

batched_data (torch DataLoader) – The loaded test data.
model (torch Model) – The PyTorch model to evaluate.

Returns:

accuracy (float) – Global accuracy of the model.
predicted (torch tensor) – Predicted output.
gold_label (torch tensor) – Truth output.
confusion_matrix (torch tensor) – Confusion matrix.