Data module¶
-
class
pynews.data.
NewsDataset
(path, vocab_size=3000)¶ NewsDataset class of different newspaper. The class heritates from torch.utils.data.DataSet class.
-
save_vecorizer
(filename='vectorizer.pickle')¶ Save the text_vectorizer method used to transform the dataset into Bag of Words tensors. This file is saved with a binary pickle format.
Parameters: filename (str, optional) – Name of the text vectorizer file. The default is “vectorizer.pickle”. Returns: Return type: None.
-
tokenizer
(sample)¶ Tokenize a text and extract only the words.
Parameters: sample (list) – Text sample to tokenize. Returns: sample – Tokenized text, containing only words. Return type: list
-
Model module¶
-
class
pynews.model.
NewsModel
(*layers_size)¶ Newspaper Model.
-
forward
(inputs)¶ Predict the outputs from the given inputs. The ReLU function is used from the inputs to the last hidden layer, then a linear function from the last hidden layer to the outputs.
Parameters: inputs (torch tensor) – Inputs tensor. Returns: outputs – Predicted tensor. Return type: torch tensor
-
Trainer module¶
-
class
pynews.trainer.
Trainer
(model, train_loader, model_name='model')¶ Train a model with a dataset.
-
run
(criterion, optimizer, epochs, lr)¶ Method to run the trainer. Use this function to train your model with your data.
Parameters: - criterion (torch.nn.modules.loss) – The loss function.
- optimizer (torch.optim) – Method to optimize the model.
- epochs (int) – Number of iterations.
- lr (float) – Learning rate.
Returns: train_losses – List of size epochs, containing
Return type: list
-
Eval module¶
-
pynews.eval.
analyze_confusion_matrix
(confusion_matrix)¶ Analyse a confusion matrix by printing the True Positive (TP), False Positive (FP), … and the specificity and sensitivity as well for each classes.
Parameters: confusion_matrix (torch tensor of size (number_of classes, number_of_classes)) – The confusion matrix computed on the test data. Returns: Return type: None.
-
pynews.eval.
eval_func
(batched_data, model)¶ Evaluate the model on the test data.
Parameters: - batched_data (torch DataLoader) – The loaded test data.
- model (torch Model) – The PyTorch model to evaluate.
Returns: - accuracy (float) – Global accuracy of the model.
- predicted (torch tensor) – Predicted output.
- gold_label (torch tensor) – Truth output.
- confusion_matrix (torch tensor) – Confusion matrix.