Member-only story
MultiChoice Question Answering In HuggingFace
Unveiling the power of question answering
Natural language processing techniques are demonstrating immense capability on question answering (QA) tasks. In this post, we leverage the HuggingFace library to tackle a multiple choice question answering challenge.
Specifically, we fine-tune a pre-trained BERT model on a multi-choice question dataset using the Trainer API. This allows adapting the powerful bidirectional representations from pre-trained BERT to our target task. By adding a classification head, the model learns textual patterns that help determine the correct choice out of a set of answer options per question. We then evaluate performance using accuracy across the held-out test set.
The Transformer framework allows quickly experimenting with different model architectures, tokenizer options, and training approaches. In this analysis, we demonstrate a step by step recipe for achieving competitive performance on multiple choice QA through HuggingFace Transformers.
First step: Install and Import libraries
The first step is to install and import the libraries. To install the libraries use pip install
command as following:
!pip install datasets transformers[torch] --quiet
and then import the necessary libraries:
import numpy as np
import pandas as pd
import os
import json
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from transformers.modeling_outputs import SequenceClassifierOutput
from transformers import (
AutoTokenizer,
Trainer,
TrainingArguments,
set_seed,
DataCollatorWithPadding,
DefaultDataCollator
)
from datasets import load_dataset, load_metric
from dataclasses import dataclass, field
from typing import Optional, Union
Second step: Load the dataset
In the second step, we load the train and test dataset. We use codah dataset which is available for commercial use and is licensed by “odc-by”[1]
from datasets import load_dataset
codah = load_dataset("codah", "codah")