The imbalance dataset is the fact that the classes are not represented equally. Which is quite common in practice. For instance, fraud detection, prediction of rare adverse drug reactions, etc. There are two methods of dealing with imbalanced datasets, the first one is Oversampling and the other is Class Weight

### OverSampling

Simply alter the dataset to remove such an imbalance like increasing the number of minority observations until we’ve reached a balanced dataset.

### Class weight

Simply provides a weight for each class that places more emphasis on the minority classes such that the end result is a classifier learns equally from all classes. Incorporating the weights of the classes into the loss function. In this section, I’ll discuss oversampling.

## Sampler

We use something called samplers for OverSampling. Though we did not use samplers exclusively, PyTorch used them for us internally. When we say shuffle=False, PyTorch ended up using SequentialSampler it gives an index from zero to the length of the dataset. When shuffle=True it ends up using a RandomSampler.

### SequentialSampler

Let’s understand what is the SequentialSampler by just calling it right, We’ll create SequentialSampler and get all the indexes that return a bunch of sequences like zero to the maximum dataset size.

### RandomSampler

In terms of RandomSampler, it’s going to give you a number between 0 to the maximum length of the dataset, it’s a random number. It does not repeat the same number again these are the two samplers that we end up using by default.

### WeightedRandomSampler

If you have a class imbalance, use a WeightedSampler, so that you have all classes with equal probability. Give an equal sort of weight to the dataset.

I created a dummy data set with a target imbalance of ratio 8: 2.

```numSample=1000
batch_size=100

sample=torch.FloatTensor(numSample,10)

zero=np.zeros(int(numSample * 0.8),dtype=np.int32)
one=np.ones(int(numSample * 0.2),dtype=np.int32)

target=np.hstack((zero,one))

dataset=sample.numpy()

#split dataset into tran and test set

x_train,x_test,y_train,y_test= train_test_split(dataset,
target,
test_size=0.25,
random_state=42,
stratify=target,
shuffle=True)
```

Now that we have a dataset we’re going to use this WeightedRandomSampler. What we want to do first of all is that creating class weights for each class.

```count=Counter(y_train)

class_count=np.array([count,count])

weight=1./class_count

print(weight)
```

Right now we’re just specifying those class weights, then we are going to create sample weights. We’ll do sample weights of this particular index for a particular sample of our data set we’ll set that equal to the class weight.

```samples_weight = np.array([weight[t] for t in y_train])
samples_weight=torch.from_numpy(samples_weight)
```

It seems that weights should have the same length as your number of samples.

WeightedRandomSampler will sample the elements based on the pass weights. Note that you should provide a weight value for each sample in your Dataset.

```sampler = WeightedRandomSampler(samples_weight, len(samples_weight))
```

Now we created those sample weights then we’ll create our sampler and this is going to be our `WeightedRandomSampler` where we’ll send in the sample weights and the num sample which is going to equal the length of our data set. We can also specify replacement equals True or False. If we set it to false then we’ll only see that example once when we iterate through our entire dataset.

When we’re dealing with an imbalanced dataset and we’re using Oversampling then we always want to use replacement equal True.

By default, the `WeightedRandomSampler` will use `replacement=True`. In this case, the samples that would be in a batch would not necessarily be unique.

```trainDataset = torch.utils.data.TensorDataset(torch.FloatTensor(x_train), torch.LongTensor(y_train.astype(int)))
validDataset = torch.utils.data.TensorDataset(torch.FloatTensor(x_test), torch.LongTensor(y_test.astype(int)))