Models

Segmentation Models

class DeepLabV3[source]

Bases: BaseModel

DeepLabV3 model from torchvision with custom number of classes.

__init__(num_classes, pretrained=True, backbone='resnet50', aux_loss=True, dropout_p=0.1, **kwargs)[source]
Parameters:
  • num_classes (int) – Number of classes

  • pretrained (bool) – Whether to use pretrained backbone

  • backbone (str) – Backbone network (resnet50 or resnet101)

  • aux_loss (bool) – Whether to use auxiliary loss

  • dropout_p (float) – Dropout probability (0.0 means no dropout)

forward(x)[source]

Forward pass.

Parameters:

x (Tensor)

Return type:

Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:
Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:

x (Tensor)

Return type:

Tensor

class DeepLabV3Plus[source]

Bases: BaseModel

DeepLabV3+ architecture for semantic segmentation.

__init__(num_classes, backbone='resnet50', pretrained=True, output_stride=16, dropout_p=0.1, **kwargs)[source]
Parameters:
  • num_classes (int) – Number of output classes

  • backbone (str) – Backbone network (‘resnet50’ or ‘resnet101’)

  • pretrained (bool) – Whether to use pretrained backbone

  • output_stride (int) – Output stride of the encoder (16 or 8)

  • dropout_p (float)

forward(x)[source]

Forward pass.

Parameters:

x (Tensor)

Return type:

Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:
Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:

x (Tensor)

Return type:

Tensor

class UNet[source]

Bases: BaseModel

U-Net architecture for semantic segmentation.

__init__(num_classes, in_channels=3, features=64, bilinear=True, dropout_p=0.0)[source]
Parameters:
  • num_classes (int) – Number of output classes

  • in_channels (int) – Number of input channels (3 for RGB)

  • features (int) – Number of features in first layer (doubles in each down step)

  • bilinear (bool) – Whether to use bilinear upsampling or transposed convolutions

  • dropout_p (float) – Dropout probability (0.0 means no dropout)

forward(x)[source]

Forward pass.

Parameters:

x (Tensor) – Input tensor of shape (N, C, H, W)

Returns:

Dictionary containing output logits under key ‘out’

Return type:

Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:
  • predictions (Dict[str, Tensor]) – Dictionary containing model outputs

  • target (Tensor) – Ground truth segmentation masks

Returns:

Dictionary containing the loss value under key ‘seg_loss’

Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:

x (Tensor)

Return type:

Tensor

UNet

class UNet[source]

Bases: BaseModel

U-Net architecture for semantic segmentation.

__init__(num_classes, in_channels=3, features=64, bilinear=True, dropout_p=0.0)[source]
Parameters:
  • num_classes (int) – Number of output classes

  • in_channels (int) – Number of input channels (3 for RGB)

  • features (int) – Number of features in first layer (doubles in each down step)

  • bilinear (bool) – Whether to use bilinear upsampling or transposed convolutions

  • dropout_p (float) – Dropout probability (0.0 means no dropout)

forward(x)[source]

Forward pass.

Parameters:

x (Tensor) – Input tensor of shape (N, C, H, W)

Returns:

Dictionary containing output logits under key ‘out’

Return type:

Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:
  • predictions (Dict[str, Tensor]) – Dictionary containing model outputs

  • target (Tensor) – Ground truth segmentation masks

Returns:

Dictionary containing the loss value under key ‘seg_loss’

Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:

x (Tensor)

Return type:

Tensor

DeepLabV3

class DeepLabV3[source]

Bases: BaseModel

DeepLabV3 model from torchvision with custom number of classes.

__init__(num_classes, pretrained=True, backbone='resnet50', aux_loss=True, dropout_p=0.1, **kwargs)[source]
Parameters:
  • num_classes (int) – Number of classes

  • pretrained (bool) – Whether to use pretrained backbone

  • backbone (str) – Backbone network (resnet50 or resnet101)

  • aux_loss (bool) – Whether to use auxiliary loss

  • dropout_p (float) – Dropout probability (0.0 means no dropout)

forward(x)[source]

Forward pass.

Parameters:

x (Tensor)

Return type:

Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:
Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:

x (Tensor)

Return type:

Tensor

DeepLabV3+

class DeepLabV3Plus[source]

Bases: BaseModel

DeepLabV3+ architecture for semantic segmentation.

__init__(num_classes, backbone='resnet50', pretrained=True, output_stride=16, dropout_p=0.1, **kwargs)[source]
Parameters:
  • num_classes (int) – Number of output classes

  • backbone (str) – Backbone network (‘resnet50’ or ‘resnet101’)

  • pretrained (bool) – Whether to use pretrained backbone

  • output_stride (int) – Output stride of the encoder (16 or 8)

  • dropout_p (float)

forward(x)[source]

Forward pass.

Parameters:

x (Tensor)

Return type:

Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:
Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:

x (Tensor)

Return type:

Tensor