Models

Segmentation Models

class DeepLabV3[source]

Bases: BaseModel

DeepLabV3 model from torchvision with custom number of classes.

__init__(num_classes, pretrained=True, backbone='resnet50', aux_loss=True, dropout_p=0.1, **kwargs)[source]

Parameters:

num_classes (int) – Number of classes
pretrained (bool) – Whether to use pretrained backbone
backbone (str) – Backbone network (resnet50 or resnet101)
aux_loss (bool) – Whether to use auxiliary loss
dropout_p (float) – Dropout probability (0.0 means no dropout)

forward(x)[source]

Forward pass.

Parameters:: x (Tensor)
Return type:: Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:

predictions (Dict[str, Tensor])
target (Tensor)

Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:: x (Tensor)
Return type:: Tensor

class DeepLabV3Plus[source]

Bases: BaseModel

DeepLabV3+ architecture for semantic segmentation.

__init__(num_classes, backbone='resnet50', pretrained=True, output_stride=16, dropout_p=0.1, **kwargs)[source]

Parameters:

num_classes (int) – Number of output classes
backbone (str) – Backbone network (‘resnet50’ or ‘resnet101’)
pretrained (bool) – Whether to use pretrained backbone
output_stride (int) – Output stride of the encoder (16 or 8)
dropout_p (float)

forward(x)[source]

Forward pass.

Parameters:: x (Tensor)
Return type:: Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:

predictions (Dict[str, Tensor])
target (Tensor)

Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:: x (Tensor)
Return type:: Tensor

class UNet[source]

Bases: BaseModel

U-Net architecture for semantic segmentation.

__init__(num_classes, in_channels=3, features=64, bilinear=True, dropout_p=0.0)[source]

Parameters:

num_classes (int) – Number of output classes
in_channels (int) – Number of input channels (3 for RGB)
features (int) – Number of features in first layer (doubles in each down step)
bilinear (bool) – Whether to use bilinear upsampling or transposed convolutions
dropout_p (float) – Dropout probability (0.0 means no dropout)

forward(x)[source]

Forward pass.

Parameters:: x (Tensor) – Input tensor of shape (N, C, H, W)
Returns:: Dictionary containing output logits under key ‘out’
Return type:: Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:

predictions (Dict[str, Tensor]) – Dictionary containing model outputs
target (Tensor) – Ground truth segmentation masks

Returns:

Dictionary containing the loss value under key ‘seg_loss’

Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:: x (Tensor)
Return type:: Tensor

UNet

class UNet[source]

Bases: BaseModel

U-Net architecture for semantic segmentation.

__init__(num_classes, in_channels=3, features=64, bilinear=True, dropout_p=0.0)[source]

Parameters:

num_classes (int) – Number of output classes
in_channels (int) – Number of input channels (3 for RGB)
features (int) – Number of features in first layer (doubles in each down step)
bilinear (bool) – Whether to use bilinear upsampling or transposed convolutions
dropout_p (float) – Dropout probability (0.0 means no dropout)

forward(x)[source]

Forward pass.

Parameters:: x (Tensor) – Input tensor of shape (N, C, H, W)
Returns:: Dictionary containing output logits under key ‘out’
Return type:: Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:

predictions (Dict[str, Tensor]) – Dictionary containing model outputs
target (Tensor) – Ground truth segmentation masks

Returns:

Dictionary containing the loss value under key ‘seg_loss’

Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:: x (Tensor)
Return type:: Tensor

DeepLabV3

class DeepLabV3[source]

Bases: BaseModel

DeepLabV3 model from torchvision with custom number of classes.

__init__(num_classes, pretrained=True, backbone='resnet50', aux_loss=True, dropout_p=0.1, **kwargs)[source]

Parameters:

num_classes (int) – Number of classes
pretrained (bool) – Whether to use pretrained backbone
backbone (str) – Backbone network (resnet50 or resnet101)
aux_loss (bool) – Whether to use auxiliary loss
dropout_p (float) – Dropout probability (0.0 means no dropout)

forward(x)[source]

Forward pass.

Parameters:: x (Tensor)
Return type:: Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:

predictions (Dict[str, Tensor])
target (Tensor)

Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:: x (Tensor)
Return type:: Tensor

DeepLabV3+

class DeepLabV3Plus[source]

Bases: BaseModel

DeepLabV3+ architecture for semantic segmentation.

__init__(num_classes, backbone='resnet50', pretrained=True, output_stride=16, dropout_p=0.1, **kwargs)[source]

Parameters:

num_classes (int) – Number of output classes
backbone (str) – Backbone network (‘resnet50’ or ‘resnet101’)
pretrained (bool) – Whether to use pretrained backbone
output_stride (int) – Output stride of the encoder (16 or 8)
dropout_p (float)

forward(x)[source]

Forward pass.

Parameters:: x (Tensor)
Return type:: Dict[str, Tensor]

get_loss(predictions, target)[source]

Calculate segmentation loss.

Parameters:

predictions (Dict[str, Tensor])
target (Tensor)

Return type:

Dict[str, Tensor]

predict(x)[source]

Make prediction for inference.

Parameters:: x (Tensor)
Return type:: Tensor