Prediction Alignment

Align model predictions to account for coordinate changes from variants.

Overview

Indels and structural variants change sequence length, making direct prediction comparison invalid. align_predictions_by_coordinate() inserts NaN bins to maintain genomic coordinate correspondence.

Function Signature

align_predictions_by_coordinate(
    ref_pred,               # Reference predictions (array/tensor)
    alt_pred,               # Alternate predictions (array/tensor)
    metadata,               # Variant metadata dict
    prediction_type,        # "1D" or "2D"
    bin_size,               # int: Model bin size
    crop_length,            # int: Model crop length (REQUIRED)
    diag_offset=0,          # int: Diagonal offset for 2D (default 0)
    matrix_size=None        # int: Required for 2D predictions
) -> tuple[array/tensor, array/tensor]

Parameters

Parameter

Type

Default

Description

ref_pred

array/tensor

required

Reference predictions

alt_pred

array/tensor

required

Alternate predictions

metadata

dict

required

Variant metadata (from get_alt_ref_sequences())

prediction_type

str

required

"1D" or "2D"

bin_size

int

required

Model binning resolution (bp per prediction)

crop_length

int

required

Bases cropped from each edge by model

diag_offset

int

0

Diagonal bins masked (2D only)

matrix_size

int

None

Square matrix size (required for 2D)

Alignment Strategies by Variant Type

Variant

Strategy

SNV

No alignment needed (same length)

INS

NaN bins in reference where insertion occurs

DEL

NaN bins in alternate where deletion occurs

DUP

Same as INS (duplication adds sequence)

INV

1D: mask inverted bins; 2D: cross-pattern masking (rows + columns)

BND

Chimeric reference comparison

:::{tip} For details on how variants are classified into these types, see the Variant Classification Flow Chart. :::

Model Architecture

Binning: Models predict at lower resolution. Example: bin_size=8 means 1 prediction per 8 bp.

Edge Cropping: Models may crop edges. Example: crop_length=10 removes 10 bp from each end.

Diagonal Masking (2D only): Contact maps often mask diagonal. Example: diag_offset=2 masks 2 bins from diagonal.

Examples

Basic 1D Alignment

import supremo_lite as sl

# Generate sequences
results = list(sl.get_alt_ref_sequences(
    reference_fn=reference,
    variants_fn=variants,
    seq_len=200
))
alt_seqs, ref_seqs, metadata = results[0]

# Run 1D model
from supremo_lite.mock_models import TestModel
model = TestModel(n_targets=2, bin_size=8, crop_length=10)
ref_preds = model(ref_seqs)
alt_preds = model(alt_seqs)

# Align predictions for first variant
ref_aligned, alt_aligned = sl.align_predictions_by_coordinate(
    ref_pred=ref_preds[0],      # Shape: (2, 22) [2 targets, 22 bins]
    alt_pred=alt_preds[0],
    metadata=metadata[0],
    prediction_type="1D",
    bin_size=8,
    crop_length=10
)

2D Contact Map Alignment

from supremo_lite.mock_models import TestModel2D

model_2d = TestModel2D(n_targets=1, bin_size=8, crop_length=10, diag_offset=2)
ref_preds_2d = model_2d(ref_seqs)
alt_preds_2d = model_2d(alt_seqs)

matrix_size = (200 - 2*10) // 8  # (seq_len - 2*crop_length) // bin_size

ref_aligned_2d, alt_aligned_2d = sl.align_predictions_by_coordinate(
    ref_pred=ref_preds_2d[0, 0],
    alt_pred=alt_preds_2d[0, 0],
    metadata=metadata[0],
    prediction_type="2D",
    bin_size=8,
    crop_length=10,
    diag_offset=2,
    matrix_size=matrix_size
)

Interpreting Results

NaN values indicate regions affected by indels, diagonal bins (2D), or cross-pattern masking (inversions).

Use nan-aware functions:

import numpy as np
diff = alt_aligned - ref_aligned
mean_effect = np.nanmean(np.abs(diff))

Troubleshooting

Shape mismatch: Check bin_size, crop_length, matrix_size (2D), and that metadata matches predictions.

All NaN output: Incorrect crop_length or bin_size, or variant outside window.

Unexpected masking: Inversions show cross-pattern (2D), indels show NaN stripes.

See Also