Estimate a steady-state molecular concentration field — estimate_concentration

Solves the screened Poisson (diffusion-clearance) PDE over a spatial mask to produce a continuous concentration field from discrete transcript or protein detection coordinates:

Usage

estimate_concentration_field(
  mask,
  transcript_coords,
  D = 1,
  lambda = 0.1,
  production_rate = 1,
  diffusion_length = NULL,
  weight_col = NULL,
  method = "fd",
  grid_resolution = 256L,
  boundary_condition = "dirichlet",
  fd_solver = "direct",
  sor_omega = 1.7,
  sor_tol = 1e-06,
  sor_max_iter = 1500L,
  green_r_min_factor = 0.5,
  kde_bandwidth = NULL,
  include_external = FALSE,
  normalize = FALSE,
  log_transform = FALSE,
  clip_negative = TRUE,
  n_cores = 1L,
  return_sources = TRUE,
  plot = FALSE,
  verbose = TRUE
)

Arguments

mask

An sfc or sf polygon object defining the tissue boundary. Typically the output of TissueMask::fit_spatial_mask(). Multi-polygon objects (holes, islands) are supported.

transcript_coords

A data.frame or matrix with columns x and y (or first two columns used if names differ) giving transcript or detection coordinates in the same coordinate system as mask.

D

Diffusion coefficient (arbitrary spatial units squared per time). Controls the amplitude of the field ($C \propto 1/D$); the spatial pattern depends only on diffusion_length. Default 1.0.

lambda

First-order clearance rate. Must be > 0 unless diffusion_length is supplied. Default 0.1.

production_rate

Scalar multiplier applied to all source weights. If weight_col is NULL each transcript contributes production_rate units of source strength. Default 1.0.

diffusion_length

Optional. If supplied, overrides D and lambda and sets $L$ directly. Useful for sweeping the spatial scale without changing amplitude.

weight_col

Optional character string naming a column in transcript_coords to use as per-transcript weights (e.g. "umi"). Weights are multiplied by production_rate.

method

Solver. One of:

"fd": Finite-difference sparse linear system (default). Supports exact Dirichlet or Neumann boundary conditions. Recommended for most use cases.
"green": FFT convolution with the analytic 2-D Green's function $K_0(\kappa r)$. Ignores domain boundary; fast for parameter sweeps.
"kde": Gaussian kernel density estimate. Phenomenological baseline; no physical boundary conditions.

grid_resolution

Integer. Number of grid cells per axis (the grid is grid_resolution x grid_resolution). Default 256L.

boundary_condition

"dirichlet" (concentration zero at boundary) or "neumann" (zero normal flux; molecules reflected). Only applies when method = "fd". Default "dirichlet".

fd_solver

"direct" (LU factorisation via Matrix::solve()) or "iterative" (red-black SOR). Direct is exact and fast for grid_resolution <= 400; iterative avoids the dense LU at very high resolution. Only applies when method = "fd". Default "direct".

sor_omega

SOR relaxation parameter (0 < omega < 2). Default 1.7.

sor_tol

Convergence tolerance for SOR. Default 1e-6.

sor_max_iter

Maximum SOR iterations. Default 1500L.

green_r_min_factor

Regularisation for the Green's function singularity at r = 0; sets r_min = green_r_min_factor * h. Default 0.5.

kde_bandwidth

Bandwidth for the KDE smoother. Defaults to L when method = "kde".

include_external

Logical. If TRUE, transcripts outside mask are included as sources. Default FALSE.

normalize

Logical. If TRUE, rescales the field to [0, 1] after solving. Default FALSE.

log_transform

Logical. If TRUE, applies log1p() to the field after solving (and after clip_negative). Default FALSE.

clip_negative

Logical. If TRUE, sets small negative values (solver artefacts) to zero. Default TRUE.

n_cores

Integer. Number of cores for mask rasterisation. Parallel rasterisation is available on Unix-like systems only. Default 1L.

return_sources

Logical. Include the binned source density matrix in the return value. Default TRUE.

plot

Logical. If TRUE and ggplot2 is installed, print a quick field plot. Default FALSE.

verbose

Logical. Print progress messages. Default TRUE.

Value

A named list with elements:

field: N x N numeric matrix; NA outside the mask.
x, y: Grid centre coordinates (length N each).
hx, hy: Grid cell width and height.
mask: N x N logical matrix (TRUE = inside mask).
sources: N x N binned source density (or NULL when return_sources = FALSE).
params: List of all resolved input parameters.
diagnostics: Timing and cell/transcript counts.

Details

$$D \nabla^2 C(\mathbf{x}) - \lambda C(\mathbf{x}) + s(\mathbf{x}) = 0$$

The key derived quantity is the diffusion length $L = \sqrt{D/\lambda}$, which sets the characteristic decay distance from each source. Three solvers are available: "fd" (finite-difference, recommended), "green" (Green's function via FFT), and "kde" (Gaussian kernel density, phenomenological).

Examples

library(sf)
#> Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
set.seed(1)
sq   <- st_polygon(list(cbind(c(0,10,10,0,0), c(0,0,10,10,0))))
mask <- st_sfc(sq)
tc   <- data.frame(x = runif(80, 1, 9), y = runif(80, 1, 9))
res  <- estimate_concentration_field(mask, tc, D = 1, lambda = 0.2,
                                     grid_resolution = 64L, verbose = FALSE)
str(res$field)
#>  num [1:64, 1:64] NA NA NA NA NA NA NA NA NA NA ...