Posterior Flows for Diffusion Models

diffusion

posterior sampling

guidance

A concise overview of posterior sampling methods for diffusion and flow models

Published

May 31, 2026

The goal of this note is to clarify a few recent approaches to posterior sampling in diffusion and flow models, in particular GLASS Flows [1], Diamond Maps [2], Meta Flow Maps [3], and Stochastic Few-Step Models [4]. All of them are concerned, in one form or another, with sampling from conditional laws of the form p_{s \mid t}(x_s \mid x_t) induced by a diffusion process.

The exposition follows useful discussions with Yazid Janati and Badr Moufad, which helped clarify the relation between these constructions.

The main point is simple. Posterior sampling appears as soon as one wants to do something more structured than unconditional generation: guidance, editing, inverse problems, search, or alignment. In principle, sampling from p_{s \mid t}(\cdot \mid x_t) or p_{0 \mid t}(\cdot \mid x_t) is not hard: one can simply start at x_t and run the corresponding reverse-time stochastic dynamics. The problem is that this remains a long and costly stochastic sampler.

This is the real motivation for posterior flows. The goal is not merely to sample from the posterior somehow, but to replace this expensive stochastic conditional sampler by something more efficient: ideally a deterministic sampler, or at least an ODE that can be run in fewer steps, and ultimately a conditional flow map that can be amortized across many posterior queries.

Once this is recognized, the methods in [1, 2, 3, 4] can be organized around three high-level strategies:

train a conditional posterior model directly
build an auxiliary diffusion whose terminal law is the desired posterior while re-using the original denoiser
directly follow the exact bridge dynamics associated with the original diffusion, and then distill them.

Posterior flow maps are the amortized version of these approaches.

1. Setup

Consider a forward diffusion process (X_t)_{0 \le t \le T} defined by dX_t = b(X_t,t)\,dt + g(t)\,dW_t.

Assume that, for every 0 \le s < t \le T, the transition kernel is Gaussian: q_{t \mid s}(x_t \mid x_s) = \mathcal N(x_t; \alpha_{t \mid s} x_s, \sigma_{t \mid s}^2 I). In particular, q_{t \mid 0}(x_t \mid x_0) = \mathcal N(x_t; \alpha_t x_0, \sigma_t^2 I). We write p_t for the marginal law of X_t.

The usual sampling objective is to start from the simple law p_T and recover p_0 by integrating the reverse-time dynamics dX_t = \left[b(X_t,t) - g(t)^2 \nabla \log p_t(X_t)\right]dt + g(t)\,d\bar W_t, or the associated probability-flow ODE dX_t = \left[b(X_t,t) - \frac12 g(t)^2 \nabla \log p_t(X_t)\right]dt, with the equation solved backward from T to 0.

At this level, the only quantity one needs is the score \nabla \log p_t.

2. One example where posterior sampling appears

In many applications, the real objective is not to sample from p_0, but from a tilted or conditioned version of it. A standard example is inference-time guidance. Given a reward function r_0, define p_0^r(x_0) \propto p_0(x_0)\exp(r_0(x_0)). If h_t(x_t) = \mathbb E\left[\exp(r_0(X_0)) \mid X_t = x_t\right], then the reverse-time dynamics of the tilted model are obtained by a Doob h-transform: dX_t = \left[b(X_t,t) - g(t)^2 \nabla \log p_t(X_t) - g(t)^2 \nabla \log h_t(X_t)\right]dt + g(t)\,d\bar W_t. I discuss this point of view in more detail in my other post on Doob’s h-transform.

So the new difficulty is the additional guidance term \nabla \log h_t(x_t). In the present linear-Gaussian setting, it can be written as a difference of posterior means: \nabla \log h_t(x_t) = \frac{\alpha_t}{\sigma_t^2} \left( \mathbb E_{p^r_{0 \mid t}(\cdot \mid x_t)}[X_0] - \mathbb E_{p_{0 \mid t}(\cdot \mid x_t)}[X_0] \right), where p^r_{0 \mid t}(x_0 \mid x_t) \propto p_{0 \mid t}(x_0 \mid x_t)\exp(r_0(x_0)).

This identity shows one motivation for posterior sampling. Even when the final objective is guidance, one is naturally led to the conditional laws p_{0 \mid t}(\cdot \mid x_t) and, more generally, p_{s \mid t}(\cdot \mid x_t).

At a conceptual level, this already solves the problem: once x_t is fixed, one can sample from the corresponding posterior by running conditional reverse-time dynamics. What is missing is an efficient implementation. If this has to be done by a long stochastic trajectory every time a new x_t is given, then posterior sampling remains expensive. The rest of the post is about ways of turning this basic observation into faster samplers, deterministic samplers, and finally amortized maps.

Computation of the guidance term

Indeed, h_t(x_t) = \int \exp(r_0(x_0))\,p_{0 \mid t}(x_0 \mid x_t)\,dx_0. Differentiating under the integral sign yields \begin{aligned} \nabla h_t(x_t) &= \int \exp(r_0(x_0))\,\nabla p_{0 \mid t}(x_0 \mid x_t)\,dx_0 \\ &= \int \exp(r_0(x_0))\,p_{0 \mid t}(x_0 \mid x_t)\,\nabla \log p_{0 \mid t}(x_0 \mid x_t)\,dx_0. \end{aligned} Moreover, \begin{aligned} \nabla \log p_{0 \mid t}(x_0 \mid x_t) &= \nabla \log q_{t \mid 0}(x_t \mid x_0) - \nabla \log p_t(x_t) \\ &= \left(\frac{\alpha_t}{\sigma_t^2}x_0 - \frac{1}{\sigma_t^2}x_t\right) - \left( \frac{\alpha_t}{\sigma_t^2}\mathbb E_{p_{0 \mid t}(\cdot \mid x_t)}[X_0] - \frac{1}{\sigma_t^2}x_t \right) \\ &= \frac{\alpha_t}{\sigma_t^2} \left( x_0 - \mathbb E_{p_{0 \mid t}(\cdot \mid x_t)}[X_0] \right). \end{aligned} Therefore, \begin{aligned} \nabla h_t(x_t) &= \frac{\alpha_t}{\sigma_t^2} \int \exp(r_0(x_0))\,p_{0 \mid t}(x_0 \mid x_t) \left( x_0 - \mathbb E_{p_{0 \mid t}(\cdot \mid x_t)}[X_0] \right)dx_0 \\ &= h_t(x_t)\frac{\alpha_t}{\sigma_t^2} \left( \mathbb E_{p^r_{0 \mid t}(\cdot \mid x_t)}[X_0] - \mathbb E_{p_{0 \mid t}(\cdot \mid x_t)}[X_0] \right). \end{aligned} Dividing by h_t(x_t) gives the formula above.

3. Three approaches to posterior sampling

The first approach is to train the posterior model directly. The second is to reuse the original model by embedding the posterior into an auxiliary diffusion. The third is to observe that the posterior dynamics are already implicit in the original reverse SDE, and to work directly with those bridge dynamics. The first is the most straightforward from a machine-learning perspective. The second is the most explicit reuse of the original denoiser. The third is the cleanest conceptually, and in some sense the true baseline behind all posterior-flow constructions.

3.1 A direct conditioning approach

The most immediate idea is to forget about analytic reductions and simply train the posterior model directly. If the target is the family p_{s \mid t}(\cdot \mid x_t), then one may train a diffusion model or a flow-matching model whose inputs include the conditioning variable x_t. This is the perspective taken in [3].

For instance, if (X_0,X_s,X_t) are sampled along a common trajectory of the original diffusion, one may train a conditional denoiser \hat D_\theta(x_s; s,t,x_t) \approx \mathbb E[X_0 \mid X_s = x_s, X_t = x_t] by minimizing a standard denoising objective such as \min_\theta \mathbb E\left[\|\hat D_\theta(X_s; s,t,X_t) - X_0\|^2\right]. The same idea can be implemented in a flow-matching parameterization. This has a clear interpretation, instead of trying to derive the posterior sampler from the unconditional model we learn the posterior family directly. Once such a conditional denoiser or conditional vector field is available, one can use Tweedie’s formula, recover the conditional score, and integrate the corresponding probability-flow ODE to obtain a deterministic sampler.

Its drawback is also clear. It requires training a new conditional model for the whole posterior family. If one already has a strong unconditional denoiser, this may feel wasteful. The next two approaches are precisely ways of reusing the original model more directly.

3.2 Auxiliary posterior diffusion

Fix s < t and x_t. The first idea is to regard \bar p_0^{s,t} = p_{s \mid t}(\cdot \mid x_t) as the initial law of a new diffusion, and then to run a standard reverse-time sampler for that auxiliary model. Concretely, one introduces a forward kernel \bar q_{r \mid 0}(\bar x_r \mid \bar x_0) = \mathcal N(\bar x_r; \bar \alpha_r \bar x_0, \bar \sigma_r^2 I) ending at a simple Gaussian law.

The key question is whether the auxiliary denoiser can be computed from the original denoiser. In the linear-Gaussian setting, the answer is yes: after Gaussian conjugation, the auxiliary denoiser is again an evaluation of the original denoiser, but at a modified effective observation.

More precisely, if D(x;\alpha,\sigma) := \mathbb E[X_0 \mid \alpha X_0 + \sigma Z = x], then there exist effective parameters (\hat x,\hat \alpha,\hat \sigma), obtained by Gaussian conjugation, such that \mathbb E_{\bar X_0 \sim \bar p_{0 \mid r}^{s,t}(\cdot \mid \bar x_r)}[\bar X_0] = \tilde \sigma_{s \mid t,0}^2 \left( \frac{\alpha_s}{\sigma_s^2}D(\hat x; \hat \alpha, \hat \sigma) + \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t + \frac{\bar \alpha_r}{\bar \sigma_r^2}\bar x_r \right).

This is the main structural fact behind [1]. The auxiliary sampler does not need a new denoiser from scratch; it reuses the old one through an explicit Gaussian update.

Therefore:

the posterior problem is reduced to ordinary diffusion sampling on an auxiliary process;
the original denoiser can still be used;
deterministic sampling is available through the probability-flow ODE of the auxiliary diffusion.

But there is also a cost. The auxiliary diffusion is designed so that its terminal law is p_{s \mid t}(\cdot \mid x_t); its intermediate marginals are not the true posterior marginals p_{r \mid t}(\cdot \mid x_t). So one is solving the right endpoint problem by following the wrong path.

Derivation of the auxiliary denoiser

Define \tilde q_{s \mid t,0}(x_s \mid x_t,x_0,\bar x_r) := \frac{ q_{s \mid 0,t}(x_s \mid x_0,x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s) }{ \int q_{s \mid 0,t}(y \mid x_0,x_t)\bar q_{r \mid 0}(\bar x_r \mid y)\,dy }, set \omega_r(x_0;x_t,\bar x_r) := \int q_{s \mid 0,t}(x_s \mid x_0,x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s)\,dx_s, and define \tilde p_{0 \mid t,r}(x_0 \mid x_t,\bar x_r) := \frac{ \omega_r(x_0;x_t,\bar x_r)p_{0 \mid t}(x_0 \mid x_t) }{ \int \omega_r(y;x_t,\bar x_r)p_{0 \mid t}(y \mid x_t)\,dy }. By construction, q_{s \mid 0,t}(x_s \mid x_0,x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s) = \omega_r(x_0;x_t,\bar x_r)\tilde q_{s \mid t,0}(x_s \mid x_t,x_0,\bar x_r).

Hence \begin{aligned} \mathbb E_{\bar X_0 \sim \bar p_{0 \mid r}^{s,t}(\cdot \mid \bar x_r)}[\bar X_0] &= \int x_s \bar p_{0 \mid r}^{s,t}(x_s \mid \bar x_r)\,dx_s \\ &= \frac{ \int x_s p_{s \mid t}(x_s \mid x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s)\,dx_s }{ \int p_{s \mid t}(x_s \mid x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s)\,dx_s } \\ &= \frac{ \int_{x_s,x_0} x_s q_{s \mid 0,t}(x_s \mid x_0,x_t)p_{0 \mid t}(x_0 \mid x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s)\,dx_s\,dx_0 }{ \int_{x_s,x_0} q_{s \mid 0,t}(x_s \mid x_0,x_t)p_{0 \mid t}(x_0 \mid x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s)\,dx_s\,dx_0 } \\ &= \int_{x_0} \left( \int_{x_s} x_s \tilde q_{s \mid t,0}(x_s \mid x_t,x_0,\bar x_r)\,dx_s \right) \tilde p_{0 \mid t,r}(x_0 \mid x_t,\bar x_r)\,dx_0. \end{aligned}

The bridge distribution q_{s \mid 0,t}(x_s \mid x_0,x_t) is Gaussian: q_{s \mid 0,t}(x_s \mid x_0,x_t) = \mathcal N\left( x_s; \gamma_{s \mid 0,t}^2 \left( \frac{\alpha_s}{\sigma_s^2}x_0 + \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t \right), \gamma_{s \mid 0,t}^2 I \right), with \gamma_{s \mid 0,t}^{-2} = \sigma_s^{-2} + \alpha_{t \mid s}^2 \sigma_{t \mid s}^{-2}.

Conjugating with \bar q_{r \mid 0}(\bar x_r \mid x_s) gives \tilde q_{s \mid t,0}(x_s \mid x_t,x_0,\bar x_r) = \mathcal N(x_s; \tilde \mu_{s \mid t,0}(x_t,x_0,\bar x_r), \tilde \sigma_{s \mid t,0}^2 I), where \tilde \sigma_{s \mid t,0}^{-2} = \sigma_s^{-2} + \alpha_{t \mid s}^2 \sigma_{t \mid s}^{-2} + \bar \alpha_r^2 \bar \sigma_r^{-2} and \tilde \mu_{s \mid t,0}(x_t,x_0,\bar x_r) = \tilde \sigma_{s \mid t,0}^2 \left( \frac{\alpha_s}{\sigma_s^2}x_0 + \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t + \frac{\bar \alpha_r}{\bar \sigma_r^2}\bar x_r \right). Therefore \mathbb E_{x_s \sim \tilde q_{s \mid t,0}(\cdot \mid x_t,x_0,\bar x_r)}[x_s] = \tilde \sigma_{s \mid t,0}^2 \left( \frac{\alpha_s}{\sigma_s^2}x_0 + \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t + \frac{\bar \alpha_r}{\bar \sigma_r^2}\bar x_r \right).

Similarly, \omega_r(x_0;x_t,\bar x_r) = \mathcal N\left( \bar x_r; \bar \alpha_r \gamma_{s \mid 0,t}^2 \left( \frac{\alpha_s}{\sigma_s^2}x_0 + \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t \right), (\bar \sigma_r^2 + \bar \alpha_r^2 \gamma_{s \mid 0,t}^2)I \right). Writing V_{r,s,t} := \bar \sigma_r^2 + \bar \alpha_r^2 \gamma_{s \mid 0,t}^2, this implies \omega_r(x_0;x_t,\bar x_r) \propto \exp\left( -\frac12 \frac{\bar \alpha_r^2 \gamma_{s \mid 0,t}^4 \alpha_s^2}{\sigma_s^4 V_{r,s,t}} \|x_0\|^2 + \left\langle \frac{\bar \alpha_r \gamma_{s \mid 0,t}^2 \alpha_s}{\sigma_s^2 V_{r,s,t}} \left( \bar x_r - \bar \alpha_r \gamma_{s \mid 0,t}^2 \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t \right), x_0 \right\rangle \right).

Since p_{0 \mid t}(x_0 \mid x_t) \propto p_0(x_0) \exp\left( -\frac12 \frac{\alpha_t^2}{\sigma_t^2}\|x_0\|^2 + \left\langle \frac{\alpha_t}{\sigma_t^2}x_t, x_0 \right\rangle \right), it follows that \tilde p_{0 \mid t,r}(x_0 \mid x_t,\bar x_r) \propto p_0(x_0) \exp\left( -\frac12 a_{r,s,t}\|x_0\|^2 + \langle b_{r,s,t}(x_t,\bar x_r), x_0 \rangle \right), where a_{r,s,t} := \frac{\alpha_t^2}{\sigma_t^2} + \frac{\bar \alpha_r^2 \gamma_{s \mid 0,t}^4 \alpha_s^2}{\sigma_s^4 V_{r,s,t}} and b_{r,s,t}(x_t,\bar x_r) := \frac{\alpha_t}{\sigma_t^2}x_t + \frac{\bar \alpha_r \gamma_{s \mid 0,t}^2 \alpha_s}{\sigma_s^2 V_{r,s,t}} \left( \bar x_r - \bar \alpha_r \gamma_{s \mid 0,t}^2 \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t \right).

Therefore \tilde p_{0 \mid t,r} is itself the posterior distribution associated with a single artificial Gaussian observation. If (\hat \alpha, \hat \sigma, \hat x) satisfy \hat \alpha^2 / \hat \sigma^2 = a_{r,s,t} and (\hat \alpha / \hat \sigma^2)\hat x = b_{r,s,t}(x_t,\bar x_r), then \tilde p_{0 \mid t,r}(x_0 \mid x_t,\bar x_r) = p_0(x_0 \mid \hat \alpha X_0 + \hat \sigma Z = \hat x), so \mathbb E_{x_0 \sim \tilde p_{0 \mid t,r}(\cdot \mid x_t,\bar x_r)}[x_0] = D(\hat x; \hat \alpha, \hat \sigma), which yields the final formula.

Glass transitions

The GLASS transitions of [1] do not change the logic of the construction. They amount to replacing the standard bridge q_{s \mid 0,t} by another Gaussian bridge, which is best viewed as a reparameterization of the DDIM family rather than as a genuinely new conditional law.

So the conclusion is unchanged: once the bridge remains Gaussian in x_s, the same conjugation argument applies, and the auxiliary denoiser is again recovered from the original denoiser after modifying the effective coefficients.

3.3 Exact bridge dynamics

The third approach is more direct, and in some sense conceptually prior to the others. Instead of building a new diffusion whose endpoint is the desired posterior, one asks for a process whose marginals are exactly p_{r \mid t}(\cdot \mid x_t) \qquad \text{for all } r < t.

This is the perspective taken in [4]. Start from the reverse SDE of the original diffusion, but initialize it at time t from the point mass \bar p_t = \delta_x. Then, for every r < t, the marginal is precisely p_{r \mid t}(\cdot \mid x_t = x).

From there, the probability-flow ODE is immediate: dX_r = \left[ b(X_r,r) - \frac12 g(r)^2 \left( \nabla \log p_r(X_r) + \nabla_{x_r}\log q_{t \mid r}(x_t = x \mid X_r) \right) \right]dr.

This is appealing for two reasons. First, the path itself is now the correct posterior path, not just the terminal law. Second, the correction is explicit: it is the usual score plus a Gaussian bridge term.

Most importantly, this shows that posterior sampling is already available without any new training: one can simply restart the reverse SDE at x_t. The real issue is efficiency. This remains a stochastic many-step sampler, and if one wants fast inference then one would prefer to distill it into a deterministic ODE or a learned posterior flow map.

The caveat is that the conditioning term becomes singular at r=t, because the initial law is a Dirac mass. In practice, one starts from t-\varepsilon rather than exactly from t.

Derivation of the bridge ODE

If the reverse SDE dX_r = \left[b(X_r,r) - g(r)^2 \nabla \log p_r(X_r)\right]dr + g(r)\,d\bar W_r is solved backward for r < t and started from \bar p_t = \delta_x, then \bar p_r(x_r) = \int p_{r \mid t}(x_r \mid x_t)\bar p_t(x_t)\,dx_t = p_{r \mid t}(x_r \mid x_t = x).

By Bayes’ rule, p_{r \mid t}(x_r \mid x_t) = \frac{p_r(x_r)q_{t \mid r}(x_t \mid x_r)}{p_t(x_t)}, hence \nabla \log \bar p_r(x_r) = \nabla \log p_{r \mid t}(x_r \mid x_t = x) = \nabla \log p_r(x_r) + \nabla_{x_r}\log q_{t \mid r}(x_t = x \mid x_r).

Since q_{t \mid r}(x_t \mid x_r) = \mathcal N(x_t; \alpha_{t \mid r}x_r, \sigma_{t \mid r}^2 I), \nabla_{x_r}\log q_{t \mid r}(x_t = x \mid x_r) = \frac{\alpha_{t \mid r}}{\sigma_{t \mid r}^2}(x - \alpha_{t \mid r}x_r).

Substituting this conditional score into the probability-flow formula gives the ODE above.

4. Posterior flow maps

The previous three approaches still solve a costly iterative ode. Posterior flow maps amortize this cost. One learns a map that takes (s,t,x_t), together with auxiliary randomness if needed, and directly outputs an approximation of a sample from p_{s \mid t}(\cdot \mid x_t).

This is the viewpoint adopted in [2, 3]. Since the target law depends on (s,t,x_t), the map itself must depend on these variables as inputs. This is the reason for the “meta” viewpoint emphasized in [3].

4.1 Meta Flow Maps, a direct approach

The most direct version of this idea is to learn the posterior flow map itself. This is the flow-map analogue of the direct conditional approach of Section 3.1. Instead of learning a conditional denoiser and then integrating a conditional SDE or ODE, one learns a map that directly transports a simple conditional reference variable to an approximation of p_{s \mid t}(\cdot \mid x_t).

This is the point of view emphasized in [3]. The conditioning variable x_t is part of the input, and the learned map is meant to approximate the whole posterior family at once.

Concretely, specialized to the posterior family p_{0 \mid t}(\cdot \mid x_t), one writes \hat X_{s,u}(\bar x; t,x_t) = \bar x + (u-s)\hat v_{s,u}(\bar x; t,x_t), samples X_0 \sim p_0, then samples X_t from the forward process conditioned on X_0, so that conditionally on X_t = x_t one has X_0 \sim p_{0 \mid t}(\cdot \mid x_t). It then draws an independent Gaussian base variable Z \sim \mathcal N(0,I) and defines \bar X_s = \alpha_s Z + \beta_s X_0. and trains the model with \mathcal L_{\mathrm{diag}}(\hat v) := \int_0^1 \mathbb E \left[ \left\| \hat v_{s,s}(\bar X_s; t,X_t) - \left(\dot \alpha_s Z + \dot \beta_s X_0\right) \right\|^2 \right] \,ds, together with a standard consistency or self-distillation loss on the off-diagonal map \hat X_{s,u}.

The point of this construction is that, for each fixed x_t, the conditional law of \bar X_s given X_t = x_t interpolates from the tractable Gaussian base law at s=0 to the target posterior p_{0 \mid t}(\cdot \mid x_t) at s=1. The Gaussian variable Z is the source of randomness that lets the map represent the full posterior rather than a single point estimate. The diagonal loss therefore learns the conditional drift of exactly the posterior-targeting path one wants, and the consistency term turns it into a few-step conditional flow map.

The diagonal term learns the conditional drift toward p_{0 \mid t}(\cdot \mid x_t), while the consistency term turns it into a few-step conditional flow map.

However, similarily to Section 3.1, this approach does not leverage the original unconditional model, and also requires the new model to have a conditioning.

4.2 Diamond Maps, distilling posterior dynamics

There is another route, closer to Sections 3.2 and 3.3. Instead of learning the posterior family from scratch, one can first write down a posterior sampler, for instance the exact bridge dynamics or an auxiliary posterior diffusion, and then distill that sampler into a conditional flow map.

This is conceptually attractive because it separates two questions:

what is the correct posterior dynamics
how can it be compressed into a fast deterministic sampler.

4.3 Value-function viewpoint

[2] makes a second observation, which is more subtle and perhaps more interesting: a standard flow map can also be used for value-function estimation.

Let h_t(x_t) = \mathbb E\left[e^{r_0(X_0)} \mid X_t = x_t\right], and suppose we have a deterministic flow map \hat X_0^{t'} from time t' to time 0, with t' > t. If the flow map is exact, then it pushes p_{t'} to p_0, and after renoising from x_t to x_{t'} one obtains a stochastic estimator of \log h_t(x_t).

The high-level recipe is the following:

renoise x_t to a later time t';
map the renoised sample back to time 0 through the learned flow map;
reweight the resulting samples by an importance factor.

This is the idea of weighted Diamond Maps in [2]. The same posterior-flow machinery that gives a fast sampler can also be used to estimate the value function needed for guidance.

Derivation of the reparameterized gradient formula

If the flow map is exact, then \begin{aligned} h_t(x_t) &= \frac{1}{p_t(x_t)} \int e^{r_0(x_0)}q_{t \mid 0}(x_t \mid x_0)p_0(x_0)\,dx_0 \\ &= \frac{1}{p_t(x_t)} \int e^{r_0(\hat X_0^{t'}(x_{t'}))} q_{t \mid 0}(x_t \mid \hat X_0^{t'}(x_{t'})) p_{t'}(x_{t'})\,dx_{t'}. \end{aligned} Sampling x_{t'} from q_{t' \mid t}(\cdot \mid x_t) gives \begin{aligned} h_t(x_t) &= \frac{1}{p_t(x_t)} \int e^{r_0(\hat X_0^{t'}(x_{t'}))} q_{t \mid 0}(x_t \mid \hat X_0^{t'}(x_{t'})) \frac{p_{t'}(x_{t'})}{q_{t' \mid t}(x_{t'} \mid x_t)} q_{t' \mid t}(x_{t'} \mid x_t)\,dx_{t'} \\ &= \frac{1}{p_t(x_t)} \mathbb E_{x_{t'} \sim q_{t' \mid t}(\cdot \mid x_t)} \left[ \exp\left(v_{t,t'}(x_{t'},x_t)\right) \right], \end{aligned} where v_{t,t'}(x_{t'},x_t) := r_0(\hat X_0^{t'}(x_{t'})) + \log q_{t \mid 0}(x_t \mid \hat X_0^{t'}(x_{t'})) + \log p_{t'}(x_{t'}) - \log q_{t' \mid t}(x_{t'} \mid x_t).

Equivalently, \log h_t(x_t) = \log \mathbb E_{x_{t'} \sim q_{t' \mid t}(\cdot \mid x_t)} \left[ \exp\left(v_{t,t'}(x_{t'},x_t)\right) \right] - \log p_t(x_t).

Introduce the reparameterization x_{t'}(x_t,z) = \alpha_{t' \mid t}x_t + \sigma_{t' \mid t}z, with z \sim \mathcal N(0,I), and define v_{t,t'}(z,x_t) := r_0(\hat X_0^{t'}(x_{t'}(x_t,z))) + \log q_{t \mid 0}(x_t \mid \hat X_0^{t'}(x_{t'}(x_t,z))) + \log p_{t'}(x_{t'}(x_t,z)) - \log q_{t' \mid t}(x_{t'}(x_t,z) \mid x_t). Then \log h_t(x_t) = \log \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(v_{t,t'}(z,x_t)\right) \right] - \log p_t(x_t), so \nabla \log h_t(x_t) = \frac{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(v_{t,t'}(z,x_t)\right)\nabla_{x_t}v_{t,t'}(z,x_t) \right] }{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(v_{t,t'}(z,x_t)\right) \right] } - \nabla \log p_t(x_t).

Write x_{t'} = x_{t'}(x_t,z), \hat x_0 = \hat X_0^{t'}(x_{t'}), and J = J\hat X_0^{t'}(x_{t'}). Since \begin{aligned} \log q_{t' \mid t}(x_{t'} \mid x_t) &= C_{t,t'} - \frac{1}{2\sigma_{t' \mid t}^2}\|x_{t'} - \alpha_{t' \mid t}x_t\|^2, \\ \log q_{t' \mid t}(x_{t'}(x_t,z) \mid x_t) &= C_{t,t'} - \frac{1}{2\sigma_{t' \mid t}^2}\|\sigma_{t' \mid t}z\|^2 = C_{t,t'} - \frac12\|z\|^2, \end{aligned} one has \nabla_{x_t}\log q_{t' \mid t}(x_{t'}(x_t,z) \mid x_t) = 0. Therefore \begin{aligned} \nabla_{x_t}v_{t,t'}(z,x_t) &= \alpha_{t' \mid t}J^\top \nabla r_0(\hat x_0) + \nabla_{x_t}\log q_{t \mid 0}(x_t \mid \hat x_0) + \alpha_{t' \mid t}\nabla \log p_{t'}(x_{t'}) \\ &= \frac{\alpha_t}{\sigma_t^2}\hat x_0 - \frac{1}{\sigma_t^2}x_t + \alpha_{t' \mid t}\nabla \log p_{t'}(x_{t'}) \\ &\quad + \alpha_{t' \mid t}J^\top \left( \nabla r_0(\hat x_0) + \frac{\alpha_t}{\sigma_t^2}(x_t - \alpha_t \hat x_0) \right). \end{aligned} This is the form used in Appendix A.9 of [2].

How the importance weights are computed

The exact quantity v_{t,t'}(z,x_t) is intractable because it contains \log p_{t'}(x_{t'}). Following Appendix A.9 of [2], one only identifies it up to an additive term independent of the Monte Carlo sample. Define \gamma_{t,t'}(x_t,x_{t'}) := \left[ \int_0^1 \nabla \log p_{t'}(x_t + u(x_{t'}-x_t))\,du \right]^\top (x_{t'}-x_t), so that \log p_{t'}(x_{t'}) = \log p_{t'}(x_t) + \gamma_{t,t'}(x_t,x_{t'}). Then \begin{aligned} v_{t,t'}(z,x_t) &= \bar v_{t,t'}(z,x_t) + \log p_{t'}(x_t) + C, \\ \bar v_{t,t'}(z,x_t) &:= r_0(\hat x_0) - \frac{1}{2\sigma_t^2}\|x_t - \alpha_t \hat x_0\|^2 + \gamma_{t,t'}(x_t,x_{t'}) + \frac12\|z\|^2, \end{aligned} where C does not depend on z.

Therefore the normalized weights only depend on \bar v_{t,t'}: \frac{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(v_{t,t'}(z,x_t)\right)A(z,x_t) \right] }{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(v_{t,t'}(z,x_t)\right) \right] } = \frac{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(\bar v_{t,t'}(z,x_t)\right)A(z,x_t) \right] }{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(\bar v_{t,t'}(z,x_t)\right) \right] }.

Given i.i.d. samples z_1,\dots,z_N \sim \mathcal N(0,I), one uses \nabla \widehat{\log h_t}(x_t) = \sum_{i=1}^N w_i \nabla_{x_t}v_{t,t'}(z_i,x_t) - \nabla \log p_t(x_t), \qquad w_i = \frac{\exp(\bar v_{t,t'}(z_i,x_t))}{\sum_{j=1}^N \exp(\bar v_{t,t'}(z_j,x_t))}.

For t' close to t, one can also use \log p_{t'}(x_{t'}) = \log p_t(x_t) + \log q_{t' \mid t}(x_{t'} \mid x_t) - \log p_{t \mid t'}(x_t \mid x_{t'}), and approximate the reverse kernel p_{t \mid t'}(x_t \mid x_{t'}) by a Gaussian Euler step of the reverse SDE. This gives a local approximation of the weights in terms of the score or denoiser. The main derivation in [2], however, uses the score-integral correction \gamma_{t,t'}.

5. Takeaways

The conceptual picture is then the following.

Guidance naturally turns unconditional diffusion sampling into a posterior-sampling problem.
In principle, posterior sampling is already easy: once x_t is given, one can simply restart the reverse SDE from x_t and sample the corresponding bridge.
The real issue is that this baseline is long, stochastic, and therefore costly.
There are three main strategies: train the posterior model directly, build an auxiliary diffusion whose endpoint is the posterior, or directly follow the exact bridge dynamics of the original diffusion.
Auxiliary posterior diffusions are modular and can reuse the original denoiser, but their intermediate path is artificial.
Exact bridge dynamics have the right marginals at every time, but they are singular at the conditioning time and therefore slightly less convenient numerically.
Posterior flows matter because they turn this expensive conditional SDE sampling problem into a deterministic or few-step sampler, and ultimately into an amortized conditional flow map.
In [2], the same amortized machinery also turns value estimation into a weighted Monte Carlo problem.

References

[1] Peter Holderrieth, Uriel Singer, Tommi Jaakkola, Ricky T. Q. Chen, Yaron Lipman, and Brian Karrer. GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models. arXiv:2509.25170, 2025.

[2] Peter Holderrieth, Douglas Chen, Luca Eyring, Ishin Shah, Giri Anantharaman, Yutong He, Zeynep Akata, Tommi Jaakkola, Nicholas Matthew Boffi, and Max Simchowitz. Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps. arXiv:2602.05993, 2026.

[3] Peter Potaptchik, Adhi Saravanan, Abbas Mammadov, Alvaro Prat, Michael S. Albergo, and Yee Whye Teh. Meta Flow Maps enable scalable reward alignment. arXiv:2601.14430, 2026.

[4] Romeo Passaro, Zander W. Blasingame, Michael M. Bronstein, and Alexander Tong. Stochastic Few-step Models. ICLR 2026 DeLTa Workshop, 2026.