Posterior Flows for Diffusion Models
The goal of this note is to clarify a few recent approaches for posterior sampling in diffusion and flow models, in particular GLASS Flows [1], Diamond Maps [2], Meta Flow Maps [3], and Stochastic Few-Step Models [4]. All of them are concerned, in one form or another, with sampling from conditional laws of the form p_{s \mid t}(x_s \mid x_t) induced by a diffusion process.
The exposition follows interesting discussions with Yazid Janati and Badr Moufad, which greatly helped clarify these constructions.
1. Setup and objective
Consider a forward diffusion process (X_t)_{0 \le t \le T} defined by the SDE dX_t = b(X_t,t)\,dt + g(t)\,dW_t.
Assume that, for every 0 \le s < t \le T, the transition kernel is Gaussian: q_{t \mid s}(x_t \mid x_s) = \mathcal N(x_t; \alpha_{t \mid s} x_s, \sigma_{t \mid s}^2 I). In particular, q_{t \mid 0}(x_t \mid x_0) = \mathcal N(x_t; \alpha_t x_0, \sigma_t^2 I). We write p_t for the marginal law of X_t, p_t(x_t) = \int q_{t \mid 0}(x_t \mid x_0)p_0(x_0)\,dx_0.
We assume that p_T is easy to sample, for instance a centered Gaussian distribution. The usual objective of a diffusion model is then to sample from p_0 starting from p_T.
The reverse-time SDE associated with the forward process is dX_t = \left[b(X_t,t) - g(t)^2 \nabla \log p_t(X_t)\right]dt + g(t)\,d\bar W_t, where the equation is solved backward from T to 0. The associated probability-flow ODE is dX_t = \left[b(X_t,t) - \frac12 g(t)^2 \nabla \log p_t(X_t)\right]dt.
Therefore, once \nabla \log p_t is available, either exactly or through a score model, one can approximately sample from p_0 by discretizing the reverse SDE or the probability-flow ODE.
2. Motivation for posterior sampling
In many situations, the objective is not to sample from p_0 itself but from a posterior or a tilted version of p_0. A standard example is inference-time guidance. Given a reward function r_0, consider the tilted law p_0^r(x_0) \propto p_0(x_0)\exp(r_0(x_0)). If we define h_t(x_t) = \mathbb E\left[\exp(r_0(X_0)) \mid X_t = x_t\right], then the reverse-time dynamics of the tilted model are obtained by a Doob h-transform: dX_t = \left[b(X_t,t) - g(t)^2 \nabla \log p_t(X_t) - g(t)^2 \nabla \log h_t(X_t)\right]dt + g(t)\,d\bar W_t. I discuss this point of view in more detail in my other post on Doob’s h-transform.
Hence the additional quantity that must be estimated is \nabla \log h_t(x_t). In the present linear-Gaussian setting, this term can be written explicitly in terms of posterior means. Indeed, h_t(x_t) = \int \exp(r_0(x_0))\,p_{0 \mid t}(x_0 \mid x_t)\,dx_0. Differentiating under the integral sign yields \begin{aligned} \nabla h_t(x_t) &= \int \exp(r_0(x_0))\,\nabla p_{0 \mid t}(x_0 \mid x_t)\,dx_0 \\ &= \int \exp(r_0(x_0))\,p_{0 \mid t}(x_0 \mid x_t)\,\nabla \log p_{0 \mid t}(x_0 \mid x_t)\,dx_0. \end{aligned} and \begin{aligned} \nabla \log p_{0 \mid t}(x_0 \mid x_t) &= \nabla \log q_{t \mid 0}(x_t \mid x_0) - \nabla \log p_t(x_t) \\ &= \left(\frac{\alpha_t}{\sigma_t^2}x_0 - \frac{1}{\sigma_t^2}x_t\right) - \left( \frac{\alpha_t}{\sigma_t^2}\mathbb E_{p_{0 \mid t}(\cdot \mid x_t)}[X_0] - \frac{1}{\sigma_t^2}x_t \right) \\ &= \frac{\alpha_t}{\sigma_t^2} \left( x_0 - \mathbb E_{p_{0 \mid t}(\cdot \mid x_t)}[X_0] \right). \end{aligned} If we define p^r_{0 \mid t}(x_0 \mid x_t) \propto p_{0 \mid t}(x_0 \mid x_t)\exp(r_0(x_0)), then \begin{aligned} \nabla h_t(x_t) &= \frac{\alpha_t}{\sigma_t^2} \int \exp(r_0(x_0))\,p_{0 \mid t}(x_0 \mid x_t) \left( x_0 - \mathbb E_{p_{0 \mid t}(\cdot \mid x_t)}[X_0] \right)dx_0 \\ &= h_t(x_t)\frac{\alpha_t}{\sigma_t^2} \left( \mathbb E_{p^r_{0 \mid t}(\cdot \mid x_t)}[X_0] - \mathbb E_{p_{0 \mid t}(\cdot \mid x_t)}[X_0] \right). \end{aligned} Dividing by h_t(x_t) yields \nabla \log h_t(x_t) = \frac{\alpha_t}{\sigma_t^2} \left( \mathbb E_{p^r_{0 \mid t}(\cdot \mid x_t)}[X_0] - \mathbb E_{p_{0 \mid t}(\cdot \mid x_t)}[X_0] \right).
This makes the role of posterior sampling explicit: even when the final objective is guidance, the relevant computational subproblem is often to sample from p_{0 \mid t}(\cdot \mid x_t), or more generally from p_{s \mid t}(\cdot \mid x_t) for some s < t.
3. Approaches for posterior sampling
A first natural idea is to sample from p_{s \mid t}(\cdot \mid x_t) using a diffusion model. More precisely, we consider \bar p_0^{s,t} = p_{s \mid t}(\cdot \mid x_t), and we construct an auxiliary forward diffusion started from \bar p_0^{s,t} and ending at a simple distribution, for instance a standard Gaussian: \bar q_{r \mid 0}(\bar x_r \mid \bar x_0) = \mathcal N(\bar x_r; \bar \alpha_r \bar x_0, \bar \sigma_r^2 I).
In order to reverse this auxiliary diffusion, we need to estimate \nabla \log \bar p_r^{s,t}(\bar x_r), or equivalently, through Tweedie’s formula, the denoiser \mathbb E_{\bar X_0 \sim \bar p_{0 \mid r}^{s,t}(\cdot \mid \bar x_r)}[\bar X_0].
Define first \tilde q_{s \mid t,0}(x_s \mid x_t,x_0,\bar x_r) := \frac{ q_{s \mid 0,t}(x_s \mid x_0,x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s) }{ \int q_{s \mid 0,t}(y \mid x_0,x_t)\bar q_{r \mid 0}(\bar x_r \mid y)\,dy }, set \omega_r(x_0;x_t,\bar x_r) := \int q_{s \mid 0,t}(x_s \mid x_0,x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s)\,dx_s, and define \tilde p_{0 \mid t,r}(x_0 \mid x_t,\bar x_r) := \frac{ \omega_r(x_0;x_t,\bar x_r)p_{0 \mid t}(x_0 \mid x_t) }{ \int \omega_r(y;x_t,\bar x_r)p_{0 \mid t}(y \mid x_t)\,dy }. By construction, q_{s \mid 0,t}(x_s \mid x_0,x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s) = \omega_r(x_0;x_t,\bar x_r)\tilde q_{s \mid t,0}(x_s \mid x_t,x_0,\bar x_r).
Let us now write the auxiliary denoiser in terms of the original diffusion: \begin{aligned} \mathbb E_{\bar X_0 \sim \bar p_{0 \mid r}^{s,t}(\cdot \mid \bar x_r)}[\bar X_0] &= \int x_s \bar p_{0 \mid r}^{s,t}(x_s \mid \bar x_r)\,d x_s \\ &= \frac{ \int x_s p_{s \mid t}(x_s \mid x_t)\bar q_{r \mid 0}(\bar x_r \mid \bar x_0 = x_s)\,dx_s }{ \int p_{s \mid t}(x_s \mid x_t)\bar q_{r \mid 0}(\bar x_r \mid \bar x_0 = x_s)\,d x_s } \\ &= \frac{ \int_{x_s,x_0} x_s q_{s \mid 0,t}(x_s \mid x_0,x_t)p_{0 \mid t}(x_0 \mid x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s)\,dx_s\,dx_0 }{ \int_{x_s,x_0} q_{s \mid 0,t}(x_s \mid x_0,x_t)p_{0 \mid t}(x_0 \mid x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s)\,dx_s\,dx_0 } \\ &= \frac{ \int_{x_0} \left( \int_{x_s} x_s q_{s \mid 0,t}(x_s \mid x_0,x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s)\,dx_s \right) p_{0 \mid t}(x_0 \mid x_t)\,dx_0 }{ \int_{x_0} \left( \int_{x_s} q_{s \mid 0,t}(x_s \mid x_0,x_t)\bar q_{r \mid 0}(\bar x_r \mid x_s)\,dx_s \right) p_{0 \mid t}(x_0 \mid x_t)\,dx_0 } \\ &= \frac{ \int_{x_0} \omega_r(x_0;x_t,\bar x_r) \left( \int_{x_s} x_s \tilde q_{s \mid t,0}(x_s \mid x_t,x_0,\bar x_r)\,dx_s \right) p_{0 \mid t}(x_0 \mid x_t)\,dx_0 }{ \int_{x_0} \omega_r(x_0;x_t,\bar x_r)p_{0 \mid t}(x_0 \mid x_t)\,dx_0 } \\ &= \int_{x_0} \left( \int_{x_s} x_s \tilde q_{s \mid t,0}(x_s \mid x_t,x_0,\bar x_r)\,dx_s \right) \tilde p_{0 \mid t,r}(x_0 \mid x_t,\bar x_r)\,dx_0 \\ &= \int_{x_0} \mathbb E_{x_s \sim \tilde q_{s \mid t,0}(\cdot \mid x_t,x_0,\bar x_r)}[x_s] \tilde p_{0 \mid t,r}(x_0 \mid x_t,\bar x_r)\,dx_0. \end{aligned}
The bridge distribution q_{s \mid 0,t}(x_s \mid x_0,x_t) is Gaussian and its expression is standard, so we write q_{s \mid 0,t}(x_s \mid x_0,x_t) = \mathcal N\left( x_s; \gamma_{s \mid 0,t}^2 \left( \frac{\alpha_s}{\sigma_s^2}x_0 + \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t \right), \gamma_{s \mid 0,t}^2 I \right), with \gamma_{s \mid 0,t}^{-2} = \sigma_s^{-2} + \alpha_{t \mid s}^2 \sigma_{t \mid s}^{-2}.
We now do the Gaussian conjugation with \bar q_{r \mid 0}(\bar x_r \mid x_s). One obtains \tilde q_{s \mid t,0}(x_s \mid x_t,x_0,\bar x_r) = \mathcal N(x_s; \tilde \mu_{s \mid t,0}(x_t,x_0,\bar x_r), \tilde \sigma_{s \mid t,0}^2 I), with \tilde \sigma_{s \mid t,0}^{-2} = \sigma_s^{-2} + \alpha_{t \mid s}^2 \sigma_{t \mid s}^{-2} + \bar \alpha_r^2 \bar \sigma_r^{-2} and \tilde \mu_{s \mid t,0}(x_t,x_0,\bar x_r) = \tilde \sigma_{s \mid t,0}^2 \left( \frac{\alpha_s}{\sigma_s^2}x_0 + \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t + \frac{\bar \alpha_r}{\bar \sigma_r^2}\bar x_r \right).
Similarly, \omega_r(x_0;x_t,\bar x_r) = \mathcal N\left( \bar x_r; \bar \alpha_r \gamma_{s \mid 0,t}^2 \left( \frac{\alpha_s}{\sigma_s^2}x_0 + \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t \right), (\bar \sigma_r^2 + \bar \alpha_r^2 \gamma_{s \mid 0,t}^2)I \right).
It follows that \mathbb E_{x_s \sim \tilde q_{s \mid t,0}(\cdot \mid x_t,x_0,\bar x_r)}[x_s] = \tilde \sigma_{s \mid t,0}^2 \left( \frac{\alpha_s}{\sigma_s^2}x_0 + \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t + \frac{\bar \alpha_r}{\bar \sigma_r^2}\bar x_r \right), while, as a function of x_0, the weight \omega_r(x_0;x_t,\bar x_r) is Gaussian. Writing V_{r,s,t} := \bar \sigma_r^2 + \bar \alpha_r^2 \gamma_{s \mid 0,t}^2, we have \omega_r(x_0;x_t,\bar x_r) \propto \exp\left( -\frac12 \frac{\bar \alpha_r^2 \gamma_{s \mid 0,t}^4 \alpha_s^2}{\sigma_s^4 V_{r,s,t}} \|x_0\|^2 + \left\langle \frac{\bar \alpha_r \gamma_{s \mid 0,t}^2 \alpha_s}{\sigma_s^2 V_{r,s,t}} \left( \bar x_r - \bar \alpha_r \gamma_{s \mid 0,t}^2 \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t \right), x_0 \right\rangle \right),
Since p_{0 \mid t}(x_0 \mid x_t) \propto p_0(x_0) \exp\left( -\frac12 \frac{\alpha_t^2}{\sigma_t^2}\|x_0\|^2 + \left\langle \frac{\alpha_t}{\sigma_t^2}x_t, x_0 \right\rangle \right), it follows that \tilde p_{0 \mid t,r}(x_0 \mid x_t,\bar x_r) \propto p_0(x_0) \exp\left( -\frac12 a_{r,s,t}\|x_0\|^2 + \langle b_{r,s,t}(x_t,\bar x_r), x_0 \rangle \right), where a_{r,s,t} := \frac{\alpha_t^2}{\sigma_t^2} + \frac{\bar \alpha_r^2 \gamma_{s \mid 0,t}^4 \alpha_s^2}{\sigma_s^4 V_{r,s,t}} and b_{r,s,t}(x_t,\bar x_r) := \frac{\alpha_t}{\sigma_t^2}x_t + \frac{\bar \alpha_r \gamma_{s \mid 0,t}^2 \alpha_s}{\sigma_s^2 V_{r,s,t}} \left( \bar x_r - \bar \alpha_r \gamma_{s \mid 0,t}^2 \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t \right).
Therefore \tilde p_{0 \mid t,r} is itself the posterior distribution associated with a single artificial Gaussian observation. If we choose parameters (\hat \alpha, \hat \sigma, \hat x) such that \hat \alpha^2 / \hat \sigma^2 = a_{r,s,t} and (\hat \alpha / \hat \sigma^2)\hat x = b_{r,s,t}(x_t,\bar x_r), then \tilde p_{0 \mid t,r}(x_0 \mid x_t,\bar x_r) = p_0(x_0 \mid \hat \alpha X_0 + \hat \sigma Z = \hat x).
If we denote by D(x;\alpha,\sigma) := \mathbb E[X_0 \mid \alpha X_0 + \sigma Z = x] the original denoiser, then \mathbb E_{x_0 \sim \tilde p_{0 \mid t,r}(\cdot \mid x_t,\bar x_r)}[x_0] = D(\hat x; \hat \alpha, \hat \sigma), and therefore \mathbb E_{\bar X_0 \sim \bar p_{0 \mid r}^{s,t}(\cdot \mid \bar x_r)}[\bar X_0] = \tilde \sigma_{s \mid t,0}^2 \left( \frac{\alpha_s}{\sigma_s^2} D(\hat x; \hat \alpha, \hat \sigma) + \frac{\alpha_{t \mid s}}{\sigma_{t \mid s}^2}x_t + \frac{\bar \alpha_r}{\bar \sigma_r^2}\bar x_r \right).
Therefore, the denoiser of the auxiliary diffusion can indeed be computed using only the original denoiser. The sampling procedure is then the usual diffusion sampling procedure: one starts from the terminal simple distribution and samples from the reverse transitions of the auxiliary diffusion until time 0.
This remains a stochastic sampling procedure. If one wants a deterministic sampler, one may pass to the probability-flow ODE of the auxiliary diffusion, equivalently to the \eta = 0 limit in DDIM.
In [1], the auxiliary diffusion is not started from a standard Gaussian but from a Gaussian centered at x_t with positive variance. The computations above are modified accordingly, but the general principle is the same.
Glass transitions
In [1], the authors consider a slightly more general problem. Instead of working with the posterior induced by the canonical DDPM bridge, they work with what they call the GLASS transitions. The important point, however, is that this is not really a different family of transitions: it is simply a reparameterization of the DDIM family.
Concretely, the only change is that the bridge kernel q_{s \mid 0,t}(x_s \mid x_0,x_t) is replaced by another Gaussian conditional law q_{s \mid 0,t}^{\mathrm{glass}}(x_s \mid x_0,x_t) = \mathcal N(x_s; m_{s \mid 0,t}^{\mathrm{glass}}(x_0,x_t), (v_{s \mid 0,t}^{\mathrm{glass}})^2 I), whose mean is still affine in (x_0,x_t).
More precisely, changing the GLASS transition amounts to choosing a particular DDIM bridge parameterization. In the notation of [1], the GLASS parameter can be identified with the usual DDIM noise parameter through a simple change of variables. So, at the level relevant for the present discussion, GLASS transitions should be viewed as DDIM transitions written with a different parameterization rather than as a genuinely new object.
For the derivation above, this means that nothing essential changes. Indeed, the only properties we used were: q_{s \mid 0,t}(\cdot \mid x_0,x_t) \text{ is Gaussian in } x_s, \qquad \bar q_{r \mid 0}(\bar x_r \mid x_s) \text{ is Gaussian in } x_s. Therefore, if we replace q_{s \mid 0,t} by q_{s \mid 0,t}^{\mathrm{glass}}, the same Gaussian conjugation gives \tilde q_{s \mid t,0}^{\mathrm{glass}}(x_s \mid x_t,x_0,\bar x_r) = \mathcal N(x_s; \tilde m_{s \mid t,0}^{\mathrm{glass}}(x_t,x_0,\bar x_r), (\tilde v_{s \mid t,0}^{\mathrm{glass}})^2 I), with (\tilde v_{s \mid t,0}^{\mathrm{glass}})^{-2} = (v_{s \mid 0,t}^{\mathrm{glass}})^{-2} + \bar \alpha_r^2 \bar \sigma_r^{-2}, and \tilde m_{s \mid t,0}^{\mathrm{glass}}(x_t,x_0,\bar x_r) = (\tilde v_{s \mid t,0}^{\mathrm{glass}})^2 \left( \frac{m_{s \mid 0,t}^{\mathrm{glass}}(x_0,x_t)}{(v_{s \mid 0,t}^{\mathrm{glass}})^2} + \frac{\bar \alpha_r}{\bar \sigma_r^2}\bar x_r \right). Similarly, \omega_r^{\mathrm{glass}}(x_0;x_t,\bar x_r) = \mathcal N\left( \bar x_r; \bar \alpha_r m_{s \mid 0,t}^{\mathrm{glass}}(x_0,x_t), \left(\bar \sigma_r^2 + \bar \alpha_r^2 (v_{s \mid 0,t}^{\mathrm{glass}})^2\right)I \right).
Thus the final denoiser formula is exactly the same as before, except that the ordinary bridge coefficients are replaced by the corresponding DDIM, or equivalently GLASS, bridge coefficients. In particular, once the effective posterior over x_0 is rewritten as a single Gaussian observation, the denoiser of the auxiliary diffusion is again obtained by evaluating the original denoiser at a modified input.
4. A simpler and useful alternative
One important observation is that the marginals of the auxiliary diffusion constructed above are not p_{r \mid t}(\cdot \mid x_t) for all s < r < t, but only at the terminal time r=s.
Besides being somewhat unnatural, this means that the denoiser is evaluated slightly out of distribution.
A simpler alternative is therefore to directly construct a process whose marginals are exactly p_{r \mid t}(\cdot \mid x_t) for all r < t. This is the approach taken in [4].
The starting point is that the reverse SDE dX_r = \left[b(X_r,r) - g(r)^2 \nabla \log p_r(X_r)\right]dr + g(r)\,d\bar W_r, solved backward for r < t and started from \bar p_t = \delta_x, has marginals p_{r \mid t}(\cdot \mid x_t = x) for all r < t.
Indeed, if \bar p_t = \delta_x, then \bar p_r(x_r) = \int p_{r \mid t}(x_r \mid x_t)\bar p_t(x_t)\,dx_t = p_{r \mid t}(x_r \mid x_t = x).
Therefore, in order to obtain the associated probability-flow ODE, it suffices to know the score of this diffusion started from \bar p_t = \delta_x. By Bayes’ rule, p_{r \mid t}(x_r \mid x_t) = \frac{p_r(x_r)q_{t \mid r}(x_t \mid x_r)}{p_t(x_t)}, hence \nabla \log \bar p_r(x_r) = \nabla \log p_{r \mid t}(x_r \mid x_t = x) = \nabla \log p_r(x_r) + \nabla_{x_r}\log q_{t \mid r}(x_t = x \mid x_r).
Since q_{t \mid r}(x_t \mid x_r) = \mathcal N(x_t; \alpha_{t \mid r}x_r, \sigma_{t \mid r}^2 I), the correction term is \nabla_{x_r}\log q_{t \mid r}(x_t = x \mid x_r) = \frac{\alpha_{t \mid r}}{\sigma_{t \mid r}^2}(x - \alpha_{t \mid r}x_r).
Therefore, the associated probability-flow ODE is dX_r = \left[ b(X_r,r) - \frac12 g(r)^2 \left( \nabla \log p_r(X_r) + \nabla_{x_r}\log q_{t \mid r}(x_t = x \mid X_r) \right) \right]dr.
However, this ODE is not defined at r=t, since \bar p_t = \delta_x and the Gaussian correction term is singular in that limit. This is not a problem in practice: one can start at r=t-\varepsilon for some small \varepsilon > 0, for instance by taking one Euler-Maruyama step of the stochastic dynamics, and then integrate the ODE.
Posterior flow maps
The point of these methods is to amortize the posterior samplers of the previous sections. Instead of integrating the bridge ODE every time a new conditioning variable (s,t,x_t) is given, one learns a map that takes (s,t,x_t), together with auxiliary randomness if needed, and directly outputs a sample from an approximation of p_{s \mid t}(\cdot \mid x_t). This is the viewpoint adopted in [2, 3].
Since the target law depends on (s,t,x_t), the map itself must depend on these variables as well. In particular, the conditioning variable x_t must be provided as an input, which explains the “meta” terminology in [3].
An additional observation of [2] is that it is also possible to reuse a standard flow map for value-function estimation. Let h_t(x_t) = \mathbb E\left[e^{r_0(X_0)} \mid X_t = x_t\right]. Assume that we have a deterministic flow map \hat X_0^{t'} from time t' to time 0, and let t' > t. If the flow map is exact, then it pushes p_{t'} to p_0, and therefore \begin{aligned} h_t(x_t) &= \frac{1}{p_t(x_t)} \int e^{r_0(x_0)}q_{t \mid 0}(x_t \mid x_0)p_0(x_0)\,dx_0 \\ &= \frac{1}{p_t(x_t)} \int e^{r_0(\hat X_0^{t'}(x_{t'}))} q_{t \mid 0}(x_t \mid \hat X_0^{t'}(x_{t'})) p_{t'}(x_{t'})\,dx_{t'}. \end{aligned}
We now rewrite this integral by sampling x_{t'} from q_{t' \mid t}(\cdot \mid x_t) instead of from p_{t'}. This gives \begin{aligned} h_t(x_t) &= \frac{1}{p_t(x_t)} \int e^{r_0(\hat X_0^{t'}(x_{t'}))} q_{t \mid 0}(x_t \mid \hat X_0^{t'}(x_{t'})) \frac{p_{t'}(x_{t'})}{q_{t' \mid t}(x_{t'} \mid x_t)} q_{t' \mid t}(x_{t'} \mid x_t)\,dx_{t'} \\ &= \frac{1}{p_t(x_t)} \mathbb E_{x_{t'} \sim q_{t' \mid t}(\cdot \mid x_t)} \left[ \exp\left(v_{t,t'}(x_{t'},x_t)\right) \right], \end{aligned} where v_{t,t'}(x_{t'},x_t) := r_0(\hat X_0^{t'}(x_{t'})) + \log q_{t \mid 0}(x_t \mid \hat X_0^{t'}(x_{t'})) + \log p_{t'}(x_{t'}) - \log q_{t' \mid t}(x_{t'} \mid x_t).
Equivalently, \log h_t(x_t) = \log \mathbb E_{x_{t'} \sim q_{t' \mid t}(\cdot \mid x_t)} \left[ \exp\left(v_{t,t'}(x_{t'},x_t)\right) \right] - \log p_t(x_t).
Introduce the reparameterization x_{t'}(x_t,z) = \alpha_{t' \mid t}x_t + \sigma_{t' \mid t}z, with z \sim \mathcal N(0,I), and write v_{t,t'}(z,x_t) := r_0(\hat X_0^{t'}(x_{t'}(x_t,z))) + \log q_{t \mid 0}(x_t \mid \hat X_0^{t'}(x_{t'}(x_t,z))) + \log p_{t'}(x_{t'}(x_t,z)) - \log q_{t' \mid t}(x_{t'}(x_t,z) \mid x_t). Then \log h_t(x_t) = \log \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(v_{t,t'}(z,x_t)\right) \right] - \log p_t(x_t), and differentiating under the integral sign yields \nabla \log h_t(x_t) = \frac{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(v_{t,t'}(z,x_t)\right)\nabla_{x_t}v_{t,t'}(z,x_t) \right] }{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(v_{t,t'}(z,x_t)\right) \right] } - \nabla \log p_t(x_t).
Write x_{t'} = x_{t'}(x_t,z), \hat x_0 = \hat X_0^{t'}(x_{t'}), and J = J\hat X_0^{t'}(x_{t'}). Since \begin{aligned} \log q_{t' \mid t}(x_{t'} \mid x_t) &= C_{t,t'} - \frac{1}{2\sigma_{t' \mid t}^2}\|x_{t'} - \alpha_{t' \mid t}x_t\|^2, \\ \log q_{t' \mid t}(x_{t'}(x_t,z) \mid x_t) &= C_{t,t'} - \frac{1}{2\sigma_{t' \mid t}^2}\|\sigma_{t' \mid t}z\|^2 = C_{t,t'} - \frac12\|z\|^2, \end{aligned} one has \nabla_{x_t}\log q_{t' \mid t}(x_{t'}(x_t,z) \mid x_t) = 0. Therefore \begin{aligned} \nabla_{x_t}v_{t,t'}(z,x_t) &= \alpha_{t' \mid t}J^\top \nabla r_0(\hat x_0) + \nabla_{x_t}\log q_{t \mid 0}(x_t \mid \hat x_0) + \alpha_{t' \mid t}\nabla \log p_{t'}(x_{t'}) \\ &= \frac{\alpha_t}{\sigma_t^2}\hat x_0 - \frac{1}{\sigma_t^2}x_t + \alpha_{t' \mid t}\nabla \log p_{t'}(x_{t'}) \\ &\quad + \alpha_{t' \mid t}J^\top \left( \nabla r_0(\hat x_0) + \frac{\alpha_t}{\sigma_t^2}(x_t - \alpha_t \hat x_0) \right). \end{aligned}
This is the form used in Appendix A.9 of [2].
Now, we also need to estimate v_{t,t'}(z,x_t) itself, since it determines the importance weights. The exact quantity above is intractable because it contains \log p_{t'}(x_{t'}). Following Appendix A.9 of [2], we only identify it up to an additive term independent of z. Define \gamma_{t,t'}(x_t,x_{t'}) := \left[ \int_0^1 \nabla \log p_{t'}(x_t + u(x_{t'}-x_t))\,du \right]^\top (x_{t'}-x_t), so that \log p_{t'}(x_{t'}) = \log p_{t'}(x_t) + \gamma_{t,t'}(x_t,x_{t'}). Then \begin{aligned} v_{t,t'}(z,x_t) &= \bar v_{t,t'}(z,x_t) + \log p_{t'}(x_t) + C, \\ \bar v_{t,t'}(z,x_t) &:= r_0(\hat x_0) - \frac{1}{2\sigma_t^2}\|x_t - \alpha_t \hat x_0\|^2 + \gamma_{t,t'}(x_t,x_{t'}) + \frac12\|z\|^2, \end{aligned} where C does not depend on z.
Therefore, the normalized weights only depend on \bar v_{t,t'}, since the missing term \log p_{t'}(x_t)+C cancels out: \frac{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(v_{t,t'}(z,x_t)\right)A(z,x_t) \right] }{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(v_{t,t'}(z,x_t)\right) \right] } = \frac{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(\bar v_{t,t'}(z,x_t)\right)A(z,x_t) \right] }{ \mathbb E_{z \sim \mathcal N(0,I)} \left[ \exp\left(\bar v_{t,t'}(z,x_t)\right) \right] }.
Given i.i.d. samples z_1,\dots,z_N \sim \mathcal N(0,I), one therefore uses \nabla \widehat{\log h_t}(x_t) = \sum_{i=1}^N w_i \nabla_{x_t}v_{t,t'}(z_i,x_t) - \nabla \log p_t(x_t), \qquad w_i = \frac{\exp(\bar v_{t,t'}(z_i,x_t))}{\sum_{j=1}^N \exp(\bar v_{t,t'}(z_j,x_t))}.
Thus, for gradient estimation, it is enough to evaluate the relative correction \gamma_{t,t'}(x_t,x_{t'}) from x_t to x_{t'}. Estimating the value function itself requires an additional offset term, as discussed later in [2].
For t' close to t, one can also use the identity \log p_{t'}(x_{t'}) = \log p_t(x_t) + \log q_{t' \mid t}(x_{t'} \mid x_t) - \log p_{t \mid t'}(x_t \mid x_{t'}), in which the term \log p_t(x_t) is independent of the Monte Carlo sample and therefore again disappears in the normalization. In that regime, p_{t \mid t'}(x_t \mid x_{t'}) may be approximated by a Gaussian Euler step of the reverse SDE, whose drift depends on the score at time t', and therefore on the denoiser. This provides a local approximation of the weights.
References
[1] Peter Holderrieth, Uriel Singer, Tommi Jaakkola, Ricky T. Q. Chen, Yaron Lipman, and Brian Karrer. GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models. arXiv:2509.25170, 2025.
[2] Peter Holderrieth, Douglas Chen, Luca Eyring, Ishin Shah, Giri Anantharaman, Yutong He, Zeynep Akata, Tommi Jaakkola, Nicholas Matthew Boffi, and Max Simchowitz. Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps. arXiv:2602.05993, 2026.
[3] Peter Potaptchik, Adhi Saravanan, Abbas Mammadov, Alvaro Prat, Michael S. Albergo, and Yee Whye Teh. Meta Flow Maps enable scalable reward alignment. arXiv:2601.14430, 2026.
[4] Romeo Passaro, Zander W. Blasingame, Michael M. Bronstein, and Alexander Tong. Stochastic Few-step Models. ICLR 2026 DeLTa Workshop, 2026.