# Optical tomography with discretized path integral

**Bingzhi Yuan**,

**Toru Tamaki**,

**Takahiro Kushida**,

**Bisser Raytchev**,

**Kazufumi Kaneda**

Hiroshima University, Department of Information Science, Graduate School of Engineering, 1-4-1 Kagamiyama, Higashi-Hiroshima 739-8527, Japan

**Yasuhiro Mukaigawa**,

**Hiroyuki Kubo**

Graduate School of Information Science, NAIST, 8916-5, Takayama, Ikoma, Nara 630-0192, Japan

*J. Med. Imag*. 2(3), 033501 (Aug 13, 2015). doi:10.1117/1.JMI.2.3.033501

#### Open Access

**Abstract.**
We present a framework for optical tomography based on a path integral. Instead of directly solving the radiative transport equations, which have been widely used in optical tomography, we use a path integral that has been developed for rendering participating media based on the volume rendering equation in computer graphics. For a discretized two-dimensional layered grid, we develop an algorithm to estimate the extinction coefficients of each voxel with an interior point method. Numerical simulation results are shown to demonstrate that the proposed method works well.

Optical tomography^{1}^{–}^{8} is known as a safer alternative to x-ray tomography. Usually, tomography consists of a light source generating penetrative light and a detector capturing the light, which allows to estimate the inside of the object through which the light is passing. The most important application is x-ray computed tomography (CT), where x rays are used due to their penetrative property. The balance between the radiation exposure of the human body and the quality of the obtained results has been debated since the early days when x-ray CT was invented. Therefore, there is an urgent demand for a safer medical tomography, such as optical tomography.

Modeling the behavior of light plays an important role in optical tomography, and in the mesoscale, in which the wavelength of light is close to the scale of tissue, the radiative transport equation (RTE) is used for describing the behavior of light scattering.^{5}^{,}^{9} At the macroscale,^{6} the time-independent or dependent RTE is often approximated with a diffusion equation.

Similarly, the computer graphics community has the used time-independent RTE, and in contrast to the (surface) rendering equation,^{10}^{,}^{11} often calls it the volume rendering equation (VRE).^{10}^{,}^{12}

^{13}

^{,}

^{14}The path integral, which can be considered as a discrete version of the continuous Feynman path integral,

^{15}

^{,}

^{16}has been recently employed to solve the VRE in an efficient way with Monte Carlo integration, such as Metropolis light transport

^{17}

^{,}

^{18}or bidirectional path tracing.

^{19}

In this paper, we propose an optical tomography method using path integral as a forward model and solving a nonlinear inverse problem that minimizes the discrepancy between measurements and model predictions in a least-squares sense. To the best of our knowledge, the discretized path integral has not been used in optical tomography before. In our work, we simplify the path integral with some assumptions. The path integral, as the name suggests, gathers (or integrates) the contributions of all possible paths of light.^{17}^{,}^{18}^{,}^{20}^{–}^{23} We approximate the integral of an infinite number of paths with the sum of a finite number of paths, discretize a continuous medium into voxels of a regular grid, and continuous light paths into discrete ones (i.e., polylines). We deal with anisotropic scattering having a peak in the forward direction, which is different from other discretization methods using discrete ordinate or spherical harmonics.^{13}^{,}^{24}^{,}^{25} In this work, we focus on estimating the spatially varying extinction coefficient $\sigma t(x)$ at each discretized voxel location of the medium while fixing scattering properties (e.g., scattering coefficients $\sigma s$ and phase functions $fp$). By separating the scattering properties from our problem, we formulate optical tomography as an optimization problem with inequality constraints solved by an interior point method.

An interior point method^{26} is an iterative method to solve an optimization problem with inequality constraints describing a feasible region in which the optimal solution must reside. To this end, a series of nonconstrained optimization problems are constructed by combining the constraints and the original objective function and are solved by an ordinal gradient-based (Quasi-Newton) method.

To summarize our contribution, we reformulate the problem of optical tomography by combining a path integral with several simplifying assumptions to model the light transport in the participating media. This paper is an extension of our previous conference version^{27}^{,}^{28} with additional theoretical background and additional experiments and discussions, and is structured as follows. In Sec. 2, we briefly review previous work related to path integrals and optical tomography. In Sec. 3, we describe how to model the light transport in participating media and turn optical tomography into an optimization problem. In Sec. 4, we show how to solve the optimization problems. Section 5 reports some simulation results, and Sec. 6 concludes the paper.

In this section, we briefly review related work on optical tomography and path integrals in computer graphics.

Optical tomography^{4}^{,}^{5} (or inverse transport,^{6}^{,}^{7} inverse scattering,^{29} scattering tomography^{30}^{,}^{31}) is a problem in medical imaging using light sources to reconstruct the optical properties of tissue from measurements (time-dependent or stationary, angular-dependent or independent) at the surface boundary. Analytically solving the RTE [Eq. (1)] with boundary conditions is difficult, however, and approximations, such as discrete ordinates and $N$’th-order spherical harmonics ($PN$ approximation), are often used and solved numerically by, for example, finite element methods (FEM) or finite difference methods. The famous diffuse approximation^{5}^{,}^{6} (DA) is a $P1$ (thus first-order) approximation with the assumption on a phase function being isotropic. The DA is an approximation to RTE at a macroscopic scale when scattering is large while absorption is low and scattering is not highly peaked. Diffuse optical tomography (DOT) is based on DA and today represents the frontier of optical tomography^{32}^{,}^{33} with many clinical applications.^{34} DA, however, does not often hold in realistic participating (scattering) media; absorption may not be small compared to scattering, and the shapes of the phase functions can be highly peaked in the forward direction, which is often modeled by Henyey-Greenstein,^{35} Schlick,^{36} or Mei and Rayleigh phase functions.^{10}^{,}^{12}^{,}^{37}^{,}^{38} Experimental evidence^{39} also suggests a highly peaked shape of the phase functions in biological media. DOT works, but is still limited; therefore, other methods have also been studied for cases when DA does not hold.

Statistical Monte Carlo methods are used for media in which the assumptions do not hold; however, they are computationally intensive and inefficient for solving the forward problem,^{4}^{–}^{7}^{,}^{34} i.e., solving the RTE with given parameters. Therefore, Monte Carlo based approaches have been used for estimating the spatially constant (not varying) parameters in homogeneous media, such as paper,^{40}^{,}^{41} clouds,^{42} liquids,^{43} plastics,^{44} or uniform material samples.^{45} Another difficulty of Monte Carlo based inverse methods is that an analytical forward model prediction is hard to obtain when we want to minimize the difference between the prediction and measurements except for very special structures.^{46}^{,}^{47} A gradient based least-square approach has been proposed but only for spatially constant parameter estimation,^{40}^{,}^{41}^{,}^{48} while model-free approaches have relied on genetic algorithms,^{42}^{,}^{44} numerical perturbation,^{49}^{,}^{50} voting,^{51} or even simple backprojection.^{52} One of the contributions of the current paper is to enable us to use a gradient based optimization approach for estimating spatially varying parameters, which is extensible by using many optimization methods.

Similar to optical tomography, modeling light transport plays a very important role in computer graphics. Our own work on optical tomography is inspired by Monte Carlo based statistical methods. In the last two decades, methods based on path integrals^{17}^{–}^{19}^{,}^{53}^{–}^{55} have provided models of light transport for efficient volume rendering. For solving RTE, a path integral has been used for a forward problem solver,^{16}^{,}^{56}^{,}^{57} and has also been applied to optical tomography, but under the diffusion assumption.^{58}^{,}^{59} Our proposed method is based on a path integral to explicitly express the forward model prediction, which is very suitable for solving the inverse problem with gradient based methods. This is an advantage of our method over existing methods because the paths used in the forward model can be generated by either a deterministic or statistical (Monte Carlo) method. To achieve an efficient forward model, we introduce a simplified layered scattering model that uses a limited number of deterministic paths instead of Monte Carlo simulated ones.

We deal with the following optical tomography problem [this is a conceptual formulation and the actual problem is shown in Eq. (29)].

In the forward problem, as we mentioned before, we use a path integral to build a mathematical model for the light transport. Here, we follow the notation developed in the computer graphics literature^{17}^{,}^{23}^{,}^{53}^{,}^{60} to introduce the path integral. Sections 3.2 to 3.6 will show the simplified model we propose.

Given a space $R3$, a light source is located at $x0\u2208R3$ and a detector at $xM+1\u2208R3$, and in between them is the participating media $\nu \u2282R3$ with boundary $\u2202\nu $ and interior volume $\nu 0\u2254\nu \u2216\u2202\nu $. A light path $x\u02dc$ connecting $x0$ and $xM+1$ of length $M+2$ consists of $M+2$ vertices $xm\u2208R3$ for $m=0,1,\u2026,M+1$, denoted by $x\u02dc=x0,x1,\cdots ,xM,xM+1$. Thus, absorption, scattering, or reflection events happen at $x1,\u2026,xM$. The set of all paths of length $M$ is denoted by $\Omega M$. The path space $\Omega $ is the countable set of all paths $\Omega M$ of finite length.

A direction is denoted by $\omega \u2208S2$, where $S2$ is a unit sphere in $R3$. A unit vector $\omega xm,xm+1$ is the direction from vertex $xm$ to vertex $xm+1$ in a path $x\u02dc$.

Veach^{20} introduced a framework representing the rendering equation in the form of a path integral for scenes without participating media (i.e., no scattering), and later, Pauly et al.^{17} extended it to the volume rendering equation with scattering. The amount of light $I$ observed by the detector is given by the path integral

Here, the bidirectional scattering distribution function $fs(xm\u22121,xm,xm+1)$ is used for locations on the surface of objects, and the scattering coefficient $\sigma s(xm)$ at $xm$ and the phase function $fp(xm\u22121,xm,xm+1)$ are used for those inside the medium. $G(xm,xm+1)$ is a generalized geometric term.

Putting all together, we have a path integral of the following infinite sum of all possible path contributions.

Note that all vertices ${xm}$ depend on a path $k$; different paths have different sets of vertices. In the equation above, however, we omit the path index $k$ for simplicity. Later, we will again use $k$ as the path index.

As our target is optical tomography, we restrict the model to deal with inside the participating media. To do so, we assume that the light source $x0$ and detector $xM+1$ are located on the surface, and the other vertices $x1,x2,\u2026,xM,xM+1$ are inside the medium, that is, $x0,xM+1\u2208\u2202\nu $ and $x1,\u2026,xM\u2208\nu 0$. Then the transmittance is simplified as

Furthermore, we assume that the observations are ideal and the camera response function is the identity, $We(xM,xM+1)=1$.

Apart from the assumptions above, we rewrite the geometric term and the differential measure. The definitions above use area measures $dA(xm)$ and volume measures $dV(xm)$ along with the squared distance geometric term;^{17}^{,}^{23}^{,}^{53} however, steradian measures $d\omega (xm)$ and the identity geometric term is equivalent and also widely used.^{10}^{,}^{12}^{,}^{60}

Therefore, we employ the steradian measures and rewrite it as follows:

Now, Eq. (12) is written as

For numerical computation, we first discretize the medium into voxels of a regular grid, where each voxel has its own extinction coefficient $\sigma t[b]$ ($b$ is the index of the voxel) as shown in Fig. 1.

**F1 :**

Illustration of a discretization example. (a) Voxelization of the medium into a regular grid of size $5\xd75$. Voxels are indexed in raster scan order in this example, from left to right, and top to bottom. Each voxel $b$ has extinction coefficient $\sigma t[b]$. (b) A path segment between vertices $x1$ and $x2$. Voxels involved in the segment are shaded. (c) Lengths $d12[b]$ of the involved voxels $b=2,3,8,9$. Here we denote $d12[b]$ instead of $dx1,x2[b]$ for simplicity.

With this voxelization, the paths of light are also divided into segments, as explained below. First, we explain the integral [Eq. (11)] along a single segment $xm,xm+1$ of a path $x\u02dc$. It describes the attenuation of light along the segment due to the extinction coefficients of the voxels involved. Because of the discretization of the medium, Eq. (11) can be written as a sum of voxel-wise multiplications.

For the second equality, $b$ is the index of a set $Bxm,xm+1$ of all voxels involved by segment $xm,xm+1$, and $dxm,xm+1[b]$ is the length of the part of the segment $xmxm+1$ passing through voxel $b$. This is illustrated in Fig. 1(c). The extinction coefficient $\sigma t$ is now a piece-wise constant function because of the voxelization; then the integral turns into a sum (the idea that this integral can be turned into a sum has been discussed before,^{61} however, not in the context of tomography).

This simplifies the computation; however, the sum over a set $Bxm,xm+1$ is not preferable in terms of implementation and optimization. We propose here to use a vector representation of both extinction coefficients and segment lengths, which is the third equality of the above equation. The first vector $\sigma t$ stores the values of the extinction coefficients $\sigma t[b]$ of all voxels. This vector can be generated by serializing the voxels on the grid in a certain order. The second vector $dxm,xm+1$ contains the values of the lengths $dxm,xm+1[b]$ for all voxels. We should note that this vector is very sparse; most of the voxels have no intersection with the segment $xm,xm+1$. Hence, only a few elements in $dxm,xm+1$ have nonzero values, and the other elements are zero because those voxels $b$ have no intersection and $dxm,xm+1[b]=0$.

This sparsity of the vector facilitates the construction of a whole path $x\u02dc$ because path segments can be added as follows:

Using this notation to rewrite Eq. (17), we have

As a first attempt, we design a two-dimensional (2-D) layered grid, instead of the three-dimensional (3-D) one. Since we voxelize the medium into a regular grid, the 2-D medium consists of parallel layers. Hereafter, a 3-D direction $\omega $ between vertices is written as a 2-D direction $\theta $ and a steradian measure $d\omega $ as an angular measure $d\theta $.

As shown in Fig. 2, we assume a particular layer scattering having the following properties. First, vertices $x1,\cdots ,xM$ of path $x\u02dc$ are located at the centers of each voxel. The light source $x0$ is located on the boundary of the top surface of the voxels in the top layer. Similarly, the detector $xM+1$ is located on the boundary of the bottom surface of the voxels in the bottom layer. Second, directions $\theta x0,x1$ and $\theta xM,xM+1$ at the beginning and end of a path are perpendicular to the boundary. This means that scattering begins at $x1$ and ends at $xM$. Third, forward scattering happens layer by layer. More specifically, light is scattered at the center of a voxel in a layer and then goes to the center of a voxel in the next (below) layer. Scattering is assumed to happen every time the light traverses voxel centers. Even if the next voxel is just below the current voxel and the path segment is straight, it is regarded as scattering. Fourth, the scattering coefficient is uniform, $\sigma s(x)=\sigma s$.

**F2 :**

Proposed two-dimensional layered model of scattering. This example shows path $x\u02dc$ consisting of vertices $x1,\cdots ,xM$ located at the centers of voxels in a grid with $M$ parallel layers. $x0$ is a light source located on the top surface, and $xM+1$ is a detector at the bottom. At each vertex, the light scatters to voxels in the next layer, and possible scattering directions are indicated by arrows.

By ignoring paths exiting from the sides of the grid, the number of all possible paths is $NM$, where $M$ is the number of layers and $N$ is the number of voxels in one layer.

We use a Gaussian model $fp(\theta ,\sigma 2)$ as an approximation of the phase function

First, existing phase function models^{10}^{,}^{12}^{,}^{35}^{–}^{38} are those for 3-D scattering, not for 2-D. This means that those functions are normalized for integrals over the unit sphere $S2$: $\u222bS2fp(\omega )d\omega =1$. Most of the phase functions assume isotropy (rotational symmetry), and hence, the function has a form taking angle $\theta $ as an argument; however, $\u222b\u2212\pi \pi fp(\theta )d\theta \u22601$. These functions, therefore, are not adequate for our case.

Second, our assumption of layer-wise forward scattering does not allow scattering to happen backwards or sideways, and the Gaussian model is suitable for it. As shown in Fig. 3, the Gaussian model has the form of forward-only scattering (no backwards or sideways) in a reasonable range of $\sigma 2$, and it is almost normalized; $\u222b\u2212\pi /2\pi /2fp(\theta ,\sigma 2)d\theta \u22481$. Other 2-D phase functions exist which are not forward-only. For example, Heino et al.^{62} introduced a 2-D analog of Henyey-Greenstein’s phase function,^{35} shown in Fig. 3. Although the parameters are different, the two functions in Fig. 3 have similar shapes. The most important difference is that Heino’s function has backward scattering, but our Gaussian model does not. More realistic scattering rather than the layer-wise forward scattering introduced here needs Heino’s or Henyey-Greenstein’s phase function.

**F3 :**

Comparison of two-dimensional phase functions. The upward vertical direction is $\theta =0$, and horizontal directions are $\theta =\xb1(\pi /2)$. (a) Gaussian approximated phase functions with $\sigma 2=0.1,0.2,\u2026,1.0$. The tallest and narrowest shapes correspond to $\sigma 2=0.1$, and the shape becomes shorter and rounder for larger values of $\sigma 2$. (b) Heino’s two-dimensional analogs^{62} of Henyey-Greenstein’s phase function with parameter $g=0.1,0.2,\u2026,1.0$. The tallest and narrowest shapes correspond to $g=1.0$, and the shape becomes shorter and close to a hemisphere for smaller values of $g$.

We should note one further simplification in our layer-wise forward scattering model. The angle $\theta m$ in the phase function is usually defined between $\theta xm\u22121,xm$ and $\theta xm,xm+1$, that is, the difference of directions changed by the scattering event. Instead of dealing with such an exact difference of directions, we use the angle between $\theta xm,xm+1$ and the vertical (downward) direction for efficiency of computation. This assumption enables us to discretize the Gaussian phase function much more easily. Since $fp(\theta )$ integrates to (approximately) one, such a normalization can be discretized with a sum as follows:

The above equation can be considered as the energy distribution from a voxel in one layer to the voxels in the next layer. For a voxel $b$ at direction $\theta b$, the value of $fp(\theta b,\sigma 2)\Delta \theta b$ describes what percentage of the energy will be scattered to this voxel. Figure 5 shows plots of the values corresponding to two phase functions with different parameters. We can see that, due to forward scattering, most of the energy is concentrated in the voxel just below, and a small part goes to the adjacent voxels.

**F5 :**

(a) The phase functions with parameter $\sigma 2=0.2$ (dashed line) and $\sigma 2=0.4$ (solid line). Plot of the value $fp(\theta b,\sigma 2)\Delta \theta b$ for each voxel $b$ for (b) $\sigma 2=0.2$ and (c) $\sigma 2=0.4$. Note that index $b$ is relative to the voxel in the next layer just below the voxel in consideration. The voxel just below is $b=0$, the voxel on its right side is $b=1$, and that on the left side is $b=\u22121$.

The contribution $Hk$ in Eq. (22) now needs to be rewritten so that it deals with the Gaussian phase function and the discretized energy distribution discussed above. First, we reorder the measure

Note that the factor $dA(x0)\Delta \theta x0,x1\sigma sM$ is common for all paths because we assumed that the grid is uniform so that $dA(x0)$ is constant, and the direction $\theta x0,x1$ (or $\omega x0,x1$) is perpendicular to the top surface, and $\sigma s$ is constant.

Suppose the 2-D layered medium is an $M\xd7N$ grid; it has $M$ layers, each of which is made of $N$ voxels. We now construct an observation model of the light transport between a light source and a detector: emitting light to each of the voxels at the top layer, and capturing light from each voxel from the bottom layer. More specifically, let $i\u2208B1$ and $j\u2208BM$ be voxel indices of the light source and detector locations, respectively. By restricting the light paths to only those connecting $i$ and $j$, the observed light $Iij$ is written as follows:

In the above equation, $k$ indexes the light paths, which share the same $i$ and $j$. Due to the layered scattering model in the $N\xd7M$ grid, the number of different paths between $i$ and $j$ is $Nij=NM\u22122$. This is, however, too large even for small $N$ and $M$, e.g., $N=M=10$. Therefore, we exclude paths having small contributions from the computation. This is done by a simple thresholding while computing $Hijk$ as shown in Algorithm 1. This results in generating fewer paths; $Nij\u2264NM\u22122$. For example, there are $Nij=742$ paths for $N=M=20$ with $\sigma 2=0.4$ when $th=0.001$, which enables us to reduce the computation cost.

Input: Threshold th, path $x\u02dc=x0,\cdots ,xM+1$. |

Output: Contribution $Hijk$. |

1 $Hijk=1$; |

2 for$m=1$to$M$do |

3 $Hijk=Hijkfp(\theta xm,xm+1,\sigma 2)\Delta \theta xm,xm+1$ |

4 if$Hijk\u2264th$then |

5 stop; |

6 omit this path; |

7 accept this path; |

8 return $Hijk$; |

Next, we propose a method for the inverse problem of the forward model [Eq. (28)] to estimate the extinction coefficients of the 2-D layered model. As we mentioned before, we fix the light paths and assume that the scattering coefficients and parameters of the Gaussian phase function are uniform and known in advance.

In the $M\xd7N$ 2-D layered medium described in Sec. 3.6, we had assumed a configuration of a light source and detector similar to the left-most one shown in Fig. 6; the light source is located above the medium and the detector is below, and the observed light is $Iij$, where $i,j$ are the voxel indices of the light source and detector locations. By sliding the light source and the detector, we can obtain $N2$ observations, resulting in the following least-squares equation:

Furthermore, as shown in Fig. 6, we have four configurations of light sources and detectors by changing their positions. This gives us four different sets of observations $Iij$ and paths $ijk$. These four different sets lead to four objective functions ($fT2B$, $fL2R$, $fB2T$, $fR2L$) as shown in Fig. 6. Since the four objective functions share the same variables $\sigma t$, we can use all of them at the same time by adding them to form a new single function $f0$ at the expense of additional (factor of four) computation cost.

Since the inverse problem [Eq. (31)] is nonlinear, we employ an interior point method,^{26} an iterative optimization algorithm for problems with constraints. Here, we first review several key points in optimization; then we will develop an algorithm to solve Eq. (31) along with the required first- and second-order derivatives of the cost function.

First, we review optimization without constraints, which is used inside the interior point method. The general form of unconstrained optimization is

To solve it, an iterative procedure begins with an initial guess $\sigma t0$ and generates a sequence ${\sigma tk}k=0\u221e$. It stops when the change of solutions is small enough. The information about function $f$ at $\sigma tk$ or even previous estimates $\sigma t0,\sigma t1,\cdots ,\sigma tk\u22121$ is used to calculate a direction $pk$ to move with a step size $\alpha k$. A line search is often used to determine the step size by searching along the direction starting from $\sigma tk$ for finding $\sigma tk+1$ with the least value of the objective function

Once we find the step size, the estimate $\sigma tk+1$ is updated as $\sigma tk+1\u2190\sigma tk+\alpha kpk$. The direction is $pk=\u2212Bk\u2207f(\sigma tk)$ for the Newton’s method, where $Bk=\u22072f(\sigma tk)\u22121$ is the inverse of the Hessian.

The Newton’s method is well known for its second-order convergence and accuracy. However, when the dimension of the problem is large, calculating the Hessian and its inverse is computationally expensive. Therefore, Quasi-Newton methods are often used, where the inverse Hessian is updated by incremental approximations in order to reduce the computation cost. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) update rules are well known.^{63}

When the conditions $yTs>0$ and $B0\u227b0$ (where $\u227b0$ means positive definite) are satisfied, the BFGS update guarantees the positive definiteness of $Bk$. Algorithm 2 shows the Quasi-Newton method.

Input: A feasible initial solution $\sigma t0$, and $B0\u227b0$. |

Result: An estimate $\sigma t\u22c6$. |

1 repeat |

2 Compute the Quasi-Newton direction: $pk=\u2212Bk\u2207f(\sigma tk)$. |

3 Find step length $\alpha k$ with line search. |

4 Update estimate $\sigma tk+1\u2190\sigma tk+\alpha kpk$. |

5 Update $Bk$ with BFGS. |

6 until convergence; |

Here we introduce a constrained optimization with inequality constraints of the form

The idea is to approximate it as an unconstrained problem. Using Lagrange multipliers, we can first rewrite Eq. (37) as

The barrier method^{26} is an interior point method that introduces a logarithmic barrier function to approximate the indicator function $I$ as follows:

The barrier method solves Eq. (42) iteratively by increasing the parameter $t$. At the limit of $t\u2192\u221e$, the above problem coincides with the original problem [Eq. (38)].

Algorithm 3 shows our algorithm, which uses a barrier method with Quasi-Newton for solving the inverse problem. We should mention the following parts where we have modified the original algorithm.^{26}

Data: Parameters $\mu >1$, $\u03f5>0$, and $t=tinit>0$. |

Input: A feasible initial estimate $\sigma t0$, and $B\u227b0$. |

Result: An estimate $\sigma t\u22c6$. |

1 while$(2MN/t)\u2265\u03f5$do // outer loop: barrier method |

2 $t\u2190\mu t$. |

3 Set a log-barriered cost function |

$f(t)=tf0\u2212\u2211b{log(\sigma t[b])+log(u\u2212\sigma t[b])}$ |

4 $k\u21900$, $Bk\u2190B$, $\sigma tk\u2190\sigma t$. |

5 repeat // inner loop: Quasi-Newton |

6 Compute the Quasi-Newton direction: $pk=\u2212Bk\u2207f(\sigma tk)$. |

7 Find step length $\alpha k$ with line search. |

8 while$\sigma tk+\alpha kpk$ is not feasible do |

9 Halve the step size: $\alpha k\u2190\alpha k/2$. |

10 Update estimate $\sigma tk+1\u2190\sigma tk+\alpha kpk$. |

11 $s=\sigma tk+1\u2212\sigma tk$. |

12 $y=\u2207f(\sigma tk+1)\u2212\u2207f(\sigma tk)$. |

13 if$yTs>0$then |

14 Update $Bk+1$ with BFGS [Eq. (36)]. |

15 else |

16 Reset $Bk+1\u2190(yTs/yTy)I$. |

17 $k\u2190k+1$. |

18 until$(1/2)\u2207f(\sigma tk+1)TBk+1\u2207f(\sigma tk+1)\u2264\u03f5$; |

19 $B\u2190Bk+1$, $\sigma t\u2190\sigma tk$. |

Warm start: For each inner loop, the Quasi-Newton method needs an initial guess of the inverse Hessian $B0$. Instead of fixing $B0$ for every inner loop, we reuse the $Bk$ of the last inner loop to accelerate the convergence (shown in lines 4 and 19 in Algorithm 3).

Checking feasibility: Since the Quasi-Newton method and line search estimate without constraints, the next estimate $\sigma tk+1$ may go beyond the constraints; in our case, each element $\sigma tk+1[b]$ in $\sigma tk+1$ must be inside $[0,u]$ after the step size has been determined. Therefore, in line 8, we check the feasibility of the estimate $\sigma tk+1$ for the current step size $\alpha k$. If it exceeds the boundary of the feasible region, we pull the estimate back into the feasible region by halving the step size. If it is still outside the feasible region, then the step size is halved again. Why do we not just set the step size so that $\sigma tk+1$ is exactly on the boundary? The reason is the log-barrier: if $\sigma tk+1$ is on the boundary, in other words, $\sigma tk+1[b]$ is either 0 or $u$, then $log(\sigma t[b])$ or $log(u\u2212\sigma t[b])$ becomes infinite, which results in numerical instability. Therefore, the procedure described above is needed.

Checking for positive definiteness: The BFGS update rules guarantee $Bk$ to be positive definite if $yTs>0$ and $B\u227b0$ are satisfied. While the latter is satisfied by giving an appropriate initial guess, the former depends on the updates at each iteration. If it is not satisfied, then the BFGS updates are no longer valid, and we reset the inverse Hessian $Bk$ to a scaled identity^{63} at line 16.

Here, we represent the Jacobian of the objective function $f0$ in Eq. (29). Note that the objective function $f0$ in Eq. (31) can be derived in the same manner.

We first rewrite the objective function $f0$ as follows:

To simplify the equation, we use the following notation:

Now, $f0$ and the gradient can be represented as