$$f(x,y) = u_c(x,y) + \eta(x,y)$$

where $\eta(x,y)$ is a random variable from a Gaussian distribution of mean zero and standard deviation $\sigma$. Thus, the image denoising problem is the inverse problem of finding $u_c$ given the noisy image $f$ and some statistics on the noise.

One of the most celebrated and utilized image denoising models is the TV-Denoising Model from Rudin, Osher, and Fatemi. The model is in a functional minimization form:

$$\min_{u} \left\{ J[u] = \frac{1}{2}\int_{\Omega} (f-u)^2\,dx + \lambda \int_{\Omega} |\nabla u| \right\}$$

where $\Omega$ is the image domain (rectangle) and the Total variation semi-norm $\int |\nabla u|$ is defined in the distributional sense:

$$TV(u) = \int_{\Omega}|\nabla u| = \sup \left\{\int_{\Omega} u(x) \xi(x) \, dx \, | \, \xi \in C_c^1(\Omega, \mathbf{R}^n), \ \|\xi\|_{\infty} \leq 1\right\}.$$

The model is a balance between data fitting and regularity where the parameter $\lambda$ controls this tradeoff.

There are numerous ways to minimize the above TV model but one of the simplest is via gradient descent:

$$u_ t = -\nabla J[u] = f-u + \nabla \cdot \frac{\nabla u}{|\nabla u|}$$

where $\nabla J[u]$ denotes the functional gradient (more on this in later blog posts!). Intermediate results for increasing values of $t$ from the gradient descent to minimize the TV model are observed below. Note how the noise is removed as the iterates approach a minimum of the functional $J[u]$. ]]>

Lots of fun conversations and stories at dinner as well. Looking forward to next year!

]]>Here's a photo from last year's induction ceremony and talk (2015). Our invited speaker was Alice Silverberg from UCI.

]]>$$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.$$

Newton's method applied to the function $f(x) = x^3 + 5$ is seen below using the starting point $x_0 = -5$. Within 7 iterations, $|f(x_7)|<10^{-15}$. ]]>

Let $F(\mathbf{x})$ be a function defined and differentiable on some domain $\Omega \subseteq \mathbb{R}^n$. Then, gradient descent is the following iterative scheme:

$$

\mathbf{x}^{n+1} = \mathbf{x}^n -\delta \nabla F(\mathbf{x}^n) \ \ \ (1)

$$

where $\, \delta$ is a time-step parameter and $\, \nabla F(\mathbf{x})$ denotes the gradient of $\, F$. From multi-variable calculus, we know that for a function of 2 variables $z=F(x,y)$ with graph in $\mathbb{R}^3$, the gradient vector points in the direction of steepest ascent of the function $\, F$. Thus, $-\nabla F$ points in the steepest descent direction. This is the key premise of gradient descent. The method is local in nature and for a sufficiently well behaved function ($\nabla F$ Lipschitz continuous), will always move towards a minima from the point obtained from the previous iteration. Thus, it will always converge in that case. Notable caveats include slow convergence near minima, linear convergence rate at best, and non-optimal convergence if the Hessian Matrix is ill-conditioned. An example of slow convergence due to a zig-zagging phenomenon is observed below. ]]>