Over the past several years, topics machine learning and optimization have been hot. It appears to me that they have number of connections with communication theory and statistical inference too. Actually, machine learning, statistical inference, and communication theory all address a situation, in which a source is mixed with addition degrees of freedom in a process, and is recovered in the destination. They all can be fomulated as an optimization problem.
Consider the generic equation, possibly over vectors:
\(g\)\(\left[\vphantom{h \left[z ,\, x\right]}\right.\)\(h\)\(\left[\vphantom{z ,\, x}\right.\)\(z\)\(,\,\)\(x\)\(\left.\vphantom{z ,\, x}\right]\)\(\left.\vphantom{h \left[z ,\, x\right]}\right]\)\(\;=\;\)\(\left\langle\vphantom{w ,\, y}\right.\)\(w\)\(,\,\)\(y\)\(\left.\vphantom{w ,\, y}\right\rangle\)
In statistical inference, \(x\) is usually a boolean value, namely \(0\) or \(1\) standing for the null hypothesis and the alternative hypothesis, and \(y\) the estimated hypothesis. Moreover, \(z\) is the event space, and \(h\) the sample, \(g\) the estimator. A large sample property guarantees the success of estimation.
In machine learning, \(x\) is the collection of parameters of the hypothesis function, and \(y\) the estimated collection of parameters. Moreover, \(z\) is the event space, and \(h\) the data, \(g\) the learning method. A large sample property guarantees the success of estimation. This is a more general than the case of statistical inference.
Meanwhile, in communication theory, \(x\) is the transmitter, \(y\) the receiver, and \(z\) the noise. Here the situation is more complicated; in addition to \(z\), the large sample property of \(x\) and \(y\) are explored too. In fact, \(x\) not only is taken to be identically distributed per channel use, but is further encoded before being fed to the channel; similarly, \(y\) is further decoded after being output from the channel.
These can be formulated as optimization problems. Indeed, let \(x\) be the collection of parameters, and \(z\) the sample points. Then \(h\) is the target function over \(x\), and \(g\) the optimization algorithm over \(y\). A large sample property guarantees the success of optimization, regardless of the chosen \(z\).
❧ September 20, 2021