Markov Chain Monte Carlo Without All the Bullshit (2015)

132 points by ibobev 9 hours ago

seanhunter 5 hours ago

Hmm. I'm not an expert, but some of this seems definitely not to be accurate. Some of the "Bullshit" turns out perhaps to be quite important.

Take the statement:

> Markov Chain is essentially a fancy name for a random walk on a graph

Is that really true? I definitely don't think so. To my understanding, a Markov process is a stochastic process that has the additional (aka "Markov") property that it is "memoryless". That is, to estimate the next state you only need to know the state now, not any of the history. It becomes a Markov chain if it is a discrete, rather than continuous process.

There are lots of random walks on graphs that satisfy this definition. Like say you have a graph and you just specify the end points and say "walk 5 nodes at random from this starting node, what is the expectation that you end up at a specific end node". This could be a Markov process. At any point to estimate the state you only need to know the state now.

But lots of random walks on graphs do not have the Markov property. For example, say I did the exact same thing as the previous example, so I have a random graph and a start and target node and I say "Walk n nodes at random from the starting node. What's the expectation that at some point you visit the target node". Now I have introduced a dependency on the history and my process is no longer memoryless. It is a discrete stochastic process and it is a random walk on a graph but is not a Markov chain.

An example of a Markov and non-Markov processes in real life is if I have a European option on a stock I only care about what the stock price is at the expiry date. But if I have a barrier option or my option has some knock-in/knock-out/autocallable features then it has a path dependence because I care about whether at any point in its trajectory the price hit the barrier level, not just the price at the end. So the price process for the barrier option is non-Markov.

low_tech_love 4 hours ago

I won’t pretend to know the technical details (as the other replies do) but I want to make a point for the “pedagogical” effect here, which I agree with the author. The way I interpret the article, it’s not supposed to be a deep, theoretical treatise on the subject; more of an introductory, “intuitive” take on it. This works for those who need to either learn the concept to begin with, or refresh their memories if they don’t work with it every day. I think it’s a given that any intuitive take on a mathematical concept will always oversimplify things, with the underlying assumption that, if you actually need to know more, you’re going to have to dive deeper somewhere else. The most important thing I think is to help the reader build a rough conceptual understanding of the concept such that they can reason about it, instead of simply memorizing the terms.
- majoe 2 hours ago
  
  I see, where you're coming from, but in that particular case, the "intuitive" explanation (walk on a graph) is far less intuitive for me than the proper explanation, that a Markov process is memoryless. That said, I used MCMC in the past to do physics simulation, where the Markov property also applies to the underlying physical process. So maybe it's just me.
NotAnOtter 5 hours ago

Your overall point might be correct but your example does not prove your point.
A Makrov chain is just the path taken through the course of a markov process. The terms 'chain' and 'process' are sometimes conflated in this context, but this is the most common distinction I've seen. As such you can run a markov process for some number of steps N times, and then ask how many generated chains contain the property you are interested in. The process is memoryless but the chain is the result of the process and therefor contains memory.
I agree 'Random walk' is a superset of 'a markov process', but IMO when someone says Random walk - they normally make assumptions that qualify it as a markov chain. Therefor it's useful as a teaching to just call it a random walk.
- seanhunter 5 hours ago
  Interesting. I'd not heard the term Markov Chain used to describe a path so I checked my copy of "All of Statistics" by Larry Wasserman and he (slightly) disagrees with both of us and says
  "A Markov chain is a stochastic process for which the distribution of X_t depends only on X_{t-1}."
  So he doesn't need it to be a discrete process but he also doesn't think it's a path. I guess the terminology is not 100% standardized. But anyhow thanks for making me think about this. Always interesting.
  - JohnKemeny 3 hours ago
    
    But a random walk is precisely a stochastic process for which the _next state_ depends only on the _current state_. In terms of graphs (where _random walk_ comes from), the next node is decided by randomly selecting a neighbor of the current node.
    
    seanhunter 2 hours ago
    
    That's not true for random walks in general I don't think. A random walk is a process derived from taking random steps in some mathematical space. It can include jumps and it can include memory.
    Take a "path-avoiding" random walk. At time t the distribution of the next step depends on whether or not I have at some point hit any of the adjacent nodes in the current path. That's not the current state, that's memory.
    
    vladimirralev 3 minutes ago
    
    But also "the memory" of the random walk can be encoded in a state itself. In your example you can just keep accumulating the visited nodes, so your state space will be now the space of tuples of nodes from your initial space.
    
    JohnKemeny 21 minutes ago
    
    A random walk on a graph is a stochastic process that starts at a vertex and, at each step, moves to a randomly chosen neighboring vertex. Formally:
    Given a graph, a random walk is a sequence of vertices [v1, v2, ..., vk] such that each v{i+1} is selected uniformly at random from the neighbors of vi.
    In weighted graphs, the next vertex is chosen with probability proportional to edge weights.
  - jmaker 3 hours ago
    
    A Markov chain is commonly understood to be a time-discrete Markov process. Intuitively, it’s a “chain” because you can “single out” its states in time rather than intervals. That’s also Markov’s original definition. Instead of Wassermann one can look up the notion in Wikipedia. A path is a notion relevant to the state space of a process - it’s a realization of states at every single point in time for the time span of interest.
bjornsing 4 hours ago

Not all Markov processes have stationary distributions, and of those that do not all correspond to a non-normalized probability function.
It therefore has some merit to think about MCMC as a random walk on a graph rather than Markov processes, because the “graph” needs to have some properties in order for the Markov process to be useful for MCMC. For example every “node” in the “graph” needs to be reachable from every other “node” (ergodicity).
- seanhunter 4 hours ago
  
  Could you explain further please? I agree with what you're saying but I don't' understand how it applies to what I said so there's definitely something I could learn here.
  Edit: Thanks.
  - bjornsing 3 hours ago
    
    Sure. As I see MCMC it’s basically a mathematical trick that lets you sample from probability distributions even if you only know the relative probability of different samples. It’s based on the observation that some Markov processes have a stationary distribution that is identical to a distribution you may want to sample from. But you need to carefully construct the Markov process for that observation to hold, and the properties you need to ensure are most easily understood as properties of a random walk graph. So the interesting subset of Markov processes are most easily understood as such random walks on a graph.
    
    hazymemory 37 minutes ago
    
    I think the core of the subject is that you only need to know relative probabilities, p_j/p_k to be able to take a sample of a distribution.
graycat 3 hours ago

Stochastic process with the Markov property: Past and future are conditionally independent given the present. The general version of conditionally independent is from probability theory based on measure theory and the Radon-Nikodym theorem (with von Neumann's novel proof in Rudin, Real and Complex Analysis), but an easier introduction is in Erhan Çınlar, Introduction to Stochastic Processes.
In a Poisson process the time until the next event has the Poisson distribution and, thus, from a simple calculus manipulation, is a Markov process.
E.g., time from now until a molecule of hydrogen peroxide H2O2 decomposes to water and oxygen is independent of when it was made from water and oxygen. That is the basis of half life, the same distribution until decomposition starting now no matter when the chemical or particle was created.
In WWII, searching at sea, e.g., for enemy submarines, was important, and then was Bernard O. Koopman, Search and Screening, 1946 with an argument that time to an encounter between two ships had a Poisson distribution, i.e., was a Markov process.
In grad school, there was a question about how long US submarines would last in war at sea. Well, take n Red ships and m Blue ships with, for each ship, position, speed, and detection radius and, for each Red-Blue pair, given a detection, probabilities of Red dies, Blue dies, both die, neither die (right, these four have to be non-negative and add to 1). Now have specified a Markov process that can evaluate with a relatively simple Monte-Carlo simulation.
Had written a random number generator in assembler using an Oak Ridge formula, typed quickly, and did the simulation. Had a review by a famous probability prof and passed when explained how the law of large numbers applied. So, some pure and applied math and computing worked, but some politics didn't!
- nivertech 2 hours ago
  
  This Red/Blue submarine problem seems to be a better fit for ABM simulation, rather than Monte Carlo based on Markov processes.
  IRL this will be a path dependent since both sides will learn from the past actions and probabilities will be changing, i.e. the memorylessness Markov property will not hold.
  In ABM the ships (agents) can move on 2D space, which makes detection easier.
  Also, obviously there are lots of externalities, like weapons, food, and sailors supply, ceasfires, surrenders, politics, etc.
  All of the above is easier to simulate using ABM, rather than Monte Carlo.
- seanhunter 3 hours ago
  
  I once failed an interview with a hedge fund because they asked me a variant on that red ships/blue ships problem and at the time I knew nothing about probability. They also hadn’t given me lunch which didn’t help.

artofpongfu an hour ago

So he sets up a toy problem (drawing from a baby names distribution), then never explains how to solve this problem.

The intuition is, you set up a graph where the vertices are names, and the edges are based on name similarity. Two names are neighbors if e.g. their edit distance is within some limit. You start at a random name, then at the neighbors, flip a biased coin with the ratio of the P(x) of your current name and the neighbor, if heads you move to the neighbor.

I'm sure this is wrong in many and subtle ways, but when I read an article like this I expect some intuition like this to be imparted.

gnabgib 9 hours ago

(2015) Popular in:

2022 (233 points, 53 comments) https://news.ycombinator.com/item?id=33332437

2016 (125 points, 20 comments) https://news.ycombinator.com/item?id=12537043

2015 (114 points, 17 comments) https://news.ycombinator.com/item?id=9331808

zero_k 2 hours ago

Science communication is so important. I write scientific papers and I always write a blog post about the paper later, because nobody understands the scientific paper -- not even the scientists. The scientists regularly read my blog instead. The "scientific style" has become so obtuse and useless that even the professionals read the blog instead. True insanity.

or_am_i 2 hours ago

> The “bullshit” here is the implicit claim of an author that such jargon is needed. Maybe it is to explain advanced applications (like attempts to do “inference in Bayesian networks”), but it is certainly not needed to define or analyze the basic ideas.

"The bullshit here is the implicit claim of an author that German language is needed. Maybe it is for advanced applications (like Goethe's poetry), but it is certainly not needed to describe basic ideas."

(proceeds to explain the same basic concepts 10x more verbose than in any math textbook on the subject)

Math/statistics nomenclature is certainly not perfect (think of it as a general utilities library that has been in active development for 200+ years), but it is widely used for a reason: once you learn the language it becomes second nature (very much the same as knowing all the details of the standard library API in your language of choice) allowing to communicate arbitrarily complex abstract ideas with speed and precision.

low_tech_love 4 hours ago

Interesting! I also make extensive notes of mathematical and computational concepts (and “buzzwords”) with a “no bullshit” title, and it works great. It’s great for quick refreshers once in a while.

nivertech 2 hours ago

If I ever write abt "Markov chains without the Math/Jargon/BS" I'll use the clip from the "Ten Second Tom" scene from 50 First Dates[1] & a host of other Sci Fi movies abt time loops[2,3] to illustrate the Memorylessness[4] Markov property[5]

---

1. https://www.youtube.com/watch?v=iN_BDcKhtWk

2. https://en.wikipedia.org/wiki/Time_loop

3. https://en.wikipedia.org/wiki/List_of_films_featuring_time_l...

4. https://en.wikipedia.org/wiki/Memorylessness

5. https://en.wikipedia.org/wiki/Markov_property

emmelaich 7 hours ago

Unfortunate that the equation rendering doesn't work in the body text.

pixelpoet 7 hours ago

Works fine here, Firefox on Android
johnthesecure 7 hours ago

I experienced this issue, but it works on another device, and after a refresh it worked on the original device. Good luck...

jokoon 5 hours ago

Feels too long

I only want the python code

fedeb95 3 hours ago

it's better to understand the theory before putting up some faulty production code.

SamSeeder 4 hours ago

[dead]

SamSeeder 4 hours ago

[dead]

bilurrrbe 3 hours ago

[dead]