Stochastic entropy production in diffusive systems

Computing the stochastic entropy production associated with the evolution of a stochastic dynamical system is a well-established problem. In a small number of cases such as the Ornstein-Uhlenbeck process, of which we give a complete exposition, the distribution of entropy production can be obtained analytically, but in general it is much harder. A recent development in solving the Fokker-Planck equation, in which the solution is written as a product of positive functions, enables the distribution to be obtained approximately, with the assistance of simple numerical techniques. Using examples in one and higher dimension, we demonstrate how such a framework is very convenient for the computation of stochastic entropy production in diffusion processes.


Introduction
The notion of the production of entropy as physical systems evolve dates from the time of Boltzmann and Gibbs, and underpins basic ideas of thermodynamics and statistical mechanics, including the celebrated second law of thermodynamics describing the irreversibility of events on the macroscale [14]. The modern understanding of the production of entropy, based on effective stochastic dynamical models of system evolution [38,39], has to a great extent clarified some of the puzzles raised by the development of statistical mechanics [5,23], and has broadened the meaning of the second law, especially at the nanoscale. While the entropy of the world is expected to increase with time, this is subject to fluctuations, and for certain realisations of the dynamics of a system, entropy may decrease.
The stochastic entropy production associated with a thermodynamic process is fundamentally a measure of the reversibility of the dynamics. It is defined for a specific path, or trajectory, taken by a system, and represents the 'arrow of time' by reflecting the difference in likelihood between the trajectory in question and its time-reversed counterpart in suitable circumstaances [15]. It is very naturally defined for a system that is subject to environmental influence by way of energy or particle exchanges, and may be divided conveniently into contributions associated with the flow of these quantities between the system and environment together with an internal production of entropy within the system. It may also be defined for an isolated system if some procedure of coarse graining is adopted.
The degree to which the traditional second law is 'broken' in the modern framework is quantified, at least to an extent, by fluctuation relations, symmetry requirements satisfied by the probability distribution of entropy production [6,18,40]. Beyond this, a more detailed quantification of the fluctuations, necessary if we are to demonstrate how entropy behaves on different temporal scales and for systems of different size, typically requires extensive numerical investigation. The central problem is obtaining the solution to a Fokker-Planck equation that describes the evolving probability density function (pdf) of a system over its phase space, from which the probability distribution of entropy generated in realisations of the stochastic process may be derived.
The computation of entropy production relies on assigning the correct probabilities to trajectories between points in system phase space followed over arbitrary time intervals. For simple models such as the Ornstein-Uhlenbeck model (OU, [43]) the transition probability between arbitrary points over arbitrary time intervals is known in closed form and, as we show in §3.1, the distribution of entropy distribution can be calculated analytically as well. In cases where the transition probability density is not known, the problem is much harder. The estimation of a transition density by Monte Carlo simulation requires a very large number of simulations, and if one instead uses a partial differential equation (PDE) solver then solutions starting for each initial point will be needed to obtain the transition density to all possible final points, which is a considerable task.
Analytical approximations appear to confer a computational advantage, but they may suffer from another problem, which stems from the fact that we are after the logarithm of the probability of a path, and this is particularly difficult when the probability is small. Any kind of approximation that assigns even the slightest negative (or zero) probability to some part of the transition density is destined to fail. In fact, most analytical approximations for PDEs are based on orthogonal sum expansions, and as the functions in question are oscillatory, this kind of error is hard to eradicate. A variety of analytical and numerical methods have been employed in studies of the distribution of work performed or heat delivered by a system undergoing a stochastic process, which are associated with the production of entropy [3,7,9,11,12,16,19,20,24,30,32,33,35,36,41].
In a recent paper [27], a new method of dealing with Fokker-Planck PDEs was developed, in which the general thesis is that one does better to write the solution as a product of terms, all of which are positive. This naturally models the logarithm of the phase space density, so it obviates the difficulties described above, and is potentially very suitable for dealing with problems that pertain to entropy production. As the initial condition is a delta-function, it is very hard to make a sum of terms add up to zero at all points except at the starting-point where an enormous spike is required: truncation of such a series necessarily produces oscillatory artefacts. However, a product does not suffer from this disadvantage. One of the terms can, for t → 0, be zero except at the location of the delta-function-a Gaussian of variance ∝ time captures this effect perfectly, and indeed is an immediate consequence of the Fokker-Planck equation-and the initial condition will be correctly captured regardless of the other terms. Another term can capture the long-time asymptote, and further terms 'patch up the middle' without corrupting the short-and long-time behaviour. Indeed, a simple approximation containing only two terms (i.e. without intermediate-time corrections) is often very satisfactory, which is (34) later, and this is indeed a product formula. The method also extends to higher-dimensional cases via (50) which is again clearly a product formula. Intuitively, the method consists in expanding around an OU model, in the sense of finding the characteristics of a mean-reverting solution as exemplified by the OU case and capturing these characteristics for the general case, while reproducing the OU case exactly. It thereby provides a stepping-stone from the tractable OU model to the intractable general case. Potentially, it also gives insight into Fokker-Planck diffusions in other contexts, and a comment in some very recent work on the related area of mutual information struck us as highly pertinent: "Indeed, there is an interesting dichotomy in which we understand the intricate properties of the Ornstein-Uhlenbeck process since we have an explicit solution, whereas we know very little about the general Fokker-Planck process." [45, §V]. This paper is, then, the first application in physics of the new method for approximately solving Fokker-Planck equations, and we apply it to the problem of stochastic entropy production in a range of different diffusive systems, all corresponding to the spreading of probability density from an earlier point source, but with different force fields and hence different stationary states (see Figure 3). It is organised as follows. After a brief introduction to the nomenclature and methods, Section 3 deals with the one-dimensional theory. It begins with a complete exposition of the OU model, and provides new insights into that case and also into the arithmetic Brownian motion (drift-diffusion) which is a special case of it. After this it develops the theory for a general potential in §3.2, giving a brief account of the leading-order approximation shown in [27] before showing a variety of examples in §3.3. Section 4 describes the multidimensional theory, sgain starting with the multivariate OU model which is dealt with in some depth, with comparisons drawn to the one-dimensional case. Then the theory for a general potential is developed, along the same lines as in [27] and §3. Our final section ( §5) gives our conclusions and suggests opportunities for further research.

Definitions
For a stochastic process X t , the dimensionless stochastic entropy production, associated with a transition from an initial to a final system coordinate during the interval between times t 1 > 0 and t 2 > t 1 , is defined as where f X (t, X) is the pdf over X at time t and T (x 1 → x 2 , ∆t) is the transition probability density from x 1 to x 2 in time interval ∆t. The argument of the logarithm is the probability of transitioning from X t 1 at time t 1 to X t 2 at time t 2 , divided by the probability of a subsequent reversal, i.e. starting from X t 2 at time t 2 and ending up at X t 1 at time t 2 − t 1 later. If the system were in a stationary statistical state (thermal equilibrium) then this ratio would be equal to unity and the stochastic entropy production would vanish for all transitions. In general, ∆s is nonzero and can take either sign, though it satisfies a second law in possessing a nonnegative expectation when averaged over all possible paths in [t 1 , t 2 ]. A condition of detailed balance holds in thermal equilibrium, defined by such that we can write with and where we explicitly note that we are considering a situation corresponding to evolution from a definite initial coordinate X 0 at t = 0. More complicated scenarios can be constructed from this base case. Our task is to obtain the probability distribution of ∆s for the interval between t 1 and t 2 . Given these definitions, it makes sense to focus on the evolution of g X as a means of computing the entropy production. Indeed, when we come to approximate the density for analytically intractable models, it is g X that we choose to approximate.
The entropy production is transformation-invariant in the following sense. If Y t = ψ(X t ) where ψ has an inverse ψ −1 and both ψ, ψ −1 are appropriately smooth, then whenever y = ψ(x) and and so over a particular time period the entropy production of X and Y is the same. Accordingly, we can take the general form of a time-independent diffusion, with W t a standard Wiener process, and by applying the transformation ψ given by 1 By Itō's lemma 2 we have κA(y) = ψ ′ (x)µ X (x) + 1 2 ψ ′′ (x)σ 2 X (x) and we call A the force field. This transformation simplifies the model by making the volatility constant and does not affect the entropy production, so henceforth we work with (5); the parameter κ has units time −1 and is understood as a rate constant.
The density f Y obeys the forward or Fokker-Planck equation, and the function g Y obeys its adjoint, the backward or Feynman-Kac equation: with f Y (0, y) = δ(y −Y 0 ) and dimensionless time τ = κt. When we later present graphs showing the distribution of entropy production over a given time period, we are talking about κt or, equivalently, t if we standardise on κ = 1 throughout. The invariant density (stationary state) is related to A by A(y) = d dy ln f Y (∞, y). Note also the reciprocity condition (Proof. Viewed as a function of y (and t), the LHS obeys the adjoint forward equation, whereas the RHS obeys the backward equation. However, those are the same PDE, with the same initial condition. Alternatively, we can invoke the Kolmogorov criterion, which pertains to the transition probability around any closed loop being independent of the direction of travel: see e.g. [22, §1.5].) One device that we can use for studying the distribution of ∆s is its moment-generating function (mgf), where the expectation is over all transitions during the interval t 1 ≤ t ≤ t 2 . It is easily seen that M ∆s (−1) = 1, known as the integral fluctuation relation [38]. The mgf relates to the density p by 1 The so-called Lamperti transformation. This works provided σX is bounded away from zero. 2 It can also be derived purely algebraically by substituting gY t, ψ(x) = gX(t, x) into the backward equation.

The K-, or Variance-Gamma, distribution
When the dynamics are Gaussian, as in the OU process, the entropy production is a quadratic function and in the particular case where the starting-point is the equilibrium level (where the force field vanishes) its density can be found via the mgf using the following result, known variously as the K-distribution (in the physics literature) and the Variance-Gamma distribution (in the quantitative finance literature). We shall use it more than once: with K ν denoting the modified Bessel function of the second kind [2], corresponds to the mgf and from this the mean and variance are Proof. Immediate from the Fourier representation of K ν , e.g. [17, §8.432.5].
When the order ν is a half-integer the function K ν admits a representation using elementary functions, and when ν = 1 2 the distribution is simply a double-expeonential.

Inversion integrals
When the mgf of the entropy production is known in closed form-the OU model is a case in point-it is possible to obtain the pdf of the entropy production by inverting (10) via a Fourier integral: where the contour C runs up the imaginary axis. The most straightforward approach is to replace the integral with a finite sum, i.e. use the discrete Fourier transform, implemented via the Fast Fourier Transform (FFT) algorithm. There are a couple of points to be made about this. In going from an inversion integral to a finite, discrete approximation two undesirable effects are introduced. The first, aliasing, causes the inverse transform to be periodised, with a period equal to 2π/δω, where δω is the spacing in ω-space. If dω is too large, features will appear in the wrong place. The second, leakage, occurs as a result of truncating the range of integration (summation) to [−Ω/2, Ω/2] say, and features are smeared out, losing resolution. It can be attenuated by multiplying M (iω) by a window function, that is zero or nearly zero when ω is close to the ends of the interval [−Ω/2, Ω/2], and unity at the middle. A simple choice is the Hann window, given by cos 2 (πω/Ω); further details are in [34, §13].
As an example of this in action: suppose that in the problem at hand we think that the entropy production can safely be ignored outside the range [−5, 5], and that we want a resolution of around 0.01 in entropy space. Then we shall need a 1024-point FFT and the spacing of the samples in ω-space will need to be δω = 2π/10. As an example of what aliasing means, suppose we were wrong about the entropy production being essentially oonfined to the interval [ −5, 5], and that there were a pronounced spike at ∆s = 6.3, for the sake of argument: then this feature would incorrectly appear at ∆s = −3.7. Other techniques that numerically invert the Fourier integral are orthogonal expansion using Laguerre functions [44], provided that this is extended to the two-sided case, and numerical integration techniques discussed by Abate & Whitt [1].
Another, more specialised, approach requires analyticity of M , and deforms C so that it lies along a more convenient path. In many cases, including here, the mgf has branch points at λ = λ ± say and is analytic in the complex plane with (−∞, λ − ] and [λ + , ∞) deleted. If the integrand is integrably singular at λ ± , we can collapse the integral around either branch cut, depending on the sign of ∆s, and do a real integral (see e.g. [28, §3.1]). However, this cannot be done if the integrand also possesses an essential singularity at λ = λ ± . In fact both cases can be seen in the OU model, depending on whether Y 0 is zero or not (see (20) later). This brings us to our final point, which is that it may be useful to deform C so that it lies along the path of steepest descent [4], or at least fairly close to it. That way, the integrand is not too oscillatory, and is easily approximated by analytical methods or by numerical integration. This is what gives rise to saddlepoint methods [10].
Interestingly, the problem of finding the pdf of a quadratically-transformed multivariate Normal variable-which is fundamental to stochastic entropy generation in the OU model-finds application in unrelated areas, notably mathematical finance. In [13], the authors use the moment-generating function as we do, and show by matrix algebra that the variable in question can be written as a weighted sum of independent Gamma-distributed variables. The authors discuss the use of saddlepoint methods as a way of approximating the pdf, though do not indicate why it is effective. In fact the Gamma distribution is very well approximated by its saddlepoint approximation, and this is why the technique works well [25].

Entropy production from transition density by stratified sampling
Much attention has been given in the literature to the derivation of the pdf of entropy production through efficient (semi-)analytical approaches, but in fact it is in principle straightforward to proceed numerically if the transition density is known exactly or approximately. For given Y 0 , t 1 , t 2 we proceed as follows.
Bound the phase space by V so that we disregard the possibility that the process goes outside this space. Select a set of points {y j } N j=1 (in our calculations we have used N = 10000) that are distributed reasonably uniformly over V. We enlarge on this point presently. Next, divide entropy space into bins-throughout, we take the interval [−10, 10] and partition it into 2000 bins each of width 0.01-and write 0 in each bin. For each pair (j, k), calculate the entropy production using g Y , find which entropy bin this corresponds to, and add to that bin the quantity Note that an alternative notation for T (y j → y k , It is understood that the use the exact transition density and g Y if these functions are known, or otherwise approximations to them. Once this has been done for all pairs (j, k), the probability of the entropy production lying in some interval J is approximated by The denominator is just the sum over all bins, and this construction obviously ensures that the total probability mass for the entropy production is exactly unity, i.e. M ∆s (0) = 1.
As promised we give a brief exposition of sampling techniques. For a one-dimensional phase space this is easy as we dissect V into N equal intervals. However, if N is not known upfront, in the sense that one might want to run a certain number of samples, and then some more, and then some more after that, the use of pseudo-or quasi-random procedures is preferred; this is also the case when the dimension is > 1. We take these in turn. Pseudo-random sequences mimic the generation of truly random ones, and are typically obtained from whatever inbuilt linear congruential generator the user has to hand; for a discussion of these see e.g. [34] and [21, §7]. However, the random nature will in any one finite realisation of the random number generation give rise to nonuniformities, with some areas receiving more samples than others; the standard error of a Monte Carlo integral decays as O(N −1/2 ) as N → ∞. Quasi-random or low-discrepancy sequences, which are not at all random, are designed to cover the space more uniformly by 'filling in the gaps as they go'; the error from these decays as S(N )/N , where S(N )/N ε → 0 for all ε > 0 (often S(N ) is roughly a power of ln N , but a deeper discussion of this would take us too far off track). Two particular classes of low-discrepancy sequence are worth mentioning. The first are called Sobol sequences [34, §7.7], [21, §8.3]. The second uses ideas from Diophantine approximation theory [31]: the construction is where {·} denotes the fractional part and α 1 , . . . , α d are appropriately-chosen irrational numbers. Our preference is for this method on account of its simplicity.
We now return to the matter of entropy generation. A final, optional, step is to ensure that the integral fluctuation relation holds exactly in spite of the various approximations (truncation of the phase space to V, use of approximated transition density, use of finite sumber of samples). Defining ε = ln j,k p jk e −∆s jk , which can be computed while the above calculation is being done, and shifting all the entropy bins by ε, will achieve this.
If we only need the expected entropy production then we do not need to evaluate a double (nested) integral: instead we just need to compute a pair of single integrals as The second law s(t 2 ) − s(t 1 ) ≥ 0 follows immediately from the Fokker-Planck equation or from the integral fluctuation relation 3 .

Note on density functions
As will become apparent, the pdf of the entropy production is often singular at one or points, or more informally it may contain a large spike somewhere. More important than the pdf is the probability mass function or cumulative distribution function (cdf): one wants to know where the probability mass lies. It is hard to visually estimate the probability mass under a narrow but infinitely high spike: such a feature may be visually distracting but yet represent only a very small probability mass, and of course if the spike is caused by a delta-function of strength c then it is impossible to assess the value of c from a density plot. Furthermore two distributions can have very different pdf but quite similar cdf 4 . Indeed, for comparing two distributions, common methods such as Kolmogorov-Smirnov use cdf and not the pdf. Finally, in higher-dimensional cases the granularity of the method described in §2.4 gives rise to a 'hairiness' in the pdf which is difficult to eradicate, as seen for example in Figures 10 and 11. We have therefore chosen to additionally plot the logit function, commonly encountered in statistics and defined as where F is the cdf. Obviously this is one order of differentiability smoother than the pdf. The median is immediately apparent as L −1 (0), the interquartile range is roughly [L −1 (−1), L −1 (1)], and a two-sided 99.5% confidence interval is roughly [L −1 (−6), L −1 (6)]. This method of plotting is also convenient for examining the shapes of the tails.

Ornstein-Uhlenbeck
The OU model with A(y) = −θy is analytically tractable, and we discuss it in detail as a foundational step. We have with q = e −2θτ , τ = κt, and we drop the | Y 0 notation for simplicity. The entropy production ∆s is simply a quadratic in the triplet with q i = e −2θτ i . (Throughout this paper † denotes the transpose.) From this we can see that the mean entropy production is 5 Analytical results can be derived by appeal to the moment generating function M ∆s : we are to find the double integral where Q is the quadratic form Write the matrix in the above expression as and define the following two determinants: The double-integral of e −Q/2 above evaluates to and so the mgf of the entropy production is withδ Q = δ Q /θ 2 . This is understood as follows. By the shifting theorem (of the Laplace transform) the term on the front represents a shift to the right by an amount We now examine the rest of the expression, starting with Y 0 = 0. (20) is unity, and we are left with the denominator, which is the square root of a quadratic in λ. By evaluating δ Q we establish as the quadratic must have real roots λ ± of opposite sign. Indeed, observing that the quadratic is positive (and in fact equal to (1 − q 1 )/(1 − q 2 )) at the two values λ = −1 and q −1 2 , we deduce something stronger: Further, as λ + + λ − = q −1 2 − 1 > 0, the right-hand tail decays more quickly than the left. The case t 2 → ∞ is special, as the positive root goes to ∞ and the negative one to −q −1 1 , and the entropy production is given by i.e. Ξ is distributed as χ 2 with one degree of freedom. Consequently in this limit the production of entropy is bounded above but not below. This behaviour is suggested by the right-hand plot in Figure 1.
Comparison of M ∆s with (12) yields: Proposition 1 For the OU model with Y 0 = 0, the distribution of the entropy production ∆s for the interval [t 1 , t 2 ] is given by: with λ ± given by (22) and and and the variance is .
The density has a logarithmic singularity (spike) at ∆s ⋆ the origin; the probability of being to the right (resp. left) of this is and the mean conditional on being positive (resp. negative) is By analysis of the singularities of the mgf the asymptotes of the density are 6 The distribution arises as the weighted difference of two independent central χ 2 1 random variables (with positive weights). Figure 1 shows some examples of this analytical computation (with θ = 1). 6 Recall λ− < 0 < λ+.

Case
The effect of starting at Y 0 = 0 is found by convolving the previous result (23) with the function whose mgf is exp(−Y 2 0 ∆ Q /2δ Q ), so to make further deductions we must examine this expression in detail. Now ∆ Q is a cubic in λ that vanishes at λ = 0, −1, because M ∆s (0) = M ∆s (−1) = 1 for all Y 0 ; also, the coefficient of λ 3 vanishes. (Write the entropy production as a quadratic form in (Y 0 , Y 1 , Y 2 ): as it can be written as a sum of only two squares, its determinant vanishes.) So ∆ Q must be of the form λ(λ + 1) multiplied by a constant, and in fact This can be seen directly by from elementary row and column operations on the matrix: Returning to (20), we can now conclude that the term in the exponential is simply the ratio of two quadratics in λ: To find what pdf corresponds to the mgf that is the exponential of the ratio of two quadratics, we recall the compound Poisson distribution. Let P follow a Poisson distribution of mean µ, and let (Z j ) be iid (independent and identically distributed) random variables, independent also of P , with mgf M Z . Form the random variable 7 P • Z = P j=1 Z j and note that its mgf can be obtained by conditioning on P and then integrating out: Consider now the distribution formed of two exponentials as follows: with probability π + it is exponential with mean λ + , and with probability π − it is −1× an exponential variable of mean and its mgf is This will do what we want, because the ratio of two quadratics can be written as a constant plus an expression of the above form, by partial fractions. Accordingly, we have synthesised a function whose mgf is the exponential of the ratio of two quadratics, and have proven: Proposition 2 For the OU model, if Y 0 = 0 the distribution of the entropy production is obtained by convolving the Y 0 = 0 result (23) with the compound Poisson distribution P • Z, where P has a Poisson distribution of mean µ and Z has a double-exponential distribution (29), parametrised thus: and λ ± as earlier.
(Recall that λ − < −1, so π ± are both positive.) While this result is unhelpful in writing down a closed-form expression for the density, it provides very clear intuition about what it looks like, as follows. If we start the OU process near its equilibrium point and/or observe entropy production over a short time, then µ is small, so the main contribution to P • Z is a delta-function at the origin of strength e −µ , with exponential wings on either side. The effect of convolving with such a density is to move probability mass to the right without displacing the spike. On the other hand if we start a long way from the origin and/or observe the entropy production over a longer time, then µ is larger, and P is more likely to be high: so P • Z is approximated as a multiple convolution of iid double-exponential distributions, which by the Central Limit Theorem must be somewhat Gaussian in shape. So the distribution becomes more bulbous in the middle.
The extra mean entropy production that arises from starting at Y 0 = 0 is (q 1 − q 2 )θY 2 0 /2, which is positive. The form of this is unsurprising, because it increases with (t 2 − t 1 ) but only if one has not waited a long time since inception, as otherwise reversion will have occurred and the startingplace become irrelevant: hence the form (q 1 − q 2 ). It is a simple matter to verify this expression, as it pertains to the O(λ) term in the Maclaurin expansion of the mgf.  A minor but nonetheless interesting point is that the effect of starting away from equilibrium is neither to simply shift the whole distribution to the right (because the spike in the pdf does not move 8 ); nor does it alter the exponential decay-rates in the tails which, referring to (23), remain as λ ± .
The distribution of entropy production can be calculated in two ways. The first is to invert the mgf (Fourier integral) numerically using the FFT as discussed in §2.3. The second is to employ the stratified sampling method discussed earlier ( §2.4), which needs only the transition density and not the mgf. As a check, both methods were used. Results are shown in Figure 2 for a few cases.

Limit of zero mean reversion
When θ ≪ 1 we have To interpret this result, let us write σ = √ 2κ for the volatility and also shift the process so that it starts from zero and has reversion level y ∞ . Then for which the mgf of the entropy production is Now set θy ∞ = 2µ/σ 2 , with µ fixed (and representing the drift), and let θ → 0 with y ∞ → ±∞ according as µ is positive or negative. Then we end up with the familiar arithmetic Brownian motion, for which the mgf of the entropy production is (cf. [30, §3.1]) The mean entropy production is The first term represents, as usual, the broadening-out of the pdf. In the second, 2µ 2 /σ 2 is identified as the rate of accretion of entropy resulting from the drift. This is analogous to the dissipation of work as heat of a particle being moved through a viscous medium by the influence of a conservative field: for example, a charged particle, in an oil bath, by an electric field.
In the driftless case we have the simple result The distribution is symmetrical about ∆s ⋆ , the mean, and the variance is 1 − t 1 /t 2 . As expected, for zero drift the result depends only on t 1 /t 2 . All of the above can be derived directly from (1,32) using essentially the same techniques as used here. Indeed, writing Z t = X t − µt we have and the joint density of (Z t 1 , Z t 2 ), conditionally on starting from the origin at time zero (which we may assume without loss of generality), is The mgf of the entropy production is then obtained by completing the square and doing a bivariate Gaussian integral. It is worth noting, though, that the question of entropy production for the arithmetic Brownian motion is not materially simpler than for the OU model 9 .

General potential
When the transition density is not known-which is the general case-it has to be approximated, and the approach in [27], developed initially in [26], provides a framework for this. In the interest of stating the main result upfront, the approximated transition density from Y 0 at time zero to y at time t is given by and also where as before q = e −2θτ and θ > 0 is now a constant that controls the average strength of mean reversion, as will be explained presently. The first part of the expression (prefactor and Gaussian term) deal with the short-time behaviour, and the rest ensures that the long-time asymptote is correct. Furthermore, the result is exact for the OU model A(y) = θ(y ∞ − y) regardless of the reversion level y ∞ . We now give some further details. The reader who wishes only to use (34,35) may do so without reading the rest of this section, except for noting the definition of θ in (39).
As we are concerned with a product representation of the transition pdf, and as entropy relates directly to the logarithm of the probability density, it makes sense to deal not with f Y directly but instead with its logarithmic derivative. Recall that τ = κt. Then, writing Now this equation appears to be harder than (6) because it is nonlinear and also has a singular initial condition, in the sense that h Y ∼ (y − Y 0 )/2τ as τ → 0, which can be seen by dominant balance in (36). However, it turns out that h Y is easier to approximate than f Y . Consider the class of OU models A(y) = θ(y ∞ − y), with the proportionality constant θ controlling the strength of mean reversion and the constant y ∞ denoting the long-term mean-alternatively the stationary state is Normal with mean y ∞ and variance 1/θ. For this, we have exactly as is easily verified by substituting it into (36), or writing it down directly from the known OU solution. Clearly h Y is just a linear function of y. The first term is singular as τ → 0, and captures the initial Gaussian behaviour as the process spreads out from its point source. The second term refers to mean reversion, for it vanishes when the process is at its equilibrium level (y = y ∞ ). Note also that h Y → 0 as τ → ∞, as it must, because we require g Y → 1. This inspires the approximation for the general case: where now θ is understood as an arbitrary parameter. However, we must now explain what θ corresponds to in this general case. When (38) is inserted into (36), and a Laurent expansion performed around τ = 0, the LHS and RHS agree at O(τ −2 ) and O(τ −1 ), explaining why we are writing the error term in (38) as o(1) in the short-time limit. In so doing, we find and we observe that θ is absent from both the first two terms. So all θ's are equally good from this point of view and we cannot say anything about θ simply by looking at the first two terms in the short-time expansion. As θ does not affect the long-time asymptote, it must therefore control the intermediate-time behaviour. It turns out that the next-order term (i.e. expanding the o(1) term in (38) in powers of 1 − q) is a rather complicated expression involving d dy A(y) + θy , which is unsurprising as if that quantity vanished identically then we would be back to the OU model, for which (38) is exact. Given that we are going to truncate the series before this term, it makes sense to minimise it, and so we want to choose θ so as to make A ′ (y) + θ as close as possible to zero 'on average'. This motivates the choice where · ∞ denotes an average over the stationary distribution f Y (∞, ·); this relates to the force field via A(y) = d dy ln f Y (∞, y). As is apparent from the RHS of this expression (which follows from integration by parts), this choice of θ is positive, which is necessary for (34,35) to be valid 10 . In [27] it is pointed out that (39) relates directly to the Fisher information for the estimation of the long-term mean of a mean-reverting process 11 . Now that we have approximated h Y , we simply integrate w.r.t. y and identify the implied constant of integration from the fact that the initial condition is a delta-function of unit strength. The derivation can be simplified by recalling the reciprocity condition, namely that g Y must be symmetric in y and Y 0 . This gives (35) and thence (34); full details can be found in [27].

Examples
In this section we present a variety of results for models differing from the OU process: see Figure 3. These models were considered in [27] and the approximate transition density was found to correspond well with the exact (as calculated numerically where necessary). For the reader's convenience, the diagrams are assembled at the end of this section.

Dry-friction
In the dry-friction model, Conveniently the transition density can be obtained in closed form [42], e.g. by the usual route of Laplace transforming the Fokker-Planck equation: with Φ denoting the cdf of the standard Normal distribution. The entropy distribution still has to be obtained numerically, as per §2.4, but we can compare the results using the exact transition density with those using the approximated transition density (34). These are shown in Figures 4,5, 10 Though as we have already seen we can take A(y) = θ(y∞ − y) and allow θ → 0 with θy∞ = µ fixed to obtain the arithmetic Brownian motion. The invariant density is formally A(y) ∝ e µy which is non-normalisable, but (34) nevertheless gives the correct transition density. One cannot, however, permit θ < 0. 11 Although in principle we could average −A ′ over some other distribution-perhaps varying over time and/or space-this can present its own difficulties, as it is essential that θ be positive, and it also needs to be symmetric in y and Y0 so as to preserve the reciprocity condition (8).  for different time periods and starting-points. The agreement is particularly good when starting near the equilibrium point; it is less so when starting away from it, but the difference is smaller when viewed on the logit scale ( Figure 5(e,f)) than when viewed as a pdf (Figure 4(e,f)).

Stationary state sech-power
One way of moving away from the linear force field (quadratic potential well) of the OU model is to make the force field grow less rapidly away from equilibrium by setting (Incidentally this can be obtained from the local volatility model in which volatility increases away from equilibrium, and changing variable by γX = sinhγY .) The stationary state is a sech-power: more precisely, with B denoting the Beta function. In the limitγ → 0 we recover the OU model, soγ measures the deviation from OU-ness. Also in the limitδ =γ → ∞ we arrive at the dry-friction case considered above. We have usedγ = 1,δ = 2, which gives θ = 4 3 , making the stationary distribution 1 2 sech 2 y. See Figure 6.
Qualitatively the results are similar to those of the OU model. However, there is a minor difference: whereas in the OU model, starting further from equilibrium does not affect the the position of the spike in the density, in this model it is shifted to the right.

Stationary state Student t
We can also write down a model that has Student t ν as its steady state: This model gives rise to fatter tails than the sech-power example, because the force field decays to zero as |y| → ∞. We have used 12 ν = 3. See Figure 7.

Double-well potential
A useful general form for the stationary state for a double-well potential is from which the force field is A(y) = −y + 2y 12 The definition of ν was different in [26] and [27, Fig.3], but this should not cause confusion.
The parameters act as follows: γ → 0 makes the two wells disjoint; α 1,2 control the location; letting β 1,2 → 0 makes them deeper. In principle the implied coefficient of normalisation K, and the quantity −A ′ ∞ , can be calculated directly using Dawson's integral, but the computational effort does not seem worthwhile, as a simple numerical calculation of the integrals is sufficient. We consider, as in [27], the case α 1 = α 2 = 2, β 1 = β 2 = 1, γ = 1 √ 2 , for which −A ′ ∞ ≈ 1.557. Unsurprisingly, starting from Y 0 = 0, the point of unstable equilibrium, generates more entropy than starting in either of the wells. Referring to Figure 8, more entropy is generated in case (a) than in case (c), for any particular time period.

Remarks
We shall reserve our general conclusions for the end of the paper, but it is noticeable that the probability densities of entropy production, over various time intervals, have a number of features in common across different models. Most obvious is that there is a shift in probability mass to the right as time advances, which is expected from the integral fluctuation theorem. The cases where the particle motion involves relaxation towards a unimodal pdf over position have reasonably similar pdfs over entropy production, as might be expected.

Theory in higher dimension
We start with the multidimensional OU model.

Ornstein-Uhlenbeck
This is similar in terms of tractability to the one-dimensional theory. Let the process be written as where italic bold letters are square matrices and W t is a d-dimensional standard Brownian motion, i.e. different coordinates are independent and so E[dW t dW † t ] = I dt. The normal form (40) can be obtained from the more general form and in the same notation as before This allows the mgf of the entropy production to be written as where d[Y ] denotes a volume element in Y -space and, as previously, Q is a quadratic form. We confine ourselves to the case in which a is symmetric 13 , corresponding to the gradient of a quadratic potential. In that case and I the d-dimensional identity matrix 14 . There are two ways of proceeding. The first is to redo the analysis of Section 3.1 in a multidimensional setting, which we do next; the second idea will become apparent presently.
Analogously to before, and if we write the matrix entries as Q 00 etc as before (only now these are d × d matrices rather than scalars) then this can also be written in which the top left-hand entry is a scalar but the other elements on the leading diagonal are d × d matrices, and so on. Defining (much as before) the determinants we have that the double-integral of e −Q/2 above evaluates to and so the mgf of the entropy production is We have obtained, as desired, a multidimensional version of the work in §3.1. At this juncture it is convenient to mention an alternative approach. Had we diagonalised a at the outset, thereby rotating the coordinate axes so they they aligned with the eigenvectors of the covariance ellipsoid σ(∞), and effectively decoupling the dynamics into d independent one-dimensional systems, we could have expressed the d-dimensional result as the convolution of d (in general not identically distributed) one-dimensional results, i.e. pdfs of the form given in Prop. 2. This is because the transition density is simply a product of d component densities in the principal directions. As the entropy production relates to the logarithm of the density, it is given by the sum of the entropies produced in each of the principal directions, which are independent, and so the pdf of the entropy production is the convolution of one-dimensional pdfs. Alternatively, the mgf of the entropy production will be a product of one-dimensional mgfs of the form (20). With this in mind, it seems reasonable to complete the analysis by performing the necessary algebraic manipulations on the determinants in (42) to make this factorisation apparent. We define the eigenvalues of a to be (ǫ r ) d r=1 , and write q (r) i = exp(−2ǫ r τ i ), i = 1, 2. Analysing as before, the term on the front of (42) represents a shift by an amount , which is clearly the sum of shifts in the principal directions. Next consider Y 0 = 0. In that case the only thing to analyse is the determinant δ Q . Under an orthogonal change of basis a is diagonalised and then all of the matrices Q 11 , Q 12 , Q 21 , Q 22 are brought to diagonal form. It is convenient to define so that |Q ♯ | = δ Q . Now permute the rows and columns of Q ♯ by taking them in the order 1, m + 1, 2, m + 2, . . .. Then the matrix consists of 2 × 2 blocks along the leading diagonal and zeros elsewhere, and the determinant, which is unaffected by this operation, is given by which is the product of d results of the form (22).
When Y 0 = 0 we need to analyse ∆ Q and note that, by elementary row operations, By the block-determinant lemma [8], where adj denotes the adjugate (transpose of the matrix of cofactors; [8]). By applying the same permutation trick to write Q ♯ as an array of 2 × 2 blocks, we can calculate the adjugate directly and then multiply the matrices out to obtain which is a sum of d results of the form (28), as anticipated.
In summary: Proposition 3 In a d-dimensional OU model the pdf of the entropy generation is a d-fold convolution of one-dimensional models along the principal axes. If Y 0 = 0, then in the isotropic case a = θI, we have In general the effect of increasing the dimension is to make the pdf of the entropy production 'less singular', i.e. smoother and closer to being Normally distributed. This is seen in Figure 9, which shows some representative cases.

Non-conservative OU model
When a is not symmetric, so that the force field is not the gradient of a potential, solution of the Fokker-Planck equation is more difficult. Some discussion of this is given in [27, §4.3], which is summarised now.
First, although the steady-state density is still multivariate Gaussian with zero mean, its covariance matrix σ ∞ is not a −1 : instead it is given by the Lyapunov equation, which can be solved for σ ∞ by writing it as a set of linear equations in its elements. Secondly, recalling the general objective of [27], it is possible to find two OU models with the same shortand long-term behaviour, even if the medium-term behaviour is different. Indeed where A is the space of skew-symmetric matrices, give rise to the same behaviour in both limits. Formally, define two generators a to be equivalent (notation ≏) if they give rise to the same longtime covariance matrix. Then any generator is equivalent under ≏ to a unique symmetric one; we obtain σ ∞ from the Lyapunov equation and then invert it to obtain this symmetric generator. After this, the equivalent symmetric generator can be used as a first approximation for entropy calculations.
As an example: when a = 1 a 0 1 we have and so the equivalence class of a under ≏ is which contains the following elements in particular: 1 a 0 1 ; 1 1 + a 2 /4 1 a/2 a/2 1 + a 2 /2 .
It is easy to see that all elements of [a] ≏ have the same trace, which in this example is 2.

General potential
The multivariate analogue of (5) is where, in d dimensions, W t , Y t ∈ R d and A : R d → R d is the force field. The corresponding Fokker-Planck equation is (with τ = κt as before) If A is conservative, i.e. the gradient of a potential, then and g Y (t, y) = f Y (t, y)/f Y (∞, y) obeys the backward equation but neither of these last two equations is true if A is non-conservative (the statement of the backward equation is correct, it is just that g Y does not satisfy it). We concentrate only on the conservative case from now on. The analogue of θ in (39) is now a symmetric matrix θ given by which shows it to be positive-definite, and as before we write q = exp(−2θτ ).
Again in the interest of stating our results upfront, we have analogously to (34,35), and These are exact for the (conservative) multivariate OU model and generalise the one-dimensional work in a reasonably natural way. The function ρ is defined by The function Ω is defined by where µ ∞ = Y ∞ is the mean of the stationary distribution and the path of integration is a straight line; regardless of the dimension of the problem (d), this is still a one-dimensional integral, and in the examples considered it can be calculated in closed form. The remainder of this section is devoted to details relating to the above results and the reader may omit it.
As before we define H = −g −1 Y ∇g Y , which satisfies the vector equation Analogously to the univariate case, and also following from the multivariate OU model, for which the following is exact, we adopt the ansatz The first term integrates to give a Gaussian, which is immediately visible in (50,51) and to be expected. The second term is more difficult and the function Ω arises from integrating it. The main analytical features of it are (i) it tends to A(y)/2 as t → 0, which is seen from dominant balance in (54), and (ii) it vanishes as t → ∞. But despite its links to the OU model and to the one-dimensional theory given earlier, this second term is not in general a conservative field. To see why this is, consider its gradient: defining the symmetric matrix φ = √ q/(I + √ q), the gradient is γ given by γ ij = φ jk ∂ i A k (summing over the repeated suffix k in the usual way). Now A is conservative so ∂ i A k = ∂ k A i , and so the above expression is the product of two symmetric matrices. Such a product is a symmetric matrix iff the two matrices commute, which in turn holds iff their principal axes (eigenvectors) are in alignment. We point out that in certain cases this condition will already be met as the necessary commutation already holds. One is the OU model, for which φ and ∇A are both in R[a] (though we know that there cannot be a problem as the expression for H is exact). Another is any spherical model, i.e. one in which f Y (∞, y) is a function of (y − µ) † (y − µ) for some constant vector µ: in that case θ, and hence φ, are scalar multiples of the identity matrix.
In general, though, some alteration of the second term in (55) is in principle necessary. By rotating φ by a 'small' amount-more specifically replacing φ with w † φw, where w is orthogonal-will align its principal axes with those of ∇A to make γ symmetric, and hence make (55) a conservative field. While this is a solution, there are potential difficulties with it: we must find w, which depends on y, and is also not as yet well-defined; and then we must integrate it, which would probably have to be done numerically.
Importantly, the antisymmetric part (or curl) of γ, i.e. γ ij − γ ji , which measures how nonconservative is the second term in (55), is O(τ ) as τ → 0. This error is commensurate with, or possibly smaller than, the error incurred by ignoring the term marked √ q o(1) in (55). Put differently, fixing the 'non-conservativeness' of the second term in (55) may well not give a significantly more accurate result. Also the curl vanishes as τ → ∞, and it vanishes on average, i.e. if y is integrated over the stationary distribution f Y (∞, ·). Besides, the main objective of this work is to provide an approximation that is reasonably simple to calculate. This is why we persist with (55) and its consequences, even though the expression is not theoretically ideal.

Student t
The multivariate form of the one-dimensional model we considered earlier is for which a is related to the steady-state covariance matrix by σ ∞ = ν ν−2 a −1 , provided ν > 2. Also , η(τ, y) = y † a √ q I + √ q y y † ay so (50) is explicit. The density starts off circularly-symmetric, and ends up ellipsoidal (for a numerical demonstration see the examples in [27]). Like the univariate Student t it has power-law tails. Keeping ν = 3 as before we take two cases:   The results are shown in Figure 10. The results are shown in Figure 11.
The results are qualitatively similar to those in Figure 9, which is unsurprising in view of the qualitative similarity of the invariant density to the multivariate Gaussian.

Conclusions and Final Remarks
We have considered the problem of entropy production in diffusive systems evolving away from a given starting-point at time zero. Entropy production is understood to be a measure of the extent to which time reversal symmetry is broken within a context of stochastic dynamics: it expresses the sense that certain patterns of evolution are more likely than the exact reverse behaviour. More precisely, it assesses the likelihood of observing reverse behaviour over a period subsequent to that in which the forward behaviour took place. In finding the distribution of entropy production over a certain time interval, we characterise the reversibility of the evolution of a system exposed to illdetermined environmental forces. Whereas macroscopic systems admit no reversibility of behaviour in these circumstances, and obey a firm requirement that the entropy production be non-negative, small systems can undergo fluctuations that allow them to retrace their steps, and such events give rise to negative entropy production. The modern formulation of the second law of thermodynamics can accommodate such behaviour.
In the OU model, if the starting-point coincides with the equilibrium level (i.e. where the force field vanishes) then the distribution of entropy production is the K-distribution, expressible in terms of the K ν Bessel function. Otherwise the best route to obtaining the pdf is to use the inverse Fourier integral, as the moment-generating function is known in closed form. This is true regardless of dimension. The convenience of these results as well as the centrality of the OU model justifies the effort devoted to understanding it here. The result for a drifted Brownian motion (limit of zero mean reversion) is not materially simpler and is most easily found by the algebra of the OU derivation, suggesting that the OU process is the right way to approach this special case.
For nonlinear force fields (non-quadratic potentials), numerical simulation methods are required, together with a new and powerful analytical approximation to the transition density; the necessity for this machinery is particularly clear in problems of dimension > 1. While potentials that are qualitatively similar to the OU model-in effect, unimodal potentials-produce qualitatively similar distributions of entropy production, investigation of the finer details requires numerical techniques. Further, when the divergence from the OU case is substantial, as for example in the double-well potential considered earlier, one has no choice but to go down the numerical route. This justifies the effort devoted in the paper to numerico-analytical work.
An obvious conclusion from any of our graphical results is that some realisations of the dynamics violate the classical thermodynamic behaviour, in the sense that there is always positive probability of negative entropy production. Another general result is that as time advances there is a shift to the right in probability mass of the entropy production, which is to be expected from the integral fluctuation theorem. We also note that the pdf of entropy production often contains singularities, and that these are sometimes softened as time progresses; they are also less pronounced in higherdimensional systems. The explicit pattern of entropy production can be decidedly complex, and certainly not symmetrical about the origin. Such are the statistics of the second law at the level of a small system evolving under the influence of its complex environment.
Further work in this field might consider more complex dynamics: an obvious idea is to incorporate jumps in both directions, thereby considering so-called Lévy processes. Typically these are considerably more difficult to analyse than simple diffusions, as the forward equation is no longer a parabolic PDE but instead an integro-differential equation. A general introduction to such processes is provided by [37], and [29] shows how to use calculate certain functionals of Lévy processes, as a way of generalising the Brownian motion. There is, therefore, a broad scope for further work in this field, and we hope that the ideas demonstrated herein will provide fresh insight into stochastic thermodynamics, and allow concrete results to be obtained on difficult and analytically intractable models of the world.