Local versus non-local information in quantum information theory: formalism and phenomena

In spite of many results in quantum information theory, the complex nature of compound systems is far from being clear. In general the information is a mixture of local, and non-local ("quantum") information. To make this point more clear, we develop and investigate the quantum information processing paradigm in which parties sharing a multipartite state distill local information. The amount of information which is lost because the parties must use a classical communication channel is the deficit. This scheme can be viewed as complementary to the notion of distilling entanglement. After reviewing the paradigm, we show that the upper bound for the deficit is given by the relative entropy distance to so-called psuedo-classically correlated states; the lower bound is the relative entropy of entanglement. This implies, in particular, that any entangled state is informationally nonlocal i.e. has nonzero deficit. We also apply the paradigm to defining the thermodynamical cost of erasing entanglement. We show the cost is bounded from below by relative entropy of entanglement. We demonstrate the existence of several other non-local phenomena. For example,we prove the existence of a form of non-locality without entanglement and with distinguishability. We analyze the deficit for several classes of multipartite pure states and obtain that in contrast to the GHZ state, the Aharonov state is extremely nonlocal (and in fact can be thought of as quasi-nonlocalisable). We also show that there do not exist states, for which the deficit is strictly equal to the whole informational content (bound local information). We then discuss complementary features of information in distributed quantum systems. Finally we discuss the physical and theoretical meaning of the results and pose many open questions.

"Quantum information" is emerging as a primitive notion in physics following an essential extension of classical Shannon information theory [1] into the quantum domain. Quantum information can not be defined precisely, but it is necessary to understand the role of this mysterious and "unspeakable" information [2] in newly discovered quantum phenomena such as teleportation [3] or cryptography [4] [5]. These phenomena strongly suggest that quantum states represent quantum information -reality we process in the laboratory, but which can not be described as a sequence of classical symbols on a Turing tape [6,7]. Recently the no-deleting and no-cloning theorems have been connected with the principle of conservation of quantum information [8]. Like physical quantities such as energy, quantum information has different forms and one of them is entanglement -an exotic resource extraordinarily sensitive to the environment. One finds a loss of entanglement in the transition from pure entangled state to noisy entangled state, yet remarkably this process can be partially reversed within the distance labs paradigm. Namely from a large number of noisy bipartite states shared between two distant parties one can distill a number of e-bits at the optimal conversion rate using local operations and classical communications (LOCC) [9].
Despite a plethora of measures which can be used to quantify entanglement, we are still far from properly understanding it. Part of the difficulty is that measures of a quantity are not enough to understand the quantity -one needs to understand entanglement in relation to something else. You cannot understand entanglement in relation to entanglement. In the above context, basic questions arise: i) Does entanglement exhaust all aspects of quantum information? ii) Are there resources other than entanglement in the distant labs paradigm? iii) Does quantum information involve a nonlocality which goes beyond Bell theorem?
The above questions have been recently considered [10,11,12,13,14,15,16,17,18,19,20]. In particular,a new quantum information processing paradigm has been introduced, where we proposed the idea of attributing cost to local resources such as pure local qubits [14,15]. Instead of asking how much entanglement can be distilled from a state shared between two parties, one can ask how many local pure qubits I l can be drawn from it. This gives a quantity (called localisable information) which can then be used to get insight into the double nature of quantum information. Namely, it was shown that local information can be thought of as being complementary to entanglement [16], thereby allowing one, in particular, to understand entanglement in relation to I l .
At first glance, the idea of considering local pure states to be a resource may seem curious. In traditional entanglement theory, one thinks of local pure states as being a free resource. Each party can use as many pure state ancillas as desired. Furthermore, one can obtain pure states from a mixed state, simply by performing a measurement on the state. Note however, that the second law of thermodynamics tells us that purity is indeed a resource. One can never decrease the entropy of a closed system; entropy only increases. The reason a measurement appears to produce pure states is that we ignore the fact that the measuring apparatus must have initially been set in some pure state, and after the measurement, the apparatus will be in a mixture of all the different measurement outcomes. In other words, in a closed system which includes the state, the measurement apparatus and the observer, the total number of pure qubits can never increase. We must therefore be careful how we define the allowable class of operations in order to account for all pure states which might be introduced by various parties from the outside. We will discuss such a useful class, called Closed Operations which can properly be used to account for pure states.
By considering pure states as a resource, one is immediately connecting quantum information theory with thermodynamics. In fact, it was the early foundational work on reversible computation [21] where the entropic cost of computation was considered [22]. This became one of the cornerstones which led to the possibility of quantum computation. The relationship between information and physical tasks such as performing work, also has a long history beginning with Szilard [23]. In fact, as shown in [15,24] the information function is exactly equal to the number of pure qubits one can extract from a state while having many copies of the state. We will therefore talk of extracting information I from a state. One can think of this as extracting pure states from more mixed states. From the work of Szilard, we also know that the information I is closely related to the amount of work one can extract from a single heat bath (see [25] for a rigorous derivation). These connections will be discussed in Section II were we review the basic concepts.
The rough essence of the approach is that if separated individuals extract local pure states (i.e. information) from a shared state, using only local operations and classical communication, then they will in general be able to extract less information than if they were together. If the amount of information they can extract when they are together from a state ̺ is I(̺), and the optimal [26] amount they can extract when separated is I l (̺), then the difference (called the deficit) ∆(̺) ≡ I(̺) − I l (̺) feels some non-classical correlations in the state ̺. or Note that the quantity ∆ is not an entanglement measure, at least in the regime of finite copies of a state ̺. It feels not only entanglement, but also, so called non-locality without entanglement [10]. We say that it quantifies the quantumness of correlations rather than entanglement (first attempts to formally quantify such features for quantum states are due to [11], and for ensembles in [10]). The state which has nonzero deficit we will call "informationally nonlocal". The term nonlocality means here that distant parties can do worse than parties that are together, despite the fact that they can communicate classically [10]. Thus it is a different notion than the nonlocality understood as a violation of local realism (we have discussed the relations in [27]).
In this work we review some of the results of [14,15,16,17,24] and provide more detail. We then give a number of new, essential results within the paradigm of distillation of local information. In particular we provide a lower bound for the deficit: it is bounded from below by the relative entropy of entanglement [28,29]. We also find that the CLOCC paradigm allows to define thermodynamical cost of erasure of entanglement. The cost is also bounded from below by the relative entropy of entanglement. We also analyze the deficit for multi-party pure states such as the Aharonov state, GHZ state and W state. We obtain that, according to the deficit, the Aharonov state exhibits the greatest quantum correlations, while GHZ state -the least. In fact, the Aharonov state can be said to be quasi-non-localisable, in the sense that in large dimension, the fraction of information which can be localised goes to zero. We show that in the finite regime (i.e where Alice and Bob deal with a single copy of a state), any entangled state is informationally nonlocal i.e. it has nonzero deficit. Moreover, we provide states which exhibit informational nonlocality even though they are separable and have an eigenbasis of distinguishable states -call it non-locality without entanglement but with distinguishability (on the level of ensembles, it has its counterpart in [10]). We also provide many other interesting results, including the impossibility of catalysis with local pure states, and non-existence of states whose entire informational contents is non-localisable.
The paper is organized as follows. After the Introduction, in section II an operational meaning of information is briefly recalled in terms of transition rates and basic laws of thermodynamics. In section III, the idea of information as a resource in distant labs paradigm is presented. Here the central notion of the present formalism i.e. quantum information deficit is defined. In section IV the various aspects of information deficit and its dual notion localisable information are discussed and an interpretation of the deficit in the context of quantum nonlocality is provided. Section V presents deficit as entropy production needed to reach set of pseudo-classically correlated states. The concept is then generalized to arbitrary set, including set of separable states, and cost of erasure of entanglement is defined. Section VI provides upper and lower bounds for deficit in terms of relative entropy distance and upper bound for entanglement erasure cost.
We next turn to exploring new phenomena which can be discovered using our methods. In section VII, main implications of the results of previous section are provided including the key conclusion that any entangled state is informationally nonlocal in a well defined, natural sense. We also prove the existence of separable states which have a locally distinguishable eigenbasis, yet contain nonlocalisable information. Section VIII is devoted to the generalization to the multipartite case. Some of these results were briefly noted in [14]. Here information deficit is calculated and the asymptotic behavior is analyzed for special examples of pure multipartite states: GHZ state, W-state, and the Aharonov state. We find that the Aharonov state can be considered to be the most non-local. Section IX contains exhaustive analysis of Bell states. In section X we prove that (as a opposed to -pure nondistillable entanglement i.e.. the bound entanglement phenomenon) pure unlocalisable information does not exists. Section XI includes analysis of the proportions of quantum and classical correlations in quantum states, addressing the question: can the first component exceed the second? In section XII zero-way and one-way subclasses of informational deficit are presented. It is shown that in asymptotic version one way deficit is nonzero for separable (disentangled) states, stressing that quantum correlations is more than quantum entanglement. Section XIII discusses relation of our measure to other measures of quantumness of correlations i.e.. one-way and two-way quantum discord is discussed. Section XIV contains discussion of the result in the context of classical correlations measure introduced by other authors including the Henderson-Vedral measure. Discussion of complementarity between information quantities in distributed quantum systems is provided in section XV The paper closes with general discussion of the results and a list of open questions in section XVI.

II. INFORMATION: AN OPERATIONAL MEANING
Before turning to the case of parties who are in distant labs, it will prove worthwhile to discuss the notion of information from a more general perspective. Although we often talk about information as an abstract concept, here, we use it as a term of art which refers to a specific function where S(̺) = −tr̺ log ̺ is the von Neumann entropy of ̺ acting on a Hilbert space of dimension d. We will usually work with qubits, in which case log d = N an integer. As we will see in next section, so defined information function has operational meaning: it is number of pure qubits one can draw from many copies of the state. Let us now shortly discuss the information function (1) in the context of more common Shannon picture. In the latter approach a source produces a large amount of information if it has large entropy. Thus information can be associated with entropy. This is because the receiver is being informed only if he is "surprised". In such an approach the information has a subjective meaning: something which is known by the sender, but is not known by receiver. The receiver treats the message as the information, if she didn't know it. However, one can also consider an objective picture; a system represents information if it is in a pure state (zero entropy). We know what state it is in. The state is itself the information.
We obtain a dual picture, where two kinds of information are dual. Shannon's entropy represents the information one can get to know about the system, while the information of (1) represents the information one knows about the system. Together they add up to a constant, which characterizes the system only (not its particular state) Note that the "objective" picture is more natural in the context of thermodynamics. There, a heat bath is highly entropic, and we are ignorant of exactly what state it is in. On the other hand, it is known that using pure states, one can draw work from a single heat bath using a Szilard heat engine [23] The pure state represents information needed to order the energy of the heat bath. Knowing which side of a box the molecules of gas are in, allows one to draw work by having the molecules push out a piston. High entropy of the gas, implies ignorance of the molecule positions, and an inability to draw work from the system. In general from a single heat bath of temperature T by use of a system in state ρ, one can draw amount of work (cf. [30]) The process does not violate the second law because the information is depleted as entropy from the heat bath accumulates in the engine, and one cannot run a perpetual mobil. Thus a quantum system in a nonmaximally mixed state can be thought of as a type of fuel or resource. In fact, originally, our motivation for considering the function 1 in [14] was to understand entanglement in a thermodynamical context.

A. Information and transition rates
In [15,24] it was shown that the function I has operational meaning in the asymptotic regime of many identical copies. It gives the number of pure states that one can obtain from a state ̺ under a certain class of operations we call Noisy Operations (NO): operations that consist of (i) unitary transformations (ii) partial trace and (iii) adding ancillas in maximally mixed state. First, one can show that it is the unique function (up to constants) that is not increasing under the class NO. One then shows that I determines the optimal rate of transitions between states under NO. Let us now discuss two special cases.
First, given n copies of state ̺ one can obtain nI(̺) qubits in a pure state. This is done essentially by quantum data compression [31] (c.f. [32,33]). In data compression, one keeps the signal, and discards the qubits which are in the pure state. Here we do the opposite. We discard the "signal", treating it as noise, and keep instead the redundancies (that are in pure state). Thus we obtain pure states. This is essentially like cooling [34]. The protocol does not require using noisy ancillas (e.g. maximally mixed states).
A second protocol of interest is that one can take n(N −S(̺)) pure qubits, and produces n copies of ̺. The protocol, described in [24], takes pure states and dilutes them using ancillas in the maximally mixed state (noise). Existence of such dual protocols is similar to entanglement concentration and dilution [35]. And, similarly as in [36,37], this can be used to prove that there is unique function that does not increase under NO class of operations.
Note that for K pure qubits, information is equal to K. For the maximally mixed state I = 0. As mentioned, I is monotonically decreasing under partial trace, and adding ancilla in maximally mixed state. It is of course constant under unitary operations. The property that makes it a unique measure of information in the asymptotic regime is asymptotic continuity (see [37,38,39]) which means that if two states are close to each other, then so is their informations per qubit. It is important to remember that I is not expansible, i.e. if we embed the state into larger Hilbert space, then it changes (because the number of qubits increases). The reason is obvious even within the classical framework: if there are two possible states of the system, knowledge of the state represents less information than knowledge of the state in the case of, say, three possible configurations. It is in contrast with entanglement theory where a pure state of Schmidt rank two means always the same thing, independently of how large the system is. Also entropy of the state depends only on nonzero eigenvalues: e.g. the entropy of a pure state is zero, independently of how large the system is. However in the present case, the Hilbert space and its dimension is important element of our considerations.

B. Information in context of "closed operations"
In previous section we have argued that information function gives transition rates from mixed state to pure and backwards, and that it gives uniqueness of information in the context of NO. For the rest of this paper we will not treat additional mixed state as free resource. Thus let us now discuss the meaning of information in the context of a class that is compatible with the class of operations which we will use in the case of distributed systems further in this paper. Namely, we can consider closed operations (CO). They are arbitrary compositions of the following two basic operations: (i) unitary transformations (ii) dephasing ρ → i P i ρP i where i P i = I, and P i are projectors not necessarily of rank one.
We call the class closed, though it is not actually fully closed. The information cannot go in, but can go out (via dephasing). The name closed is motivated by the fact that the number of qubits is the same, and the qubits cannot be exchanged between the system of interest and environment. The only allowed contact with environment is decoherence caused by operation (ii). In next section we will introduce a "closed" paradigm to distant labs scenario, by use of which we will define quantum deficit. Now, let us ask what about drawing pure qubits out of given state by the present class of operations. The operations do not change the size of the system, so that when we start e.g. with many copies of state ρ we cannot end up with a smaller system in almost pure state. However this is not a big problem, Imagine for a while that in addition we can apply partial trace (which is not allowed in CO). Then the process of drawing pure qubits can be divided into two stages: 1) some CO operations aiming to concentrate the pure part into some number of qubits, and 2) partial trace of the remaining qubits.
Since we do not allow for partial trace, one can simply stop before tracing out.The obtained state will have a form of (approximate) product of qubits in pure state and the rest of the system -some garbage. Thus the process of dividing system into pure part and garbage we can treat as extraction pure qubits. Now, let us ask how many pure qubits can be drawn from a state by closed operations in the above sense? Actually, the process of drawing qubits by NO didn't use maximally mixed states. It was just a unitary operation, plus partial trace. Thus we can apply this operation (unitaries are allowed in CO) and get again I qubits per input states. Thus also within "closed picture" information has the same interpretation of maximal amount of pure qubits that can be obtained from a state per input copy, by closed operations.

III. RESTRICTING THE CLASS OF OPERATIONS IN DISTANT LABS PARADIGM: CLOCC AND THE INFORMATION DEFICIT
In the preceding section, we discussed the notion of information from the perspective of being able to reversibly distill pure states from a given state ̺. Now, one can ask about how things change when the allowable class of operations one can perform are somehow restricted. This is a rather general question, but since here we are interested in understanding entanglement and non-locality, we will examine the restricted class of operations which occurs when various parties hold some joint state, but are in distant labs. One then imagines that Alice and Bob wish to distill as many locally pure states as possible i.e. product pure states such as |0 ⊗mA A ⊗ |0 ⊗mB B . The amount of local information which is distillable, we call I l .
In the ordinary approach to the distant labs paradigm, one imagines that two parties (Alice and Bob) are in distant labs and can only perform local operations and classical communication (LOCC). However, as we noted, this class of operations is not suitable to deal with the questions of concentration of information to local form. That is because under LOCC, one does not count the information that gets added to the systems through ancillas, measuring devices etc. We thus have to state the paradigm more precisely. Since we are interested in local information, we must treat it as a resource, assuming it cannot be created, but only manipulated. Once we have a compound state, the task is to localize the information by using classical channel between Alice and Bob The new paradigm was introduced in Ref. [14] where one essentially looks at a closed system as one does in thermodynamics when calculating changes in entropy. One imagines that Alice and Bob are in some closed box, and don't allow them to import additional quantum states, except for ones which we specifically keep track of and account for.
In defining a class of operations, the crucial point is that here, unlike in usual LOCC (local operations and classical communication) schemes, one must explicitly account for all entropy transferred to measuring devices or ancillas. So in defining the class of allowable operations one must ensure that no information loss is being hidden when operations are being carried out. Moreover the operations should be general enough to represent faithfully the ultimate possibilities of Alice and Bob to concentrate information. In other words we wouldn't like to introduce any limitation apart from two basic ones: (i) there is classical channel between Alice and Bob (ii) local information is a resource (cannot be increased).
We consider a state ̺ AB acting on Hilbert space H AB = H A ⊗ H B . Let us first define the elementary allowable elements of Closed LOCC operations (CLOCC). Definition 1. By CLOCC operations on bipartite system of n AB qubits we mean all operations that can be composed out of (i) local unitary transformations (ii) sending subsystems down a completely decohering (dephasing) channel The latter channel is of the form where P i are one-dimensional projectors. For a qubit system, it acts as It is understood, that ̺ in is at the sender's site, while ̺ out is at the receiver's site. The operation (ii) accounts for both local measurements and sending the results down a classical channel. It can be disassembled into two parts: a) local dephasing (at, say, sender site) and b) sending a qubit intact (through a noiseless quantum channel) to receiver Thus suppose that Alice and Bob share a state ̺ AB ≡ ̺ A ′ A ′′ B , and Alice decided to send subsystem A ′′ to Bob, down the dephasing channel. The following action will have the same effect. Alice dephases locally the subsystem A ′′ The state is now of the form Thus the part A ′′ is classically correlated with the rest of the system (it is stronger than to say that the state is separable with respect to A ′′ : A ′ B). Now Alice sends the system A ′′ to Bob through an ideal channel. Thus the final state differs from the state ̺ out A ′ A ′′ B only in that the system A ′′ is at Bob site. It follows that the operation 1 can be replaced by the following two operations (iia) Local dephasing (iib) Sending completely dephased subsystem.
Note that operations (i) and (iib) are reversible. Only the operation (iia) can in general, be irreversible. Actually it is irreversible, if only it changes the state, i.e. in all nontrivial cases. Note also that the operations do not change the dimension of the total Hilbert space, or, equivalently, the number of qubits of the total system, even though the particular qubits can be reallocated; for example at the end all qubits can be at Alice's site.
Let us finally ,note that it may happen that after the protocol, one of the parties will be left without particles at all, as all have been sent to the other ones. It is only the total number of particles which is conserved.

A. Comparison with other class of operations
For the purpose of the present paper, we will use solely CLOCC operations. Yet, since in some of our papers a different class of operations was employed, we will describe the other class and compare it with CLOCC.
Let us first present the other (likely equivalent) class of operations, called Noisy LOCC (NLOCC). The relation between NLOCC and CLOCC will be similar to the relations between NO and CO: the elementary operations will be the same as in CLOCC, plus tracing out local systems and adding maximally mixed ancillas.
Definition 2. By NLOCC operations on bipartite system of n AB qubits we mean all operations that can be composed out of (i) local unitary transformations (ii) sending subsystem down completely decohering (dephasing) channel (iii) adding ancilla in maximally mixed state (iv) discarding local subsystem As in CLOCC we can decompose (ii) into (iia) and (iib). CLOCC operations is more basic than NLOCC. Namely, the latter can be treated as CLOCC with additional resource: unlimited supply of maximally mixed states (which have zero informational contents). Indeed, similarly as in section II B one can argue that the operation of local partial trace is not essential.

IV. LOCALISABLE INFORMATION AND INFORMATION DEFICIT
In this section we define the central quantity information deficit. To this end we will first introduce notion localisable information. We will first deal with single copy case, and define basic quantities on this level. Then we will discuss the asymptotic regime, which will require regularization of the quantities.
Definition 3. The localisable information I l (̺ AB ) of a state ̺ AB on Hilbert space C dA ⊗ C dB is the maximal amount of local information that can be obtained by CLOCC operations. More formally: are number of qubits of subsystems of the output state. When one of numbers of qubits is zero (null subsystem) we apply the convention that information is zero.
Alternatively, we have formula: where N is total number of qubits. Again, if it happens that all particles are with one party (i.e. the output dimension is equal to one) so that the subsystem of the other party is null, then we apply convention that the entropy of such subsystem is zero. Further state on system with one subsystem null we will call null-subsystem states. It is important here, that "to obtain local information" does not mean as usual getting some outcomes of local measurements. Rather it means, to apply such operation, after which, information, as a function of states of subsystems will be maximal. Thus, we only deal with state changes, and calculate some function (information function) on states.
Actually is not localisable information which will be the most important quantity. Rather, the central quantity is a closely connected one, which we call quantum information deficit (in short quantum deficit). It is defined as a difference between the information that can be localized by means of CLOCC operations, and total information of the state. Definition 4. The quantum deficit ∆(̺ AB ) of a state ̺ AB on Hilbert space C dA ⊗ C dB is given by Using definition of localisable information I l , we get alternative formula for quantum deficit where ̺ ′ AB = Λ(̺ AB ). It is important to notice that both quantities are functions not only of a state but also the dimension of the Hilbert space. This is because CLOCC operations are defined for a fixed Hilbert space. That I l depends on dimension of the Hilbert space is even more obvious, because the latter is explicitly written in the formula. However, in the formula for deficit as written in (11), the dimension does not appear explicitly, so it could happen that there is no dependence on dimension. Actually, it is rather important that ∆ does not actually depend on dimension, i.e. when one locally increase Hilbert space, by e.g. adding a qubit in pure state, ∆ should not change. This is because, as we will see later, the deficit will be interpreted as a measure of quantumness of correlations, which should not change upon adding local ancilla. We will discuss this issue later in more detail. In particular in section X we will show that regularization of deficit does not change upon adding local ancilla in pure state.
A. Interpretation of quantum deficit: measure of "informational nonlocality" Nonzero deficit means that Alice and Bob are not able to localize all the information contained within the state. This, however, means that part of information is necessarily destroyed in the process of localizing by use of classical communication. This part of information cannot survive traveling classical channels. It implies, that it must be somehow quantum. In addition, this part of information must come from correlations, since information that is not in correlations, is already local, and need not be localized. We could say that quantum deficit quantifies quantum correlations. However, we will see that quantum deficit can be (and often is) nonzero for separable states, which can be generated by local quantum actions and solely classical communication. It is not clear then if we can talk here about quantum correlations, because can quantum correlations be created by only classical communication between the parties? However, quantum deficit being nonzero indicates that there is something quantum in correlations of the state. One can say, that these are classical correlations of quantum properties. We will then propose to interpret quantum deficit as amount of "quantumness of correlations".
Let us now discuss issue in the context of a notion of nonlocality considered by [10]. The authors exhibited ensembles of product states which are fully distinguishable if globally accessed, but cannot be perfectly distinguished by distant parties that can communicate only via classical channel, Then they called this effect nonlocality without entanglement. The reason for using term "nonlocality" was the following: one can do better if the system is accessible as a whole, rather than when it is accessible by local operations and classical communication.
In our case, the situation is similar: Alice and Bob can do better in distilling local information if they have two subsystems at the same place rather than shared in distant labs. Thus, we have similar kind of nonlocality, and quantum deficit is a measure of such nonlocality, which we can call "informational", as it concerns difference in access to informational contents. Thus, any state with nonzero deficit will be called informationally nonlocal (or nonlocal, when the context is obvious).

B. Classical information deficit of quantum states
It is important to investigate not only "quantumness" of compound quantum states, but also the relationships between their "classical" and "quantum" parts. To this end consider quantity I LO -the information that is local from the very beginning, i.e.
We will call it local information. We can now define an analogous quantity to quantum deficit, on a "lower" level.

Definition 5.
A classical deficit of a quantum state is a difference between local information and the information that can be obtained by CLOCC (i.e. by localisable information) This tells us how much more information can be obtained from the state by exploiting additional correlations in the state ρ AB . We will refer to ∆ c as the classical deficit, because the channel is classical. Also, as we will see later, the quantity can be used in the context of quantifying of classical correlations (though it is not immediate, see [40]).

C. Restricting resources: zero-way and one-way subclasses
Additional measures of quantumness of correlations which arise when one restricts the communications between Alice and Bob.
One can define the one-way (Alice to Bob) deficit (∆ → ) and one-way (Bob to Alice) information deficit (∆ ← ) by restricting the classical communication to only be in one direction. Furthermore, one has also a zero-way deficit (∆ ∅ ). The name zero way is perhaps confusing. It refers to the situation where no communication is allowed between Alice and Bob until after they have completely dephased (or performed measurements) on their systems. After they have done this, they may then communication in order to exploit the (what are now) purely classical correlations in order to localize the information. These restricted deficits corresponds to locally accessible informations I → l , I ← l and I ∅ l .

D. Asymptotic regime: Distillation of local information as dual picture to entanglement distillation
In this section we will argue that the idea of localization of information, though at a first glance exotic, can be recast in terms typical for quantum information theory, where of central importance are manipulations over resources. Even more, our present formulation will be analogous to the scheme which is a basis for entanglement theory: entanglement distillation. We will use the interpretation of the information function as the amount of pure qubits one can draw from a state in limit of many copies.
Instead of singlets our precious resource will be pure local qubit. The aim of Alice and Bob is: given many copies of state ̺ AB to distill the maximal amount of local pure qubits by means of CLOCC operations. (in entanglement theory, we had LOCC operations, however here we need CLOCC, otherwise one could add for free states, and the maximal distillable amount of pure local qubits would be infinite). One way of doing that is the following: Alice and Bob take state ̺ AB , apply the CLOCC protocol that optimizes formula for localisable information i.e. they obtain state ̺ ′ AB which has maximal local informations I ′ A and I ′ B . They apply such protocol to every copy of state they share. As a result they obtain many copies of state ̺ ′ AB . Now, Alice in her lab, can apply protocol of drawing pure qubits out of her state ̺ ′ ⊗n A , obtaining I ′ A pure qubits. The same does Bob. Finally, they possess I ′ A + I ′ B pure local qubits which is equal just to localisable information, and actually it is the best they can do, when act first on single copies using communication, and only locally perform collective actions on many copies.
Alice and Bob could do better, when they act collectively from the very beginning. In this way we get that the optimal amount of local pure qubits that can be distilled by CLOCC is equal to regularization of localisable information: Similarly we can define regularized quantum and classical deficit Thus we conclude that regularizations of our quantities have operational meaning connected with amount of pure local qubits which can be distilled out of large number of copies of input state by means of different resources (global operations, CLOCC, local operations). Let us emphasize here, that when Alice and Bob are given single copy of state, they usually cannot distill pure qubits. When, they are given many copies, the ultimate amount of distillable pure qubits is described by regularized I l . Thus the non-regularized quantity does not represent the amount of pure qubits that can be drawn either from single copy or from many copies. However, since in definition of I l there is information function that has operational asymptotic meaning, then I l also has some asymptotic interpretation, representing the amount of pure local qubits that can be drawn when at the stage of communication, Alice and Bob operate on single copies, and only after that stage operate collectively. In entanglement theory, there is similar situation with entanglement of formation and entanglement cost. The first is not the ultimate cost of producing a state out of singlets, though it already contains "some asymptotics" in definition -the von Neumann entropy, which is asymptotic cost of producing pure states out of singlet. The ultimate cost of producing states out of singlets is regularization of entanglement of formation.
There is however some difference. Namely, even I l itself, without regularization, has operational meaning. Indeed, it is proportional to the amount of work one can draw from a single copy of the state in presence of local heat baths. To draw optimal number pure qubits, one needs many copies. In one copy case, drawing pure qubits is highly non-optimal. However to draw optimal amount of work by use of state and single heat bath, one copy is enough. roughly speaking, in former case, law of large numbers (in other words, ergodicity) comes from many copies, while in the latter -ergodicity is "supplied" by heat bath.
Finally, one can also consider amount of local information that can be distilled by means of one-way classical communication. It is equal to regularized one-way quantum deficit ∆ → . In similar vain we can consider regularizations of other quantities based on restricted resources, such as ∆ → c , ∆ ∅ , ∆ ∅ cl etc. Again, all those regularizations have operational meaning.

E. Additional local resources
One of basic features of the paradigm is that adding local ancillas is not for free. The reason is that otherwise, all the quantities would become trivial. However there are two kinds of local resources that still can be taken into account.
First of all, we can allow adding for free local ancillas in maximally mixed state. Thus given a state ̺ AB we can ask, what about the quantities of interest for the state ̺ AB ⊗ I A ′ d . Note here that this would mean that I ∞ l does not change, if we use NLOCC class instead of CLOCC. Indeed, as have already mentioned, the only difference, between two classes for the problem of distillation of local information, may appear when adding local maximal noise could help. In general, upon adding such local noise, localisable information could only go up. However it is more likely, that it will not change. In fact, Devetak has shown [20] that one-way deficit does not change upon adding noise. We were not able to show the same in the case of two-way communication, though we believe it is also the case.
Second possibility is borrowing local pure qubits. This would be most welcome, as it would mean that the deficit does not depend on dimension of the Hilbert space as discussed in the introduction of section IV. We actually show that it is the case for regularized deficit in Section X. For one-way case it is shown also in asymptotic regime in [20].
There is more general possibility: borrowing local ancilla in any mixed state. However in asymptotic limit, this is actually equivalent to borrowing noise and pure qubits, as in that regime any state can be reversibly composed out of noise and pure qubits [15,24].
F. An example: pure states As we have mentioned, in our definition of quantum correlations, we do not speak about entanglement at all. We do not work in the established paradigm of optimal rate of transformation to or from maximally entangled states [41]. We consider distillation of pure product states. Thus, it was perhaps surprising to find [14] that for pure states, this definition of quantumness of correlations is just equal to the unique asymptotic entanglement for pure states [35,41].
We shall now see that by taking as an example the singlet It is a 2 qubit state of zero entropy, so it's informational content, as given by Eq. (1) is I = 2. We will now see that I l = 1. Clearly, without communicating, neither party can draw any information from the state, since locally, the state is maximally mixed. It turns out that the best protocol is for Alice to send her qubit down the dephasing channel. After she has done this, Bob will hold the classically correlated state from which one can extract 1 bit of information by performing a cnot gate to extract one pure state |0 . We thus have that ∆ = 1. One can actually view this process in terms of measurements and classical communication, as long as we keep track of the measuring device. Alice performs a measurement on the state to find out if she has a |0 or |1 . She then tells Bob the result. Bob now holds a known state, without having to perform any measurement. Alice on the other hand, had to perform a measurement to learn her state. The informational cost of the measurement is 1 bit since a measuring apparatus is initially in a pure state, and must have two possible outcomes. After the measurement, the measuring device needs to be reset. The classical state correlated state ρ CC , if held between two parties has ∆ = 0. That the process is optimal for singlet state, is obvious, as this is actually the only thing which Alice and Bob can do given a single copy. However it is highly nontrivial to show that the regularization of I l is still the same. The optimality of this protocol also in many copy case was shown in [15]. It also follows from the general theorem we give in this paper, which connects deficit with relative entropy distance from some set of states.
In general, it is not hard to see that for an arbitrary pure state, the same protocol can be used with Alice first performing local compression on her state. For any pure state |ψ AB , the two-way ∆ is given by [14,15] ∆(|ψ ) = S(tr A (|ψ ψ|)).
Thus for pure bipartite states, quantum deficit is equal to entanglement. It is quite interesting that we have obtained entanglement by destroying entanglement.

V. DEFICIT AS PRODUCTION OF ENTROPY NECESSARY TO REACH PSEUDO-CLASSICALLY CORRELATED STATES
In this section we will show that the quantum deficit can be interpreted as the amount of entropy one has to produce in the process of transforming a given state into a so-called pseudo-classically correlated [14]. This expression of deficit makes it possible to define entropy production connected with a given subset of states. For example we can then speak about the entropy production needed to reach the set of separable states. In this way our paradigm provides a consistent definition of thermodynamical cost of erasure of entanglement, while the original deficit can be called thermodynamical cost of erasing quantum correlations.

A. Important classes of states
Let us first define sets of states which are important for our analysis. Notice that in place of a simple dichotomy between separable and entangled states [42], one can have a whole hierarchy of levels of quantumness [43]. Already Werner recognized [42], that within entangled states there might be ones that do not violate Bell's inequalities (cf. [44,45]). One may also go in converse direction, and within separable states find a subclass which is most classical, and wider classes which are still somehow classical, though in some sense to a lesser degree (cf. [11]).
First, let us consider a set if states which we choose to call properly classically correlated or shortly classically correlated. These are states of the form where {|i } and {|j }are local bases. Thus any such a state is classical joint probability distribution naturally embedded into a quantum state. Note that the set of classically correlated states is invariant under local unitary operations. The states are diagonal in a special product basis, which can be called biproduct basis. Now let us define the set of states of our central interest. We will call them pseudo-classically correlated states and denote by PC. These are the states that can be reversibly transformed into classically correlated ones by CLOCC. "Reversibly" means that no entropy is produced during the protocol. This implies that no dephasing is needed in transformations: Alice and Bob use only unitaries and sending such subsystems such that dephasing does not change the total state. Thus they can send only such subsystems X, that are in the following state with the rest R: The states that can be in such a way transformed into classical ones can be also described as the set of states which Alice and Bob can create under the allowed class of operations (CLOCC) out of classical states. The eigenbasis of these states was called an Implementable Product Basis or IPB in [15], since it is the eigenbasis that Alice and Bob are able to dephase in.
Let us note that one can have an intermediate class, one-way classically correlated which are of the form These are states which can be produced out of classically correlated states by one-way reversible CLOCC. They are diagonal in basis which is of the form {|i |ψ k } are bases themselves. The above sets are proper subsets of separable states, and all the inclusions between them are proper too.

B. Formula for quantum deficit in terms of pseudo-classically correlated states
Any protocol of attaining the information deficit looks as follows: Alice chooses a subsystem of her system, dephases it, then sends it to Bob. Bob then chooses a subsystem from his system (which now includes his original system and the system sent by Alice). He dephases his chosen part, and sends it to Alice. They can send the states using an ideal channel, as the sent subsystems are already dephased. Thus sending is here only reallocating subsystems, nothing more. Alice and Bob continue such a process as long as they wish. When they decide to stop, the final step is ρ ′ and the obtained local information is equal to N − S(ρ ′ A ) − S(ρ ′ B ) while the initial total information was I = N − S(ρ AB ). Thus the deficit obtained in a particular protocol P is ∆ P = S(ρ ′ A ) + S(ρ ′ B ) − S(ρ AB ). Alice and Bob wish this quantity to be minimal. Suppose then that they preformed an optimal protocol, for which indeed this value is minimal.
There are two cases: (i) one of subsystems is null (all particles with the other party) or (ii) both parties have subsystems that are not null. Note that in the second case the system must be in product state. Suppose it is not. Then, Alice and Bob can dephase state in eigenbasis of states of local subsystems. This will not change local entropies, but will transform state into classically correlated one. Then Alice can send her part to Bob, so that the information contents of the total state will be unchanged. However, if only the state was non-product, the total information was greater than sum of local informations. This means that the protocol was not optimal, so that we have contradiction.
Thus we conclude, that the optimal protocol ends up with either product state or state of a system, which one of subsystems is null (all particles either with Bob or with Alice). Even more, when a state is product, one of subsystems can be sent to other party, so that the whole system is with one party. This is compatible with the philosophy of "localizing" of information.
However it turns out that we can divide the total process of localizing of information into two stages: • Irreversible stage: transforming input state ̺ into some pseudo-classically correlated one ̺ ′ • Reversible stage: localizing information of the state ̺ ′ .
In the first stage Alice and Bob try to produce the least entropy. The amount of information that they are able to localize is determined by this stage. In second stage, the entropy is not produced, and the information is constant. We have the following proposition The goal is to make the total entropy increase ∆S = ∆S1 + ∆S2 + . . . minimal. Then the deficit is given by ∆ = ∆S, because once the state is pseudo-classically correlated, its full information content can be localized.
where the infimum is taken over all CLOCC protocols that transform initial state ρ into pseudo-classically correlated state ρ ′ .
Proof. The proof actually reduces to noting, that pseudo-classically correlated states can be reversibly created from states with one null system. Simply, by definition pseudo-classically correlated states can be reversibly produced out of classically correlated states. The latter, in turn, can be reversibly produced out of one-subsystem states. Thus, consider optimal protocol for drawing local information. As we have argued, it can end up with a one-subsystem state. Out of the state we can reversibly create classically correlated state which is a special case of pseudo-classically correlated states. Conversely, suppose that we have a protocol, that ends up with a pseudo-classically correlated state. Then one can reversibly transform it into one-subsystem state.
Thus quantum deficit is equal to minimum entropy production during making state to be pseudo-classically correlated by CLOCC operations. In other words, to draw optimal amount of local information from a given state, one should try to make it pseudo-classically correlated state in the most gentle way, i.e. producing the least possible amount of entropy. Once the state is pseudo-classically correlated, the further process of localization of entropy is trivial. The first stage is illustrated in figure 1.

C. Defining cost of erasing entanglement
The above formulation of the deficit allows one to generalize the idea of thermodynamical cost to other situations. Namely, instead of the set of pseudo-classically correlated states one can take any other set and ask the same question: how much entropy must be produced, while reaching this set by use of CLOCC. Thus our concept of localizing information allows to ascribe thermodynamical costs to other tasks than localizing information. With any chosen set we can associate a suitable deficit ∆ Set . An important application of this concept is to take set of separable states. Then the associated deficit ∆ sep has interpretation of thermodynamical cost of erasing entanglement. As such it is a good candidate for an entanglement measure. In this paper we will show that it is bounded from below by relative entropy of entanglement. Since set of separable states is a superset of pseudo-classically correlated states, we have so that the cost of erasing all quantum correlations is no smaller than cost of erasing entanglement. For sake of further proofs, let us put here formal definition of ∆ sep Definition 6. The thermodynamical cost of erasing entanglement ∆ sep is given by where infimum runs over all CLOCC protocols P which transform initial state ρ into separable output state ρ ′

VI. RELATIONS BETWEEN DEFICIT AND RELATIVE ENTROPY DISTANCE
In this section we will present the proof of the theorem relating deficit to in terms of relative entropy distance obtained in [15]. Theorem 1. The information deficit is bounded from above by the relative entropy distance from the set of pseudoclassically correlated states.
Let us first prove the proposition Proposition 2. Localisable information and deficit satisfies the following bounds: where H(ρ, B) denotes the entropy of diagonal entries of state ρ in basis B Proof. We will exhibit a simple protocol to achieve a reasonable amount of local information. Namely, Alice and Bob choose some implementable basis B and dephase a state in such basis. They can do this, as by definition, an IPB is a basis in which Alice and Bob can dephase by use of CLOCC. The final state has entropy Alice and Bob can now choose the basis, that will produce the smallest possible entropy H(ρ, B). In this way we obtain the following bound for ∆: This ends the proof of proposition.
Let us now express this bound in terms of relative entropy distance. This is done by the following lemma.
where H(̺, B) is the Shannon entropy of the probability distribution of the outcomes when ̺ is measured in a given basis B and S B is the set of all states with eigenbasis B.
Proof. We have Here ̺ B is the state ̺ dephased in the basis B. In the second equality, we have used the fact that tr(̺ log 2 σ) = tr(̺ B log 2 σ), because σ is diagonal in basis B. In the fourth equality, we have used that ̺ B belongs to the set S B so that inf σ∈SB S(̺ B |σ) = 0, and also that S(̺ B ) ≡ H (̺, B). This ends the proof of the lemma. Now combining the lemma with the proposition we obtain the above theorem. We have not been able to prove equality, and in subsection VI C we discuss the origin of the difficulties.
A. Deficit, cost or erasure of entanglement and relative entropy of entanglement In the previous section we have reproduced the result of [15] which provided upper bound for deficit in terms of relative entropy distance from pseudo-classically correlated states. In this section we will prove a new result, providing a lower bound for the deficit in terms of an entanglement measure -the relative entropy of entanglement.
Theorem 2. For any bipartite state ρ the quantum deficit is bounded from below by relative entropy of entanglement To prove the above theorem it is enough to show that ∆ sep -the cost of erasing entanglement -is lower bounded by E r , which is the contents of the next theorem. Indeed, by definition of ∆ sep and by the proposition 1 the deficit is no smaller than ∆ sep . Theorem 3. For any bipartite state ρ the quantum deficit is bounded from below by relative entropy of entanglement To prove this theorem we will need the following lemma.
Lemma 2. Consider any subset S of states, invariant under product unitary transformations. Then relative entropy distance from this set E S r given by decreases no more than entropy increases under local dephasing, that is where Λ is local dephasing.
Proof. Note first that local dephasing can be represented as mixture of local unitaries: Indeed, consider any set of projectors {P j } k 1 . The suitable unitaries are given by where s j = ±1 are chosen at random. Thus p i 's are equal, but this is irrelevant for our purpose. Now, let us rewrite the inequality (2) as follows: Thus we have to prove that function f (ρ) = E S r (ρ) + S(ρ) is nondecreasing under dephasing. This is somehow parallel result to the result of [46] where it was proven that the above function does not decrease under (global) mixing. The proof is directly inspired by [47].
We have The inequality comes from properties of infimum, the last but one equality comes from the fact that the set S is invariant under product unitary operations. This ends the proof of the lemma.
Proof of the theorem 3. The basic ingredient of the proof is monotonicity of the function f (ρ) = E r (ρ) + S(ρ) under CLOCC. (In entanglement theory important functions are the ones that cannot increase under suitable class of operations, while here we need a function that does not decrease under our class of operations. This once more shows that our approach is in a sense dual to the usual entanglement theory.) As we have already discussed, any CLOCC operation can be decomposed into basic ones: (i) local unitary transformation, (ii) local dephasing and (iii) noiseless sending of dephased qubits. Of course local unitary operation does not change either entropy or E r , so that the function f remains constant. The lemma we have just proved tells us that local dephasing can only increase the function f . Consider now the last component -sending dephased qubits. Clearly entropy again does not change during such operations. It remains to show that E r does not change under sending dephased qubits. Consider the state ρ ABB ′ with one dephased qubit B ′ on Bob's site. Consider the closest separable state to the state σ ABB ′ . Since relative entropy of entanglement is in particular monotone under dephasings, we can choose this state to have the qubit B ′ dephased too. Consider then state ρ AA ′ B , where A ′ qubit is the B ′ qubit after being sent by Bob. We now apply the procedure of sending qubit B ′ to the state σ ABB ′ and obtain a new separable state σ AA ′ B . By construction we have S(ρ ABB ′ |σ ABB ′ ) = S(ρ AA ′ B |σ AA ′ B ). Thus E r could only go down. However we can repeat the reasoning with the qubit sent in converse direction, and conclude that E r does not change.
In this way we have shown that the function f cannot decrease under CLOCC operations. This means, that for any protocol that brings initial state ρ to a final separable state ρ ′ we have However the target state is separable, hence it has E r = 0. We obtain which tells us that in any protocol that ends up with separable state, the increase of entropy is no smaller than relative entropy of entanglement. This ends the proof.

B. Connection with bounds obtained via semidefinite programming
In [19] semidefinite programming techniques were used to obtain lower bounds on regularized deficit. The following general bound was obtained: where λ max denotes the greatest eigenvalue, and Γ is partial transposition of the matrix. The value of the bound has been calculated for Werner states and isotropic states. It turned out that for those states it is exactly equal to regularized relative entropy of entanglement. This is compatible with the theorem 3. It is interesting, what is general relation of the bound (41) with regularized E r .
C. Discussion of the problem of "noncommuting choice" We have proved that deficit satisfies the following inequality Yet we have not been able to prove that ∆ = E PC r . Let us discuss the main obstacles which we encountered. The question is actually as follows: Can there be better protocol than dephasing in optimal IPB basis? The latter protocol has some fundamental feature. Namely, in the series of the subsequent local dephasings, each dephasing is compatible with the previous one in the sense that they commute with each other. In other words, each dephasing is in some sense ultimate: it divides the total Hilbert space into blocks, so that all subsequent dephasings are performed within blocks, in the basis that is compatible with the blocks. Another way of viewing it is to say that what was sent from Alice to Bob or vice versa, will remain classical, that is diagonal in fixed distinguished basis. The main open question is now the following: Is it enough for Alice and Bob to follow this restriction, or whether they should violate this rule to draw more information?
We can formulate this fundamental problem in a more tractable way, if we look through the proof of the theorem 3, and find where the proof fails if instead of separable states one takes pseudo-classically correlated states. Almost the entire proof can be carried forward without alteration, apart from one small item: the invariance of E PC r under sending dephased qubits. E r was invariant mainly because we could choose the closest separable state to be also dephased on that qubit. This is because set of separable states is closed under local dephasings. However the set of pseudo-classically correlated states is not. It does not rule out the possibility that indeed the closest pseudo-classically correlated state has the qubit dephased. However we were not able to prove it or disprove. We will formulate here the problem in a formal way: Problem. Consider bipartite state that can be written in the following form: where ρ 1 AB and ρ 2 AB are orthogonal on subsystem A, i.e. the reduced states ρ i A have disjoint support. Can the closest pseudo-classically correlated state in relative entropy distance be written in this form?
D. Deficit and relative entropy distance for one-way and zero-way scenarios Finally let us note that needed results can be obtained easily for one-way and zero-way scenarios. The problem with two-way is that Alice and Bob could draw more information than they obtain by measuring in optimal IPB basis. The source of difficulty was that in many rounds protocol, Alice and Bob could make dephasings that would not commute with dephasings they made in previous step. In the case of one-way there is no such danger, as there is only one round. The zero-way situation is simplest. The only thing Alice and Bob can do is to dephase the subsystems in some bases, and the only problem is to find optimal bases (so that they will produce the smallest amount of entropy). The versions of lemma 1 in one-way and zero-way case can be proven in the same way. Thus in those cases the deficits are equal to relative entropy distance to the two sets of states -classically correlated states and one-way classically correlated states (19).

E. Multipartite states
We can define set of pseudo-classically correlated states also in the case of multipartite states. Then one can formulate version of Theorem 1 in the latter case. Since the arguments we have used did not depend on number of parties, Theorem is then true also in multipartite case. Similarly theorems 2 and 3 hold in the multipartite case.

VII. BASIC IMPLICATIONS OF THE THEOREM (INFORMATIONAL NONLOCALITY)
The theorems obtained in the previous section allow us to obtain the following results for both bipartite as well as multipartite states.
• ∆ is bounded no smaller than distillable entanglement E D .
Indeed, the latter is bounded from above by relative entropy of entanglement [48].
• Moreover, Theorem 3 implies that quantum deficit is no smaller than coherent information: where X = A, B, C.... or This is because it was proven that in bipartite case [49], relative entropy of entanglement is bounded from below by coherent information S X − S. For multipartite states, one gets it by noting that multipartite relative entropy of entanglement is no smaller than the one versus some bipartite cut. Then one applies the mentioned bipartite result.
• Any entangled state is informationally nonlocal, i.e. it has nonzero deficit.
This follows form the fact that when a state is entangled, then it has nonzero relative entropy of entanglement.
Note however that there exist separable states which are informationally nonlocal for some separable states. We will now discuss an example of such state and relate it to so called "nonlocality without entanglement".
• Theorem 3 allows for easy proof that for pure bi-partite states the deficit is equal to entanglement. Indeed, from the theorem we have that deficit is no greater than entanglement. On the other hand, a simple protocol of dephasing Alice's part in eigenbasis of the state of her subsystem, and sending the stuff to Bob gives the amount of information 2 log d − S(ρ A ). Thus deficit is also no greater than entropy of subsystem. However the latter is equal to relative entropy of entanglement (this is reflection of the fact that in asymptotic regime there is only one measure of entanglement for pure states). For multipartite pure states there does not exists unique entanglement measure. We have the following open question: For multipartite pure states, is the deficit equal to relative entropy of entanglement?
If so, deficit would be an entanglement measure for all pure states. And since deficit is an operational quantity, we would have operational interpretation for relative entropy of entanglement for pure states.
Note here that in general the deficit is not monotone under LOCC, and even under CLOCC. In contrast, I l is monotone under CLOCC.
• From the above reasoning and theorem 3 it follows that thermodynamical cost of erasure of entanglement of pure states is equal to their entanglement. (c.f. [14,15])

A. Non-locality without entanglement and with distinguishability
One form of non-locality we are familiar with, is entanglement. Another form of non-locality was introduced in [10]: so-called non-locality without entanglement. There, it was shown that there are ensembles of states, which, although product, cannot be distinguished from each other under LOCC with certainty. Ensembles of product states can have a form of non-locality. Other ensembles were exhibited, which were distinguishable, but distinguishing was thermodynamically irreversible. This can be thought of of as non-locality without entanglement but with distinguishability. All those results were done for ensembles.
Here we report similar kinds of nonlocality for states. Namely, we will exhibit states which are separable, and which can be created out of ensembles of distinguishable states but which contain unlocalizable information such that ∆ = 0 (at least for single copies). In fact, one can find such states which have an eigenbasis where each eigenket is perfectly distinguishable.
An example is the state given by It is a separable state, which can be seen either by construction, or because it has positive partial transpose which is a sufficient condition for dimension 2 ⊗ 2. It's eigenkets |00 , |11 , |ψ − are clearly perfectly distinguishable under LOCC, since Alice and Bob just need to measure in the computation basis and compare results to know which of the three basis state they have. Nonetheless, it clearly has non-localisable information. To localize all the information, one would need to dephase it in the basis |00 , |11 , |ψ − , but this cannot be done under CLOCC, since one cannot dephase using a projector on |ψ − . The proof follows from Theorem 2 -we know that the optimal protocol is for Alice to dephase her side in some basis, and then send the state to Bob. Indeed, for two qubits, all implementable product bases are one-way implementable, i.e. they are of the form {|i |ψ k } are bases themselves. Thus for the one copy case, which we consider here, the optimal protocol is one-way protocol. Since the state is symmetric, then it does not matter which way (from Alice to Bob or vice-versa).
A direct calculation shows that the optimal basis is |0 ± 1 at one of the sites. This yields I l = 3/4 log 3 − 1, while I = 1/2 giving a value of ∆ = .1887. There are thus separable states which exhibit non-locality in that all the information cannot be localized even though all the basis elements of the state are perfectly distinguishable.

VIII. INFORMATIONAL NONLOCALITY OF MULTIPARTITE STATES
The approach considered here turns out to be quite valuable in the case of multipartite states. One of the reasons for this is that one can not only quantify the quantumness of correlations along various splittings, as is commonly done, but one can also look at the total amount of localizable information that a given state possesses if all parties cooperate. In other words, in addition to the various vector measures defined for a particular splitting of the state eg. AB|CD, one also has a scalar measure which is defined for the state as a whole. One can calculate ∆ for various bipartite splitting by grouping parties together, or one can calculate ∆ for the entire state. In fact, one can consider all possible groupings, such as AB|CD|EF etc. This allows one to explore multipartite correlations in more detail, and also allows one to ascribe a single quantity to a particular state in order to rank various states in terms of their total quantum correlations.
By considering a family of states for a number of parties N , one can calculate the information deficit per party ∆(ρ N )/N . and we find that it goes to zero for the generalized GHZ, and to infinity for the Aharonov state, as N goes to infinity. Of the states we consider, we shall thus find that the Greenberger-Horne-Zeilinger (GHZ) state is the least informationally nonlocal, while the so-called Aharonov state is the most informationally nonlocal.

A. Schmidt decomposable states
The information deficit for the N party GHZ state where we depart slightly from convention by taking the dimension of each parties state to also scale like N . This state is thus more entangled than if one were to give each party a qubit, and we do so in order to fairly compare our results with other entangled states. The deficit for the GHZ was calculated in [14] where it was found to be ∆(ψ N GHZ ) = log N . Essentially, once one party makes a measurement, all the other parties can learn which state they have without performing a measurement, thus I l = (N − 1) log N , while the total state is of dimension N N , hence I = N log N . Therefore This is in keeping with the notion that the GHZ is rather fragile, since if only one of the qubits becomes dephased, the entire state becomes classical. One can generalize this to any multipartite state which can be written in a Schmidt basis. I.e.
In that case, one finds ∆(ψ N S ) = S(ρ A ) where ρ A is any of the subsystem entropies (they are all equal). This follows directly from inequality (46) and it holds in the asymptotic regime of many copies.
B. An example of a non-Schmidt decomposable state: The W state in three qubits A more complicated example is the "W state" [50] |ψ W ABC = 1 √ 3 (|100 + |010 + |001 ) and we ask the question of how much localizable information I l can be extracted under one-way CLOCC by using it as a shared state. Since each party only has one qubit, we can use Theorem 2 to calculate it. This is because if each party only holds a single qubit, the optimal protocol will only need one way communication, and will be equivalent to having one party measure, and then tell her results to the other parties who will than hold a pure state between them. Let Alice measure her part of the state in basis {|e i } and send the result to Bob and Charlie. After the measurement Then Alice obtains the ensemble {p i , |e i }. Bob and Charlie obtain ensemble {p i , ̺ i BC }. ̺ i BC are of course pure states. Bob and Charlie know, which of the states {̺ i BC } they have, because they have obtained information about the result of the measurement by Alice. Therefore, the total amount of information, that can be extracted from |ψ W ABC locally, by such a protocol, is given by where ̺ A = i p i |e i e i | so that and where (since ̺ i B are pure) with ̺ i B being the reduced density matrix of ̺ i BC . So for an arbitrary von Neumann measurement, we have that for the W state, I l is given by where the measurement is performed in the basis {|e i } given by One can check that for von Neumann measurements, the largest amount of local information extractable is 1.45026. It is achieved for measurement in the basis {|e i }, where either x 2 = 1/3 or x = 2/3 (see Figure VIII D). Contrary to naive expectations, dephasing in the computational basis is the worst choice. Also the basis |± (x = 1) is not optimal. It is interesting, that optimal bases are not incidental. Rather these are those bases for which probabilities of transition into |0 , |1 states are the same as the probabilities of getting those states by Alice measuring W state in basis |0 , |1 . In the regime of single copies, this protocol is optimal by Theorem 2, therefore for the W state, I l = 1.45026. This is less than the amount of localisable information for the corresponding GHZ state |ψ GHZ = 1 √ 2 (|000 + |111 ), thus we would argue that the W-state exhibits more non-local correlations.

C. The Aharonov state and quasi-unlocalisable information
We next consider the so called, Aharonov "diamond" state. it is essentially given by anti-symmetrizing N Ndimensional states. For three parties, the unnormalized state is and in general it is where ǫ a1...aN is the permutation symbol (Levi-Civita density).
It has the property that if one party measures their state in any basis, and tells their result to the rest of the parties, they will then still hold another Aharonov state of dimension N − 1. Since this is a pure state of dimension N N , the total amount of information is I = N log N . On the other hand, under the protocol where the parties take turns measuring, it is easy to see that after each measurement, the other parties will still be left with a locally maximally mixed state. Finally, however, there will be two parties left, and they will share a singlet, which can be converted into 1 bit of localized information.
The amount of localisable information is therefore I l = 1 regardless of how large N is. This is optimal by Theorem 2 for single copies. We thus have that ∆(ψ N A )/N = log N − 1/N which grows logarithmically to infinity with N . The amount of localisable information per dimension goes very fast to zero as N −N . The Aharonov state can then be thought of as a form of unlocalizable information. One might wonder if one can make the localisable information strictly zero, as is the case for entanglement with bound entangled states. We will soon show that this is not the case.

D. General pure three qubit states
In subsection VIII A, we considered the localisable information of Schmidt decomposable states. And in subsection VIII B, we considered the W state, an example of a non-Schmidt decomposable state.
Let us here consider the general three qubit pure state, which can be written in the form [51,52] where only a need be complex, while the rest of the coefficients are real. Of course we have |a| 2 + b 2 + c 2 + d 2 + e 2 = 1. We again can use Theorem 2 to obtain the amount of localisable information. let us suppose that Alice (A) measures in the basis and sends the measurement outcome to Bob (B) and Charlie (C). Depending on the measurement outcome, Bob and Charlie share the state corresponding to the outcome |e 1 or |e 2 at Alice, where is the probability that |e 1 is obtained by Alice. For such a protocol, the localisable information amounts to where we maximize over x and y to obtain the highest localisable information. This is an optimal protocol, and thus we obtain I l . Let us denote the quantity in square brackets as I xy l . Let us find the value of the localisable information for the case of the W state |ψ W , using Eq. (62). Without loss of generality, we may write x = r > 0 and y = exp(iφ) √ 1 − r 2 . We now plot, in Fig. VIII D, the expression I xy l on the (r, φ)-plane. The supremum can then be read off from the figure. This supremum will then correspond to the localisable information for the W state. Interestingly, the supremum is attained on two parallel lines on the (r, φ)-plane.
Let us now choose an exemplary one-parameter subclass from the class in Eq.   (61) for W-state. The optimal basis for maximizing I x l is for Alice to dephase (or measure) with x 2 = 1/3 or 2/3. The basis |± (x 2 = 1/2) is not optimal.

IX. BELL MIXTURES
The state of eq. (50) is a particular example of a mixture of Bell states Here, for completeness, we calculate ∆ for all states of this for -so-called Bell-diagonal states. Up to local unitaries, this includes all 2 ⊗ 2 states with local density matrices that are maximally mixed. Due to Theorem 2, we only need consider optimizing over projection measurements (without adding any ancilla locally) at one of the parties, say Alice. Consider therefore the mixture of the four Bell states in 2 ⊗ 2.
After an arbitrary projection valued (PV) measurement on Alice's side, projecting in the basis let the global state be projected respectively to At this stage, the whole state is essentially on Bob's side. This is because we allow dephasing as one of our allowed operations. Consequently, the locally extractable information after this set of operations is the von Neumann entropy of where p is the probability of Alice obtaining the state 0 . The optimization yields the value where p 1 and p 2 are the two highest coefficients of the Bell mixture ̺ Bm . If we consider only von Neumann measurements (without addition of ancilla) and if Alice and Bob are not allowed to make any communication before they perform their measurements, then the zero-way information deficit ∆ ∅ for the Bell mixtures (63) is given by . Note however that in this case, we are unable to show whether one can do better by POVMs or whether more copies are useful. Consider however the isotropic d ⊗ d state The one-way information deficit ∆ → (as well as ∆ ∅ ) is given by For the isotropic state, it is possible to prove, on the same lines as for Bell mixtures, that POVMs as well as more than one copy cannot help.

A. Asymptotic regime
For two qubits we easily evaluated deficit, because, one-way and two-way deficits are equal in this case, because Alice's first measurement leaves no room for other measurements. So the only thing she should do is to communicate results to Bob, and communication from Bob is not needed. Put it in other words: set of pseudo-classically correlated states is equal to the one-way classically correlated states of the form (19). Thus it was enough to evaluate only one-way deficit. However if we turn to regularization, this equivalence is no longer valid. This is because, to calculate regularization, one needs to evaluate deficit for many copies. Thus the dimension of the system is high, and there is room for many rounds. we are not able to regularize two-way deficit.
Concerning one-way deficit, one can argue that it is additive for Bell diagonal states. Moreover, borrowing qubits that borrowing qubits does not help (It has been independently shown that in general, in one-way case, borrowing pure local qubits does not help [20]). We will provide the argumentation in section XIV A.

X. PURELY UNLOCALISABLE INFORMATION DOESN'T EXIST
One important aspect of entanglement theory is the existence of bound entangled states. These are states which are entangled, in that they require entanglement to create, yet no entanglement can be drawn from them. In Section VIII C we saw that in the multipartite case, there were states which the amount of localisable information per party went to zero as the number of parties increased. One can ask whether there is a strict analogy to bound entanglement: are there states which have positive I, but which I l = 0. It turns out that the answer is no; the only state which has I l = 0 is the maximally mixed state. Here we prove this in the following lemma for the case of two parties. The generalization to many parties is straightforward.
Lemma 3. ¿From any state other than the maximally mixed state we can draw local information.
Proof. Consider a state ̺ ∼ C d ⊗ C d such that ̺ = ̺ mmix = I d 2 , then there exists an observable for which the mean value in state ̺ has a different value than ̺ mmix . Every nonlocal observable can be decomposed into local operators, so we can always find such an observable of the form A ⊗ B for which: Then Notice that distribution of probability for ̺ in (71) is classical. We know that we can obtain a nonzero amount of local information from any classical state besides the maximally mixed one. We can see that we are able to find such local operation, that transforms every state which agrees with the assumptions of lemma 3, into a state from with we can draw local information.
There is an open question, whether there exist states, for which localisable information is entirely equal to local information content, but which nevertheless are not product. In such case, one wouldn't be able to draw information from correlations at all. The classical deficit ∆ c would be zero, even though state would be non-product. It is rather unlikely that such states exist, yet we have not been able to solve this question.
We now prove a related theorem which follows from the above lemma, and which will be useful for the following section. Namely, we show that using pure states as a resource cannot help when distilling local information. One can think of such a process as catalysis where one uses pure states to produce more pure states from some shared state. Proof. Assume that catalysis can help in drawing local information. Consider a state ̺, which is not the maximally mixed state and the optimal protocol of distilling local information P 1 , which do not use ancillas. Consider also another protocol P 2 , in which we distill information from some of the copies of state ̺. Using P 1 and then use the distilled pure states to do catalytic distillation on the rest of copies. Notice, that we can do this, because we know from lemma 3 that we can distill local information thus also pure states from it. If catalysis is helpful that means that using P 2 we are able to obtain more local information than in previous protocol. Protocol P 2 does not use ancillas and is better than P 1 , which is optimal. This leads to the required contradiction.
We showed that catalysis is useless for state with nonzero distillable information. It could help only in cause of states with pure unlocalisable information, but we know from Lemma 3 that such states do not exist. This ends the proof.
Remark. We know that to do catalytic distillation we need pure ancillas. One can notice that states, which we want to use in protocol P 2 to do catalysis are not exactly pure. But these states come from distillation, so they are equal in the limit of many copies to |0 ⊗rn (r is rate of distillation of local information and n is the amount of copies). This fact assure us that in asymptotic regime of many copies we are able to catalysis.

XI. CAN CORRELATIONS BE MORE QUANTUM THAN CLASSICAL?
The total amount of correlations contained in a bipartite state is given by the mutual information One can easily see that our quantities for dividing correlations into ones which behave quantumly (∆) and classically (∆ c ) satisfy In other words, the total amount of correlations (given by I M ) can be divided into classical and quantum components. Now one can ask whether the total correlations I M can be divided arbitrarily. Certainly for pure states, this is not the case. For pure states, correlations which behave quantumly cannot exceed I/2. For pure states ψ, we showed that ∆ = S(ρ A ), and thus it is always the case that ∆(ψ) = I M /2. For pure states, the quantumness of correlations can never exceed the classicalness of correlations. Now one can ask: Can it be that one has states for which If so, one could think of these states as having super-saturated quantum correlations, in that for a given amount of mutual information I M they have a greater proportion of correlations which behave quantumly. In this sense, one can think of such states as being more non-local than maximally entangled states. One way of approach to the above problem is to work with relative entropy of entanglement. We know that both the relative entropy of entanglement (E r ), with distance taken from separable states, and the von Neumann entropy (S AB ) are not greater than log 2 d for d ⊗ d states. Consequently, one has E r + S AB ≤ 2 log 2 d. Can we have the following stronger inequality: This is tight for maximally entangled states. Because deficit is no smaller than relative entropy of entanglement, it follows that if the inequality is violated, then for some states inequality (74) is true, and we would have the curious phenomenon. On the other side, when the inequality is satisfied for all states, we would obtain a nice trade-off between entanglement and noise. In a recent work, Wei et al. [53] calculated (for two-qubit states) the maximal possible relative entropy of entanglement E r (as well as other entanglement measures) for a given amount of mixedness (quantified by the von Neumann entropy). Note that the inequality (75) would generically hold for two-qubit states if it is satisfied by these optimal values. Indeed examining the curves of the above paper, one finds that for any two qubit state the inequality is satisfied.
One can also find that for Werner states, and maximally correlated states, the inequality is satisfied too, for regularized relative entropy of entanglement. To see this, the asymptotic relative entropy of entanglement (E ∞ r(P P T ) ) (with distance taken from states with positive partial transpose (PPT)) is known for Werner states (mixture of projectors on symmetric and antisymmetric spaces) in d ⊗ d [54]. One may check that the relation is satisfied for all Werner states in arbitrary dimensions. However, note here that the relative entropy of entanglement (from PPT states) is not additive for Werner states. For the maximally correlated states, relative entropy of entanglement (from PPT states) is known to be additive. Its value is also explicitly known for all such states in d ⊗ d. Via additivity, this would exactly be equal to its asymptotic relative entropy of entanglement (from PPT states). Precisely, for any state of the form It is easy to check that the relation (76) is satisfied by any ̺ mc in d ⊗ d.
Thus we haven't found states for which the inequality would be violated for regularized relative entropy of entanglement. It remains an open question whether the trade-off between noise and entanglement represented by inequality is universally true, or whether there exist states, for which there is more quantum than classical correlations.

XII. ZERO-WAY AND ONE-WAY SUBCLASSES
We now turn to additional measures of quantumness of correlations which arise when one restricts the communications between Alice and Bob. In sections IX, VIII such restrictions were useful for evaluations of perhaps more basic two-way quantities. However they are more than just for ease of calculation -we shall also see that the restricted measures allow one to explore other aspects of non-locality. Additionally, there appears to be strong connections between the deficit and distillation of randomness from shared states. For example, it has just been shown in [20] that the one-way deficit is equal to the mutual information minus the one-way distillable randomness [55].
As before, the optimal protocols by which the corresponding local informations are obtained amounts to producing "classical-like" states of least entropy by the respective operations. As mentioned in section V B the theorems proven there apply equally well in these restricted scenarios with suitable modification.
In any protocol of concentrating information to local form, the parties can stop at states of the form However for two-way scenario, we have argued that one can stop already at pseudo-classically correlated states. When one-way protocols are allowed, it is sufficient for the parties to stop at states of the form Finally for zero-way protocols, one has to achieve classical states (77). Consider for example the zero-way protocol for a state ̺ AB by which I ∅ l is attained. Without any classical communication (just by dephasing via an environment), Alice and Bob change the state ̺ AB into a classical-like state ̺ ′ AB (of the form given in eq. (77)), so that S(̺ ′ A )+S(̺ ′ B ) is minimized, where ̺ ′ A and ̺ ′ B are the local density matrices of ̺ ′ AB . Note that the parties must concentrate information using classical communication. But this is only after they have performed all their dephasings. The situation is therefore like in a Bell-type experiment.
Let us now show that ∆ ∅ is an independently useful candidate for quantum correlations and can capture interesting aspects of non-locality. The states that contain no quantum correlations would be then the ones with ∆ ∅ = 0. Consider for example the states with eigenbasis (without normalization) where |0 and |1 are the eigenvectors of the Pauli matrix σ z . Such states are the ones used in the BB84 quantum cryptography protocol [4]. This set of orthogonal states are distinguishable locally. But they are not distinguishable by zero-way communication. Bob must wait for Alice's measurement result (in the σ z -basis) to decide whether to perform a measurement in the σ z -basis or in the σ x -basis. Therefore a mixture of the states in eq. (79), where the mixing probabilities are all different from each other (so that the spectrum of the resulting state is non-degenerate), would have nonvanishing ∆ ∅ . This is because an arbitrary dephasing by Bob on such a mixture, before obtaining Alice's result would result in no information being extracted from the state (by Bob). Consequently there would be an information deficit when trying to extract information locally, because globally of course all the information is extractable from such a state. All the information is also extractable by one-way or two-way communication. This is contrast to states which have an eigenbasis for which all the information is extractable from the state locally, by measurement by both the parties in the σ z -basis, without any communication.
We therefore see that the quantum behavior of correlations could result from the distinctly quantum but "local" property of nonorthogonality. Here we call nonorthogonality a local property, as it does not a priori require a tensor product structure to manifest itself. It is this nonorthogonality that manifests itself in a more complex form in the examples of LOCC-indistinguishable orthogonal product bases [10,56,57]. More generally, it may be the reason for any case of LOCC-indistinguishability of orthogonal states [58,59,60,61,62].
An interesting issue is relation between ∆ ∅ and mutual information. In section XI we have asked a question whether there exist states for which ∆ would be more than half of mutual information. The same question can be asked in the case of one-way and zero-way deficits. Lukasz Pankowski has performed numerical simulations to evaluate ∆ ∅ versus mutual information. The results are presented on figure 4. Surprisingly, there are states, for which the deficit is almost equal to mutual information. Thus the measurement destroys almost all correlations! The quantum correlations do not imply classical correlations (see [63] in this context).
With respect to the pure states considered in Section IV F, it is easy to see that ∆ is also equal to ∆ → . This is also true for single copies of single qubit states, due to Theorem 2. A. Expression for one-way ∆ In this subsection we consider the expression for the one-way deficit In the case when only one-way communication is allowed between the parties, the only thing that Alice and Bob can do, is that Alice dephases her part in come basis, and then sends her part to Bob. Dephasing transforms the state as where {P i = |i i|} forms a set of orthogonal one-dimensional projectors on the Hilbert space of Alice's part of ̺ AB and p i are probabilities of the corresponding outcomes which Alice would obtain if she performed measurements with the same P i 's rather than dephasing, while ̺ i B is the state that Bob would obtain conditionally on measurement outcome |i . Thus The process of sending does not change the form of the state, so that the entropy of the final state at Bob is where ̺ ′ A = i p i |i i| is the reduced density matrix of the A-part of ̺ ′ AB . So finally I → l takes the form and correspondingly Just as we showed that ∆ was equal to the relative entropy distance to pseudo-classically correlated states, one can also write ∆ → and ∆ ∅ as the minimum relative entropy distance to the set of states S → and S ∅ which can be created reversibly under the one-way and zero way classes of operations.

XIII. RELATIONSHIP WITH OTHER MEASURES OF QUANTUMNESS OF CORRELATIONS
Let us now compare the deficit with other measures of quantumness of correlations, in particular the quantum discord [11,12]. The latter is defined formally, as the difference of two classically equivalent expressions for the mutual information, applied to quantum systems (taken to be a measuring apparatus and system). It was defined with respect to a measurement A M (either a projective one, or a POVM (Positive Operator Valued Measure) performed on the apparatus A. One then defines the discord δ(A M |B) with respect to this measurement that results with probabilities The relationship between δ(A M |B) and ∆ → (defined on single copies) was recently shown in [64] where it was shown that the discord also has the interpretation of extraction of work by a demon, if one minimizes δ(A M |B) over all possible measurements A M . Care however must be taken, since with the definition of discord there is no cost associated to pure states which are used in a POVM. Therefore, we note here that the relationship between the discord and ∆ → only applies if one optimizes the discord over von Neumann measurements, and disallows POVMs.
Finally, let us provide two explicit examples of cases where two way communication is more powerful than 1-way communication. I.e one has the strict inequality ∆ ↔ > ∆ → = inf AM∈P V meas δ(A M |B) To this aim consider the basis related to the sausage states of [10] and which has been analyzed in [16]: Consider now any bipartite 3 ⊗ 3 state ̺ two−way that is diagonal in the above basis, but has nondegenerate spectrum. It is relatively easy to provide a two-way protocol that distinguishes vectors (83) without destroying them (see [16]). Hence ∆ ↔ vanishes. Evidently ̺ two−way is not of the form 3 i=1 |φ i φ i | ⊗ ̺ i with orthogonal φ i , since there are no three eigenvectors among (83) that have the same component on Alice's side. So both ∆ → and discord are strictly positive for this state. Thus Maxwell's demon which communicate in both directions are more powerful than demon's who can only communicate in one direction.
Another simple example is to take states which have zero optimized discord or one-way deficit but in different directions of communication. Then take them each to be on orthogonal Hilbert spaces, and mix. Such a state will have ∆ ↔ = 0 since both parties can just project on the two orthogonal Hilbert spaces to determine whether they hold ρ → or ρ ← and then the appropriate party can send her state down the channel. On the other hand, one-way communication will be sufficient to completely localize one of the states but not always both.

XIV. RELATION WITH MEASURES OF CLASSICAL CORRELATION
In this section we shall analyze the relation of the classical deficit [16] to already known measures of classical correlations. It happens that both zero-way and one-way deficit have their "counterparts" in such measures. There is no known analog, however, for two-way deficit.
Let us recall that just as the quantum deficit was defined as One can think of it as describing how much better Alice and Bob can do under closed operations (CO) if they are given a quantum channel instead of the classical channel. Because it feels the difference between the quantum and classical channel, it tells us about the quantumness of correlations. Likewise, the classical deficit is given by It tells us how much better two parties can do at localizing information if, instead of having no access to a channel i.e. closed local operations, they have access to a classical channel. Because the added resource is a classical channel, it shows how much better the parties can do by exploiting a classical channel. One can verify that ∆ c and ∆ add up to the quantum mutual information I M (̺ AB ) = S(̺ A ) + S(̺ A ) − S(̺ AB ). Thus ∆ cl = I M − ∆ More explicitly we have (cf eq. (11)) i.e. ∆ cl is the optimal decrease of local entropies by means of CLOCC.

A. One-way measures
Corresponding to the measure of quantumness of correlation under one-way classical communication (from Alice to Bob) (∆ → ), given by eq. (81), we could have the following formula for classical correlation: Note that the supremum is taken over all local dephasings on Alice's side. Although we optimize over projection measurements, one can effectively include POVM's by including all the required ancillas from the start. Remarkably, it has been shown [20] that POVM's need not be considered when one goes to to the limit of many copies.
In eq. (86), we have distinguished two terms. The second term shows the decrease of Bob's entropy after Alice's measurement. The first one δS(A) = S(̺ A ) − S(̺ ′ A ) denotes the cost of this process on Alice side, and is non-positive. It is zero only if Alice measures in the eigenbasis of her local density matrix ̺ A .
The expression for ∆ cl → is very similar to the measure of classical correlation introduced by Henderson and Vedral [13]: Originally the supremum was taken over POVMs, but as mentioned we take the state acting already on a suitably larger Hilbert space, unless stated otherwise explicitly. The difference between the Henderson-Vedral classical correlation measure and one given in eq. (86) is that the former does not include Alice's entropic cost δS(A) of performing dephasing. Hence in general, ∆ → cl ≤ ∆ HV . In the asymptotic limit of many copies, one has equality [20]. Actually in [20] it was shown that regularized one-way classical deficit is equal to another operational measures of classical correlations: distillable common randomness introduced in [55]. The latter is in turn equal to regularized Henderson-Vedral measure. It is interesting that ∆ → cl without regularization, although seems to be an important characteristics of classical correlations, does not meet a basic requirement for being a measure of classical correlations: it is not monotonous under local operations [40]. Thus regularization plays here a role of monotonization. There is interesting question what happens with two-way classical deficit after regularization.
B. Additivity of one-way quantum and classical deficits for Bell diagonal states Here we will prove the fact mentioned in section IX A, that the one-way deficits are additive, and that borrowing pure qubits does not help for Bell diagonal states. First of all in [65] it was shown that a measure of classical correlations C HV is additive for Bell diagonal states. Let us recall the argument, as it will be useful for making connection with classical deficit. For a Bell diagonal state ρ, consider a related channel Λ (i.e. such channel that (I ⊗ Λ)(|φ + φ + |) = ̺). The maximum output Holevo function over all input ensembles, denoted by χ * (Λ) is, for general channels, no smaller than C HV . They are equal, if the density matrix of ensemble attaining χ * is equal to ̺ A . In the case of Bell diagonal states, we have ̺ A = I/2, and it turns out that the optimal ensemble for corresponding channels consists of two orthogonal states, hence gives rise to the same matrix. King [66] has shown that χ * is additive for channels coming from the Bell diagonal states. ¿From this and from the fact that, in general, χ * ≥ C HV one gets that for Bell diagonal states C HV for many copies is also equal to χ * for many copies of corresponding channels. This proves that C HV must be additive. Now, let us make connection with classical deficit. As discussed in [40], if χ * is attained on such ensemble that its density matrix is equal to ̺ A , then by looking at ensemble maximizing χ, one can tell something about measurements that attain C HV . Namely, when the ensemble is orthogonal, then one attains C HV by measurement in eigenbasis of ̺ A . Now, it is obvious from eq. (86) and discussion thereafter, that in the latter case C HV is actually equal to classical deficit, as they differ from one another only by entropy production during Alice's measurement, which vanishes, if it is done in eigenbasis. Since C HV is additive, then for many copies it is again attained by measurement in orthogonal basis that is eigenbasis of Alice's subsystem. Thus classical deficit for many copies is also not less than C HV , and it by eq. (86) cannot be greater.
Thus for Bell diagonal states the deficit is equal to C HV and it is additive. Moreover, since the measurement was von Neumann one, the deficit is attained without using POVMs. This means that additional pure ancillas do not help.
So far we have talked about classical deficit. Now, since quantum and classical deficit add up to mutual information which is additive, it follows that quantum deficit is additive too. Also, since borrowing local qubits does not increase classical deficit, it cannot decrease quantum deficit.

C. Zero-way measures
Let us now consider measures of classical correlations under no classical communication, ∆ cl ∅ . Again, this is taken to mean that the parties are not allowed to communicate before making measurements, but can do so afterward in order to concentrate the classical records. The information deficit under no classical communication, ∆ ∅ , is given by where S(̺ ′ AB ) is the von Neumann entropy of the optimal final state ̺ ′ AB (which is classical-like), and was obtained by local complete measurements, without classical communication. We then obtain We have three terms here: the last one is the classical mutual information of the final state, while the first two, δS(A) = S(̺ A ) − S(̺ ′ A ) and δS(B) = S(̺ B ) − S(̺ ′ B ), denote respectively the local entropic costs of the process at the respective sides. We therefore have a trade-off similar to that in the one-way case. And again there was defined a classical correlation measure [65] which consists only of the last term of our quantity where ̺ ′ is obtained out of ̺ by local complete measurements. Again the original definition of C ∅ involved POVMs, but as we have suitably increased our Hilbert space from the very beginning, we need not do so.

XV. COMPLEMENTARITY FEATURES OF INFORMATION IN DISTRIBUTED QUANTUM SYSTEMS
Bohr was the first who recognized a fundamental feature of quantum formalism -complementarity between incompatible observables. Complementarity was not explicitly related to entanglement, now regarded as an important quantum information resource. Namely, Bohr's complementarity concerned mutually exclusive quantum phenomena associated with a single system and observed under different experimental arrangements.
Let us comment on complementarity in the case of composite systems and Bohr complementarity. Roughly speaking, the latter says that one cannot access the properties of the systems necessary to describe it by one measurement. The rule is formulated for single quantum systems and is a consequence of noncommutativity.
On the other hand we know that one can also divide the properties of the system into local and nonlocal ones, and they are complementary with each other too [16]. For example, one can perform measurement in Bell basis or in standard product basis. However one cannot perform those measurements simultaneously. In other words one cannot access global and local properties of the system (see also [67] in this context).
The latter phenomenon is not merely a consequence of Bohr's complementarity. Indeed, if the only allowable states of composite systems were the classically correlated states: then maximal information about the total system would be available through measurements on subsystems. Global measurements would not access any further knowledge about properties of the system. On the other hand, Bohr complementarity would still hold, in the sense that one cannot access all properties of the system in one measurement.
Thus we see that the local-nonlocal complementarity [16] is a consequence of two distinct phenomena: noncommutativity and existence of entanglement (or quantum correlations). So not only is there noncommutativity, but there is too much of it, so that it affects also relations between local and nonlocal informational contents.
In distributed systems one usually imposes constraints by allowing operations that can be done solely by classical communication and local operations. It turns out that in such situation there also arises an interesting complementarity. Namely, in [16] we considered two tasks: localizing information (which we have presented in this paper) and sending quantum information (e.g. teleportation), performed simultaneously. It was shown that for a fixed protocol P, the rates of those two tasks obey the following relation I l (P, ρ) + Q(P, ρ) ≤ I l (ρ) where I l (P, ρ) is the amount of information localised by the protocol P and Q(P, ρ) is the amount of qubits transmitted by the protocol. For example, for the singlet state, the total informational contents is equal to total correlation contents and amounts to two bits. The right hand side of the inequality is equal to 1. This number 2 in the light of the above complementarity we can interpret as follows: 2 is equal not to 1 plus 1 but it is equal to 1 or 1. One can either draw one bit of local information (classical correlations) or teleport one qubit (quantum correlations), however we cannot access both bits.
One can see that this phenomenon is connected with above-mentioned Bohr complementarity for distributed systems: for the task of teleportation, Alice makes a Bell measurement on her part of the singlet and the unknown state to be sent, while to localize information, she measures only the half of singlets. Interestingly, as far as those two exclusive measurements are concerned, the "local versus nonlocal" complementarity occurs within Alice laboratory, while it results in complementarity between tasks that refer to local-nonlocal properties of systems belonging to Alice and Bob.
The above inequality suggests an interesting problem: to find the trade-off curves for performances of teleportation and localizing information of a given state. In particular, an interesting question is whether there exist states for which if we teleport the amount of qubits equal to distillable entanglement, one not only would not localize any information, but would need to spend some additional pure states (see [17] in this context).

XVI. DISCUSSION AND OPEN QUESTIONS
In conclusion we have developed the quantum information processing paradigm which involves local information as a natural resource in the context class of CLOCC operations. We have presented proof that the central quantity of the paradigm, quantum information deficit is bounded from above by the relative entropy distance from the set of pseudo-classically correlated states. We showed how the paradigm allows one to define thermodynamical cost of erasure of entanglement: entropy production necessary to make state separable by CLOCC operations. We proved that the cost is no smaller than relative entropy of entanglement. Since the cost is no greater than the deficit, we have obtained that the deficit is no smaller than relative entropy of entanglement. This in turn implies that every entangled state exhibits informational nonlocality.
We have also found that the paradigm offers a new method of analysis of correlations of multipartite states. The most nonlocal state from this point of view (we call it informationally nonlocal) would be the one for which one has to produce the largest entropy while converting it into classical states. It turned out that according to such a criterion, the Aharonov state is much more nonlocal than GHZ one. The nonlocality that can be probed by our methods is one that is not caught by Bell's inequalities, since we have found that also separable states can exhibit nonzero deficit. Rather, it has much in common with nonlocality without entanglement, that was found for ensembles of states [10]. Thus our nonlocality is not identical with entanglement. As a matter of fact it is a wider notion.
The information deficit has then some peculiar properties. Since it is not an entanglement measure, it can increase under local operations. It is not unreasonable: Local operations may destroy a local property, and make it impossible to carry out some action by separated parties, while when the parties meet, the action may still be achievable. This curious behavior of quantum states may be attributed to the fact that even for separable states, when they are mixtures of nonorthogonal states, we cannot ascribe to the subsystems local properties (this may have some connection with the Kochen-Specker theorem).
The paradigm developed in this paper opens many important questions. Here are some of them.
• Are "noncommuting-choice protocols" better in localizing information? This is the major problem in the paradigm of localizing information by CLOCC operations.
• Is the quantum deficit equal to relative entropy distance to pseudo-classically correlated states? This question would be answered positively, if the noncommuting-choice protocols do not help.
• Is regularized deficit still nonzero for all entangled states? For regularized deficit we have lower bound given by regularized relative entropy distance. However we do not know if for any entangled state the latter is nonzero.
• Is deficit for multiparty pure states equal to relative entropy of entanglement? For bipartite states it was proven that the deficit is equal to entanglement. For multiparty case it is also true for Schmidt decomposable states.
It is an open problem whether it is true in general. The same question can be asked for regularized deficit. Is it equal to regularized E r for multipartite pure states?
• Is two-way classical deficit a legitimate measure of classical correlations? The classical deficit definitely is important quantity describing some aspects of classical correlations. However there is a question, whether it can be used to quantify them. To this end, it should not increase under local operations [13]. For one way case, the classical deficit is not monotonous under local operations as shown in [40]. Yet it turns out that after regularization, the monotonicity is regained [20], because regularized one-way classical deficit is equal to one-way distillable common randomness of [55]. Can two-way classical deficit be also monotonous after regularization? This is connected with the next question: • Is classical two-way deficit equal to two-way distillable common randomness [55]?
• Is relative entropy of entanglement the thermodynamical cost of erasure of entanglement? We have shown that the cost is bounded from below by relative entropy of entanglement. If there is equality, relative entropy of entanglement would acquire operational status: it would be interpreted as thermodynamical cost of erasure of entanglement.
• What is the relation between deficit and mutual information? We have shown that if a trade-off inequality for E r (75) would be violated, then quantum deficit would be more than the classical deficit for some states. We have also touched on this question by analysis of zero-way deficit versus mutual information. Preliminary results suggest that there is very interesting phenomenon while going from quantum to classical states via local measurements: for some states before measurement there are large correlations quantified by mutual information, while after measurement, the remaining amount of information is equal almost exclusively to initial local information. This means that for some states, even optimal measurement may destroy most of information contained in correlations. The question can be recast in the following way: how small can be the classical deficit versus mutual information?
In [63] measure of classical correlations (88) closely related to zero-way deficit was compared with mutual information. The authors showed that when this measure is smaller than ǫ then mutual information is smaller than ǫ poly(d) where d is dimension of the Hilbert space. They were however unable to improve the factor to be of order of log d. This means that most probably there is place for dramatic divergence between the two measures of correlations. Since deficit can be only smaller from the measure of (88), the effect can be even stronger. All that suggests that there may be a large gap between the classical and quantum.
• A fundamental open problem, or rather program is to analyze complementarity between drawing local information and distilling singlets initiated in [16]. In the latter paper, the two tasks: drawing local information and teleporting qubits were treated as complementary ones. One obtains trade-offs, if one wants to perform those tasks simultaneously. An open question is whether distilling singlets can lead to negative amount of local information gained, i.e. whether in process of distillation we have to use up local pure qubits rather than we gain them [17]. Moreover one can define the following quantity: maximal amount of pure qubits one can draw by CLOCC from a given state [19]. Note that here we do not speak about local qubits. Thus for example, singlet is already pure and needs no action. Due to reversibility in entanglement transformations for pure bipartite states [35], the question in fact reduced to the problem of drawing simultaneously singlets and local pure qubits.
• An interesting question arises in the context of [68]. There the authors probe correlations by applying random local unitaries to transform the state to product or separable form, using the smallest number of unitaries. This method allows to define not only quantum correlations but also total correlations in terms of entropy production while reaching some set of states. It differs from our approach in that the authors do not use classical communication in an essential way (it cannot help). Therefore a natural application of their method is to probe total correlations. This allows them to give a fresh, operational meaning to the quantum mutual information -it is the entropy production needed to bring a quantum state into product form. Our method could be applied in a similar way -one tries to bring a state into product form using CLOCC but without the classical communication (i.e. CLO) Then one finds that the entropy production (i.e. deficit to product states ∆ CLO prod ) is equal to I(ρ AB ). This can be seen simply from the fact that the optimal protocol is for one party to locally compress her state and then to dephase in the eigenbasis of the compressed state. She then dephases in a basis complementary to the eigenbasis. The latter measurement completely destroys all correlations between A and B. Since the initial entropy was S(ρ A B) and the final entropy is S(A) + S(B), the deficit and hence entropy production, is I(ρ AB ). Just as the relative entropy distance to some set of states (pseudo-classically correlated, and separable states) played a crucial role in the case of ∆ and ∆ sep , here, the relative entropy distance to product states plays the crucial role, and is equal to the quantum mutual information.
It is rather amusing that this gives the same answer as the method used in [68], since in our cases, Alice performs her measurement without any knowledge of the density matrix of Bob, while in [68], she must use this information. Furthermore, the number of unitaries which would be needed to perform the dephasing in our case, is S(A) 2 , far greater than the optimal number found in [68]. Understanding in greater detail why these two methods give the same answer might be an interested avenue of further research. It is also interesting to compare how one divides the total correlations into quantum and classical ones. For example, in the case of the singlet, [68] interpret the two bits of mutual information as requiring one bit of noise to destroy the entanglement, and one bit of noise required to destroy the secret correlations. In [16] we interpreted the two bits in terms of one use of a quantum channel, or one bit of local information.
In the case of destroying correlations due to entanglement, our method uses classical communication in an essential way, therefore on the surface, it appears to naturally encode the notion of entanglement whose definition relies on the class of LOCC. For pure states the authors of [68] also obtain entanglement, as in this case communication is not needed to reach the set of separable states. It is interesting then to compare what those both approaches would produce as far as entropic cost of erasing entanglement is concerned. One could expect that our method will show less cost in the case of erasing entanglement.
Finally we strongly believe that the present, novel paradigm analyzed and developed here will be helpful as a new rigorous tool in searching for a border or rather a way of coexistence between quantumness and classicality in physical states. It may also enrich our understanding of quantum information processing and its relation to other branches of physics like thermodynamics and statistics.