Anscombe’s Quartet and the Case for Visualizations

anscombesquartetIn his classic paper on Graphs in Statistical Analysis published in 1973, F. J. Anscombe presents what we now know as the Anscombe’s quartet – a set of 4 datasets with identical statistical properties that are very different when plotted. Anscombe’s quartet highlights the fallacy of relying purely on statistics, and emphasizes the importance of visualization. Anscombe’s paper begins with one of the shortest abstracts I have encountered in a technical paper.

Graphs are essential to good statistical analysis. Ordinary scatterplots and “triple” scatterplots are discussed in relation to regression analysis

That’s it! All of 2 sentences and under 20 words. Anscombe then goes on to discuss regression. It is easy to perform a regression, says Anscombe, but that does not mean that the straight line fit of regression is appropriate to the data

In practice, we do not know that the theoretical description is correct, we should generally suspect that it is not, and we cannot therefore heave a sigh of relief when the regression calculation has been made, knowing that statistical justice has been done.

He recommends plotting the error (the difference between the dependent variable and the fitted values) against the independent variable and separately against the fitted values, as well as checking the distribution of the residuals to see if they approximate a normal distribution

Hopefully the fitted values follow the observations closely and have a greater variability than the residuals

Certain observed inconsistencies in the residuals can be resolved by either log-transforming the dependent variable (y), or by including higher order terms of the independent variable (x) in the regression. Visualizing the data helps us easily observe outliers, and test our hypothesis of the relationship between the dependent and independent variables without relying on statistical tests. It is here that he presents the quartet of datasets and their plots as a case in point.

The final part of Anscombe’s paper discusses scatterplots. He recommends plotting a scatterplot of the dependent variable against the independent variable. He further recommends a triple scatterplot (TSCP), of plotting the independent variable against one of the dependent variable, and coding the value of a third dependent variable as symbols of varying size and blackness. Two-way tables and looking at the row-means, column-means and residuals are an additional way of looking at the data.

Anscombe saves the best for the last – his concluding paragraph

Unfortunately, most persons who have recourse to a computer for statistical analysis of data are not much interested either in computer programming or in statistical method, being primarily concerned with their own proper business. Hence the common use of library programs and various statistical packages. Most of these originated in the pre-visual era. The user is not showered with graphical displays. He can get them only with trouble, cunning and a fighting spirit. It’s time that was changed.

We have come a long way from 1973 with regards to visualizing data. It may not take as much trouble or cunning as it did to graph data when Anscombe wrote these words, though I think it still takes a bit of that fighting spirit to go the extra length for a better graphical visualization.

Nassim Taleb and Daniel Kahneman on the economic crisis have a video titled Nassim Taleb and Daniel Kahneman: Reflection on a Crisis from the Digital, Life, Design (DLD) conference from January of this year. This discussion has interesting insights on what led to the current crisis, and Taleb’s recommendations to avoid this going forward, foremost of which involves nationalizing banks (to shield them from forecast errors). The discussion hits on some interesting points, noted below:

  • Certain events, that Taleb calls rare events or black swans cannot be predicted with any level of precision, because there is not enough historic evidence for such events. Why is it then, he asks, that people (that act on behalf of financial firms) take large risks without any insurance against such events. He offers the following three reasons (1) because they are ignorant about the odds (2) because they would rather make small profits than one big profit, but rather take a big loss than a series of small losses, what he refers to as the convexity of losses in a negative domain, a topic that falls within the area of Prospect Theory. (3) non-neutrality of representation: loosely, the tendency of people to use irrelevant quantitative information as an anchor when making predictions.
  • Kahneman talks about the mismatch of time scales: there is a mismatch between the expectations of firms and the agents that act on their behalf. Firms have a long horizon, the agents have a different, shorter time-scale, when it comes to making profits.
  • Kahneman brings up the two systems for human decision making as put forth by psychologists – system 1 which is the intuitive system, and system 2 which is the reasoning system. Often in making decisions, people tend to give precedence to system 1 (intuition) over system 2 (reason).
  • In a discussion involving the inadequacy of financial models, Kahneman asserts that the models used in finance are useless and dangerous , dangerous because they give people confidence even if they are wrong.
  • In the last few minutes, Taleb talks about how financial firms used the bailout money – 1) to pay themselves fat bonuses and 2) to take on higher levels of risk. He had, earlier in the talk, mentioned backward capitalism where profits are theirs (financial institutions) to keep, and losses are ours (the tax-payers) to bear. It is in this context that he calls for the nationalization of banks.

Interesting perspectives from two renowned personalities. I don’t know why Taleb is researching the history of medicine that has little to do with all the financial stuff that he has been talking about (portending may be a more apt word that, fortunately for him and unfortunately for many of us, he’s done with a good level of success). Perhaps we’ll find out in a book to come.

Allais Paradox and Rational Decisions

Slate has a simple test of rationality, which is really the Allais Paradox. To understand the significance of the Allais Paradox, one has to go back to the Expected Utility Theory proposed by John von Neumann and Oskar Morgenstern in 1944. According to this theory, the expected utility of a gamble or a lottery with multiple outcomes and corresponding probabilities, is simply the expectation over the utility associated with each outcome. In deciding between gambles or lotteries, the one that maximizes the expected utility is chosen. In constructing utility functions associated with the gambles or lotteries, a certain set of assumptions are made. One of the assumptions is the substitution axiom. It states that if a decision maker prefers a lottery with outcome ‘x’ over ‘y’, and if he has to then choose between two lotteries that offer x and y respectively with the same probabilities (p) and are identical otherwise, then he will still choose the one that offers ‘x’ with probability ‘p’ over the one that offers ‘y’ with the same probability ‘p’. A clearer description of this utility theory can be found here.The Allais paradox violates the substitution axiom, as demonstrated in the simple test on Slate mentioned at the beginning of this post.

It was in explaining the Allais paradox that Kahnemann and Tversky came up with their oft-cited Prospect Theory (1979). One of the observations under the Prospect Theory is that people tend to be risk-averse in prospects concerning gains, and risk-loving in prospects concerning losses. The Allais paradox and subsequent research on the role of emotions financial decision-making establishes that we are not rational when it comes to making financial decisions, much though we’d like to believe we are.

Fooled By Randomness – Nassim Taleb

The gist of Taleb’s book can be explained, at the risk of oversimplification, by using a few examples rooted in basic probability of the kind that relies on a simple coin toss experiment. Suppose one tosses a fair coin 5 times, with a reward associated with getting 5 consecutive heads. The probability associated with this outcome is 1/32. Now suppose that there are 32 people similarly tossing a coin each 5 times. There exists a high probability of one of them obtaining the desired 5 heads, and getting the reward. The other participants, assuming they are ignorant of the underlying probability, may attribute this victory to a certain talent possessed by the winner. Similarly, the winner may be led to believe that his winnings are due to his own talents. This is what Taleb refers to when he talks about being fooled by randomness.

Another interesting concept in the book is that of alternate histories. In the earlier example, instead of 32 people, consider just one person tossing the coin 5 times each on 32 different occasions. In this case, each of these 32 occasions can be considered as being an alternate history. In obtaining a certain outcome of the 5 tosses, one has essentially chose one of 32 alternate histories, of which one may lead to a success and the others end up in failures.

Taleb also talks about the black swan – the occurrence of a rare event. The significance of the rare event is best explained using the example in the first paragraph. Suppose that instead of a reward associated with getting 5 heads, there is now a large penalty associated with getting 5 heads (say $95), and a small reward associated with each of the other outcomes (say $1). In this case, the expected payoff will be – $2, and hence it is not worth accepting such a proposition. However, the tendency to focus more on frequency of outcomes while ignoring their outcomes, often leads people to take up such risky propositions with potentially damaging outcomes.

Some of the other concepts invoked in this book that I found interesting include the biases: survivorship bias (ignorance of failed outcomes), hindsight bias (overestimation of what one knew at the time of a past event due to subsequent information) and attribution bias (attributing personal success to talent and failure to randomness).

The above are just a sampling of the many ideas that Taleb draws on, in his book. The examples used are far more interesting than the simple one that I have used above. Taleb begins by explaining the rational basis for what is often random but perceived otherwise, and then he goes on to discuss the behavioral aspects that lead to such mistaken perceptions. At the end of the book, he brings up the key question of whether it is possible to be completely rational, and the repercussions, both positive and negative, of such an outlook.

An eminent financial mathematician in his own right, Taleb is evidently well read, drawing from a range of subjects that includes philosophy and mythology, to economics, behavioral finance, and mathematics. My only criticism about this book is that one often finds more of the author writing about himself than the ideas he expounds. That apart, the book remains an enlightening and engrossing read.

MGI annual report on Global Capital Markets ending 2006

Notes and numbers from McKinsey Global Insitiute’s (MGI) fourth annual report on Mapping Global Capital Markets

  • The world’s financial assets (calculated by adding together the market value of publicly traded equities, bank deposits, and outstanding face value of government and private debt securities for 100 countries in the MGI database) rose by $25 trillion in 2006, to $167 trillion. Compared to the average annual growth rate of 8% from 1995-2005, the growth rate from 2005-6 is 17.5%. 
  • The US ($56.1 trillion), eurozone ($37.6 trillion),  Japan($19.5 trillion), and UK ($10 trillion) account for 75% of the global financial assets.
  • Financial depth (defined as the ratio of a country’s financial assets to its GDP) has increased from being equal to the world GDP in 1980, to double the size of the world GDP in 1993, and further up to 3.5 times the world GDP in 2006.
  • China’s financial assets grew by $2.5 trillion during 2006 – a 44% increase over 2005 and the second highest growth rate in the world after Russia’s 60%. China’s equity market accounted for 75% of its growth in financial assets 
  • Japan’s financial assets have remained almost flat in 2006, with a marginal growth of $140 billion from 2005
  • Emerging markets account for only 14% of the global financial assets. 
  • Cross-border capital flows (net result of buying and selling  of financial assets through the year) increased to $8.2 trillion in 2006- 8 times the amount in 1990. 
  • In 2003, the euro surpassed the dollar as the most popular currency for international bond issues. The euro has become the second most popular reserve currency, with a 25% share in global reserves in 2003, up from 18% in 1999. 
  • The US attracted $1.9 trillion in foreign investment in 2006 and is the largest recipient of capital inflows in the world. It has purchased $1.1 trillion in foreign assets in the same year. With foreign assets in high yielding equity investments, and foreign liabilities in low-yielding debt securities, the US receives higher returns on its foreign assets than it pays in foreign liabilities. 
  • Emerging markets have become net providers of capital, having invested $332 billion abroad than they have received in 2006. 

An interesting commentary on the low level of investments in India relative to China

One reason India has less money to invest is that its national saving rate is only half of China’s 40%. India’s households save as much as Chinese ones do, but many Indian families put their money in informal savings vehicles rather than banks, so the funds are not available for investment.

I am aware of a cultural proclivity in India towards holding savings in the form of gold, which may possibly be one of the ‘informal savings vehicles’ referred to in the report. 

A Short History of Financial Euphoria – John Kenneth Galbraith

In this book, Galbraith rounds up instances of financial booms and busts, starting from the Tuliopomania in the mid-1630s to the stock market crash in 1987. He cites two primary factors as being the cause of this financial euphoria: the brevity of financial memory and the association of wealth with intelligence. About the former, Galbraith remarks:

There can be few fields of human endeavor in which history counts for so little as in the world of finance. Past experience, to the extent that it is a part of memory at all, is dismissed as the primitive refuge of those who do not have the insight to appreciate the incredible wonders of the present. 

Another common factor is the thought of a new and rewarding opportunity that the participant hopes to exploit.

In analyzing the past instances of financial euphoria, Galbraith attempts to demonstrate the recurrence of these common features, and hopes that the vulnerable reader will be warned. The list includes the Tulip Mania in Holland (1637), the Mississippi company and Banque Royale flouted by John Law in France (1720), the South Sea Company in England (1720), a series of bubbles in the US (1819, 1837, 1857 and 1873), the Great Depression (1929), and Black Monday (1987).

It is often easy to argue in hindsight about the factors responsible for such bubbles. Nevertheless, this book offers a remarkable insight into the various bubbles that have plagued the financial world over the centuries. To investors easily swayed by novel financial instruments that are little understood, Galbraith appeals through this book, to err on the side of caution.

Sovereign Wealth Funds (SWFs)

The International Monetary Fund (IMF) estimated in September 2007 that sovereign wealth funds, or SWFs, control as much as $3 trillion, and that this tally could jump to $12 trillion by 2012. … As of early 2008, the total assets of SWFs, estimated at nearly $3 trillion, surpass the $1.5 trillion managed by hedge funds worldwide—but are dwarfed by the $53 trillion managed by institutional investors like pension funds and endowments.

The above is from the background on sovereign wealth funds in the Center for Financial Research (CFR). SWFs are public investment agencies that manage part of the foreign assets held by national states. Examples of large SWFs are the Abu Dhabi Investment Council that has an estimated USD 400-800 BN in assets, the Norway Government Pension Fund with an estimated USD 373 BN in assets, and the China Investment Corporation with ~USD 200 BN in assets [Ref]. SWFs have been credited with improving market liquidity in the US financial sector having pumped in USD 69BN dollars over the past 7 months. At the same time, the large stakes acquired by these government-owned funds in corporations and the lack of transparency of SWFs in disclosing their holdings has been a cause of concern for governments in recipient countries. Some of the SWFs appear to be taking steps towards greater transparency. Other concerns remain. As the CFR backgrounder puts it:

…what if Middle Eastern or East Asian SWFs banded together to oust the CEO of a U.S. corporation? In corporate governance terms, this would be seen as positive shareholder activism, but when governments are involved, experts are left to guess at whether such clout would be used for financial gain or for political purposes.