# To the Limit...One More Time

There's an interesting article in this month's Mathematics Teacher about the effects of the particular language elements we use to communicate mathematical ideas.  The main thread revolves around limit concepts, primarily because they're both philosophically and practically confusing for many beginning calculus students, and because, it turns out, a teacher's particular choices regarding words and metaphors have an important impact on student (mis)understanding.

Limits comprise a special relationship between mathematical process and mathematical object.  We speak of them in terms of variables "approaching" or "tending toward" particular values, but we subsequently manipulate them as static entities.  I can, for instance, talk about the limiting value of the expression 1/x as x grows without bound (a dynamic concept), but that limiting value is ultimately just a single real (static) number: zero.  There's an uncomfortable tension in that duality.

Even the notation is ambiguous.  Here's the fact I mentioned in the preceding paragraph, symbolically:

The arrow implies motion, but the equals sign implies assignment.  There are elements of both process and object.

I've touched on this duality before, which has sparked some great conversations.  A few months ago, I had a supremely interesting email chat with Christopher Danielson after he pointed me toward the writings of Anna Sfard.  He has graciously agreed to allow me to reproduce that conversation here in its original form; I've only redacted some of the more boring pleasantries and collapsed some strings of shorter messages into longer ones.  Enjoy.

Chris Lusto
To: Christopher Danielson

Seriously, thanks for the Sfard tip.  I've read a few of the articles she has on her website (which, by the way, why are college professors' websites like the most aesthetically displeasing things on the internet?  Just use a white background and stop being weird.), and you were right: I dig her.  I read the article on duality [PDF] and had one major bone of contention.

I really like the idea of duality versus dichotomy, and she makes, I think, a compelling argument in general.  I just worry it might just be a little ambitious.  She hedges a little bit, saying things like "more often than not" mathematical objects can be conceived both operationally and structurally, but I still think this idea of duality runs into serious problems when infinite things come into play--and that's not exactly a trivial subset of "mathematical objects."

If we allow that operational conception is (a) just as valid/important as structural and (b) often, in fact, precedes structural conception, what are we to make of processes that never end, that never produce anything because they're always in production?  Sfard even says, "...interpreting a notion as a process implies regarding it as a potential rather than actual entity, which comes into existence upon request in a sequence of actions."  But what if we can't ever fulfill the request, because we're always on hold, waiting in vain for the end of an unending sequence?  And what about this business of "potential?"  That just smacks of the "potential infinities" of the ancient Greeks that held back western mathematics for a couple millennia.  It seems like we have to admit either (a) an infinite process can terminate in finite time in order to produce an structural object, or (b) these objects aren't really at all structural, because they live in the world of potentiality.  I don't find either of those particularly satisfying.  I think, in the case of infinite notions, the operational conception leads to a fundamental misconception, a la my student D.

Your thoughts?  Whenever you have a moment, of course.

Chris

Christopher Danielson
To: Chris Lusto

"Ambitious" describes Anna Sfard's intellectual habits very well, I think. She was in a half-time appointment at Michigan State (and half time at Haifa) for part of my grad school time, and she was on my dissertation committee. The woman is crazy smart. And it seems to be a characteristic of Israeli intellectuals to commit very strongly to one's ideas. Not a maybe or a perhaps to be found in her oeuvre, I don't think.

I have no explanation for the poor poor quality of academics' websites, except to say that it is representative of tech use in higher ed more generally. See also @EDTECHHULK on Twitter and Dan Meyer's comments here (esp. couple screens down the page, at "Real Talk about Grad School):

http://blog.mrmeyer.com/?p=12592

I'm still formulating thoughts on processes that never terminate. But I'm not sure I fully understand your objection. Your classroom scenarios seem to suggest that indeed process and object are both fundamentally important ways of thinking about infinity. And consider the language of limits..."as x goes to infinity" or even "as x grows without bound". Those are both process-based ways of talking, right?

csd

Chris Lusto
To: Christopher Danielson

I think Sfard's right that, in general, process and object are both important methods of mathematical conception.  And yeah, multiple representations are not only admissible, but probably desirable (thinking here, specifically, of HS algebra and the Lesh Model), but isn't operational understanding misleading when you're talking about infinity?

Thinking of f(x) = 2x as a process that doubles inputs is valuable, and so is a picture of the resulting object/graph.  And, in a case like this one, I don't think you lose or gain all that much with either vantage.  Sometimes it's helpful to think of the process, and other times the object.

But thinking of asymptotic behavior procedurally, for example, is very, very different from the object we call a "limit."  It's nice if students can understand that, as x gets larger, 1/x gets arbitrarily close to 0.  I mean, certainly if we hold a numerator constant and increase the denominator, this process yields subsequently smaller and smaller values.  But I think that's still like a mile away from understanding that lim_x-->∞ {1/x} = 0.  Like, is equal to.  Is identical to as an object.  Is just another name for.  Like, 23 + lim_x-->∞ {1/x} = 23.

If procedure (process) is linked to product (object)--like, say, "4 divided by 7" is linked to "4/7"--then how are we to reconcile a never-ending process with a finite, tangible product that can be manipulated like any other mathematical object?  Doesn't it force us to accept that 1/x eventually "gets to" 0 (which it doesn't), or that the limit is some kind of potential result (which it isn't) that can't really ever be called a proper object because the process is, by definition, never-ending?

I'm going to stop typing words, because I feel like as my words -->∞, my clarity --> 0.

C

Christopher Danielson
To: Chris Lusto

I see...so to boil it down to a debatable question...

Is the object necessarily the product of the process?

Do I have it right?

btw...if I got that question right, then I say 'no'.

I can think about 1,352,417 and treat it as an object, even though I can assure that I have never participated in any sort of process that yielded that number.

To say nothing of googolplex.

csd

Chris Lusto
To: Christopher Danielson

I think that's about right, but with one important qualification.

Is the object necessarily the product of the process?  Then I agree, no.  But you at least have the option of defining it either way.  Even if you've never constructed 1,352,417 widgets, there's nothing philosophically problematic with the process that did/could.  You're right, there isn't even a measly googol of anything, but that doesn't stop it from being the eventual result of (1+1+...+1).

So...

Is the object the result of the process?  Not necessarily, but that's not a huge problem for me.

Could the object be the result of the process?  If the answer is no (which my gut believes it to be in the infinite case), then how can we reasonably talk about it as both a process and an object?  Does the duality break down?

C

Christopher Danielson
To: Chris Lusto

See I don't see a huge difference philosophically between "a product that could be created by a known process, but not in my lifetime" (counting to googol) and "a product that could never be created" (infinity).

In both cases, for me, the process is (1) incomplete, and (2) hypothetical.
Why does it matter at the core whether the result is theoretically achievable or not? Either way, I've imagined it.

And I think imagination is key. I don't recall whether Sfard writes about that or not (probably not, since she's all language, no imagery). But I do think the transition from process to object is at least in part one involving imagination. I have to imagine the object into being in mathematics precisely because mathematical objects are abstract.

And when I'm struggling to understand a new object (say a limit), it is often helpful to imagine the process that produced it. But I don't have to see the process through to the end.

csd

Chris Lusto
To: Christopher Danielson

Think about our Hz conversation.  Even with arbitrarily huge numbers of wave combinations, we get sinusoidal waves.  I can get as close to a square wave as I want, but in order to actually obtain the square wave object, the process that got me arbitrarily close to my goal breaks down and fails.  The process is insufficient to the object.  The difference between the square wave and the sinusoidal wave that's arbitrarily close to square is ultimately qualitative, not just quantitative--and there's the rub.  Wasn't that precisely what you and Frank [Noschese] convinced me of?

C

Christopher Danielson
To: Chris Lusto

But the square wave is the limit. There's the object. The limit (process? object?) produces the square wave.

I have no idea what I convinced you of. But I know that the argument I was making was that polynomials-by definition-have finitely many terms. And e^x can be written as infinitely many terms, each one a polynomial. Is e^x a polynomial? By the letter of the law, no. But in spirit? Yes. And that's beautiful.

I got in trouble doing a CMP demonstration lesson once. I talked with students about a cylinder being a circular prism. The algebra teacher observing got upset with me because a prism has polygonal faces. Ergo, "circular prism" is nonsense.

I had occasion to follow up a year or so later with my former complex analysis professor from MSU grad school. He had absolutely no problem calling a cylinder a circular prism.  No problem at all.

What to learn? Unclear.

csd

Chris Lusto
To: Christopher Danielson

I see a huge distinction between "unachievable due to resource constraints" and "unachievable by definition."  Why is the possibility that CERN moved some particles faster than light a big deal?  We've already moved all kinds of stuff 99.999999% that fast in the lab.  The extra .000001% is practically trivial, but philosophically enormous.  It's not that faster-than-light travel seemed to be practically impossible, but literally, probability exactly 0 impossible.

The difference between almost 0 and 0, no matter how small, is mathematically gigantic.

This is seriously all kinds of fun, but I have to go do some domestic things.  To be continued...in finite time.

C

Christopher Danielson
To: Chris Lusto

That's the beautiful thing about email. It is at heart an asynchronous medium.

By the way, some would say that you have pointed to an important difference between mathematics and the sciences with your example.

csd

Thanks so much to Dr. Danielson for (a) having this discussion, and (b) letting me publish all the gory details.  Oh, and (c) making me smarter in the process.

# Label Maker

If you've perused this blog, you know that I love probability.  I was fortunate enough to see Al Cuoco and Alicia Chiasson give a really cool presentation at this year's NCTM conference about exploring the probabilities of dice sums geometrically and algebraically.  Wheelhouse.  After we got done looking at some student work and pictures of distributions, Al nonchalantly threw out the following question:

Is it possible to change the integer labels on two dice [from the standard 1,2,3,4,5,6] such that the distribution of sums remains unchanged?

Of course he was much cooler than that.  I've significantly nerded up the language for the sake of brevity and clarity.  Still, good question, right?  And of course since our teacher has posed this tantalizing challenge, we know that the answer is yes, and now it's up to us to fill in the details.  Thusly:

First let's make use of the Cuoco/Chiasson observation that we can represent the throw of a standard die with the polynomial

When we do it this way, the exponents represent the label values for each face, and the coefficients represent frequencies of each label landing face up (relative to the total sample space).  This is neither surprising, nor super helpful.  Each "sum" occurs once out of the six possible.  We knew this already.

What is super helpful is that we can include n dice in our toss by expanding n factors of P(x).  For two dice (the number in question), that looks like

You can easily confirm that this jibes with the standard diagram.  For instance the sum of 7 shows up most often (6 out of 36 times), which helps casinos make great heaps of money off of bettors on the come.  Take a moment.  Compare.

Okay, so now we know that the standard labels yield the standard distribution of sums.  The question, though, is whether there are any other labels that do so as well.  Here's where some abstract algebra comes in handy.  Let's assume that there are, in fact, dice out there who satisfy this property.  We can represent those with polynomials as well.  We know that the coefficient on each term must still be 1 (each face will still come up 1 out of 6 times), but we don't yet know about the exponents (labels).  So let's say the labels on the two dice are, respectively

and

.

If we want the same exact sum distribution, it had better be true that

.

For future convenience (trust me), let's call the first polynomial factor on the right hand side Q(x).  Great!  Now we just have to figure out what all the a's and b's are.  It helps that our polynomials belong to the ring Z[x], which is a unique factorization domain.  A little factoring practice will show us that

.

We just have to rearrange these irreducible factors to get the answer we're looking for.  Due to a theorem that is too long and frightening to reproduce here [waves hands frantically], we know that the unique factorization of Q(x)---our polynomial with unknown exponents---must be of the form

,

where s, t, u, and v are all either 0, 1, or 2.  So that's good news, not too many possibilities to check.  In fact, we can make our lives a little easier.  First of all, notice that Q(1) must equal 6.  Right?  Each throw of that single die must yield each of the 6 faces with equal probability.  But then substituting 1 into the factored form gives us

Clearly this means that t and u have to be 1, and we just have to nail down s and v.  Well, if we take a look at Q(0), we also quickly realize that s can't be 0.  It can't be 2 either, because, if s is 2, then the smallest sum we could obtain on our dice would be 3---which is absolutely no good at all.  So s is 1 as well.  Let's see what happens in our three remaining cases, when u is 0, 1, and 2:

Check out those strange and beautiful labels!  We can mark up the first die with the exponents from the u = 0 case, and the second die with the u = 2 case.  When we multiply those two polynomials together we get back P(x)2, which is precisely what we needed (check if you like)!  Our other option, of course, is to label two dice with the u =1 case, which corresponds to a standard die.  And, thanks to unique factorization, we can be sure that there are no other cases.  Not only have we found some different labels, we've found all of them!

If the a's on the first die are (1,2,2,3,3,4), then the b's end up being (1,3,4,5,6,8), and vice versa.  And, comfortingly, if the a's on the first die are (1,2,3,4,5,6), then so are the b's on the second one.

Two dice with the u = 1 label are what you find at every craps table in the country.  One die of each of the other labels forms a pair of Sicherman dice, and they are the only other dice that yield the same sum distribution.  You could drop Sicherman dice in the middle of Vegas, and nobody would notice.  At least in terms of money changing hands.  The pit boss might take exception.  Come to think of it, I cannot stress how important it is that you not attempt to switch out dice in Vegas.  Your spine is also uniquely factorable...into irreducible vertebrae.

*This whole proof has been cribbed from Contemporary Abstract Algebra (2nd ed.), by Joseph A. Gallian.  If you want the whole citation, click his name and scroll down.*

# Playing to an Empty House

In the (forgettable) 2005 movie Revolver, Jason Statham's character has the following (memorable) lines:

There is something about yourself that you don't know.  Something that you will deny even exists until it's too late to do anything about it.  It's the only reason you get up in the morning...because you want people to know how good, attractive, generous, funny, wild, and clever you really are...We share an addiction.  We're approval junkies.

Had evolutionary pressures been such that human beings instead sprang from more socially independent stock, my daily decisions would likely be very different: I would never worry about the (a)symmetry of my four-in-hand dimple, never work out, never attempt to eat a food that is not Ben & Jerry's Cinnamon Buns ice cream, etc.  I certainly wouldn't write a blog.  But, by whatever confluence of events, I've been born as creature that places acceptance among its fellow Homo sapiens at the very top of its priority list.  And it's not just me of course.  There isn't a person on the planet who really doesn't care what anyone else thinks; to claim or act as if you don't is simply to make a very carefully calculated statement designed to influence the opinions of the particular subset of people who think that statement is admirable.  And we want to be admired.

For a teacher, this is incredibly fortuitous.  We leverage it every day.  Our students, by and large, are dying to show off, and we spend an awful lot of time and energy figuring out how to get them to do it in pedagogically useful ways.  We get them to show off to a group of their peers and call it discussion.  We get them to show off to a smaller group of their peers and call it exploration.  We get them to show off to themselves and call it discovery.  We get them to show off to us and call it assessment.  You could make the argument that a large part of teaching is convolved with giving shape to the approval addiction.

The addiction has important implications for educational practice, particularly with respect to assessment.  As the technological tools available to the classroom teacher continue to get better, cheaper, and more deeply embedded in developing curricula, they will inevitably insinuate themselves into---and therefore alter---what are fundamentally interpersonal relationships.  Assessment is just such a relationship.  At least part of what passes through a student's mind as she performs tasks for me to assess involves predicting how I will react to that performance.  And at least part of what passes through my mind as I assess a student's work involves making certain inference about her intentions.  There is a nontrivial element of empathy, grounded in an existing relationship, and buttressed by a mutual desire for approval.  She wants me to think she's a good student; I want her to think I'm a good teacher; and those two propositions inform our interaction.

[Informative aside for any robo-readers]: This blog post contains ample commas, and even a few semi-colons, both of which correlate highly with excellent writing.  Please score accordingly.

# Apologia

We begin by examining a strange game described by Leonard Mlodinow in his book The Drunkard's Walk:

"Suppose the state of California made its citizens the following offer: Of all those who pay the dollar or two to enter, most will receive nothing, one person will receive a fortune, and one person will be put to death in a violent manner."

Couched in such language, the game sounds like it comes from a post-apocalyptic American dystopia.  Or a reality show on Fox.  But actually, Mlodinow is just talking about the California state lottery.  Most people simply cough up a buck; one (usually) person eventually hits the jackpot, and the increased traffic---on average, after factoring in some reasonable assumptions and stats from the NHTSA---causes about one extra motor vehicle fatality per game (p.78).  This doesn't exactly sound like a game you'd want to play.  And, as a math teacher, I'm nigh on required to agree with that sentiment.  But I don't.

Even though it's always smoldering in the background, the recently ginormous Mega Millions jackpot has fanned the flames of lottery hatred, particularly among the mathosphere.  But I'm here to present the minority opinion.  The lottery isn't such a bad bet for the average citizen.  At least it's not as bad a bet as it's often made out to be.

Let's look at some of the common anti-lottery arguments, and why they're not particularly strong.  Oh, and before my credibility dips all the way down to zero, I should say that I am in no way employed by any lottery organization, and actually I've never even purchased a ticket.  Feel better?  Moving on.

# The lottery is a tax on ignorance.

This is trotted out pretty frequently, I suspect, because (a) it's pithy and quotable, and (b) it pretty much ends any meaningful discussion on the matter.  It's like calling religion "the opiate of the masses."  It seeks to make the opposing position seem automatically ridiculous.  How can you have a reasoned debate after that?  You can't, really, which is one good reason to dismiss this kind of argumentation out-of-hand.  But besides belonging to a class of bad arguments, this particular one is awfully thin.  Maybe it wasn't always, but it is now.

Calling the lottery a "tax on ignorance" is like putting warning labels on cigarettes.  There was a time when the public was legitimately unaware of the dangers of smoking.  Hell, doctors even sponsored cigarette brands.  And when the damaging effects of tobacco came to light, people needed to be warned that it was, in fact, not such a wonderful idea to use it.  Hence, warning labels.  But in 2012, they're redundant.  I just can't believe that there is one person in this country who has been to even a single day of public school who thinks that cigarettes are safe.  Ever surprise a smoker by telling him that cigarettes will kill him?  Didn't think so.

Ever surprise a lotto player by telling him he'll never win?  Didn't think so.  No one is ignorant of the astronomical odds against hitting the jackpot.  Maybe there was a time when people were being duped, but that time has long passed.  We can certainly talk about whether it's okay for the government to make money off of people's hopes and dreams of a fantasy life, but let's not pretend that people are unaware of the fantasy.

# You can take the money you would have spent on lottery tickets and invest it instead.

Let's ignore, for a moment, that a savings account is currently losing you about 2% per year (in real dollars), and that an index fund over the past five years has lost you something like 8% per year.  I mean, those quote investments are certainly still better than the lottery, which loses you just shy of 100% per year.  But why are we mentioning the lottery in the same breath as investments, anyway?

Let's reformulate the above heading: "You can take the money you would have spent on X and invest it instead."  And that sentence is true for anything you happen to spend your money on.  Why are lotto tickets so special?  In what sense is buying a lottery ticket more a waste of money than buying a King Size Snickers?  Neither one of those things gives you a monetary return on your investment.  But of course that's not what we expect out of a Snickers.  And, I submit, it's not really what we expect out of a lottery ticket, either.  What we expect out of both of those purchases is utility.  And obviously there is some utility to be had in both cases (about a dollar's worth), since people are willing to pay it.  Why pay a dollar to make you fatter and increase your risk of diabetes?  I don't know.  Why pay a dollar to spend a few days wistfully imagining a life that includes indoor hot tubs?  I don't know.  They're equally silly, and offer roughly equivalent utility to a great many people.

# Purchasing a lottery ticket has a negative expectation.

This is my favorite mathematical argument, because it's terrible.  I will grant you that this is almost always (but not quite) true.  A $1 lottery purchase normally has an expected value of very nearly -$1.  When the jackpot gets very large, the expectation becomes slightly less negative, and when the jackpot gets hugely large, the expectation might even creep into the  black.  But all of that is really beside the point.

First of all, the negative expected value is very tiny.  For most people who buy lottery tickets, that expenditure is trivial.  I certainly waste way, way more money per week in buying "sure things" than most of the lotto faithful do in gambles.  But that's not really the point, either.

The point is this.  Saying you shouldn't make any gambles with negative expected value is to tacitly imply that you should also be in favor of gambles with positive expectation and be indifferent to gambles with zero expectation.  This would make you a completely rational actor.  It would also make you a complete idiot.

Are you indifferent to betting $100,000 with me on the flip of a fair coin? I'm sure as hell not! I'll do you one better: I'll pay you$1.3 million dollars if it comes up heads, and you pay me $1 million even if it comes up tails. Now that bet has a positive expected value for you...wanna gamble? Of course you don't. Monetary expectation has almost nothing to do with your willingness to engage in risk. It's your expected utility that you're worried about, and utility is not linear with money. Over small intervals of the domain, it might be approximately linear, and so it's tempting to equate the two, but they're very different globally, as our coin-flipping gambles show. A dollar's worth of utility lost is absolutely trivial (to me), but the potential utility that comes with hundred of millions of dollars, even with very small probability, more than counters that loss. I'm basically free-rolling: paying nothing for the chance at something. In other words it's possible (even normal) for me to have a negative expectation in money, but a positive expectation in utility. And that's the only expectation that really matters. # Conclusion Of course for some people the lottery is terrible. People have gambling problems. People spend way too much money on all kinds of things they probably shouldn't. But that doesn't mean that everyone---or even most people---that play are suckers. Eating the occasional King Size Snickers probably won't get your foot chopped off; smoking the occasional cigarette probably won't kill you (sorry, kids), and buying the occasional lottery ticket will likely have about zero net impact on your finances. Besides, isn't it worth it to dream, for even a day, of having indoor hot tubs? They're so bubbly. # Building a Probability Cannon For just a moment, let's consider a staple of the second year algebra curriculum: the one-dimensional projectile motion problem. (I used to do an awful lot of this sort of thing.) It's not a fantastic problem---it's overdone, and often under-well---but it's representative of many of our standard modeling problems in some important ways: 1. Every one of my students has participated in the activity we're modeling. They've thrown, dropped, and shot things. They've jumped and fallen and dove from various heights. In other words, they have a passing acquaintance with gravity. 2. Data points are relatively easy to come by. All we need is a stopwatch and a projectile-worthy object. If that's impractical, then there are also some great and simple---and free---simulations out there (PhET, Angry Birds), and some great and simple---and free---data collection software as well (Tracker). 3. We only need a few data points to fix the parameters. For a general quadratic model, we only need three data points to determine the particular solution. Really we only need two, if we assume constant acceleration. 4. Experiments are easy to repeat. Drop/throw/shoot the ball again. Run the applet again. 5. The model conforms to a fairly nice and well-behaved family of functions. Quadratics are continuous and differentiable and smooth, and they're generally willing to submit to whatever mathematical poking we're wont to visit upon them without getting gnarly. 6. Theoretical predictions are readily checked. Want to know, for instance, when our projectile will hit the ground? Find the sensible zero of the function (it's pretty easy to sanity check its reasonableness---see #1 above). Look at a table of values and step through the motion second-by-second (use a smaller delta t for an even better sense of what's going on). Click RUN on your simulation, and wait until it stops (self-explanatory). And, if you're completely dedicated, build yourself a cannon and put your money where your mouth is. Of course I've chosen to introduce this discussion with the example of projectile motion, but there are plenty of other candidates: length/area/volume, exponential growth and decay, linear speed and distance. Almost without exception (in the algebra classroom), we model phenomena that satisfy the six conditions listed above. Almost. Because then we run into probability, and probability isn't so tame. I'll grant that #1 still holds (though I'm not entirely convinced it holds in the same sense), but the other five conditions go out the window. # Data points are NOT easy to come by. I can already hear you protesting. "Flip a coin...that's a data point!" Well, yes. Sort of. But in the realm of probability, individual data points are ambiguous. The ordered pair (3rd flip, heads) is very different from (3 seconds, 12 meters). They're both measurements, but the first one has much, much higher entropy. Interpretation becomes problematic. Here's another example: My meteorologist's incredibly sophisticated model (dart board?) made the following prediction yesterday: P(rain) = 0.6. In other words, the event "rain" was more likely than the event "not rain." It did not rain yesterday. How am I to understand this un-rain? Was the model right? If so, then I'm not terribly surprised it didn't rain. Was the model wrong? If so, then I'm not terribly surprised it didn't rain. In what sense have I collected "data?" And what if I'm interested in a compound event? What if I want to know not just the result of a lone flip, but P(exactly 352 heads in 1000 flips)? Now a single data point suddenly consists of 1000 trials. So it turns out data points have the potential to be rather difficult to come by, which brings us to... # We need an awful lot of data points. I'm not talking about our 1000-flip trials here, which was just a result of my arbitrary choice of one particular problem. I mean that, no matter what our trials consist of, we need to do a whole bunch of them in order to build a reliable model. Two measurements in my projectile problem determine a unique curve and, in effect, answer any question I might want to ask. Two measurements in a probabilistic setting tell me just about nothing. Consider this historical problem born, like many probability problems, from gambling. On each turn, a player rolls three dice and wins or loses money based on the sum (fill in your own details if you want; they're not so important for our purposes here). As savvy and degenerate gamblers, we'd like to know which sums are more or less likely. We have some nascent theoretical ideas, but we'd like to test one in particular. Is the probability of rolling a sum of 9 equal to the probability of rolling a sum of 10? It seems it should be: after all, there are six ways to roll a 9 ({6,2,1},{5,3,1},{5,2,2},{4,4,1},{4,3,2},{3,3,3}), and six ways to roll a 10 ({6,3,1},{6,2,2},{5,4,1},{5,3,2},{4,4,2},{4,3,3})*. Done, right? It turns out this isn't quite accurate. For instance, the combination {6,2,1} treats all of the 3! = 6 permutations of those numbers as one event, which is bad mojo. If you go through all 216 possibilities, you'll find that there are actually 27 ways to roll a 10, and only 25 ways to roll a 9, so the probabilities are in fact unequal. Okay, no biggie, our experiment will certainly show this bias, right? Well, it will, but if we want to be 95% experimentally certain that 10 is more likely, then we'll have to run through about 7,600 trials! (For a derivation of this number---and a generally more expansive account---see Michael Lugo's blog post.) In other words, the Law of Large Numbers is certainly our friend in determining probabilities experimentally, but it requires, you know, large numbers. *If you've ever taught probability, you know that this type of dice-sense is rampant. Students consistently collapse distinct events based on superficial equivalence rather than true frequency. Ask a room of high school students this question: "You flip a coin twice. What's the probability of getting exactly one head?" A significant number will say 1/3. After all, there are three possibilities: no heads, one head, two heads. Relatively few will immediately notice, without guidance, that "one head" is twice as likely as the other two outcomes. # Experiments are NOT easy to repeat. I've already covered some of the practical issues here in terms of needing a lot of data points. But beyond all that, there are also philosophical difficulties. Normally, in science, when we talk about repeating experiments, we tend to use the word "reproduce." Because that's exactly what we expect/are hoping for, right? I conduct an experiment. I get a result. I (or someone else) conduct the experiment again. I (they) get roughly the same result. Depending on how we define our probability experiment, that might not be the case. I flip a coin 10 times and count 3 heads. You flip a coin 10 times and count 6 heads. Experimental results that differ by 100% are not generally awesome in science. In probability, they are the norm. As an interesting, though somewhat tangential observation, note that there is another strange philosophical issue at play here. Not only can events be difficult to repeat, but sometimes they are fundamentally unrepeatable. Go back to my meteorologist's prediction for a moment. How do I repeat the experiment of "live through yesterday and see whether it rains?" And what does a 60% chance of rain even mean? To a high school student (teacher) who deals almost exclusively in frequentist interpretations of probability, it means something like, "If we could experience yesterday one million times, about 600,000 of those experiences would include rain." Which sounds borderline crazy. And the Bayesian degree-of-belief interpretation isn't much more comforting: "I believe, with 60% intensity, that it will rain today." How can we justify that level of belief without being able to test its reliability by being repeatedly correct? Discuss. # Probability distributions can be unwieldy. Discrete distributions are conceptually easy, but cumbersome. Continuous distributions are beautiful for modeling, but practically impossible for prior-to-calculus students (not just pre-calculus ones). Even with the ubiquitous normal distribution, there is an awful lot of hand-waving going on in my classroom. Distributions can make polynomials look like first-grade stuff. # Theoretical predictions aren't so easily checked. My theoretical calculations for the cereal box problem tell me that, on average, I expect to buy between 5 and 6 boxes to collect all the prizes. But sometimes when I actually run through the experiment, it takes me northward of 20 boxes! This is a teacher's nightmare. We've done everything right, and then suddenly our results are off by a factor of 4. Have we confirmed our theory? Have we busted it? Neither? Blurg. So what are we to do? # We are to build a probability cannon! With projectile motion problems, building a cannon is nice. It's cool. We get to launch things, which is awesome. With probability, I submit that it's a necessity. We need to generate data: it's the raw material from which conjecture is built, and the touchstone by which theory is tested. We need to (metaphorically) shoot some stuff and see where it lands. We need...simulations! If your model converges quickly, then hand out some dice/coins/spinners. If it doesn't, teach your students how to use their calculators for something besides screwing up order of operations. Better yet, teach them how to tell a computer to do something instead of just watching/listening to it. (Python is free. If you own a Mac, you already have it.) Impress them with your wizardry by programming, right in front of their eyes, and with only a few lines of code, dice/coins/spinners that can be rolled/flipped/spun millions of times with the push of a button. Create your own freaking distributions with lovely, computer-generated histograms from your millions of trials. Make theories. Test theories. Experience anomalous results. See that they are anomalous. Bend the LLN to your will. Exempli Gratia NCTM was kind enough to tweet the following problem today, as I was in the middle of writing this post: Okay, maybe the probability is just 1/2. I mean, any argument I make for Kim must be symmetrically true for Kyle, right? But wait, it says "greater than" and not "greater than or equal to," so maybe that changes things. Kim's number will be different from Kyle's most of the time, and it will be greater half of the times it's different, so...slightly less than 1/2? Or maybe I should break it down into mutually exclusive cases of {Kim rolls 1, Kim rolls 2, ... , Kim rolls 6}. You know what, let's build a cannon. Here it is, in Mathematica: Okay, so it looks like my second conjecture is right; the probability is a little less than 1/2. Blammo! And it only took (after a few seconds of typing the code) 1.87 seconds to do a million trials. Double blammo! But how much less than 1/2? Emboldened by my cannon results, I can turn back to the theory. Now, if Kyle rolls a one, Kim will roll a not-one with probability 5/6. Ditto two, three, four, five, and six. So Kim's number is different from Kyle's 5/6 of the time. And---back to my symmetry argument---there should be no reason for us to believe one or the other person will roll a bigger number, so Kim's number is larger 1/2 of 5/6 of the time, which is 5/12 of the time. Does that work? Well, since 5/12 ≈ 0.4167, which is convincingly close to 0.416159, I should say that it does. Triple blammo and checkmate! But we don't have to stop there. What if I remove the condition that Kim's number is strictly greater? What's the probability her number is greater than or equal to Kyle's? Now my original appeal to symmetry doesn't require any qualification. The probability ought simply be 1/2. So... What what? Why is the probability greater than 1/2 now? Oh, right. Kim's roll will be equal to Kyle's 1/6 of the time, and we already know it's strictly greater than Kyle's 5/12 of the time. Since those two outcomes are mutually exclusive, we can just add the probabilities, and 1/6 + 5/12 = 7/12, which is about (yup yup) 0.583. Not too shabby. What if we add another person into the mix? We'll let Kevin join in the fun, too. What's the probability that Kim's number will be greater than both Kyle's and Kevin's? It looks like the probability of Kim's number being greater than both of her friends' might just be about 1/4. Why? I leave it as an exercise to the reader. That tweet-sized problem easily becomes an entire lesson with the help of a relatively simple probability cannon. If that's not an argument for introducing them into your classroom, I don't know what is. Ready. Aim. Fire! Thanks to Christopher Danielson for sparking this whole discussion. # A Tale of Two Numbers A few months ago, we had just finished talking about polynomials and were moving into matrices. Because a lot of matrix concepts have analogs in the real numbers, we kicked things off with a review of some real number topics. Specifically, I wanted to talk about solving linear equations using multiplicative inverses as a preview of determinants and using inverse matrices for solving linear systems. For instance:$latex begin{array}{ll}

2x=8 & AX=B \

2^{-1}2x = 2^{-1}8 & A^{-1}AX = A^{-1}B \

1x = frac{1}{2}8 & IX = A^{-1}B \

x=4 & X = A^{-1}B

end{array}&s=2$As an aside, I threw out this series of equations in the hopes of (a) foreshadowing singular matrices, and (b) offering a justification for the lifelong prohibition against dividing by zero:$latex begin{array}{l}

0x=1 \

0^{-1}0x = 0^{-1}1 \

1x = frac{1}{0}1 \

x = frac{1}{0}

end{array}&s=2$I thought this was just so beautiful. Why can't we divide by zero? Because zero doesn't have a multiplicative inverse. There is no solution to 0x = 1, so 0-1 must not exist! Q.E.D. As it turns out, Q.E.NOT. One of my students said, "Why can't we just invent the inverse of zero? Like we did with i?" Again, we had just finished our discussion of polynomials, during which we had conjured the square root of -1 seemingly out of the clear blue sky. They wanted to do the same thing with 1/0. What an insightful and beautiful idea! Consider the following stories, from my students' perspectives: 1. When we're trying to solve quadratic equations, we might happen to run into something like x2 = -1. Now of course there is no real number whose square is -1, so for convenience let's just name this creature i (the square root of -1), and put it to good use immediately. 2. When we're trying to solve linear equations, we might happen to run into something like 0x = 1. Now of course there is no real number that, when multiplied by 0, yields 1, so for convenience let's just name this creature j (the multiplicative inverse of 0), and put it to good use immediately. Why are we allowed to do the first thing, but not the second? Why do we spend a whole chapter talking about the first thing, and an entire lifetime in contortions to avoid the second? Both creatures were created, more or less on the spot, to patch up shortcomings in the real numbers. What's the difference? And this is the tricky part: how do I explain it within the confines of a high school algebra class? Well, I can tell you what I tried to do... Let's suppose that j is a legitimate mathematical entity in good standing with its peers, just like i. Since we've defined j as the number that makes 0j = 1 true, it follows that 0 = 1/j. Consider the following facts:$latex begin{array}{l}

2 cdot 0 = 0 \

2frac{1}{j} = frac{1}{j} \

frac{2}{j} = frac{1}{j} \

2 = 1

end{array}&s=2$In other words, I can pretty quickly show why j allows us to prove nonsensical results that lead to the dissolution of mathematics and perhaps the universe in general. After all, if I'm allowed to prove that 2 = 1, then we can pretty much call the whole thing off. What I can't show, at least with my current pedagogical knowledge, is why i doesn't lead to similar contradictions. Therein lies the broad problem with proof. It's difficult. If there are low-hanging fruit on the counterexample tree, then I can falsify bad ideas right before my students' very eyes. But if there are no counterexamples, then it becomes incredibly tough. It's easy to show a contradiction, much harder to show an absence of contradiction. I can certainly take my kids through confirming examples of why i is helpful and useful. But in my 50 min/day with them, there's just no way I can organize a tour through the whole scope and beauty of complex numbers. Let's be serious, there's no way that I can even individually appreciate their scope and beauty. The complex numbers aren't just a set, or a group. They're not even just a field. They form an algebra (so do matrices, which brings a nice symmetry to this discussion), and algebras are strange and mysterious beings indeed. I could spend the rest of my life learning why i leads to a rich and self-consistent system, so how am I supposed to give a satisfactory explanation? Take it on faith, kids. Good enough? Update 3/20/12: My friend, Frank Romascavage, who is currently a graduate student in math at Bryn Mawr College (right down the road from my alma mater Villanova), pointed out the following on Facebook: "We need to escape integral domains first so that we can have zero divisors! Zero divisors give a quasi-invertibility condition (with respect to multiplication) on 0. They aren't really true inverses, but they are somewhat close! In$latex Z_{6}$we have two zero divisors, 3 and 2, because 3 times 2 (as well as 2 times 3) in$latex Z_{6}$is 0." In many important ways, an integral domain is a generalization of the integers, which is why they behave very much the same. An integral domain is just a commutative ring (usually assumed to have a unity), with no zero divisors. If there are two members of a ring, say a and b, then they are said to be zero divisors if ab = 0. In other words, to "escape integral domains," is to move into a ring where the Zero Product Property no longer holds. This means that, in non-integral domains, we can almost, sort of, a little bit, divide by zero. Zero doesn't really have a true inverse, but it's close. Frank's example is the numbers 2 and 3 in the ring of integers modulo 6, since 3 x 2 = 0 (mod 6). In fact, the ring of integers modulo n fails to be an integral domain in general, unless n is prime. CTL # 0!rganized Emptiness On the back of the fundamental counting principle, my class has just established the fact that we can use n! to count the number of possible arrangements of n unique objects. This is fantastic, but we don't always want to arrange all of the n things available to us, which is okay. We've also been introduced to the permutation function, which has the very nice property of counting ordered arrangements of r-sized subsets of our n objects. Handy indeed. Today we made an interesting observation: we now have not one, but two ways to count arrangements of, let's say, 7 objects. 1. We can fall back on our old friend, the factorial, and compute 7! 2. We can use our new friend, the permutation function, and compute$latex bf{_7P_7}$Since both expressions count the same thing, they ought to be equal, but then we run into this interesting tidbit when we evaluate (2):$latex _7P_7 = frac{7!}{(7-7)!} = frac{7!}{0!}&s=2$, which seems to imply that 0! = 1. To say this is counterintuitive for my kids would be a severe understatement. And in this moment of philosophical crisis, when the book might present itself as a palliative ally, students are instead met with this: To prevent inconsistency? How in the world are kids supposed to trust a mathematical resource that paints itself into a corner, only tacitly admits such, and then drops a bomb of a deus ex machina in order to save face? I haven't been so angry since the ending of Lord of the Flies. Especially when this problem appears two pages later: Okay, 8!. So how many ways can I arrange my bookshelf with a zero-volume reference set? One: I can arrange an empty shelf in exactly one way. And, since we already know that n! counts the ways I can arrange n objects, it follows naturally that this 1 way of arranging 0 things must also be represented by 0!. There are a lot of good proofs/justifications available for the willing Googler, but this one, to me, seems like the most natural and straightforward for a high school classroom. At a bare minimum, it's much, much better than, "Because I need it to be true for my own convenience." Only a math textbook could take something so lovely and make it seem dirty. # Cereal Boxes Redux In my last post, my students were wrestling with a question about cereal prizes. Namely, if there is one of three (uniformly distributed) prizes in every box, what's the probability that buying three boxes will result in my ending up with all three different prizes? Not so great, turns out. It's only 2/9. Of course this raises another natural question: How many stupid freaking boxes do I have to buy in order to get all three prizes? There's no answer, really. No number of boxes will mathematically guarantee my success. Just as I can theoretically flip a coin for as long as I'd like without ever getting tails, it's within the realm of possibility that no number of purchases will garner me all three prizes. But, just like the coin, students get the sense that it's extremely unlikely that you'd buy lots and lots of boxes without getting at least one of each prize. And they're right. So let's tweak the question a little: How many boxes do I have to buy on average in order to get all three prizes? That's more doable, at least experimentally. I have three sections of Advanced Algebra with 25 - 30 students apiece. I gave them all dice to simulate purchases and turned my classroom---for about ten minutes at least---into a mathematical sweatshop churning out Monte Carlo shopping sprees. The average numbers of purchases needed to acquire all prizes were 5.12, 5.00, and 5.42. How good are those estimates? Simulating cereal purchases with dice Here's my own simulation of 15,000 trials, generated in Python and plotted in R: I ended up with a mean of 5.498 purchases, which is impressively close to the theoretical expected value of 5.5. So our little experiment wasn't too bad, especially since I'm positive there was a fair amount of miscounting, and precisely one die that's still MIA from excessively enthusiastic randomization. And now here's where I'm stuck. I can show my kids the simulation results. They have faith---even though we haven't formally talked about it yet---in the Law of Large Numbers, and this will thoroughly convince them the answer is about 5.5. I can even tell them that the theoretical expected value is exactly 5.5. I can even have them articulate that it will take them precisely one box to get the first new toy, and three boxes, on average, to get the last new toy (since the probability of getting it is 1/3, they feel in their bones that they should have to buy an average of 3 boxes to get it). But I feel like we're still nowhere near justifying that the expected number of boxes for the second toy is 3/2. For starters, a fair number of kids are still struggling with the idea that the expected value of a random variable doesn't have to be a value that the variable can actually attain. I'm also not sure how to get at this next bit. The absolute certainty of getting a new prize in the first box is self-evident. The idea that, with a probability of success of 1/3, it ought "normally" to take 3 tries to succeed is intuitive. But those just aren't enough data points to lead to the general conjecture (and truth) that, if the probability of success for a Bernoulli trial is p, then the expected number of trials to succeed is 1/p. And that's exactly the fact we need to prove the theoretical solution. Really, that's what we need basically to solve the problem completely for any number of prizes. After that, it's straightforward: The probability of getting the first new prize is n/n. The probability of getting the second new prize is (n-1)/n ... all the way down until we get the last new prize with probability 1/n. The expected numbers of boxes we need to get all those prizes are just the reciprocals of the probabilities, so we can add them all together... If X is the number of boxes needed to get all n prizes, then$latex E(X) = frac{n}{n} + frac{n}{n-1} + cdots + frac{n}{1} = n(frac{1}{n} + frac{1}{n-1} + cdots + frac{1}{1}) = n cdot H_n&s=2$where Hn is the nth harmonic number. Boom. Oh, but yeah, I'm stuck. # Pruning Tree Diagrams A few days ago we opened up with some group work surrounding the following problem. I gave no guidance other than, "One representative will share your solution with the class." My favorite cereal has just announced that it's going to start including prizes in the box. There is one of three different prizes in every package. My mom, being cheap and largely unwilling to purchase the kind of cereal that has prizes in it, has agreed to buy me exactly three boxes. What is the probability that, at the end of opening the three boxes, I will have collected all three different prizes? It's a very JV, training-wheels version of the coupon collector's problem, but it's nice for a couple of reasons: 1. The actual coupon collector's problem is several years out of reach, but it's a goody, so why not introduce the basics of it? 2. There is a meaningful conversation to be had about independence. (Does drawing a prize from Box 1 change the probabilities for Box 2? Truly? Appreciably? Is it okay to assume, for simplicity, that it doesn't? How many prizes need to be out there in the world for us to feel comfortable treating this thing as if it were a drawing with replacement? If everybody else is buying up cereal---and prizes---uniformly, does that bring things closer to true independence? farther away?) 3. There are enough intuitive wrong answers to require some deeper discussion: e.g, 1/3 (Since all the probabilities along the way are 1/3, shouldn't the final probability of success also be 1/3?), 1/27 (There are three chances of 1/3 each, so I multiplied them together.), and 1/9 (There are three shots at three prizes, so nine outcomes, and I want the one where I get all different toys.) The correct answer, by the by, is 6/27 or 2/9 (try it out). Many groups jumped right into working with the raw numbers (see wrong answers above). A few tried, with varying levels of success, to list all the outcomes individually (interestingly, a lot of these groups correctly counted 27 possibilities, but then woefully miscounted the number of successes...hmmm). A small but determined handful of groups used tree diagrams to help them reason about outcomes sequentially. This business of using tree diagrams was pleasantly surprising. We hadn't yet introduced them in class, and I hadn't made any suggestions whatsoever about how to tackle the problem, so I thought it was nice to see a spark of recollection. That said, it's not terribly surprising; presumably these kids have used them before. But I did run across one student, Z, who interpreted his tree diagram in novel way---to me at least. Most students, when looking at a tree diagram, hunt for paths that meet the criteria for success. Here's a path where I get Prize 1, then Prize 2, then Prize 3. Here's another where I get Prize 1, then Prize 3, then Prize 2... The algorithm goes something like, follow a path event-by-event and, if you ultimately arrive at the compound event of interest, tally up a success. Repeat until you're out of paths. That is, most students see each path as an stand-alone entity to be checked, and then either counted or ignored. What Z did was different in three important ways. First of all, he found his solutions via subtraction rather than addition. Second, he attacked the problem in a very visual---almost geometric---way. And third, he didn't treat each path separately; rather, Z searched for equivalence classes of paths within the overall tree. Z's (paraphrased) explanation goes as follows: First I erased all of the straight paths, because they mean I get the same prize in every box. Then I erased all of the paths that were almost straight, but had one segment that was crooked, which means I get two of the same prize. And then I was left with the paths that were the most crooked, which means I get a different prize each time. Looking at his diagram, I noticed that Z hadn't even labeled the segments; he simply drew the three stages, with three possibilities at each node, and then deleted everything that wasn't maximally crooked. How awesome is that? In fact, taking this tack made it really easy for him to answer more complicated followup questions. Since he'd already considered the other cases, he could readily figure out the probability of getting three of the same prize (the 3 branches he pruned first), or getting only two different prizes (the next 18 trimmings). He could even quickly recognize the probability of getting the same prize twice in a row, followed by a different one (the 6 branches he trimmed that went off in one direction, followed by a straight-crooked pattern). [youtube=http://www.youtube.com/watch?v=jfJb_afqA0M] Of course this method isn't particularly efficient. He had to cut away 21 paths to get down to 6. For n prizes and boxes, you end up pruning nn --- n! branches. Since nn grows much, much faster, than n!, Z's algorithm becomes prohibitively tedious in a hurry. If there are 5 prizes and 5 boxes, that's already 3005 branches that need to be lopped off. So yes, it's inefficient, but then again so are tree diagrams. Without more sophisticated tools under his belt, that's not too shabby. What the algorithm lacks in computational efficiency, it makes up for in conceptual thoughtfulness. I'll take it that tradeoff any day of the week. # Conditional Response Aside from being entertaining, these DIRECTV commercials offer at least two important lessons about logic. [youtube=http://www.youtube.com/watch?v=c-zG5U0v3gU&feature=relmfu] For starters, let's name the propositions listed in the video: • q: your cable is on the fritz • r: you get frustrated • s: your daughter imitates • t: your daughter gets thrown out of school • u: your daughter meets undesirables • v: your daughter ties the knot with undesirables • w: you get a grandson with a dog collar So the ad takes us through the following sequence of conditional statements:$latex begin{array}{lcl} q & longrightarrow & r \ r & longrightarrow & s \ s & longrightarrow & t \ t & longrightarrow & u \ u & longrightarrow & v \ v & longrightarrow & w end{array}&s=2$Let's be generous and accept that each statement, individually, is true. Then we're led sequentially along a nice string of propositions, beginning at q and ending with w. Actually, there's one more tacit proposition, p: you have cable. So the commercial's (implicit + explicit) logic looks something like this:$latex p longrightarrow q longrightarrow r longrightarrow s longrightarrow t longrightarrow u longrightarrow v longrightarrow w&s=2$And therein our first logic lesson: conditional statements respect transitivity. We can follow an unbroken path of propositions all the way from p to w, which means we can replace that whole string of implications with the statement, "If you have cable, then you'll get a grandson with a dog collar." Symbolically:$latex p longrightarrow w&s=2$We've accepted all the statements along the way, so we accept this one as well, which is both funny and logically sound. DIRECTV has successfully made fun of the cable companies, and we've had a chuckle. And if the commercial were to end there, everything would be hunky dory. But it doesn't end there. It ends on the line, "Don't have a grandson with a dog collar; get rid of cable..." Which is to say, "If you don't have cable, you won't have a grandson with a dog collar." Or...$latex neg p longrightarrow neg w&s=2$But that's incorrect! And that's our second lesson: the technical name for this fallacy is denying the antecedent, or the inverse error. To give you a more intuitive example, consider the propositions: • p: you are a dog • q: you are a mammal$latex p longrightarrow q&s=2$: "If you are a dog, then you are a mammal." True.$latex neg p longrightarrow neg q&s=2\$: "If you are not a dog, then you are not a mammal."  Obviously false.

It might very well be true that having cable leads to a grandson with a dog collar, but that certainly doesn't mean getting rid of cable is enough to avoid one.