Last week we examined how a series of transformations affects the equation of a function, in order to write the equation from a graph, or vice versa. We touched on why it works the way it does, but this is something you need to look at from multiple perspectives in order to really grasp it fully. Luckily, some people don’t stop asking until they get it!

A month after the question we looked at last week, Mario wrote again, asking about the post from 2019 that I’d referred him to:

I am reading this article from the math doctors: Combining Function Transformations: Order Matters

I have a question on something that was said there. Namely:

You can perform transformations inany order you want, in general. But in this case, you are asking in which order to do themin order to transform f(x) into a specific goal, f(ax+b). The order makes a difference in how you get there. What I do is to explicitly write the steps, one at a time. Suppose we first do thehorizontal shrinkf(x) -> f(ax). If we then apply ahorizontal shift(translation) b units to the left, we would be REPLACING x in f(ax) with x+b, and we'd getf(a(x+b)). That is NOT what we are looking for; it's equal tof(ax+ab). So this order of doing those particular transformations is wrong.The part that confuses me is if you do the horizontal shrink and then you do the horizontal shift, you end up with f(a(x + b)) and not f(ax + b). The reason it confuses me is because

it doesn’t concur with how I have come to think about graph transformations.Let’s say I have a function f(x), and I graph it. I shrink it horizontally by a factor of 1/3. The function now looks like f(3x), and the graph has gotten thinner. Now, let’s say I shift the graph 4 units to the right.

I would think the graph would look like f(3x – 4). The reason I think it looks like that is because of the order I transformed the graph. Since I shrunk it first and then shifted, I would add -4 not to the set of inputs x, but to the set of inputs multiplied by 3, thus f(3x – 4). So my question is, what’s my mistake?

This is a very common mistake, and although in principle what I said last time should explain it, it takes several exposures, from different perspectives, to fully understand it, because it is counterintuitive, and your wrong intuitions have to be replaced by correct ways of thinking. (I’m reminded of my early post When Math Doesn’t Make Sense.) Students who pay more attention to memorized procedures than to reasons may just accept what they’re taught, but those who want to understand, like Mario (and me) need more! So I was happy to try again.

I responded:

Hi, Mario.

It happens that I am currently working on turning our previous discussion on this subject into a post. If this answer works well, I’ll try to add it on, because it’s a good follow-up, and I can say some things I’d wished I said.

I did a very similar example to yours late in that post, in the section headed

“Looking at the graph”, which you should read carefully.There we get f(3x – 9) by first

shifting(9 units right) and thenshrinking(by 1/3). We do that by firstreplacingx with x – 9 to accomplish the shift, giving f(x – 9), and thenreplacingx with 3x to accomplish the shrink, resulting in f(3x – 9). I subsequently show that we can get the same thing in your order,shrinkingand thenshiftingby only 3 units, and get f(3(x – 3)).This differs from your example only in using 3 rather than 4.

There, and last week, I mostly just **asserted** that the transformed function is constructed by **replacing** *x* with something. We want to move beyond mere assertion to answer the deeper “**why**“.

First, here are the graphs from that post. When we shift first, resulting in the purple graph, and then shrink to make \(f(3x-9)\), the blue graph, it looks like this:

On the other hand, if we shrink first, to make the purple graph, then shift to make \(f(3(x-3))\) in blue, we have:

That shows that the result is right (especially if you take the time to verify points on each graph). But this time, I don’t want to depend on graphs to see merely *what* is right; our goal is to understand the underlying algebra, to see *why* it is right. We need to have more to say than, “If you don’t believe me, trust the graph.”

To answer the present question, I referred back to an earlier post that was just about individual transformations, which laid the foundation for combinations of transformations.

Back to

yourtransformations: If you firstshrinkby 1/3 and thenshiftright 4 units, you change f(x) first by replacing x with 3x, making it f(3x), and then replace x with x – 4, making f(3(x – 4)). What you said, giving f(3x – 4), is wrong, because you are thinking “forward” rather than “backward”, andwe must do everything about horizontal transformations “backward”!I discussed this “backward” idea in a one-transformation-at-a-time setting in the post before the one you’re reading,

Shifting and Stretching GraphsThere, too, I emphasized the idea of

replacement, and explainedwhywe must think of it this way using several different approaches.

In the last post, I used replacement of \(x\) with \(cx\) or \(x-c\) as the foundation, but failed to explain **why** that was correct. To fill that gap, we can follow a specific point through the successive transformations:

Let’s look more closely at your example, and see

how it works. As I did in the “Looking at the graph” section, I’ll take the original function to be f(x) = x^{2}, and use g for the transformed function.Now, let’s

consider a particular pointon the original graph, say (3, 9). Applying your transformations to this point in order, we firstshrinkhorizontally, which divides x by 3 and brings us to the point (1, 9). Now weshift4 units to the right, adding 4 to x and taking it to (5, 9). So our transformed function g should yield g(5) = 9.First, we can test both claimants to the title of transformed function. First, yours,

f(3x – 4):g(5) = f(3(5) – 4) = f(11) = 11

^{2}= 121That didn’t work!

Now, mine,

f(3(x – 4)):g(5) = f(3(5 – 4)) = f(3) = 3

^{2}= 9That did work. So we’ve demonstrated that your work was wrong.

We’ve now seen, in several ways, that his intuition is *wrong*; we need to *retrain* that intuition by seeing clearly what really happens.

Now,

why did mine work?Look at what happened: We needed to transform thenew x, 5, to theoriginal x, 3, so that f(3) would give us the correct value of y.We obtained 5 from 3 by applying the transformations in order: starting with 3, dividing by 3 to get 1, and adding 4 to get 5. That is, the

new input to the function is x’= (1/3)*3 + 4 = 5. In general, a point (x, y) transforms to (x’, y) = ((1/3)x + 4, y).Since what I called x’ is the input to the

newfunction, g, corresponding to the input of x to theoriginalfunction, f, we wantg(x’) = f(x),

that is, we want

g((1/3)x + 4) = f(x).

But we want an expression for g(x), so we have to

solve for x in terms of x’:x’ = (1/3)x + 4

x’ – 4 = (1/3)x

3(x’ – 4) = x

So our new function is

g(x’) = g((1/3)x + 4) = f(x) = f(3(x’ – 4))

Here, x’ is a dummy variable that we just used as a temporary name; replacing it with the usual x, we have

g(x) = f(3(x – 4))

That, of course, is what I got by first replacing x with 3x, and then replacing x in that with (x – 4).

And this is why everything involving horizontal transformations is

backward: We’re reallysolving for the original x, which meansundoing the operations, and doing that in reverse order.

In vertical transformations, like \(h(x) = 3f(x) + 4\), we are just *evaluating* an expression, and follow the order of operations: Multiply *y* by 3, then add 4; that is, stretch by a factor of 3, then shift up 4. But in the horizontal transformation, we are not evaluating but *solving*, which reverses everything.

So, to get back from our new x, 5, to the original x = 3, we first

undo the shiftright by 4, moving left by 4 to x = 1; and then weundo the shrinkby 1/3, stretching by 3 from x = 1 to x = 3. That’s what puts the (x – 4) inside parentheses.How does that work for you?

Mario replied, restating my ideas in order to understand them better:

Interesting, I let this ruminate for a while, and I found what may be an alternate way of thinking about it.

You start with a function f(x

_{f}), and you want to find how to alter it in order to get to g(x_{g}), using only horizontal transformations of course, andassuming g(x._{g}) is only a horizontal transformation of f(x_{f})However, we must do this under the constraint that

the inputs of both of the functions are the same. In summary, we have to map f(x_{f}) → g(x_{g}) under the constraint of them having the same input. Currently, x_{f }≠ x_{g}because the same input results in different outputs for f and g.The way we would solve this is by considering the information that we have been given. We know that

x. This equation is extracted from “horizontal shrink by ⅓” and “shift 4 spaces to the right”. If we_{g}= ⅓ x_{f}+ 4solve for x, we get x_{f}_{f}= 3(x_{g}– 4). We plug that into f, and we get f(3(x_{g}– 4)) = g(x_{g}). Notice that they both have x_{g}as an input, satisfying our constraint.What do you think of this way of thinking about it?

I answered:

I think that’s basically right, but needs a little clarification.

First, we can perhaps more clearly describe your “horizontal transformations” by saying that that x

_{g}is alinear functionof x_{f}.Second, where you say,

However, we must do this under the constraint that the

inputs of both of the functions are the same. In summary, we have tomapf(x_{f}) → g(x_{g}) under the constraint of them having thesame input. Currently, x_{f}≠ x_{g}because the same input results in different outputs for f and g.I don’t think you literally mean that the inputs (x

_{f }and x_{g}) are thesame; and I’m not sure how to say what you mean by amapof a function. I would express the idea by saying that f(x_{f}) = g(x_{g}) = g(T(x_{f})), where T is alinear transformation of the input variable, i.e. T(x) = ax + b. This makes it explicit how the functions are related.

The idea that we are transforming the variable, rather than the function, is a good perspective. You can imagine sliding and stretching the coordinate system itself while keeping the graph the same. In higher math this can be a valuable way to see coordinate transformations.

Then you said,

The way we would solve this is by considering the information that we have been given. We know that x

_{g}= ⅓ x_{f}+ 4. This equation is extracted from “horizontal shrink by ⅓” and “shift 4 spaces to the right”. If wesolve for x, we get x_{f}_{f}= 3(x_{g}– 4). We plug that into f, and we getf(3(x. Notice that they both have x_{g}– 4)) = g(x_{g})_{g}as an input, satisfying our constraint.This nicely fits into what I just said. The example has T(x) = ⅓ x + 4, so that x

_{g}= T(x_{f}) = ⅓ x_{f}+ 4. In solving, you are finding theinverse function, x_{f}= T^{-1}(x_{g}) = 3(x_{g}– 4).Therefore, the requirement that f(x

_{f}) = g(T(x_{f})) implies that g(x_{g}) = f(T^{-1}(x_{g})) = f(3(x_{g}– 4)).If you’ve done enough with inverses in general, you may recognize that if T is a stretch/shrink A followed by a shift B, so that T(x) = B○A(x) = B(A(x)) = (⅓)x + 4, then the inverse is T

^{-1}(x) = A^{-1}○B^{-1}(x) = A^{-1}(B^{-1}(x)) = 3(x – 4), which fully explains the reversal of order. To undo a shrink followed by a shift, we have to undo the shift and then the shrink.Does this express what you had in mind?

When you find the inverse of a composite function, it is equivalent to the composition of the individual inverses, in the reverse order. More generally, \((f\circ g)^{-1} = g^{-1}\circ f^{-1}\). (That is, if \(y=f(g(x))\), then \(x = g^{-1}(f^{-1}(y))\); so if \(h(x)=f(g(x))\), then \(h^{-1}(x) = g^{-1}(f^{-1}(x))\).) And this, in turn, is the same idea as when we solve equations, which I discussed in an early post, Why We Care About “Why”.

Mario agreed:

Yes, that is a better way of putting what I wanted to say. I agree, “f(x

_{f}) = g(x_{g}) = g(T(x_{f})), where T is alinear transformation of the input variable”is a much better way of describing my idea rather than “map f(x_{f}) → g(x_{g}).”

With that, I was able to tell Mario,

]]>I think I’ve said what I had in mind. Thanks for giving me the chance to say it!

Transformations of functions, which we covered in January 2019 with a series of posts, is a frequent topic, which can be explained in a number of different ways. A recent discussion brought out some approaches that nicely supplement what we have said before. Here, the focus will be on examples and alternate approaches; next week, the underlying reasons.

The question came from Mario in early September, working through how to determine the appropriate transformations to graph a given function:

What order do we apply function transformations?

By transformations, I mean stuff like horizontal/vertical stretching/shrinking and translations.

I’ve been thinking about the order in which to apply the transformations to a graph when transforming its parent graph. Specifically. I am given the graph on an xy-plane, and I am given the new function. I have to then manipulate the parent function to get the graph of the new function.

A. Say I have

f(x) = (x+1).^{2 }+ 3The parent function is

p(x) = x, so I start with a graph of that. When it comes to how I would apply the transformations, I think about how I would do the operation if I were to plug in an x in the transformed function, via the order of operations. First, I would add 1, which corresponds to a^{2}horizontal shift one unit to the leftbecause it’s inside the parenthesis. After squaring, I would add 3 which corresponds to ashift 3 units up. Apply that to the whole graph, and I have my transformed function.B. Now, let’s do the same thing to

g(x) = √(-(x – 1)).The parent function is

h(x) = √(x). If I were to plug in an x, I would first subtract a 1, that corresponds to ashift 1 unit to the right. Then, I would multiply that result by a negative. Since that negative is inside the radical, it results in areflection about the y-axis. If I do it that way, I get the wrong answer.To get the right answer, I would have to apply the reflection first and then shift 1 unit to the righteven though the part that causes the shift is inside the parenthesis. So my rule of applying transformations like the order of operations falls apart.From here it occurred to me that I should apply reflections first. Reason being that because I shifted to the right one and then reflected across the y-axis,

I was actually reflecting not the x values, but the x – 1 values. A reflection across the y-axis applies a negative value to the x values only.Then, I came to the conclusion that I could

apply not just reflections but vertical/horizontal stretching at the startas well. They both involve multiplying the same parts of the function, and by the commutative property of multiplication, it doesn’t matter the order in which I do the multiplication. In summary,this is the order I’ve come up withat this point for applying transformations.

- Horizontal transformations/reflections
- Vertical transformations/reflections
- Horizontal translations
- Vertical translations
I discern the order of the translations the same way. Horizontal translations apply operations to the x-values, so they go first. Vertical translations apply to the parent function as a whole, so they go last.

These rules seem to hold so far.

C. Consider the function

y(x) = -2(-4x – 2)^{2 }+ 1First I applied the reflections, the horizontal shrink by a factor of 1/4, and the vertical stretch by a factor of 2. Then, I applied the translations. This gave me the right graph.

Is the rules I got, and the logic to how I got them mathematically solid?

There is a lot of good thinking here, but a few corrections are needed, and a good explanation of *why* things work the way they do.

I answered:

Hi, Mario.

I have previously dealt with this topic here:

Combining Function Transformations: Order MattersThere are several ways to express these ideas. Let’s see how your ideas agree with mine!

A. Looking first at your work for f(x) = (x+1)

^{2}+ 3, what you say (that x + 1 corresponds to a horizontal shift one unit to the left because it’s inside the parenthesis) is how I describe it, too. I say that“inside” transformationsare horizontal transformations (which affect x); and they workbackward.B. In your second example, g(x) = √(-(x – 1)), your corrected approach (doing the reflection first) is correct. The way I describe this is that “inside” transformations are carried out

“from the outside in”, reversing the order of operations. Thelastoperation before the square root, when you evaluate the function, is the negation; so thefirstoperation when you transform the function is the reflection.Then you say,

From here it occurred to me that I should

apply reflections first. Reason being that because I shifted to the right one and then reflected across the y-axis, I wasactually reflecting not the x values, but the x – 1 values. A reflection across the y-axis applies a negative value to the x values only.This is a valid alternative way to think about it. One way I like to explain this is that the reflection is accomplished by

replacingx with -x, which would result in √(-x). If you then shift, you are replacing x with x – 1, which results in √(-(x – 1)), just as you need. If you shiftfirst, you get √(x-1) from the shift, then √(-x – 1) from the reflection, which is a different function.But this is only true of your example because of the

parentheses. To get √(-x + 1), without the inner parentheses, you would first shiftleft, andthenreflect. So you only reflect first when the notation (parentheses) demands it.You correctly include stretching with reflections, for the right reason, and summarize:

In summary, this is the order I’ve come up with at this point for applying transformations.

- Horizontal transformations/reflections
- Vertical transformations/reflections
- Horizontal translations
- Vertical translations
This is mostly true, though

I prefer to think of the horizontal and vertical transformations separately, as they affect different parts of the function. Stretches, being a multiplication, can be done together with reflections. But, as I said, horizontal reflections and stretches are usually (in the absence of parentheses) doneaftertranslations.Then you say,

Horizontal translationsapply operations to the x-values, so they go first.Vertical translationsapply to the parent function as a whole, so they go last.Actually,

it doesn’t matterin what order horizontal and vertical translations (or any transformations) are done, because they don’t interact.C. As for your third example, y(x) = -2(-4x – 2)

^{2}+ 1, you sayFirst I applied the

reflections, thehorizontal shrinkby a factor of 1/4, and thevertical stretchby a factor of 2. Then, I applied thetranslations. This gave me the right graph.It is possible that you didn’t write what you meant, or didn’t actually do to the graph what you say, which can be difficult to do. What you describe here is incorrect.

The post I referred to includes (in the last section, “Looking at the graph”) a discussion of the difficulty of graphing this combination of transformations correctly. The following follows the method I talked about there:

Here is what I would do:

Horizontal transformationsfirst (only because they can be harder!):

Shiftright 2 first, because the subtraction is last in the order of operations.

Reflect and shrinkby 1/4 second, because multiplication is first in the order of operations.

Vertical transformationslast (because they are easier):

Reflect and stretchby 3 first, because multiplication is first in the order of operations.

Shiftup 1 second, because the addition is last in the order of operations.

As I said earlier, the order between horizontal and vertical doesn’t really matter; in fact they could be interleaved. But separating them in some way makes it easier to be careful with the parts that matter.

That is how I would handle the function written in that form. But since reflecting and shrinking something that has already been shifted is easy to get wrong, I prefer to instead change the form to what he had in example B:

Now,

it is easier to actually draw a reflection first, so I would commonly start by factoring:y(x) = -2(-4x – 2)

^{2}+ 1 =-2(-4(x + 1/2))^{2}+ 1Now the horizontal transformations are different:

Reflect and shrinkby 1/4 first, because that’s on the outside of the parentheses.

Shiftleft 1/2second, because the addition is on the inside of the parentheses.This different pair of transformations has the same effect as the other, because the

rightshift by 2 in the first method is then reflected (now going to theleft) and shrunk (from a distance of2to 1/4 of 2, which is1/2).

Looking at the graph below, it definitely **looks** like it’s been shifted **left 1/2**, not **right 2**.

One thing I always do after making a graph like this, is to

check a couple points on my graphby putting them into the equation. In this case, I get this graph:One point I plotted was (1, 1) on the parent graph, as shown: I shifted that right 2, to (3, 1), then reflected and shrunk by 1/4 to (-3/4, 1), then reflected and stretched by 2 to (-3/4, -2), and finally shifted up 1 to (-3/4, -1) as shown. Now I check this by setting x = -3/4 and evaluating y(-3/4) = -2(-4(-3/4) – 2)

^{2}+ 1 = -2(1)^{2}+ 1 = -1. There is enough complication in this work that I don’t trust myself until I check!It is easier to see in the final result that the graph is shrunk horizontally, then shifted left 1/2, than the other way.

Mario replied:

I see. It makes sense that you could swap the order in which one does the vertical and horizontal movements. They seem very

vector-likein their movements. Whatever you move in the x-direction first or the y-direction first, you still end up at the same endpoint because moving in just the x-direction doesn’t change your distance with respect to the y-axis and vice versa. And yeah, I see why doing the horizontal transformations first makes things easier.There’s still some things I don’t get though. Why the order of operations?

When you did y(x) = -2(-4x – 2)

^{2 }+ 1 in its unfactored form, you did the shifting two units to the right first because subtraction islastin the order of operations and then you did the horizontal reflecting and shrinking last because they arefirstin the order or operations.Why did you do it in reverse order from the order of operations?When you did the vertical transformations on the same problem, you did the stretching and reflecting

firstbecause multiplication goesfirstin the order of operations and then you did the shifting up by 1 because addition goes last in the order of operations. You followed the order of operations.Why did you follow the order of operations here, but when you did the horizontal transformations, you went in the reverse order of operations?

He has a good perspective (more advanced than most students being introduced to these ideas), that just as we add vectors by separately adding their *x* and *y *components, we can separate the transformations that affect the *x*– and *y*-coordinates of points on the graph.

The “why” question is an important one; in what I’ve said so far I’ve been mostly just asserting it. This was a good opportunity to dig in a little deeper than I had in previous posts.

I answered, first by showing how the transformations individually affect the equation, starting with the **vertical**:

Your questions are at least partly answered in

the post I referred to. Have you read it?Let’s take two

simpler examples, so we can look at onlyone direction at a time.First, suppose we have f(x), and we first stretch it

verticallyby A, and then shiftverticallyby B.To stretch it, we just have to

multiply the value of yby A, so the new function is Af(x).To shift it, we have to

add B to the value of Y, so we have Af(x) + B.Since we are simply operating on a number, y, the first transformation we did is the first operation in the order of operations.

Vertical transformations act directly on the *y*-coordinate, just as when we **evaluate** an expression: we multiply, then add.

**Horizontal** transformations, in contrast, are indirect:

Now, suppose we want to first stretch f

horizontallyby A, and then shifthorizontallyby B.To stretch it, we have to

replace x with x/A(since then x will be A times as large to get any given input to f); so our new function is f(x/A).To shift it, we have to

replace x with x – B(since then x will be B greater than it was, to get any given input to f); so our new function is f((x – B)/A).This has the form

f(a(x + b)), where a = 1/A and b = -B. The first transformation we did is outside of the parentheses, making it the last operation in the order of operations.Observe that each transformation we do

modifies the x, the “inside” of the argumentof the function, so we are workingfrom the outside in.

Next week, we’ll be looking even deeper at how these individual transformations work, particularly at *why* the replacement idea is right.

On the other hand, suppose we want to

first shiftby A, andthen stretchby B.To shift it, we have to

replace x with x – A, so we get f(x – A).To stretch it, we ahve to

replace x with x/B, so we get f(x/B – A).This has the form

f(bx + a), where a = -A and b = 1/B. This time we didn’t need parentheses; but again, the first transformation we did corresponds to the last operation in the order of operations.Does that help at all? If not, there are other ways it can be explained.

We’ll get to some of those other ways next time, because Mario wrote back to ask about them.

Mario asked for a little more:

I see. This makes sense now. I found the answer in the article mentioned a little off-topic from my question, but this really provided me some insights on transformations that really gets to the why of it.

Could you perhaps do a specific example involves all transformationsthat uses this exact way of explaining transformations? I really feel like that’s the final push I need.

Let’s do it! I replied, first giving an example in which we are told to do a certain sequence of transformations, rather than given the equation:

Let’s take

f(x) = √xand transform it this way:

- Shift left 3 units.
- Shrink horizontally by a factor of 2.
- Reflect over the y axis.
- Stretch vertically by a factor of 2 and reflect over the x axis.
- Shift up 2 units.
The successive equations are

- y = √(x + 3)… replaced x with x+3 [ – – – ]
- y = √(2x + 3)… replaced x with 2x [ . . . ]
- y = √(-2x + 3)… replaced x with -x [ – – – ]
- y = -2√(-2x + 3)… multiplied function by -2 [ . . . ]
- y = -2√(-2x + 3) + 2… added 2 to function
Here are graphs of each step, shown in the colors I used above:

Working through each curve on the graph is good exercise in thinking about the effect of each transformation. We start with the solid black line, and end with the solid purple line.

What if we are given the equation, as in his examples, and have to find the transformations (and then graph it)?

Now, if we were

given y = -2√(-2x + 3) + 2, we could see thatverticallywe first multiply by -2 (reflect and stretch), and then add 2 (shift up); andhorizontally, we first replaced x with x+3 (shift left), and then replaced x with -2x (reflect and shrink).

Here I am just reading the order of transformations from the equation, not planning ahead what order to do them in. (It happens that this time I chose to do the vertical transformations first.) The graphing is done just as above.

That equation involved a somewhat awkward shrink. Let’s repeat, using my recommendation:

Or, the way I prefer, we could

rewrite it as y = -2√(-2(x – 3/2)) + 2, so that horizontally we first replaced x with -2x (reflect and shrink), and then replaced x with x – 3/2 (shift up by a smaller amount). Here is the sequence when we think that way. The successive equations are

- y = √(2x)… replaced x with 2x
- y = √(-2x)… replaced x with -x
- y = √(-2(x – 3/2)… replaced x with x – 3/2
- y = -2√(-2(x – 3/2))… multiplied function by -2
- y = -2√(-2(x – 3/2)) + 2… added 2 to function
The transformations in this order are:

- Shrink horizontally by a factor of 2. [ – – – ]
- Reflect over the y axis. [ . . . ]
- Shift right 3/2 units. [ – – – ]
- Stretch vertically by a factor of 2 and reflect over the x axis. [ . . . ]
- Shift up 2 units.
Here are graphs seen this way:

The final result is the same.

Does that help?

Mario replied:

Yes! It helps tremendously! Thank you! This question has been bothering me ever since I first learned function transformations in middle school. I have asked a variety of teachers and professors since, but it is only until now that I have gotten a good answer.

This is why I chose to post this answer as a supplement to what I’ve written before.

But a month later, he had further questions about **why** both the order of transformations and their individual operations are “backward” from what we expect. We’ll look at that next time.

The question came from Stoycho in early September:

Maybe the question is pretty stupid but I can’t figure out where is the problem?

To be more easy I attached image. Please help.

So based on Law of Sines,

136 / sin(C) = 76.8 / sin(8.63°)

or

sin(C) = 136 / (76.8 / sin(8.63°)) = 0,2657189;

Angle C = 15,40967°Obviously this is wrong, but where is the problem?

At this point, we don’t know the context of the problem; we’ll find out later. Given what we know, we can only assume this is an exercise in a trigonometry course. But we don’t have a full statement of the problem.

On the other hand, we can easily see what he is concerned about: Angle C looks obtuse in the picture; and although the picture does not have accurate angles, we can see that adjusting them would not change this fact. How did he get an acute angle for his answer?

Doctor Fenton was the first to answer:

The problem lies with taking the

inverse sineof 136*sin(8.63)/76.8. Therangeof the principal branch of the inverse sine function (usually written as Sin^{-1}(x)) is the interval [-π/2, π/2], or angles in the fourth and first quadrant. When you use a calculator (or tables) to evaluate θ = Sin^{-1}(x) when x > 0, the result will always be afirst quadrantangle. You also need to consider the possibility that the angle you want is in thesecond quadrant, in which case the first quadrant solution θ_{1}will be the reference angle of the desired solution, or 180° – θ_{1}. Then the desired solution is θ = 180° – θ_{1}.If you have any questions, please write back and I will try to explain further.

Here is a demonstration of two such angles with the same sine, one in the first quadrant, the other in the second:

On the unit circle, as shown, the sine of an angle is the *y*-coordinate of the point on the circle, and points B and B’ both have the same *y*-coordinate, so the sines of \(\theta\) and \(180^\circ – \theta\) are equal. So although \(\sin^{-1}(y)\) is the first-quadrant angle \(\theta\), the supplement of that angle is also a possible solution.

Doctor Rick joined in, offering the bigger picture:

Hi, Stoycho. I would like to add that The Math Doctors has a pair of blogs on the broader topic raised here:

Solving an Oblique Triangle, Part I

Solving an Oblique Triangle, Part IIThe first introduces the issue you’re confronting, in the section “The trouble with inverse sines”. The second goes into much more depth on the issue, “the ambiguous case”. That might be of help.

Also, let me point out that the solution you got is only “obviously wrong” based on the

figure, in which angle C is “clearly” obtuse. But figures can’t be trusted; often on exams, there will be a statement to the effect that “figure is not to scale” – a reminder not to assume anything from the figure that isn’t clearly stated. Based solely on the given information (A = 8.63°, a = 76.8, c = 136), your solution is perfectly valid – but it is not theonlysolution. If the problemstatedthat C is obtuse, that would be a different story.

If the question was an exercise, it was improperly stated, as the picture implied that C was obtuse, but neither stated that nor asked for all solutions. As we’ll see, it is not an exercise – which reminds us that when we are solving a problem as part of a larger problem, we need to be careful to state the “subproblem” carefully, so as not to mislead ourselves.

Then I joined the party:

I’ll add yet a third reply!

Here is an accurate picture showing both your triangle ABC, and the other valid solution, ABC’, which looks more like the given image (but shows that it is not to scale).

Have you learned about the SSA case? If not, perhaps this exercise is intended to introduce the idea!

In other words, this is a “Side-Side-Angle problem” which typically has two solutions that look like this individually:

The triangle from Stoycho’s calculations …

… and what he really wanted, based on the picture:

Stoycho replied,

Thank you all for your help. I understand that depending on the type of triangle the formula considers the corresponding angle. My decision was correct but it just calculated the other angle.

But there was more to say, so I wrote back asking for context:

You haven’t shown us the entire problem, which is important.

What did it ask you to do?Did it say anything more about the triangle?I would expect it to have asked for

allsolutions; if it just said tosolvethe triangle, then that is implied (and includes finding both unknown angles and the unknown side).And if so, then your answer is not complete. Finding

onepossibility for the angle does not finish the work. This is what all three of us were saying.

I had shown what the two solutions looked like, but not the actual solutions. Did he really need only the obtuse value of C, or something else?

Stoycho now showed us what he was really doing:

Thank you All. I attached image for what I ask

This is a nice application of trigonometry. In computer simulation of a moving ball, we can just calculate its position at a sequence of times based on its velocity; but if something changes between frames (in this case, hitting the player’s head at point C, between successive calculated locations A and D), a calculation is needed to determine exactly when it hit, and at what angle, so that position D can be replaced with the appropriate location after the bounce. A game has to do a lot of work to produce the appearance of reality!

Here he isn’t looking for perfection; modeling how the ball would bounce off the irregular shape of a real head would be far harder than what he is doing, pretending the head is a perfect sphere! Presumably this looks real enough for a game. Distance BC is the sum of the radius of the imagined spherical head and the radius of the ball, which is the closest the center of the ball can get to the center of the head.

Doctor Rick responded:

Hi, Stoycho.

Now we know that the problem was

not assigned to you; it was of your own creation … so it wasyourjob to amend the problem so as to indicate which of the two solutions to the original problem was the correct one in context. I assume you realized, after our discussions, that the additional condition is thatangle ACB must be obtuse. The acute solution corresponds to asecondtime the ball intersects with the circle of radius BC, which is unphysical as the ball would have had to passthroughthe head to get there.

Until the possibility of two solutions was pointed out, he didn’t see the need to explicitly mention a requirement that angle C be obtuse (or, equivalently, that C is the **nearer** of the two possible points to A, the **first** place the ball’s center intersects the circle). This is common in problem-solving: As you try things, you discover new constraints.

If I had seen the actual problem at the start, I would have suggested

an entirely different approach. Since you are evidently working with x and y coordinates, and you should know thecoordinatesof points A, B, and D, I would do the following: Write theequationof the line AD, and the equation of the circle of given radius centered at B. Then solve the system of these two equations.You will again get

two solutions(the line is a secant of the circle, intersecting it intwo points). You might choose the solution whose x coordinate is closer to A’s x coordinate. Or you could compare y coordinates, or the squares of the distances of the two solutions from A. (This latter method avoids the need for a special case where AD is either vertical or horizontal.)

Let’s try this. I’ve put the picture on a grid (in GeoGebra) and found coordinates for the points:

The ball was at A\((2.6,3.88)\), and without the collision would now be at D\((1.62,1.24)\). We want to find where it intersects the circle centered at B\((0.88,1.24)\) with radius 1.18 (the sum of the radii of the “head” and the ball). We want to find the coordinates of C, which I will pretend I don’t know.

We find that the line AD has equation $$y=2.694x-3.122$$ and the circle has equation $$(x-0.88)^2+(y-1.24)^2=1.38$$

We can eliminate \(y\) by substituting the line equation in the circle equation, obtaining $$(x-0.88)^2+(2.694x-4.362)^2=1.38$$

Expanding, this becomes $$8.257x^2-25.265x+18.325=0$$

By the quadratic formula, we have $$x=\frac{25.265\pm\sqrt{(-25.265)^2-4(8.257)(18.325)}}{2(8.257)}\\ = \frac{25.265\pm 5.451}{16.514} = 1.20\text{ or }1.86$$

The larger of these is closer to A, and is in fact the *x*-coordinate of C. The *y*-coordinate is $$y=2.694(1.86)-3.122=1.89$$

Let’s do the triangle calculations for this same figure:

The values in red are given: \(AB = 3.15\), \(BC = 1.18\), and \(\angle A = 12.72^\circ\). By the Law of Sines, $$\frac{3.15}{\sin(C)}=\frac{1.18}{\sin(12.72^\circ)}$$ Solving for angle C, $$\sin(C)=\frac{3.15 \sin(12.72^\circ)}{1.18}=0.5878\\ C=\sin^{-1}(0.5878) = 36^\circ\text{ or }180-36=144^\circ$$ The picture shows a slightly different angle due to rounding. To locate point C, we can first find its distance from A: $$B=180-(12.72+144) = 23.28^\circ$$ $$\frac{b}{\sin(23.28^\circ)}=\frac{1.18}{\sin(12.72^\circ)}$$ $$b = \frac{1.18 \sin(23.28^\circ)}{\sin(12.72^\circ)} = 2.118$$ which is right.

Now we need to go that distance along line AD. To do that, we find vector AD to be $$\vec{AD} = \langle 1.62-2.6, 1.24-3.88\rangle = \langle -0.98, -2.64\rangle,$$ whose length is $$|\vec{AD}| = \sqrt{0.98^2+2.64^2}=2.816;$$ we need vector AC to be the appropriate scalar multiple of vector AD: $$\vec{AC} = \frac{2.118}{2.816}\vec{AD} = 0.752\vec{AD}$$ $$= \langle 0.752\cdot-0.98, 0.752\cdot-2.64\rangle = \langle -0.737, -1.986\rangle$$. Adding this to point A, we get $$(2.6-0.737, 3.88-1.986) = (1.863, 1.894)$$ which again agrees with the graph and the other method.

Which is better? That depends on the form in which the data are available. If points are stored as coordinates, then the coordinate method seems good. But if angles and distances are available, trig works well.

]]>I had a long discussion recently about the Cartesian product of sets, answering questions like, “How is it Cartesian?” and “How is it a product?” I like discussions about the relationships between different concepts, and people who ask these little-but-big questions. We’ll be looking at about a quarter of this conversation, including the origin of the concept and what it really means (and doesn’t mean).

The question came in late August from Shaurya, whose question about sets we examined three weeks ago:

Respected maths doctors

Today I have started my new chapter in mathematics, that is relations and functions, and the first subtopic of this chapter is the

Cartesian product of sets.When I read this name only, the Cartesian product of sets, I got too much surprised because here we are all talking about sets and relations and functions, then

why the multiplication of the two sets A and B that is A×B is named Cartesian products of sets?Does it have something to do withco-ordinate geometry?Please explain me why this multiplication of two sets A and B is named Cartesian product of sets. I have tried too much to relate it with co-ordinate geometry but cannot come to the final result that is it named the Cartesian product of sets due to some convention or named arbitrarily, or because it shares some properties with the co-ordinate geometry?

Thank you

Shaurya has previously learned about the **Cartesian plane**, also called the **rectangular coordinate plane**, in which each point is associated with an ordered pair \((x,y)\). He is now learning about the **Cartesian product of two sets**, which is defined as \(A\times B = \{(a, b) : a\in A, b\in B\}\). That is, it is the set of all ordered pairs (*a*, *b*) where *a* is an element of set A, and *b* is an element of set B. So his first question is about the name, Cartesian. Are the two concepts related?

I answered, starting with the name itself:

Hi, Shaurya.

Yes, this name is closely related to the Cartesian coordinate system.

According to

Wikipedia,

The Cartesian product is named after René Descartes, whose formulation of analytic geometry gave rise to the concept, which is further generalized in terms of direct product.

The path from Descartes’ own work to Cartesian coordinates, and then to the Cartesian product of sets, was long; he never actually used the \((x, y)\) coordinates that are named for him:

But the relationship is not quite direct, as

Descartes himself did not see things in nearly the modern way. According toEarliest Known Uses of Some of the Words of Mathematics,

CARTESIAN, from Cartesius the Latin name for the mathematician and philosopher René Descartes (1596-1650), appears in several expressions. The mathematical ones usually relate toLa Géométrie(1637).The terms can be misleading, for as Boyer remarks:Cartesian geometry now is synonymous with analytic geometry, but the fundamental purpose of Descartes was far removed from that of modern textbooks. The theme is set by the opening sentence: “Any problem in geometry can easily be reduced to such terms that a knowledge of the lengths of certain lines is sufficient for its construction.” As this statement indicates, the goal is generally a geometric construction, and not necessarily the reduction of geometry to algebra. The work of Descartes far too often is described simply as the application of algebra to geometry, whereas actually it could be characterized equally well as the translation of the algebraic operations into the language of geometry.

This quotation is taken from the 1968 edition of

A History of Mathematics, pages 370-371.…

Cartesian product. This set theoretic term entered circulation in the 1930s. Previouslyproduct(Produkt) was the established term: see, e.g. Felix HausdorffGrundzüge der Mengenlehre(1914, p. 37)) Kuratowski wroteproduitforintersectionandproduit cartésienfor the formerproduct(Topologie I (1934, p. 7)). Hausdorff had usedDurchschnittfor intersection, so there was no danger of confusion. …Boyer (p. 346) considers the term “Cartesian product” an anachronism because

Descartes didnotthink of his coordinates as number pairs.

The first entry here emphasizes the distance between Descartes’ work and both concepts we now call “Cartesian”; more on that below.

The second entry shows that the concept of Cartesian product came about 300 years after Descartes, and was initially just called a “product”. His name was added to distinguish this from the intersection of sets (which can be thought of as a product; in fact the related logical “and” is written as multiplication in Boolean algebra, and the probability of the intersection of events is found by multiplying).

I continued:

The Cartesian coordinate system as we know it today, in which any point in the plane is identified with an

ordered pair (x, y), developed from Descartes’ geometrical ideas, and in turn led to the generalized idea of forming the“product” of any two setsas theset of ordered pairsfrom the sets.In particular, the

Cartesian productR×R = R^{2}of the real number line with itselfistheCartesian plane.

Descartes’ idea led to identifying points as ordered pairs of real numbers, so that what we call the Cartesian plane is in fact the Cartesian product of two sets of real numbers. The latter is a generalization of the former.

Later in the conversation, we came back to the question of what Descartes actually did. Here is what I said in answer to a question about the Boyer quote above:

Descartes was not doing what we are doingwhen we use x and y to describe something. That’s what Boyer is saying. Let’s look more deeply into what he actually did.First, the book in which he introduced these ideas is

La Géométrie. TheWikipedia articleabout that says,Descartes is often credited with inventing the coordinate plane because he had the relevant concepts in his book; however,

nowhere in. This and other improvements were added by mathematicians who took it upon themselves to clarify and explain Descartes’ work.La Géométriedoes the modern rectangular coordinate system appear

He did use *x* and *y*; but did not use the axes, and did not measure them perpendicularly.

The Wikipedia article on the

Cartesian coordinate systemsays,The adjective

Cartesianrefers to the French mathematician and philosopher René Descartes, who published this idea in 1637. It was independently discovered by Pierre de Fermat, who also worked in three dimensions, although Fermat did not publish the discovery. The French cleric Nicole Oresme used constructions similar to Cartesian coordinates well before the time of Descartes and Fermat.

Both Descartes and Fermat used a single axisin their treatments and have a variable length measured in reference to this axis.The concept of using a pair of axes was introduced later, after Descartes’La Géométriewas translated into Latin in 1649 by Frans van Schooten and his students. These commentators introduced several concepts while trying to clarify the ideas contained in Descartes’ work.I found

this articleabout what he did in this book; I haven’t read much of it, but just skimming through it, you can see that the way in which he used algebra to solve geometric problems involvedno ordered pairs or axes, and the “coordinates” that he does use involveno right angles.

In general, his work looks more like what we do today when we label a geometrical figure with variables, than like our analytic geometry.

Here is

another article, perhaps simpler. You don’t find any axes or ordered pairs here. What you do find are the x and y shown here:That is as close as he comes to our (x, y).

He planted the seedout of which the ideas of analytic geometry, the Cartesian plane, and later the Cartesian product, grew; but a seed looks very different from a fully-grown plant!

Of course, the important fact for Shaurya’s original question is simply that both concepts grew from that Cartesian seed.

Going back to my initial response, Shaurya was not yet clear on the relationship of the two concepts:

I am not able to deduce the final result; please help me. What is meant by saying that

Cartesian productR×R=R^{2}of the real number line with itself is theCartesian plane.

I replied:

If you are studying the Cartesian product, then you either have seen this, or soon will. This should not surprise you, though it is quite possible that you are being introduced to the concept with only finite sets as examples.

As I think you know, if we have two sets A and B, their Cartesian product is defined as A×B = {(x, y) : x ∈ A, y ∈ B}.

So R×R is the Cartesian product {(x, y) : x, y ∈ R}. We also call this R

^{2}, because that is how we write the product of something with itself, though you may not have seen this notation yet. And the concept of the Cartesian (coordinate) plane is precisely thatevery point of the plane corresponds to an ordered pair (x, y)of real numbers.I think you just need to be patient and study the chapter you are in, which will probably answer your questions. They introduce the Cartesian product before defining relations, because a relation is defined as a subset of a Cartesian product; it is a set of ordered pairs.

Shaurya answered, struggling to make sense of the word “product”, working from a wrong meaning of the Cartesian product to an attempted explanation for the name:

Let me try to explain my problem.

In your previous reply you have mentioned that R×R = R

^{2}. It might be possible that I am interpreting it wrong but I am interpretingR×R = {(R, R)}. So please tell me is R^{2 }= {(R, R)} and if so how.When we talk about Cartesian product, we are taking two sets, say A = {2}, B = {4}, and for multiplying these two sets we are putting a Saint Andrew cross (×) between these two sets in a particular order, say A×B = {2}×{4} so what I also think is that the results

should be equal to {8}. But it is not happening as such.Why is the thing that is happening here is that the result is coming as 2 and 4 and they are in curved brackets and there is,

between 2 and 4 that would provide a final result as (2, 4).And also the multiplication of the sets isnot following the commutative propertyas followed by the normal multiplication of the two sets. Sowhy is this procedure considered a productwhen it is not same as the product of two simple numbers?But yes one pattern that I found was that it the two sets are multiplied with each other by following the

distributive property.For this let us consider two sets A = {1, 2}, B = {3, 4, 5} so I deduce its multiplication as follows A×B = 1×{3, 4, 5}, 2×{3, 4, 5} ={(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)}this have something similar as deducing the product of real numbers by using distributive property example in (5 + √7)(2 + √5) = 5(2 + √5)√7(2 + √5) = 10 + 5√5 + 2√7 + √35 show procedure of the curve taking out cartesian product is considered as the product of two sets because it is following the distributive property in the similar way as followed by the multiplication of the real numbers or the clue behind calling them as the product is hidden behind thatwe are not multiplying here the numbers directlybut the two sets for example in sets {2}×{4} we are not multiplying simply like 2×4 that is twoness × fourness but the two sets only.

Shaurya sees that we aren’t multiplying numbers (like 2 and 4), but is trying to find a relationship between what a Cartesian product actually is (each element of A paired with each element of B) and multiplication of numbers. He is being very creative!

In particular, he sees that \(\{2\}\times\{4\}=\{(2,4)\}\), while \(\{4\}\times\{2\}=\{(4,2)\}\), so this operation is not commutative. On the other hand, he sees a connection with the distributive property, and proposes this as a reason for the name.

I answered:

Now I realize that your question about the name “Cartesian product” was not just about the name “Cartesian”, as I had thought, but about the use of the word “

product“. You appear to have some incorrect expectations, but have found some interesting ways to correct them.

This is genuinely commendable, even though his specific ideas are wrong, as we’ll see. Too many students, when they find a new idea not fitting into their existing framework, just accept it blindly, often because they don’t expect math to make sense! Shaurya does expect sense, and tries to manufacture it.

So what *is* the relationship to multiplication of numbers?

First, you need to understand that this operation is

not an extension of multiplication of real numbers, but is called multiplication (and denoted by a multiplication symbol) byanalogy. It should not be expected to follow all the properties of multiplication. You may find it interesting to read this:

What is Multiplication … Really?This doesn’t mention the Cartesian product, but some similar things can be said about it. One thing it mentions, though, is that operations we call multiplication are

not always commutative.

This would have been an excellent example for that post!

I think the main reason this is called a product is that it

multiplies the cardinalities(sizes) of the two sets. Taking your example, {1, 2} × {3, 4, 5} = {(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)}, and the cardinalities are respectively 2, 3 and 6: 2 × 3 = 6.This corresponds to the idea of making an array of 2 rows of 3 objects, which is the elementary model of what multiplication does (for whole numbers). It’s also interesting that if we interchange the factors and multiply {3, 4, 5} × {1, 2}, although the new

product is not the sameas the first, it does have thesame cardinality. (This is the elementary justification for the commutative property of multiplication of whole numbers.)

Here is a common way to represent \(\{1, 2\} \times \{3, 4, 5\}\), showing that \(2\times 3=6\); each dot represents an ordered pair:

But there are some significant errors in what you say here.

First, and simplest, it is not true that R × R = {(R, R)}. The latter is a

single ordered pair of two sets, not the set of all ordered pairs taken from two sets, which is the Cartesian product.You have, I think, understood that the Cartesian product does

notinvolve products of elements (as in your example, where {2} × {4} = {2, 4)}, not {8}). That was simply a wrong expectation on your part. We define this operation in its own way, and you must follow thedefinition, not your own preconceptions. And your expectation that any product would be commutative is wrong; again, the properties of this operation are determined from its definition.

This is explained in the post I referred to.

But you are wrong in thinking that the definition arises from the

distributive property. It’s an interesting idea, which I don’t think I’d ever considered, to observe that{1, 2} × {3, 4, 5} = {{1} × {3, 4, 5}, {2} × {3, 4, 5}} = {(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)}.

But technically, this is

not really true, as the second expression is actually a set containing two sets of 3 elements, namely{{(1, 3), (1, 4), (1, 5)}, {(2, 3), (2, 4), (2, 5)}}

This is not the same as the third expression, which contains 6 sets.

Also, this is

not really distribution, as there is no addition involved. So while this is verysimilarto distribution (combiningeach element of one set with each element of the other, as distributionmultiplieseach addend in one expression by each addend in another), it is not what distribution would really mean.

So the Cartesian product is **similar** to distribution, but not **identical** to it. But this does raise a further question:

Now, since, in my post I referred to above, I suggest that

the distributive property is almost a defining property of multiplication, I have to ask myself,does the Cartesian product have such a property?If so, what would we use for “addition”, since you can’t add sets?It turns out that, in fact, the Cartesian product distributes over

severalset operations, as mentioned inWikipedia:Here are some rules demonstrating distributivity with other operators:

\(A\times(B\cap C)=(A\times B)\cap (A\times C)\)

\(A\times(B\cup C)=(A\times B)\cup (A\times C)\)

\(A\times(B\setminus C)=(A\times B)\setminus (A\times C)\)

…

Commonly, we think of the

unionas the equivalent of addition in set theory (and more so in probability and logic), so that is what I would use. But even this is quite different from your idea of distribution. An example would be, if A = {1, 2}, B = {2, 3, 4}, and C = {3, 4, 5}, thatA × (B ∪ C) = {1, 2} × {2, 3, 4, 5}

=

{(1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5)}(A × B) ∪ (A × C)

= {(1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4)} ∪ {(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5)}

=

{(1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5)}which are the same.

So the Cartesian product does have the property that I have said distinguishes multiplication from other operations.

]]>So, yes, the Cartesian product does have a distributive property; but that has nothing to do with your idea of using distribution. Yet your idea is a nice way to think of this product.

I’ve probably said far too much, but I hope some of this helps!

In an ellipse, \(\frac{x^2}{a^2}+\frac{y^2}{b^2}=1\) with focal distance *c*, parameters *a*, *b*, and *c* all make natural sense, and it is easy enough to see why \(a^2 = b^2 + c^2\). But in the hyperbola, \(\frac{x^2}{a^2}-\frac{y^2}{b^2}=1\), the equivalent relationship, \(a^2 + b^2 = c^2\), is not nearly as natural, nor is the meaning of *b* itself; and many derivations of the formula skip over such details. We’ll dig in deeper here, using a mix of old *Ask Dr. Math* answers and new ideas.

I’ll start with the new question, from Ethan in mid-August:

Hey Math Doctors!

I was reviewing Conics when I noticed a strange step in a derivation of the equation of the hyperbola. Once you do a lot of algebra, you get c^2 x^2 − a^2 x^2 − a^2 y^2 = a^2 c^2 − a^4.You factor, and then get x^2 (c^2 − a^2 ) − a^2 y^2 = a^2 (c^2 − a^2 ). This is where I get confused. You need to set b^2 = c^2 – a^2. There is a similar relation for the ellipse, b^2 = a^2 – c^2, which is very intuitive. However, the hyperbola relation is bugging me.

The proofs I’ve read through either rely on a picture or seem circular in the reasoning.Is there anintuitive prooffor this relation without relying on the equation of hyperbola or relying on a picture “proof”? Also why does this relation look like the Pythagorean identity?Here are links to the proofs I have read.

https://math.stackexchange.com/questions/3152904/can-this-equation-b2-c2-a2-be-derived-intuitivelyI love your website!

Sincerely, Ethan

I looked through the references and saw mostly familiar material, similar to what we have said in the past; but none said what I had in my mind. I couldn’t recall whether I’d ever put my thoughts into written form. But this could be a chance to do so!

I answered:

Hi, Ethan.

This is a fun question, and I won’t be surprised if another of us has ideas to add. But I want to take a shot at it.

First, we need to clarify what you are hoping for. You ask for “an

intuitive prooffor this relationwithout relying on the equation of hyperbola or relying on a picture ‘proof’.”I think we can provide an intuitive

understanding, but it will not be an actualproof; and at some point we have to bring in the equation, because that’s where the idea ofbcomes from! And to me, nothing is intuitive if it doesn’t involve a picture!

As he’d mentioned, *b* commonly arises not from our initial geometric conception of the hyperbola, but “pops out of” the equation as we derive it, and is only then given meaning. But most derivations don’t actually relate the formula to its geometrical meaning. What we want to do is to make that connection between the equation and the picture, without merely declaring it to be so, which I think is what Ethan was seeing.

I looked through your links to see if any have the picture I have in mind, and the third one does; but it uses it within what you are perhaps finding too complicated a pile of algebra to be intuitive. I’ll probably be saying some of the same things, but hopefully in a context that will feel more natural.

I found (in my list of potential blog topics from the

Ask Dr. Matharchive)a typical derivation of the equationof the hyperbola here:

Deriving the Hyperbola Formula(To read it, you’ll have to sign up for a free account with NCTM; they’ve hidden the site behind what I call a “freewall”.)

This is a warning I have to give these days, and is why links on our site have not yet been updated to give the new locations; without the explanation, people have assumed the link was bad. And when I say the site is “hidden”, I mean it: Though they changed from a “paywall” requiring membership, it is still impossible to search for archive pages, so the only way I can find them is if I have previously done so and put them on my list of future topics.

To make it easier for readers, I’ll insert that 1998 page in its entirety here:

Deriving the Hyperbola Formula When speaking of hyperbolas, why does: C squared = A squared + B squared? My entire math class, including my teacher, has tried to figure this out, but no one can come up with a logical reason. My teacher said that if we could come up with an explanation of this, we would get an A on our next test. Can you please help?

Doctor Jerry answered by deriving the equation of the hyperbola:

Hi Gauteaux, I wouldn't want to say that there is one reason everyone would accept, but here's a reason many would accept. A hyperbola can be defined as follows (ellipses have a similar definition): Two points, calledfoci, are given; they are2cunits apart. A hyperbola is the locus of all points for which thedifferenceof the distances to the foci is a constant2a, where 2a > 2c. To take a special case, suppose the two foci are at (-c,0) and (c,0). Then if (x,y) is on the ellipse, we must have: sqrt((x-c)^2 + y^2) - sqrt((x+c)^2 + y^2) = 2a This is the equation of one "branch." To keep things simple, I'll stick with this case. The other case is similar. This equation can be greatly simplified. First, write the equation as: sqrt((x-c)^2 + y^2) = sqrt((x+c)^2 + y^2) + 2a Square both sides and simplify. You will get: -c*x - a^2 = a*sqrt((x+c)^2 + y^2) Again, square both sides and simplify. You will get: (c^2 - a^2)x^2 - a^2*y^2 = a^2(c^2 - a^2) It is common toset c^2-a^2 = b^2, to simplify the equation. b^2*x^2 - a^2*y^2 = a^2*b^2x^2/a^2 - y^2/b^2 = 1. Fortunately, the constant b is interesting. If you draw about the origin a rectangle that is 2a by 2b and then draw its diagonals,the diagonals are the asymptotes of the hyperbola. Please check my algebra. I can make mistakes.

You may recognize parts of this from Ethan’s description. I quoted this to give us a common starting point.

Other versions of the derivation, by Doctor Rob, can be found at

Focus of a Hyperbola First Principle Hyperbolas

Continuing my answer to Ethan,

As you’ve seen in other proofs, Dr. Jerry starts with the definition of the hyperbola:

Two points, called

foci, are given; they are2. A hyperbola is the locus of all points for which thecunits apartdifference of the distancesto the foci is a constant2, where 2aa< 2c.So the hyperbola is defined in terms of

aandconly. From this definition, we can also see (in the picture below) that the distance between the vertices will be 2a. The distance between vertices A and A’ is the difference between AF and AF’, since AF = A’F’.

Here’s a version of the picture containing only what we know to start with:

The hyperbola is the set of all points P such that \(|PF-PF’|=2a\). This includes the vertex, A, since \(|AA’|=|AF’-A’F’|=|AF’-AF|=2a\).

In the course of the work, he finds, as you said, that one expression occurs more than once, and says,

It is common to set

, to simplify the equation.c^2 –a^2 =b^2This is where many people feel it starts to feel circular! But we are just defining something called

bhere for convenience. At this point, it has no meaning in itself.

There is as yet no “*b*” in the picture! But whatever it is, that Pythagorean formula applies to it. And if we draw in such a triangle, we get this:

After finishing, he adds,

Fortunately, the constant b is interesting. If you draw about the origin a rectangle that is 2a by 2b and then draw its diagonals,the diagonals are the asymptotes of the hyperbola.It can be shown easily from the equation that the diagonals are the asymptotes; when

xandyare large, the equation becomes very close tox^2/a^2 +y^2/b^2 = 0, which is the equation of the asymptotes! The slopes of those lines are ±b/a.

Doctor Jerry didn’t prove his claim about the asymptotes; my claim is not quite a proof, but is intuitive. The larger *x* and *y* (and therefore both fractions on the left-hand side) get, the less difference the 1 on the right of the hyperbola’s equation matters, and it might as well be zero. Solve the resulting equation for *y*, and you get the equations of the two lines. We’ll see more on this later.

There is as yet no reason to think that OB should have length *c*, apart from the algebra. Does it make sense?

That’s the proof; my interest here is to think about

why this result should be intuitively reasonable. So I’ll do a little hand-waving. (But there’s nothing up my sleeve!)This is where I draw a picture:

The foci are F and F’; the vertices are A and A’; the hyperbola is in red, and its asymptotes are dotted red.

Values in parentheses are not initially known.

We aren’t going to assume what we found in the proof; in particular,

we’ll definenot by the expression we stumbled across, butbby placing point B on the asymptote, so that its slope isb/a.The green triangle OAB has sides

a,b,c; but what we wonder is,why would the hypotenuse beSo I’m not declaring yet that it is; our goal is to show that. On the other hand,c?aandbare where they are by definition:ais defined as half the distance between the vertices, andbby the fact thatb/ais the slope of the asymptote OB.

We’ll be seeing fuller proofs of the asymptote below. For now, we’re staying intuitive.

Now, suppose a point P is very far out on the hyperbola, approaching the asymptote.

Segments FP and F’P will be nearly parallel to the asymptote, so they will approach the parallel lines I’ve drawn through F and F’. The definition of the hyperbola says that the difference between the distances FP and F’P is 2a; I’ve marked that as the length of F’C on the yellow triangle.Can you see that the green and yellow triangles are

similar(since they are right triangles with a common angle)?Hypotenuse F’F has length 2

cby definition; the similarity implies that OB/F’F = OA/F’C; that is, OB/(2c) =a/(2a). From that, it is obvious that OB =c. And that implies thatc^{2}=a^{2}+b^{2}.Does that help?

So the yellow triangle is exactly twice the green one. And we can now fill in the “*c*” on the latter.

Incidentally, my picture also shows that *b* is the distance from a focus to an asymptote, which I’d never noticed until now!

Ethan replied, showing he’d put in the effort to fully understand:

This definitely helps! I had trouble understanding where F’C came from until I visualized sliding FP over and onto F’P; then, it was clear why F’C is 2

a. Your picture looks similar to the picture in the link with the limit argument, but what helped was connecting b to the asymptote and showing the two similar right triangles.Thank you for your help. I really appreciate it.

I responded:

To be honest, I was a little concerned that I hadn’t explicitly stated how I decided that F’C = 2a; that was essentially because some intuitions are easier seen than explained. I’m glad you got it; and perhaps giving you that little bit to work out for yourself helped the explanation work for you!

I’m not sure whether I’ve ever written up these ideas before, so I’m glad I had a chance to!

In preparing for this post, I discovered a 2019 answer containing links that were not in the list where I found Doctor Jerry’s explanation. Let’s look at those.

First, we have this question from 2007, nearly identical to Ethan’s except that it doesn’t ask for an intuitive explanation:

Meaning of Value of b in Hyperbola Equation I'm teaching conic sections, and I have been unable to find a justification forwhy in a hyperbola does a^2 + b^2 = c^2. You can easily justify a^2 = b^2 + c^2 in an ellipse by looking at special points. But I have yet to find a comparable explanation for hyperbolas. Textbooks just give you the formula and never explain where it comes from.

Doctor Fenton answered:

Hi Don, Thanks for writing to Dr. Math. The relationship is true because that is the DEFINITION of b.b doesn't correspond to any geometric feature in the specification of the hyperbola, if you are using the description as the points whose distances to the two foci differ by a given amount. Thefociare determined by the numberc, and the givendifferencedetermines the coordinates of thevertices a, and with these two numbers, you can derive the equation x^2 y^2 --- - --------- = 1 a^2 c^2 - a^2 (for a hyperbola centered at the origin, with foci (+/-c,0) and vertices (+/-a,0)). Since c^2 > a^2, c^2 - a^2 > 0, there is a positive number b such that b^2 = c^2 - a^2, and using this clearly simplifies the denominator of y^2 in the formula above. The point is that a and c are enough information to completely determine the hyperbola.No value of b is needed, and it is simply introduced to simplify the notation.

The details, of course, are in Doctor Jerry’s derivation.

Actually,the ellipse is similar: the foci and vertices (or the sum of the distances to the foci, which determines the vertices) are all that is needed to define the ellipse. It turns out thatif you introduce the semi-minor axis b, you simplify the equation, and the quantity has a geometric meaning, but this was not part of the original specification.

The difference, of course, is that the semi-minor axis is visible in the graph of the ellipse; and Don’s “special points” form the right triangle we need:

If we define point B as the co-vertex (an end of the *minor* axis, as the vertex A is an end of the *major* axis), then since it is a point on the ellipse, the sum \(BF+BF’ = 2a\), so \(BF=a\). This makes it clear that \(a^2 = b^2+c^2\). So although *b* is not used **to derive the equation**, its geometrical **meaning** is clear.

The hyperbola also has asymptotes y x y x - - - = 0 and - + - = 0 b a b a and these can be found by drawing a box with corners (+/-a,+/-b), but that box is not part of the original specification of the hyperbola. It is something you find after you have found the hyperbola. In both cases (ellipse and hyperbola), the definition essentially specifies a and c, andb is introduced for convenience. However, once youdeduce the geometric significance of b, it offers an alternative way of specifying the conic. Any two of a, b, and c can be given and the third quantity determined. Does that help?

The equations for the asymptotes as given here are easy to derive from the form I previously gave, by factoring: $$\frac{x^2}{a^2}-\frac{y^2}{b^2}=0\;\Rightarrow\;\left(\frac{x}{a}-\frac{y}{b}\right)\left(\frac{x}{a}+\frac{y}{b}\right)=0$$

Finally, we have this 2011 question about asymptotes:

Approaching Asymptotes of Hyperbolas How do you derive theequations for the asymptotes of the standard hyperbolas? y = +/-(b/a)x y = +/-(a/b)x Solving for y, I got it down to: y = +/-(b/a) sqrt(x^2 - a^2) Then, letting x go to infinity, the a^2 is rendered insignificant, so y = +/-(b/a) sqrt((x^2)) This gives y = +/-(b/a)x But, how does this prove that y = +/- (b/a)x is an asymptote(s)? I need clarity here. I have been to site after site, and looked in books, and I still can't find an explanation.They all just state it and how to use it, but offer no proof.Could you help me with this?

(His second equation appears to be for a hyperbola with its major axis vertical, which he doesn’t otherwise mention.)

This time, we want a real proof. The trouble is that an asymptote like this is not quite the same as a limit (which would be a number, not a slanted line).

I answered:

Hi, Donald. What you've done is agood informal demonstrationof the idea; we can make it a little more convincing by saying it this way: y = +/-b/a sqrt(x^2 - a^2) = +/-b/a sqrt(x^2(1 - a^2/x^2)) When x is much larger than a, a^2/x^2 is much less than 1, so this will be very close to y = +/-b/a sqrt(x^2) = +/-(b/a)x

This is much like what I did above.

For a real proof, we have tostart with the definition of "asymptote"; without that, no proof is possible, since we wouldn't know what we were trying to prove! An asymptote is a line that is approached more and more nearly by the curve as x increases. That is, if we have acurve y = f(x)and aline y = mx + b, the latter is an asymptote of the former if lim[x->oo](f(x) - (mx + b)) = 0 To show thaty = bx/ais an asymptote ofy = b/a sqrt(x^2 - a^2), we want to show that lim[x->oo](b/a sqrt(x^2 - a^2) - bx/a) = 0 Is it? lim[x->oo](b/a sqrt(x^2 - a^2) - bx/a) = lim[x->oo](b/a (sqrt(x^2 - a^2) - x)) = b/a lim[x->oo](sqrt(x^2 - a^2) - x) (sqrt(x^2 - a^2) - x)(sqrt(x^2 - a^2) + x) = b/a lim[x->oo]-------------------------------------------- (sqrt(x^2 - a^2) + x) (x^2 - a^2) - x^2 = b/a lim[x->oo]--------------------- sqrt(x^2 - a^2) + x -a^2 = b/a lim[x->oo]-------------------------- x sqrt(1 - a^2/x^2) + x -a^2 = b/a lim[x->oo]-------------------------- x(sqrt(1 - a^2/x^2) + 1) -a^2*1/x = b/a lim[x->oo]----------------------- sqrt(1 - a^2/x^2) + 1 -a^2(0) = b/a lim[x->oo]------------- sqrt(1) + 1 0 = --- 2 = 0 So the curve does in fact approach the line.

I’ve corrected some serious typos here.

The form used here for the asymptotes, \(y=\pm\frac{b}{a}x\), is the slope-intercept form. Putting everything on one side yields the form Doctor Fenton used above: $$y=\pm\frac{b}{a}x\\ \\ \frac{b}{a}x\pm y=0\\ \\ \frac{1}{b}\cdot \frac{b}{a}x\pm \frac{1}{b}\cdot y=\frac{1}{b}\cdot 0\\ \\ \frac{x}{a}\pm \frac{y}{b}=0$$

]]>While I was looking through recent questions to choose one to post, I ran across one that deals with an error we see very commonly – in fact, a student I had worked with that very afternoon in face-to-face tutoring had done the same sort of thing. The context here deals with trigonometric identities, but it could just as well occur in working with the Pythagorean Theorem in geometry, or solving an equation in algebra, or even in calculus. We’ll also see a number of other pitfalls for beginning algebra students.

This is the question, sent to us by Simran in mid-August:

Hello math doctors,

When we write cos A as sin A, that is,

cos A = 1/√1 – sin²A

why is it sin² A and not sin A? Shouldn’t it be sin A in the root as we have taken the square root?

Sorry, if the doubt is too dumb.

Thanking you,

Regards,

Simran

Our first task, besides reassuring Simran, is to determine which of several issues is the central doubt.

Doctor Rick answered with a variation on our common statement that “the only stupid question is the one you don’t ask”:

A doubt

unexpressedis dumb (literally – “dumb” means silent). It is smart to express your doubts so that they can be cleared up.

Simran is doing just what needs to be done. I often discuss with tutees the fact that asking questions is the only way to truly learn. If you feel you can’t ask a question in class, ask a tutor, or write to a site like ours.

The word “dumb” originally meant “mute” (unable to speak), but came increasingly to mean “stupid” (having nothing to say), and its use to refer to mute people became offensive. But either way, if you ask questions, you are not dumb!

What you wrote has several problems in terms of order of operations, and some other problems as well, so I can’t be exactly sure what you are thinking. You wrote:

When we

write cos A as sin A, that is,cos A = 1/√1 – sin²A

why is it sin² A and not sin A? Shouldn’t it be sin A in the root as we have taken the square root?

I assume that you meant “when we write cos A

in terms ofsin A”.

This is a phrase many students need to be introduced to! In fact, again, just today I helped a student understand what she was being asked to do, when the question said to express one function *in terms of* another. It means to use the latter to state the former.

Now, what does “cos A = 1/√1 – sin²A” mean?

What you wrote on the second line is not correct. Here is how the correct equation is derived from the Pythagorean identity:

sin

^{2}A + cos^{2}A = 1cos

^{2}A = 1 – sin^{2}Acos A = ± √(1 – sin

^{2}A)The square root is

not in the denominatoras you put it. Also, notice theplus-or-minus sign. That is needed to make the equation true for all angles — for instance, if A = 120°, then sin A = √3/2 and cos A =–1/2.

The **parentheses** are also important! This is a common problem in typing radicals, because we can’t draw the bar (called a vinculum) over the radicand, to show what we are taking a root of. In this setting, we need to use parentheses in place of the untypable vinculum. As Simran had typed it, “1/√1 – sin^{2} A” would just mean \(\frac{1}{\sqrt 1}-\sin^2(A)\) rather than what is presumably intended, \(\frac{1}{\sqrt{1-\sin^2(A)}}\), or what it should be, \(\sqrt{1-\sin^2(A)}\). But perhaps he did mean what he wrote, as we’ll be seeing.

But these things aren’t what you’re asking about; you are focused only on

whether sin A should be squared. Could you say more aboutwhyyou think sin A should not be squared? What if we put trig aside, and just solve the equationx

^{2}+ y^{2}= 1for y in terms of x?

y

^{2}= 1 – x^{2}y = ± √(1 – x

^{2})Perhaps you’re thinking that, to take the square root of 1 – x

^{2}, we can justtake the square root of 1 and the square root of x, getting^{2}√(1 – x

^{2}) = √1 – √(x^{2}) = 1 – xIf that’s what you’re thinking, we can talk it through thoroughly — but if it’s something else, I want to know, so I don’t waste your time. So let me know what you’re thinking, and we’ll talk about it.

The suggestion here is that Simran might be thinking that the square root “distributes over the addition”, so that \(\sqrt{a+b}=\sqrt{a}+\sqrt{b}\). As we’ll see, this is a common mistake.

Simran replied,

Hi Doctor Rick,

Actually I thought that

when we will square it the square will go away. I don’t understand the square still remaining in the expression. Sincecos A is written without squareso shouldn’t 1 – sin A be written without square too? I don’t understand how after writing square root of a number we can write the square along with it. If possible can you explain me in simple terms.I hope it’s understandable because I am myself confused how to express the doubt.

And thankyou I often think my doubts are too dumb to be answered.

Regards,

Simran

It appears that he did mean \(\cos(A) = 1 – \sin(A)\), and the distribution idea is probably part of the problem. We’re getting closer.

And he is saying “square it” probably because of the awkwardness of expressing a square root in English; I find many students say “square root it”, which is not a standard verb, rather than the proper but wordy “extract the square root” or “take the square root”. It’s also easy to omit the word “root”, as he has done here.

Doctor Rick wrote back, pointing out the wrong word usage (which, again, I find to be quite common even in native speakers of English):

I cannot understand what you are saying. “

When we square it…” What did we square? Did you mean “when we take thesquare root“? If we’re going to straighten out your confusion, we have to write carefully. I know that’s hard when you’re confused!Perhaps you are thinking that, for example, √(x

^{2}) = x. You may know that this isn’t quite correct – if x can be negative, then we should write √(x^{2}) = |x|. But I don’t want to cause more confusion, so I’ll keep it simple:if x is positive, then squaring it and then taking the square root of the result gets you back to x.Note, though, that when we do this,

boththe square-root signandthe exponent 2 go away. If we still need the radical (square-root sign) then we also still need the square.

That is, we can’t say that \(\sqrt{x^2}=\sqrt{x}\). But more important, in the expression we’re discussing, we aren’t taking the square root of the square itself at all:

And if we do something else to the square

beforewe take the square root, then we can’t just “cancel” the two symbols. We have to first write the expression just as it is, andlook to see if there is any valid propertyor fact that we can use to simplify it. If I havey = √(1 – x

^{2})there is no property of exponents or radicals that I can use to simplify this!

I very often find that students need to consciously ask themselves whether there is more that can be done, forcing themselves to stop and think, rather than let their “simplifying momentum” carry them beyond what is legal.

When you don’t know whether a step is valid, one way to check is to try it with specific numbers.

Let’s plug in some numbers. If x is greater than 1, then 1 – x

^{2}will be less than zero, and we can’t take the square root of a negative number. (Well, we can, but we’d need to talk about imaginary numbers then, and that’s irrelevant to the trig context.) So I’ll choose a number between 0 and 1 – let’s say x = 0.75. Theny = √(1 – (0.75)

^{2})= √(1 – 0.5625)

= √0.4375

= 0.6614

I think (though you did not confirm this) that you are thinking √(1 – x

^{2}) is the same as √1 – √(x^{2}), which is the same as 1 – x. But if x = 0.75, then 1 – x = 0.25,not0.6614.

So, $$\sqrt{1-0.75^2} = 0.6614$$ but $$1-0.75 = 0.25$$

They are not equal. So when you come to an expression like the former (but with a variable so you can’t just evaluate it), you must put on your brakes and stop. Don’t keep simplifying when the next step, though it feels natural, is illegal!

If the values had been equal, it would not prove the general statement to be true; but if it fails for one number, then it *can’t* be generally true!

Again, I have to ask: Is this what you’re thinking? If the problem lies here, then we can solve your confusion without bringing trigonometry into it. If not, then keep trying to explain your thinking as clearly as you can. For instance, don’t use words like “it” without saying exactly what “it” is.

Simran replied, confirming the interpretation of his issue:

Hi Doctor Rick,

Thank you!! Yes

I was assuming we took the root of x²that’s why I was confused and by square I meant that 1 – x² is present in square form sowhen we take the rootwe should be writing as ✓1 – x but now I get it.Thanks a lot!!

Regards,

Simran

Doctor Rick could say more now:

Good! Since now I know what your misconception was, let me say a bit more about it.

It is not unusual for students to think that √(a

^{2}+ b^{2}) = a + b. If they think about it, though, this would turn the Pythagorean theorem, c^{2}= a^{2}+ b^{2}where a, b, and c are sides of a right triangle, into c = a + b. That’s not right! So it’s mostly when a student isn’t really thinking about it that they fall into this trap.

For example, taking the familiar 3-4-5 right triangle, the hypotenuse can be found as $$\sqrt{3^2+4^2}=\sqrt{9+16}=\sqrt{25}=5$$ but if we could distribute the root, we’d have $$\sqrt{3^2+4^2}=\sqrt{9+16}=\sqrt{9}+\sqrt{16}=3+4=7$$ which is wrong.

If we compare problem solving to a race, it is easy to coast at the end, when the goal is just ahead; but we must keep thinking carefully to the end. Never stop thinking!

In the same way, if a student is going too quickly though a problem, he might write (a + b)

^{2}as a^{2}+ b^{2}. It’s really the same error. If we expand the square properly, we get (a + b)^{2}= a^{2}+ 2ab + b^{2}. And if we take the square root of both sides of this equation, we find that a + b = √(a^{2}+ 2ab + b^{2}), not √(a^{2}+ b^{2}).

If it were true that \(\sqrt{a^2+b^2}=a+b\), then it would be true that \(a^2+b^2=(a+b)^2\); but that is missing the middle term.

Simran closed:

Hello doctor Rick,

Thank you, the example is really helpful and as I always solve questions in hurry I would be cautious of this.

Regards,

Simran

There’s another lesson learned! And this is what we’re here for.

]]>

The question came to us at the end of July from Shaurya:

Respected maths doctor,

In my school mathematics textbook set is defined as follows:

a set is a well defined collection of objects.Then what I understood was like that if I talk about set of planets in our solar system then anyone would

collectplanets in solar system from their actual position and place it in front of me so that I could say now I have collection of all planets in my solar system but that would not make any sense.Then I studied further and found an extract in Wikipedia. I think that it could solve my doubt but I am not able to understand what it means to say and it is as following:

a set is a gathering together into a whole of definite, distinct objects of our perception or thought which are called elements of the set.Please explain it in simplified form and tell me that whether there is any flaw in definition of set in my textbook.

Thank you\

Can I define sets as follows?

a set is the gathering of definite quantity of words or symbols that triggers our abstraction for particular object.

The first definition is very common, and very basic. The second, from Wikipedia, tries to say more, and therefore raises more questions! And Shaurya, in trying to put this in his own words, helps us to see a little of where he is struggling. Paraphrasing can be a good way to demonstrate your understanding (as I recall being told to do in studying poetry in English class long ago). He has chosen to use “gathering” rather than “collection”, and “words or symbols” to extend “objects”.

Though I have had most of our discussions with Shaurya, Doctor Rick answered this one, starting from another of our standard sources for definitions:

Hi, Shaurya.

Here’s the definition given in

Wolfram MathWorld:A set is a

finite or infinite collection of objectsin whichorder has no significance, andmultiplicity is generally also ignored(unlike a list or multiset).Is it the word “

object” that bothers you? MathWorld’s definition, like your textbook’s definition, uses that word.

This definition lacks the term “well-defined”, but introduces two additional issues, order and multiplicity.

For now, we focus on the word “object”:

Doctor Peterson has answered your questions about the history of set theory, and one thing he mentioned was “abstraction”. From other questions, we see that you have learned something about group theory, which is all about abstraction – that is, ignoring a lot of details (about addition of whole numbers, for instance) and focusing only on a few key points, then seeing what you can do with those key points alone.

Sets are sort of an extreme abstraction.We don’t care about what objects are in the set – they could be numbers, they could be words, they could be colors, they could be chemical elements … they could be planets! It is not senseless to talk about the set of planets.The word “object” in definitions of “set” is deliberately vague, even undefined, because it makes no difference what kind of thing is in a set – we essentially ignore everything about it, except that it is different from the other members of the set.

The idea here is that we use the word “object” in the broadest possible way. It does not convey any idea of solidity or reality; I sometimes use the word “thing” in the same way, mostly to emphasize that I don’t care what sort of “thing” it is. The second definition Shaurya quoted, which we’ll get back to soon, seems to be trying to say the same thing in more words rather than fewer: These “objects” can be *anything*.

Perhaps you are concerned about the word “

collection“, which again is in both MathWorld’s and your textbook’s definitions. From your proposed definition, it appears that you prefer the term “gathering together”. The word “collection” means eitherthe act of gathering things together, orthe result of that gathering— and the latter of these is what we have in mind here. However, “gathering” does not avoid the problem you seemed to have with a “set of planets”: namely thatwe can’t actually bring the planets together in one place.So you propose that the things “gathered” are not the objects themselves, but words or symbols representing the objects. That makes some sense … but I think it’s still too concrete. A set doesn’t have to be written down in any form. Note that the MathWorld definition says a set can be infinite! It is not possible to collect an infinite number of objects in any physical sense, no matter what they are.

I am wondering if Shaurya’s suggestion of “words or symbols” might be motivated by one of the ways to define a set, called “roster notation”, in which we simply list the elements – using symbols like \(\{1,2,3,\dots\}\), or using words like \(\{\text{Mercury},\text{Venus},\text{Earth},\dots\}\). Here we are “gathering together” not the “objects” themselves, but representations of them. The problem is that this is not the set itself, but a *representation* of the set! The set does not consist of the words or symbols, but of the “things” they refer to.

My answer to your objections (as I’m interpreting them) is simply to take the word “collection” in a purely

conceptualsense. If I canimagineall even numbers on my left and all odd numbers on my right, then I have created two sets, the set of even numbers and the set of odd numbers (both infinite, by the way).

The word “collection”, as I think of it, is better than “gathering” simply because the latter is concrete, while the former is commonly used both ways. For example, here is a definition of the word from Merriam-Webster:

Definition 2a is a literal, physical gathering; 2b is what we have in mind for sets; the example is not physical “objects”, but concepts that are thought of as a single group.

Another way to define a set, called “set-builder notation”, may better represent this idea. Using Doctor Rick’s example, we can define the set \(\{2,4,6,8,\dots\}\) by describing the elements: \(\{x\in\mathbb{N} : x\text{ is an even number}\}\). Here, the statement following the colon is a way to test a potential element to see if it belongs; we don’t need to actually “collect” it, but just know how to identify it.

We must not focus so much on questionable details that we overlook what parts of the definition are important:

Let me point out the phrases in MathWorld’s definition that really matter. They are not the words “collection” and “objects”, but the descriptions “

orderhas no significance” and “multiplicityis (generally) ignored”. The second of these corresponds to the word “distinct” in the extract you found on Wikipedia: you can’t have two objects that are the same — all objects in the set can be distinguished one from another. The first phrase means that we ignore any sort of relation among the objects — even if there is a “first object” or “biggest object” or anything like that, we don’t care.

So we don’t distinguish between the sets \(\{1, 2\}\) and \(\{2, 1\}\); we can’t think of one element as first. We also can’t distinguish between the sets \(\{1\}\) and \(\{1, 1\}\); an element is either in the set or not, never “in twice”.

On the other hand, your textbook’s definition and the words from Wikipedia have something in common that MathWorld didn’t emphasize: “

well-defined” or “definite“. You put “definite quantity” in your proposed definition. The idea is that there must be no question whether any particular object is in the set or not! We can’t have a “set of all tall people” unless we have a very precise definition of “tall”. (As there are “multisets” in which there can be more than one of the same object in the set, there are also “fuzzy sets” in which a given object has some probability, between 0 and 1, of being in the set. These are different things from sets!)Those are my thoughts. The important thing here isn’t coming up with exactly the correct way to define a set, but developing

a clear sense of what is a set and what is not a set.

This is important: It is easy to try to hard to make a perfect definition, and miss the whole point. The concept of “set” is hard to pin down, yet extremely simple.

I had noticed something in Shaurya’s question, so I jumped in:

I’d like to add a couple little thoughts to what Doctor Rick said.

First, I observe that the definition you quoted from Wikipedia,

a set is a

gathering togetherinto a whole of definite, distinctobjects of our perception or thoughtwhich are called elements of the setis actually a translation into English of what the originator of the concept, Georg Cantor, said (in German) in introducing his idea. They also show, as an image, another translation of the same German:

When we translate from one language into another, there are frequently multiple words we could choose, which may have slightly different connotations (or sometimes entirely different meanings). Here, the English word “collection” comes from a Latin word meaning “gathering together” (*con-*, “together”, plus *legere*, “select”); the German word “Zusammenfassung” (which today is translated as “summarizing” or “compilation”) comes from *Zusammen*, “together”, plus *fassen*, “grasp”. Both historically could be thought of as literally bringing things together, but both can be used figuratively, especially the German. “Gathering together” is too literal a translation.

Note that they use different English words here both for “

set” (“aggregate”), “gathering together” (“collection”), and “distinct” (“separate”). This suggests that the precise words are not the important thing, and also that you shouldbe careful not to read too much into the words. I have wondered if part of your difficulty could be the language.“Collection” commonly means a group of things actually “collected” (gathered) together in one place, but that is not what is meant here. For that reason, I think “gathering together” is a poor translation. He is talking about a sort of mental “gathering”,

simply– what he calls a “whole”. When we think of all the integers as a set Z, that is what we are doing:thinkingof a group of things as a single entitybring them together, thinking of all of them together as forming a single thing.in our minds

This will hopefully deal with the idea of having to literally bring the planets together (which was a wonderful example to use!).

So Cantor’s definition is simply saying that we can consider

any bunch of things we think about(“objects of our intuition/perception or thought”) and think about them together (“collection”). That’s all it is. And it’s very vague because it is meant to allow these “objects” to be anything at all, tangible or not.

I deliberately use the phrase “bunch of things” in talking about sets, to convey the informality of the definition. And I think Cantor’s explanation of the “objects” he was referring to had this same goal: to take attention away from anything concrete, and focus on the fact that they can be anything we can imagine.

Shaurya responded:

Respected Math doctor,

So does it mean that a set is

a sort of mental gathering of the abstract description of object that we can perceiveplease help me to conclude by providing confirmation.Please tell me whether the definition of set in my textbook is correct or not.

And please elaborate the meaning of the phrase

object of our intuitionsin sir Cantor’s Definition.Thank you

On his first sentence, I replied:

The trouble with this description is that it sounds more psychological than mathematical!

As Doctor Rick said, it is not words or symbols that constitute a set, nor is it “

descriptions“. And a set is not thought of as “mental” (existing in one’s mind). In some sense they exist in some imagined “world” of mathematical objects.The point of all these extra words is merely to emphasize that the idea of sets is

entirely abstract. Nothing here should be taken as concrete. The elements of a set can be any “things” we can think about.

So the contents of a set are not “descriptions”, but the “things” themselves. But the “things” (objects) may be “perceived” merely by imagination. The important thing is that we don’t care what they are, so we shouldn’t put too much effort into being specific about it!

As to his book’s definition,

That definition was

a set is a well defined collection of objects.

That is an appropriate definition,

to the extent that it can really be defined at all. You just have to keep in mind that an “object” can beanything you can imagine, and that “collection” doesn’t mean you are literally gathering them together in one place, but merely that you arementally tagging them as belonging to one set.The MathWorld definition Doctor Rick quoted at the top is identical apart from adding qualifiers and dropping the term “well-defined”, which as he said is useful:

A set is afinite or infinitecollection of objectsin which order has no significance, and multiplicity is generally also ignored (unlike a list or multiset).Nothing is wrong with your book’s definition.

These are just slight variations on the same definition, emphasizing different details. But …

I should explain my comment, “

to the extent that it can really be defined at all“. In formal mathematics, we treat “set” as anundefined term; the “definition” we are discussing is really adescriptionof how to think of a set, informally. As it sayshere, for example,“… this is a basic,

undefinedword in mathematics. Other things are defined in terms of it, but it is not defined in terms of other mathematical words. Maybeimagineany collection of objects, which can be physical objects, numbers, or other sets. I sometimes think of sets as paper grocery bags, with the paper bag denoted by { and }.”Similarly, as explained

herein far more detail,“The notion of set is so simple that it is usually

introduced informally, and regarded as self-evident. In set theory, however, as is usual in mathematics, sets are givenaxiomatically, so their existence and basic properties arepostulatedby the appropriate formal axioms.”So your questions about words like “collection” and “object” and “intuition” are not really about the mathematical

definitionof a set, but just about how wethinkof sets. It’s precisely because we can’t clearly define what a set actually is, that we ultimately “define” a set only in terms ofaxioms. So don’t worry that the definition is unclear to you. It is meant to be!

I often use the bag analogy myself. But in using such an image, we are not defining, just waving our hands and saying “A set is sort of like this.” This informal approach is what is called Naïve Set Theory, which is sufficient for most applications. As I indicated, mathematicians, in Axiomatic Set Theory, recognize that it is impossible to make a fully clear definition – and since definitions have to refer back to some previous, more basic concepts, something must ultimately be undefined, as discussed in the post Why Does Geometry Start With Unproved Assumptions?

As for the phrase “object of our intuitions”:

This, again, merely emphasizes that an “object” can be

anything we can think about. I’m not sure of the German word that was translated either as “perception” or as “intuition”, but it would appear to mean something we “perceive” either by literally sensing it (e.g. seeing) or by imagining it. I can’t see a “3”, but I can perceive the “threeness” of a group of three ducks, so that the idea of “3” is in my mind, and I can think of it as existing in some ideal world.But that is largely philosophy or psychology. In mathematics, we don’t worry about what a 3 actually is! We just know what we can do with it (axioms).

I did later find the exact quote in German, here:

“Unter einer ‘Menge’ verstehen wir jede **Zusammenfassung** M von bestimmten wohlunterschiedenen Objekten unserer **Anschauung** oder unseres **Denkens** (welche die ‘Elemente’ von M genannt werden) zu einem Ganzen.” – Georg Cantor

Google translates this as

“By a ‘set’ we understand every **combination** M of certain well-differentiated objects of our **intuition** or our **thinking** (which are called the ‘elements’ of M) into a whole.” – Georg Cantor

The word here translated as “intuition” is also used to mean “view” or “opinion”; he is simply referring to “things we can think about”.

One final thought from me:

All we are doing with a set is

thinking of some things as a single entity.If I talk about the set of all mammals, I am not collecting anything together, or even trying to imagine all mammals in the world being in one place. I am merely

describing how to identify members of the set. If someone were to point to a mountain goat and ask if it is in my set, I would say yes, because mountain goats are mammals. That is all the set means.

I don’t need to gather all mammals, or to see them, or to list them, or to name them, but only to think of them as one kind of thing. They are a set.

]]>We’ll look at a very complicated logarithmic equation, which leads to quartic equations and some very interesting graphs. We won’t find a fully satisfying solution method, but we’ll have some fun trying – and reveal the fallibility of at least one Math Doctor!

The problem came from Zawad at the end of July:

I can’t find how to start.

That’s about as ugly a log equation as I’ve ever seen!

Incidentally, there is nothing here to indicate what base is intended for the logs; at the high school level, “log” commonly means the “common log”, using base ten, but at higher levels it can mean the natural log, with base *e*. We’ll find out in Zawad’s responses that base ten is at least what he expects it to mean!

I didn’t have a complete solution, but I could at least offer a way to start:

Hi, Zawad.

I would start by

changing variables, definingu = log(x)andv = log(y). Then write the equations in terms of u and v.It will still take work, but it will look a lot more familiar!

This is a standard way to make the “ugly” a little less so, when certain expressions appear repeatedly. Here, *x* and *y* appear only within logarithms, and we know we can expand each logarithmic expression in terms of the two basic logs.

Zawad tried out my suggestion:

Let log x = u, and log y = v.

u + {(u+v⁸)/(u²+v²)} = 2

v + {(u⁸-v)/(u²+v²)} = 0

So, u = 1, v = -1.

Now, log x = 1, so x = 10.

And log y = -1, so y = 1/10.

The basic ideas are right, but he’s made a simple error at the start, one that is very easy to make when you do too much at once. The first steps need to be written out in detail to be sure they are right.

I had several comments to make on the work:

At first, I thought you had done well with my little hint. But there are two problems with this solution. Make that three.

First, unfortunately, log(xy

^{8}) = log(x) + 8log(y) =u + 8v, not log(xy^{8}) = log(x) + log(y)^{8}=u + v. So your new equations are wrong.^{8}

This was just a little slip, but totally changed the problem! The good thing was, this allowed me to talk about subsequent errors without giving away the answer.

Second, you don’t show how you found the solution, but I suspect you found it

by inspection, simply seeing that (1, -1) works. This doesn’t prove that that is theonlysolution.

The solution of the (incorrect) system $$\left\{\begin{matrix}u + \frac{u+v^8}{u^2+v^2} = 2\\ v + \frac{u^8-v}{u^2+v^2} = 0\end{matrix}\right. $$ is not obvious! But if we just guessed small whole numbers, we could find that in fact, if \(u = 1, v = -1\), then $$\left\{\begin{matrix}u + \frac{u+v^8}{u^2+v^2} = 1 + \frac{1+(-1)^8}{1^2+(-1)^2} = 1 + \frac{2}{2} = 2\\ u + \frac{u^8-v}{u^2+v^2} = (-1) + \frac{1^8-(-1)}{1^2+(-1)^2} = -1 + \frac{2}{2} = 0\end{matrix}\right. $$

But we want **all** solutions, and finding one doesn’t mean we’ve found them all:

In fact, if I graph your (wrong) system of equations (using Desmos), I find that there are

foursolutions:Of course, that’s not the right graph!

I wouldn’t have shown this graph if it had been for the right equation; but it was a nice way to show (spectacularly) that while \((1,-1)\) is a solution, there are others, one even being the “trivial” solution \((0,0)\). Well, not really … try it out in the equations and see what you get!

(Incidentally, while Desmos is amazing in graphing this sort of equation, in this case it doesn’t show intersections when I select one of the equations, as it does in simpler cases. I had to plot the four apparent solutions myself. I’m not sure whether this is intentional; but the result is that I can’t blame Desmos for my claiming an extraneous root as real. I simply failed to check … after pointing out the importance of doing so!)

Third, you didn’t

check your solution in the original equations. If we put (10, 0.1) into the second equation, for example, we get 3.5, not 0.

This is a common error in solving an equation by substitution. Even when we remember to check our solution of the new equation, and back-substitute to get the solution of the original, it will be wrong if we had made a mistake in the substitution itself – as we did here. But the very ugliness of the original equation leads us to want to avoid that check. But let’s do it, plugging in \(x=10,y=0.1\):

$$\left\{\begin{matrix}\log(x)+\frac{\log\left(xy^8\right)}{(\log(x))^2+(\log(y))^2}=2\\ \log(y)+\frac{\log\left(\frac{x^8}{y}\right)}{(\log(x))^2+(\log(y))^2}=0\end{matrix}\right. $$

$$\left\{\begin{matrix}\log(10)+\frac{\log\left(10\cdot 0.1^8\right)}{(\log(10))^2+(\log(0.1))^2} =1+\frac{\log\left(10^{-7}\right)}{1^2+(-1)^2}=1+\frac{-7}{2}=-2.5\ne 2\\ \log(0.1)+\frac{\log\left(10^9\cdot 0.1\right)}{(\log(10))^2+(\log(0.1))^2}=-1+\frac{\log\left(10^9\right)}{1^2+(-1)^2}=-1+\frac{9}{2}=3.5\ne 0\end{matrix}\right. $$

Try again. The correct system in (u, v) is not much easier; it’s possible to find each solution by guess-and-check (unlike your system), but you still wouldn’t be sure you didn’t miss anything.

I will admit that I have not yet pursued the problem far enough to see a nice algebraic way to find all solutions. I cheated by making the graphs (only to check your work).

May I ask where this problem came from? Does the source imply that it should be quickly solvable, or that it should be very challenging (or both)? It’s definitely interesting!

My main goal here, besides getting Zawad on the right track, is to avoid implying that the rest will be easy!

Zawad wrote back:

At first, I’m sorry for my mistake.

Let log x = u. and log y = v

So,

u + {(u+8v)/(u²+v²)} = 2……(1)

v + {(8u-v)/(u²+v²)} = 0……..(2)

(u, v) = (-1, 2). or, (u, v) = (3, -2).

[I haven’t understood how to solve the two equations. So, I used Wolfram Alpha]

So,

log x = -1. x = ⅒

log y = 2. y = 100.

Again,

log x = 3. x = 1000.

log y = -2. [It’s not possible]

(x, y) = (⅒, 100).Now, I’m humbly requesting you to tell me the process of solving the two equations (1 and 2). One of my friends has found the problem. But none has solved it yet which is very sad for us. Thanks.

I suspect the mistaken rejection of the second solution may be due to confusing the impossibility of taking the log of a negative number, with the log itself being negative.

Zawad has “cheated” even more than I had, using not just a grapher but a solver. Here is what Wolfram Alpha reveals:

So there are four solutions, two real (which perhaps are all we are required to find?); and there’s an interesting feature you may notice in the graph.

I had some corrections and comments (which, unfortunately, I made without checking what Wolfram Alpha did, but just using the graph I’d made):

There are

threesolutions; your second solutiondoeswork (why do you say not possible?), and there is another (trivial) solution.But so far, I have only found them either by “guess and check” (setting one variable to integers and seeing if both equations have a common solution for the other variable), or by graphing the equation in u and v on Desmos. I am with you in not yet having a proper algebraic solution.

Again, where did the problem come from? Not all equations can be solved algebraically, so I would want some evidence that it is intended to be solved that way. Presumably this is some sort of contest problem, and there may be some special trick I haven’t seen yet. Perhaps another of us has an idea. But don’t be “very sad”!

I was wrong about the third solution, because I depended too much on my Desmos graph! We’ll see that momentarily. Wolfram Alpha, too, can make mistakes, but here it clearly shows only two (real) solutions, and it turns out to be right.

Zawad only had questions at this point:

Yes, the second solution is possible. But what is the third trivial solution? How to solve that?

May be it’s a problem of a contest.

I now compounded my error:

Here is the graph I made, showing all three solutions to the equation in u and v:

You can see what I mean by “trivial”. It isn’t obvious when you just look at the equations, but it is the first thing one should try.

The trouble is, Desmos, as we already saw in the wrong graph, fails to show or even recognize “holes” in graphs. That trivial solution at \((0,0)\) is **extraneous**, because it requires division by zero.

Here is the graph of the original equations, on which only the trivial solution is visible:

The other two, of course, require zooming in on the appropriate places. One might not even think to look at all.

I am continuing to think about the problem as I have time.

Again, the trivial solution is not really a solution at all, so this graph lies in two ways: It shows a “solution” that is not, and it hides the solutions that are! Notice that the curves look as if they were becoming parallel as they extend in both directions, but they are not. Here are stretched versions of the graph to show the real solutions:

\((0.1,100)\)

\((1000,0.01)\)

You’d never guess these behaviors!

A day later, with no further responses, I reported my status:

I’ve found an approach that at least comes close to giving an algebraic solution.

Seeing that u

^{2}+ v^{2}appears in both equations, it occurred to me to express u and v in polar coordinates, letting u = r cos(θ) and v = r sin(θ). This led to a somewhat simpler system of equations, which I could turn into a fourth degree equation that can be factored (though that’s not easy). It leads to the two non-trivial solutions; the trivial solution had been lost by having assumed r ≠ 0, so it would have to be checked separately.There is probably a better way; I am omitting details in the hope that you will follow a different path from my hint and perhaps improve on my work.

This was the end of the conversation; but here is what I did:

Our correct system of equations was $$\left\{\begin{matrix}u + \frac{u+8v}{u^2+v^2} = 2\\ v + \frac{8u-v}{u^2+v^2} = 0\end{matrix}\right. $$

The substitution $$u = r\cos(\theta), v = r\sin(\theta)$$ yields $$\left\{\begin{matrix}r\cos(\theta) + \frac{r\cos(\theta)+8r\sin(\theta)}{r^2\cos^2(\theta)+r^2\sin^2(\theta)} = 2\\ r\sin(\theta) + \frac{8r\cos(\theta)-r\sin(\theta)}{r^2\cos^2(\theta)+r^2\sin^2(\theta)} = 0\end{matrix}\right. $$ which simplifies, after applying an identity to the denominators and multiplying both sides of each by \(r\), to $$\left\{\begin{matrix}r^2\cos(\theta) + \cos(\theta)+8\sin(\theta) = 2r\\ r^2\sin(\theta) + 8\cos(\theta)-\sin(\theta) = 0\end{matrix}\right. $$

(Observe that multiplying by \(r\) assumes it is non-zero, eliminating the (non-existent) “trivial” solution!)

Dividing everything by \(\cos(\theta)\), we get $$\left\{\begin{matrix}r^2 + 1+8\tan(\theta) = 2r\sec(\theta)\\ r^2\tan(\theta) + 8-\tan(\theta) = 0\end{matrix}\right. $$

Solving the second equation for \(\tan(\theta)\), we get $$\tan(\theta) = \frac{8}{1-r^2}$$

Replacing this in the first equation, and using the fact that \(\sec^2(\theta) = 1 + \tan^2(\theta)\), we get $$r^2 + 1+\frac{64}{1-r^2} = \pm 2r\sqrt{\left(1+\frac{64}{(1-r^2)^2}\right)}$$

Multiplying both sides by \(1-r^2\), we get $$(r^2+1)(1-r^2) + 64 = \pm 2r\sqrt{\left((1-r^2)^2+64\right)}$$

Expanding, squaring, and expanding again, we get the polynomial equation $$r^8-4r^6-122r^4-260r^2+4225=0$$

That’s really a quartic equation in \(r^2\), and (with effort) it factors as $$(r^2-5)(r^2-13)(r^4+14r^2+65)=0$$

This leads to four (apparent) real solutions, $$r=\pm\sqrt{5},\theta=\cot^{-1}\left(-\frac{1}{2}\right)\\ r=\pm\sqrt{13},\theta=\cot^{-1}\left(-\frac{3}{2}\right)$$ and four imaginary solutions, which I will ignore because they will involve taking the inverse tangent of a complex number, which takes us into even stranger territory.

These solutions yield $$u = r\cos(\theta) = \pm\sqrt{5}\cdot\frac{-1}{\sqrt{5}}=\pm 1\\ v = r\sin(\theta) = \pm\sqrt{5}\cdot\frac{2}{\sqrt{5}}=\pm 2$$ and $$u = r\cos(\theta) = \pm\sqrt{13}\cdot\frac{-3}{\sqrt{13}}=\pm 3\\ v = r\sin(\theta) = \pm\sqrt{13}\cdot\frac{2}{\sqrt{13}}=\pm 2$$

Pairing these off appropriately, we get our solutions in *u* and *v*, \((-1,2), (3,-2)\). And from that, the solutions to the original problem in x and y are \((0.1,100), (1000,0.01)\).

This wasn’t pretty. Feel free to comment if you see nicer ways to solve the problem!

]]>When you are given a problem about a triangle, there can be many ways to approach it: pure geometry, trigonometry, and analytic geometry come to mind. When the context doesn’t dictate a method (as turns out to be true here), you just have to try what feels right to you. This interesting problem will illustrate the difficulty we can have when we are not sure what a student has learned, but also the great flexibility we have in solving problems by multiple methods.

Kaloyan asked this in late July, stating the problem and showing initial work, just as we like to see:

A triangle \(A B C\) is given in which \(\alpha=3 \beta\). The angle bisector of \(\angle B C A\) divides the area of the triangle in ratio \(2: 1\). Find the angles of the triangle.

Let \(C L\) be the angle bisector of \(\angle B C A(L \in A B), C L=x\) and \(\angle A C L=\angle B C L=\gamma\). Now

$$\begin{aligned}&S_{\triangle A C L}=\frac{1}{2} b \cdot x \cdot \sin \gamma \\&S_{\triangle B C L}=\frac{1}{2} a \cdot x \cdot \sin \gamma\end{aligned}$$

Since \(\angle B A C=3 \beta>\angle A B C=\beta\), then \(B C>A C\), or \(a>b\). This means that the area of triangle \(\triangle B C L\) is greater than the area of triangle \(\triangle A C L\). So

$$\frac{S_{\triangle B C L}}{S_{\triangle A C L}}=\frac{2}{1} \Longleftrightarrow \frac{\frac{1}{2} a \cdot x \cdot \sin \gamma}{\frac{1}{2} b \cdot x \cdot \sin \gamma}=\frac{2}{1} \Longleftrightarrow \frac{a}{b}=\frac{2}{1} \Longleftrightarrow a=2 b$$

I don’t know if this can be helpful later on. I was just trying to see which of the triangles \(A C L\) and \(B C L\) has a greater area and I found that relationship naturally. Something else we can notice: if we manage to express \(\angle \gamma\) with \(\angle \beta\) the problem is solved, because the sum of the three interior angles in a triangle is always \(180^{\circ}\) and we’ll have an equation in terms of \(\beta\). Thank you in advance!

Here is a picture of the problem, labeled to show Kaloyan’s initial work:

We know only the ratio of two angles, and the ratio of the two areas. We are to find all the angles.

The problem was submitted in the category of Geometry, but his work uses Trigonometry heavily, implying we are to think of it in the latter category; his work, however, could have been done entirely using geometrical theorems. For example, the ratio of the areas of the triangles, when viewed as having common base *x*, is equal to the ratio of their altitudes to that base, which in turn is equal to the ratio of sides *a* : *b* (and also of sides AL : LB). This is familiar as one proof of the Angle Bisector Theorem.

An interesting feature of Kaloyan’s work is that he noticed that the problem didn’t identify the order in which the two triangles have the ratio 2:1, so he took the time to determine that, rather than just draw the picture and assume. This is a mark of a true mathematician!

Doctor Rick answered, suggesting a way to continue the work using trigonometry:

Hi, Kaloyan.

You have done well to determine that a = 2b. Now you can focus on triangle ABC; and you will not need to consider γ any further, either. Just use the

Law of Sinesand see what happens.A trig identity will probably be useful.

This seems a useful way to go, since we have ratios of angles and of sides! We’re looking only at the basics here:

Kaloyan replied, adeptly using the Law of Sines, but then hitting a roadblock:

Hi, Dr. Rick

The law of sines gives us

$$\frac{2 b}{\sin 3 \beta}=\frac{b}{\sin \beta} \Longleftrightarrow \frac{2}{\sin 3 \beta}=\frac{1}{\sin \beta} \Longleftrightarrow 2 \sin \beta=\sin 3 \beta$$

I don’t think we have studied the trig identity that you meant because nothing comes to my mind by looking at the equality.

The triple-angle identity that we need is not always explicitly taught, but can be derived if you just know the double-angle identities.

Doctor Rick responded, assuming that Kaloyan knows at least the double-angle identity:

Good work so far:

2 sin β = sin(3β)

You may not have seen the trig identity I’m talking about — it’s not one I ever

memorized, but I know that it exists. I suppose it’s clear what identity you need, isn’t it? You’d like to be able to write sin(3β) in terms of sin β (cos β may be involved too).

Do you know an identity for sin(2β)?And do you know how to derive that identity from theangle-sum identity?I suggest you start withsin(3β) = sin(2β + β)

and use that to derive an identity for sin(3β). Then you’ll have what you need to solve the problem.

Here is the derivation of the triple-angle identity he has in mind:

We commonly memorize these **double-angle** identities: $$\sin(2\theta)=2\sin(\theta)\cos(\theta)$$ $$\cos(2\theta)=\cos^2(\theta)-\sin^2(\theta)=1-2\sin^2(\theta)$$ and **angle-sum** identities such as this: $$\sin(\alpha+\beta)=\sin(\alpha)\cos(\beta)+\cos(\alpha)\sin(\beta)$$

Using these, $$\sin(3\theta) = \sin(2\theta+\theta) =\\ \sin(2\theta)\cos(\theta)+\cos(2\theta)\sin(\theta) = 2\sin(\theta)\cos^2(\theta)+(1-2\sin^2(\theta))\sin(\theta) =\\ 2\sin(\theta)(1-\sin^2(\theta))+\sin(\theta)-2\sin^3(\theta) =\\ 2\sin(\theta)-2\sin^3(\theta)+\sin(\theta)-2\sin^3(\theta) =\\ 3\sin(\theta)-4\sin^3(\theta)$$

Kaloyan answered, having found this identity and used it:

Hi!

I found that the angles of the triangle are 30, 60 and 90 degrees. So we’re actually dealing with a right triangle. Am I right?

Using the identity \(\sin 3 \alpha=3 \sin \alpha-4 \sin ^{3} \alpha\) we get

$$2 \sin \beta=3 \sin \beta-4 \sin ^{3} \beta \Longleftrightarrow 4 \sin ^{3} \beta-\sin \beta=0 \Longleftrightarrow \sin \beta=0 ; \pm \frac{1}{2}$$

\(\angle \beta\) must be an acute angle because \(\angle \alpha=3 \beta\) so we make the conclusion \(\sin \beta=\frac{1}{2} \Rightarrow \beta=30^{\circ}\).

To be honest we haven’t studied none of the identities you mentioned (angle-sum identity, double-angle identities). These are the identities we have worked with in school:

The solution here is very nicely stated, applying the triple-angle identity to the previously determined fact that \(2\sin(\beta)=\sin(3\beta)\), and then solving the resulting equation for the sine by factoring as \(\sin(\beta)(4 \sin ^2(\beta)-1)=0\), and then eliminating the solutions \(\sin(\beta)=0\) and \(\sin(\beta)=-\frac{1}{2}\) and the obtuse solution of \(\sin(\beta)=\frac{1}{2}\) because the angle must be acute.

So our triangle in reality looks like this:

It turns out that \(\beta=\gamma\) and the three triangles shown are all congruent 30-60-90 triangles, confirming that one part has twice the area of the other. And in fact, \(90 = 3\cdot30\).

But is this what Kaloyan was expected to do, or is there another way?

Doctor Rick responded:

I gather that you

looked up the identity somewhereand used it to solve the problem. Your work is correct; the triangle does turn out to be a familiar one, the 30-60-90 right triangle. But we could not have known this without first solving the problem, as far as I can see. Of course, we could have made a wild guess that β = 30°, and checked it, discovering that we had guessed correctly.Does the problem come from a textbook or other source directly related to the course you are taking, in which you have learned only the identities you showed? I am trying to figure out whether you are expected to solve the problem

using only what you have learned, or whether the problem assumes more knowledge than you have at present – namely, the identities I mentioned.

If the problem is independent of what Kaloyan is learning, then it would make sense for it to use facts he hasn’t learned; but if it is for his class, then we need to look for an appropriate method.

Kaloyan answered those questions:

> I gather that you looked up the identity somewhere and used it to solve the problem.You’re right.

> But we could not have known this without first solving the problem, as far as I can see.I’m thinking in this direction at the moment. Let M be the midpoint of BC. If we manage to show that AM=MC=MB=b, then we can say α = 90° which is enough. Do you have any ideas how can we prove that? I think the best shot is to try to prove that triangle ABM is isosceles. In other words, we need ∢BAM = β. This is all I have noticed so far.

> Does the problem come from a textbook or other source directly related to the course you are taking, in which you have learned only the identities you showed?The problem comes from my homework for the summer vacation. To be honest I don’t think my teacher has solved all the problems, so it could be her fault.

Kaloyan is now speculating about possible geometrical methods, something I often do when working on a proof, looking for a way to get to the known result. He sees several special facts about the triangle that we can see after solving it:

But could we determine any of these facts with only the data we were given?

Using his ideas (including the fact that \(a=2b\)), I see now that we could draw this, motivated not so much by a guess as by the ratio:

And from that, I can see a way to the goal, using his idea of showing that \(\delta=\beta\). (Can you see it?)

Doctor Rick had a different idea, still expecting to use trig:

The problem I see with your proposal is that we would never try to prove that α= 90° if we did not have good reason to suppose that it’s true. And if we do

suspectthis, we can confirm it just by working out the ratios of segments based on the familiar ratios for a 30-60-90 triangle. (If AC = 1 then AB = √3 from triangle ABC, and AL = √3 / 3 from triangle ALC where angle ACL = 60°/2 = 30°. Therefore AL:LB = 1:2 as required.)However, you give me some ideas. Let’s work with your point M from the other direction.

Forgetting that M will turn out to be the midpoint of BC, we can instead construct segment AM such that M is on BC and angle BAM = β. This is one of the trisectors of angle BAC. (For completeness we can construct the other trisector of angle BAC, AN with N on BC, but I don’t think we’ll be using it.) What this does for us is to avoid needing to talk about an angle 3β.I wonder if we can do something with this. I find myself getting confused easily because I know what I want to find, and it’s easy to forget what is established so far in this approach, as opposed to what I know from the previous work.

We are now forgetting what we know of the solution, and basing our work only on the given angle relationship:

Six hours later, he had more to say:

I want to add that the problem turned out to be surprisingly easy using the approach I suggested (once I got my head on straight). In fact, far from requiring knowledge of the triple-angle identity, it can be solved with

no trigonometry at all!Was the homework you mentioned for a trigonometry course, or does it make sense that it could be a basic geometry problem?

Kaloyan took that idea and ran with it:

Hi!

You’re right if I got it correctly. Here are my notes. A really nice idea (to construct one of the trisectors of angle BAC).

> Was the homework you mentioned for a trigonometry course, or does it make sense that it could be a basic geometry problem?I really can’t answer you unambiguously. As I already told you, this problem is from my homework for the summer vacation. It is supposed to cover all of the topics we have studied in school this year (trigonometry is one of them).

He was thoughtful to add English translations of key words in his work (which I believe is in Bulgarian). Let’s write it all out here, in case you have trouble reading it:

We define M as a point on BC such that \(\angle BAM = \beta = \angle ABM\). Therefore \(\triangle AMB\) is isosceles, and \(AM=BM\). Also, \(\angle AMC = \angle BAM + \angle ABM = 2\beta\) because it is an exterior angle.of \(\triangle AMB\).

On the other hand, \(\angle CAM = \angle BAC-\angle BAM = 3\beta-\beta = 2\beta\); so the base angles of \(\triangle AMC\) are equal, so that it is isosceles, with \(AC=CM=b\).

But then \(BM=BC-CM=2b-b=b\), so that also \(AM=b\). That makes \(\triangle AMC\) equilateral, so that \(2\beta=60^\circ\), \(\beta=30^\circ\), and \(3\beta=90^\circ\), giving us a 30-60-90 triangle. Once again, we have the actual triangle looking like

Doctor Rick approved, and gave his own version of the solution, reminding Kaloyan that even the first part of his work (done originally with trigonometry) did not need trigonometry, so that this is an entirely geometrical solution:

I think we’ve got it!

Do you see how your conclusion that BC = 2(AC) can be reached without trig? My figure is attached.

I worked out that AC = MC and AM = MB first,

without yet usingthe given information that the angle bisector of angle ACB divides the area of triangle ABC in the ratio 2:1.Then, adding one more segment, LM, I observe:

(1) triangle LAC is congruent to triangle LMC by SAS.

(2) If the area of triangle BLC is twice the area of triangle ALC, then triangles CML and BML must have the same area.

(3) These triangles have a common altitude from L, so their bases (MC and MB) must be equal.

Now we know (as you saw) that AC = MC = AM, so triangle AMC is equilateral and 2β = 60°. The rest follows. No trig, just basic geometry!

The last part of the work proved that \(a=2b\), without having previously used that fact.

We got several complete proofs for the price of one!

]]>Here we have a different kind of question than usual: A conjecture about distances between points, with a request for confirmation. Normally we like to just give hints to help a student figure something out; this was a request for a theorem that ought to exist, and trying to help led to providing the core of the desired proof. Since I haven’t found the theorem anywhere, let’s publish it here! In the process, we’ll see some useful ideas for problem-solving.

Here is the question, from mid-July:

Hello experts,

Thank you for having this wonderful site.

Given two sets of points (set A, set B) in 3D Euclidean space, two types of definitions of the distance between the two sets of points are considered.

The first definition is to use the

average distance of all pairs of points(one point in set A, another point in set B).The second definition is to compute a centroid of each set. Centroid is defined as the average coordinates of all points in the set. Then we use the

distance between the two centroids.I explored some examples and found the centroid distance may always be smaller or equal to the average distance. I tried to derive and prove it mathematically but I couldn’t. I assume there should be some theoretical results in math that can be used. But I’m not a math expert and not sure how to find it. My question,

is it true that centroid distance is always smaller than or equal to the average distance?Is there any known theorem that can be used to prove it? Thank you so much!Sincerely.

Jack

Here is an illustration of the idea:

We have a set of three points in red, and another set of three points in blue. Points \(A_c\) and \(B_c\) are the centroids of these two sets. In principle, you should think of this as a three-dimensional picture; but in fact I drew it on a plane, and calculated the centroid of the set, \(A={(1,3), (2,1), (2,4)}\) as \(A_c = \left(\frac{1+2+2}{3},\frac{3+1+4}{3}\right) = \left(1\frac{2}{3},2\frac{2}{3}\right)\).

The question is: Is the distance between \(A_c\) and \(B_c\) (the green line) never greater than the average of all the distances between individual points (dotted purple lines)? It’s true in my drawing, as it was in Jack’s examples. But can we prove it?

After pondering the question for more than a day, trying to search for a known theorem about this, I answered:

Hi, Jack.

I feel like there must be a familiar theorem for this, in some field, but I’m not finding it by searching related ideas (either in my mind or online). I’ve been waiting, hoping someone else will think of it!

But I can say that, in playing with it, I’m convinced that it is true, and that it can be proved based on nothing more than the

triangle inequality, ||x+y|| ≤ ||x|| + ||y||. That is,the length of a sum is no more than the sum of the lengths.Basically, since the centroid of a set of points is in effect the mean of their position vectors, your claim amounts to this:

Distance between means ≤ mean of distances

At this point, all I had in my mind was an analogy. A mean is closely related to a sum, and a distance is a length, so maybe this is just an extension of the same idea.

As indicated in the Wikipedia article I linked to, the triangle inequality can be expressed in several different ways. Geometrically, it involves the sides of a triangle:

In any **triangle**, the sum of two sides is greater than the third side; if B were on AC, making a **degenerate triangle**, we would have equality. So we say in general, that for any three points, $$x + y \ge z$$ This is an instance of the fact that “the shortest distance between two points is a straight line”, since we can see \(x+y\) as a longer path from A to C than \(z\).

We are going to use this property as expressed in terms of **vectors**:

Now **x** and **y** are vectors, whose sum, **z**, is along the third side. So the magnitude of the sum of any two vectors is no more than the sum of their magnitudes: $$\left\|\mathbf{x}\right\|+\left\|\mathbf{y}\right\|\ge \left\|\mathbf{x}+\mathbf{y}\right\|$$

We can extend this inequality to more than two vectors, which can be proved algebraically, but is obvious, since this is again a longer path than the straight line:

$$\left\|\mathbf{x}\right\|+\left\|\mathbf{y}\right\|+\left\|\mathbf{z}\right\|\ge \left\|\mathbf{x}+\mathbf{y}+\mathbf{z}\right\|$$

Now I just played with the idea I had, taking the simplest possible case, with only two points in each set:

My drawing here is in a plane, but the calculations don’t assume that; that’s one advantage of using a vector formulation. The centroid of A, for example, is the average of the position vectors for the points \(A_1=(1,3)\) and \(A_2=(2,1)\), namely \(\mathbf{A}_c=\frac{\mathbf{A}_1+\mathbf{A}_2}{2}=\left(\frac{1+2}{2},\frac{3+1}{2}\right)=(1.5,2)\). The distance between the two centroids is \(\left\|\mathbf{A}_c-\mathbf{B}_c\right\| = \left\|(-4,-2)\right\| = \sqrt{20} \approx 4.47\).

For the distances measured here, we find that, as expected, \(\frac{5.83+6.4+4.12+3.16}{4}=4.8775\ge4.47\).

So I tried proving the claim for this case, with two sets of two points:

Without trying to do a general proof, let’s consider sets A = {

A_{1},A_{2}}, B = {B_{1},B_{2}}. (Treat the points as position vectors.) Then the centroids of sets A and B will be (A_{1}+A_{2})/2 and (B_{1}+B_{2})/2, respectively. The distance between centroids is||(

A_{1}+A_{2})/2 – (B_{1}+B_{2})/2||.There are four distances between points in A and points in B, namely

||

A_{1}–B_{1}||, ||A_{1}–B_{2}||, ||A_{2}–B_{1}||, ||A_{2}–B_{2}||.The mean of these distances is

(||

A_{1}–B_{1}|| + ||A_{1}–B_{2}|| + ||A_{2}–B_{1}|| + ||A_{2}–B_{2}||)/4.By the triangle inequality,

(||

A_{1}–B_{1}|| + ||A_{1}–B_{2}|| + ||A_{2}–B_{1}|| + ||A_{2}–B_{2}||)/4 ≥(||(

A_{1}–B_{1}) + (A_{1}–B_{2}) + (A_{2}–B_{1}) + (A_{2}–B_{2})||)/4 =(||2

A_{1}+ 2A_{2}– 2B_{1}– 2B_{2}||)/4 =||(

A_{1}+A_{2})/2 – (B_{1}+B_{2})/2||,which is what we want to prove.

In itself, this was just one example; but it worked, which is encouraging. I didn’t want to write out a complete proof; but can we convince ourselves that it can be generalized? I thought about what would be different with more points:

So, as I thought, this is a simple extension of the triangle inequality.

If we had, say,

3 points in A and 4 in B, we’d have 12 distances, using each point in A 4 times in our sum, and each point in B 3 times; after dividing by 12, we’d end up dividing by 3 and by 4 respectively, andthe same thing would happen. So what I’ve written can be turned into a general proof without too much difficulty.And since it’s so easy to prove, it must be well-known, if I could just think what to search for …

“Easy” is a relative term! But no special knowledge seems to be required; if indeed this has never been stated before, it would have to be only because no one has asked the question! (Readers, feel free to comment if you find it stated somewhere!)

I intentionally left plenty for Jack to do, while just stepping quickly through the main points of my thinking in the expectation that he would ask about anything he didn’t understand. That’s a way of showing him respect.

Jack replied, verifying my hope that I hadn’t gone over his head:

Hello Dr. Peterson,

Wow, your proving is fantastic. Thank you so much!

I’m now wondering why I couldn’t figure this out by myself. I guess that maybe it’s too simple to have a theorem for this. Anyway, thank you again for this elegant answer.

Sincerely,

Jack

I responded,

I can’t say it was obvious to me! I spent quite some time trying to find a formulation I could search for, even a theorem in statistics for the one-dimensional case involving the mean difference between two data sets.

For me, the key to solving it was just what I showed: not trying to prove it in general, but just

“playing” with a very simple caseto get a feel for it, and to convince myself it was plausible. In fact, the generalization that turned the example into a proof was being figured out as I wrote that; all I intended to show you was the example, with the suggestion that it could be turned into a general proof. But before I hit Send, I had toconvince myselfit wasn’t true only for the number 2, or when the sets have the same size!For a final proof, I would want to write everything more generally, likely using summation notation — though that would make it a little harder to follow.

I often tell students that the way to solve a hard problem is just to **start doing something**. After wasting time (in some respects) trying to find a known theorem that covers this conjecture, I simply cut the problem down as far as I could (at the risk of possibly taking too special a case and making it too easy), tried it out, and accidentally made a proof! Too often students are afraid to do anything until they know what to do; they need to learn that problem-solving doesn’t require knowing the end, but only the willingness to start! (It’s also good to have enough experience to be able to recognize whether any progress has been made. But the way to get that experience is to try things!)

As I said, what I’ve given is only the outline of a proof; the actual proof would be considerably harder to understand. But the hard part is seeing that a proof exists. The rest, as they say, is left as an exercise for the reader.

To which Jack replied,

Thank you for this detailed explanation. When I worked on it, I was strangely focused on how to get rid of the magnitude operator by squaring it and I got lost in that plan to conquer it. I also played with some simple cases, but sadly I somehow forgot to use the famous “triangle inequality”. I feel like that I was trapped by myself.

To convert your example using summation notation and make it more general is nothing difficult after realizing what we want to use. I absolutely love the way you show the proof. You are great at teaching. I really appreciate your help!

If I were a mathematician, writing for mathematicians, I would not feel finished until I had written up a complete, watertight proof covering all cases. (Well, actually I’d trust them to fill in a lot of details.) But Jack’s mention of teaching is an important reminder: Our goal here is to teach, and in teaching, we need to present solutions in the way they are discovered, rather than in a final form; we need to show the way by example, rather than presenting intimidatingly perfect proofs. So I’m glad this came out the way it did.

As I was finishing up this post, Doctor Rick said the following in discussing a textbook proof with another student:

Regarding math “coming out of nowhere”, … the issue, I believe, is that many textbooks – particularly older textbooks (like 100 years ago), but some modern texts as well – tend to present math in terms of proofs and solutions

in polished form, and this does not help students learn to develop proofs and solutions on their own!In reality, the process of

solving a difficult problem is frequently very messy, with false starts and floundering before one finally gets an idea that will lead somewhere promising. It requires “playing around” with a problem –trying an example before tackling a general proof, making up a similar problem with smaller numbers, etc. This sort of activity is what we at The Math Doctors, and at Ask Dr. Math before us, try to model. We will workwitha student on a problem, encouraging him or her to experiment, make guesses and try them out,learn to recognize when an approach is leading nowhere and when to keep trying. It’s actually a positive thing when we don’t know the answer ourselves, so we have to flounder along with the student!

In the present case, I wasn’t really working with the “student”, but I was definitely trying to model how we think, as well as leaving room for him to polish it if he wishes. And I definitely didn’t know the answer when I started.

]]>