Questioning the End Effect 1

advertisement
Questioning the End Effect
1
Running head: QUESTIONING THE END EFFECT
Word count: 11,417
Questioning the End Effect:
Endings Do Not Inherently Have a Disproportionate Impact on Evaluations of Experiences
Stephanie Tully
New York University
Tom Meyvis
New York University
Author Note
Stephanie Tully is a doctoral candidate of marketing at the Stern School of Business, New York
University. Tom Meyvis is Professor of Marketing and Peter Drucker Faculty Fellow at the Stern School
of Business, New York University. Correspondence concerning this article should be addressed to
Stephanie Tully, Stern School of Business, 40 W. 4th Street, Ste 822, New York University, New York,
NY 10012. E-mail: stully@stern.nyu.edu.
Electronic copy available at: http://ssrn.com/abstract=2498663
Questioning the End Effect
2
Abstract
The present research re-examines one of the most basic findings regarding the evaluation
of hedonic experiences: the end effect. The end effect suggests that people’s retrospective
evaluations of an experience are disproportionately influenced by the final moments of the
experience. The findings in this paper indicate that endings are not inherently over-weighted in
retrospective evaluations. That is, episodes do not disproportionately affect the evaluation of an
experience simply because they occur at the end. We replicate prior demonstrations of the end
effect, but provide additional evidence implicating other processes as driving factors of those
findings.
Keywords: retrospective evaluations, end effect, experiences
Electronic copy available at: http://ssrn.com/abstract=2498663
Questioning the End Effect
3
Questioning the End Effect:
Endings Do Not Inherently Have a Disproportionate Impact on Evaluations of Experiences
“Did you like the concert?” “How much did you enjoy that restaurant?” “How painful
was this medical procedure?” To answer common questions such as these, people need to
evaluate the experiences they live through. Since these evaluations in turn influence people’s
willingness to recommend or repeat an experience (e.g., Wirtz et al., 2003), it is essential to
understand how people form such retrospective evaluations of their past experiences. The current
research re-examines one of the most basic findings in this area: the end effect. The end effect
refers to the fact that people’s retrospective evaluations are disproportionately influenced by the
final moments of the experience (e.g., Kahneman et al., 1993; Fredrickson & Kahneman, 1993).
While there are many prior demonstrations of the end effect, previous research has also
documented several notable boundary conditions. In the current work, we do not explore such
boundary conditions, but instead revisit the basic effect, focusing on a simple, continuous
experience to test if the end of an experience does indeed inherently receive disproportionately
more weight. While we acknowledge that endings can have a disproportionate impact on
retrospective evaluations, our findings suggest that this is not due to an inherent over-emphasis
of the final moments of an experience, but rather because of specific additional properties of the
end in certain settings.
Prior research has proposed that, when retrospectively evaluating an experience, people
do not add or integrate their reactions across the experience, but rather recall the most
representative moments of the experience and then evaluate the experience based on these
selected moments (e.g., Ariely & Carmon, 2000; Kahneman, 2000a; Kahneman, 2000b; Varey &
Kahneman 1992). Furthermore, the most representative moments of an experience tend to consist
Questioning the End Effect
4
of the most extreme moment (the peak) and the final moment (the end). Thus, according to this
evaluation-by-moments principle, the peak and the end of the experience will disproportionately
affect the global evaluation of the experience (Kahneman, 2000a). The over-weighting of the end
of the experience, in particular, has received substantial attention and has led to a variety of
recommendations to restructure experiences to take advantage of this effect, including to
optimize customer experiences (Cusick, 2012; Shaw, Dibeehi, & Walden, 2010), to understand
American’s sentiment about the economy (Surowiecki, 2002), and to improve personal
happiness and well-being (Conniff, 2006).
The end effect has empirically been demonstrated across a variety of domains and using a
variety of procedures (e.g., Kahneman et al., 1993; Fredrickson & Kahneman, 1993; Redelmeier
& Kahneman, 1996; Ariely, 1998). Much of the empirical support for the end effect is based on
the analysis of online (moment-to-moment) ratings of affective experiences. More specifically,
previous research has demonstrated that the online rating of the end of the experience is often a
disproportionately effective predictor of the retrospective evaluation of the entire experience.
This has been shown for a wide range of stimuli including medical procedures (Redelmeier &
Kahneman, 1996), painful pressure from a vise (Ariely, 1998), annoying noises (Ariely &
Zauberman, 2000; Schreiber & Kahneman, 2000), advertisements (Baumgartner, Sujan &
Padgett, 1997), and television shows (Hui, Meyvis, & Assael, 2014).
Additional evidence for the end effect comes from studies documenting the effect of
“adding a better end.” Participants in those studies show an irrational preference for negative
experiences with an additional period of reduced discomfort over the same experience without
the “better” (i.e., less aversive) end. For instance, Schreiber & Kahneman (2000) asked
participants to listen to a series of annoying noises and observed that participants preferred a
Questioning the End Effect
5
longer sound profile with a less intense ending to a shorter sound profile that was identical but
lacked the additional, less aversive ending. As an example, participants preferred a sound profile
that consisted of 8 seconds of noise at 78 decibels followed by 16 seconds of noise at 66 decibels
over a sound profile that consisted only of 8 seconds of noise at 78 decibels. This beneficial
effect of adding a better (less aversive) ending has also been observed with other experiences,
such as submerging one’s hands in ice water (Kahneman et al., 1993), undergoing a colonoscopy
(Redelmeier, Katz, & Kahneman, 2003), and judgments of hypothetical pain profiles (Varey &
Kahneman, 1992).
Finally, the end effect has also received support from studies that systematically
manipulated the order in which different components of the experience were presented, and
generally found that participants reacted most favorably to experiences in which the best part
was positioned at the end. For instance, Ariely and Zauberman (2000) observed that participants
rated an annoying sound profile as more aversive when the most intense sound was positioned at
the end of the profile rather than at the beginning or the middle.
However, in spite of these many demonstrations of the end effect, prior research has also
documented several important boundary conditions. First, the end of an experience does not have
a disproportionate impact when that experience is expected to continue in the future (i.e., when
the end is seen as temporary). For instance, when participants in a social interaction expected to
further interact with the other person in the future, the most recent interaction did not receive
additional weight in the global evaluation of the personal relationship (Fredrickson, 1991).
Similarly, when people evaluated a series of aversive pictures that they had viewed and
anticipated seeing again in the near future, the peak, but not the end dominated the evaluation of
that experience (Branigan et al., 1997). Second, the presence of the end effect also depends on
Questioning the End Effect
6
the type and structure of the experience. Breaking up simple experiences into segments
attenuates the end effect, whereas complex experiences consisting of qualitatively distinct
components often fail to show any end effect at all. For instance, segmenting aversive sounds
into discrete parts reduced the end effect (Ariely & Zauberman, 2000) and no end effect was
observed in evaluations of activities over the course of a day (Miron-Shatz, 2009), evaluations of
vacations (Kemp, Burt, & Furneaux, 2008), or evaluations of meals (Rode, Rozin, & Durlach,
2007). Finally, the end effect does not appear to be a basic evolutionary trait shared with other
animals as it does not extend to food sequence preferences of rhesus macaque monkeys (Xu,
Knight, & Kralik, 2011).
In sum, prior research includes both ample demonstrations of the end effect as well as
many studies documenting boundary conditions. What does this imply for the status of the end
effect? One possibility is that the end does inherently have a disproportionate influence, but
specific conditions can activate other processes that interfere with (or compensate for) the effect.
However, another possibility is that the end does not inherently have a disproportionate impact.
In that case, the prior demonstrations of the end effect may be driven by other mechanisms, with
the boundary conditions merely reflecting the absence of those mechanisms. Closer inspection of
the prior demonstrations of the end effect provides some initial support for this second
possibility. First, consider the prior demonstrations that adding a better end to an aversive
experience improves the overall evaluation of that experience. Although adding a better end does
indeed manipulate the end of that experience, it also reduces the average intensity of the
experience. Therefore, the improvement in the overall evaluation of the experience could be
driven by the change in average intensity, rather than the over-weighting of the end. As such,
these findings are more accurately classified as demonstrations of duration neglect, rather than
Questioning the End Effect
7
demonstrations of an end effect. Second, consider the finding that the online (i.e., moment-tomoment) rating of the final moments of an experience is a particularly good predictor of the
overall evaluation (relative to ratings of other parts of the experience). While this finding is
consistent with the overweighting of the end, it would also occur in the absence of an end effect
if participants’ online ratings incorporate information from past as well as current moments of
the experience. This would result in the final ratings being more informed than the initial ratings,
and therefore correlating more strongly with the overall evaluation. Moreover, providing explicit
online ratings may artificially enhance the salience of the final rating, leading participants to use
it as an anchor in their global evaluation. Finally, the finding that experiences are evaluated more
favorably when the best part is positioned at the end (rather than elsewhere in the experience) is
based on studies that systematically varied the position of the different components of the
experience within-subjects. Asking participants to evaluate multiple experiences that were
identical in all respects except for the order of the components likely encouraged participants to
rely on that order in their evaluations, as it was the only aspect that varied. As such, this
procedure may have lead participants to rely on their lay beliefs about how experiences should
be optimally structured (e.g., “This was identical to the previous experience, with exception that
the best part now came at the end. Ending on a high note is good, so I like that experience
more.”).
It should be noted that, even if endings do not inherently have a disproportionate impact,
this does not preclude that, under specific circumstances, they can in fact have a greater impact
than other parts of the experience. This would for instance be the case if the structure of the
experience is made salient and consumers rely on their lay beliefs about the desirability of
favorable endings (as may be the case in the within-subject designs mentioned earlier). Similarly,
Questioning the End Effect
8
past research has observed a strong impact of the end of an experience when that end is
particularly meaningful, as is the case with goal-directed experiences (Carmon & Kahneman,
1996), where endings determine whether a goal is met, and television shows (Hui et al., 2014),
where endings serve as meaningful conclusions of a storyline. However, in those cases, the end
does not have a disproportionate impact merely because it is the end, but rather because it has
additional properties that increase its significance relative to the rest of experience.
Overview of the Current Research
In this paper, we focus on the inherent impact of the end, and test whether merely being
the end of an experience is sufficient for disproportionately affecting overall evaluations. To do
so, we do not examine novel situations, nor the complex experiences that have been previously
identified as boundaries of the effect. Instead, we examine the type of basic experience that has
traditionally been used in studies that have provided support for the effect: listening to short
fragments of simple auditory stimuli. In a first study, we observe that aversive sounds with either
a better beginning or a better ending are not rated differently, even when participants clearly
recall the ending as better or worse than the rest of the experience. The remaining studies
reconcile this lack of a discernable end effect with previous demonstrations of the effect. Studies
2 and 3 demonstrate that changing the ending does affect evaluations when the end changes the
experience’s overall average, but not when the average is unaffected. Next, in studies 4 and 5,
which use a repeated measures design, we observe that, while endings are not over-weighted in
evaluations of the first experience, they are over-weighted in evaluations of a subsequent
experience. That is, moving a distinct part of the experience to the end versus the beginning of an
experience only affects people’s evaluations when they can readily observe this shift (being the
only difference between both experiences). Finally, in a field study (Study 6), we examine the
Questioning the End Effect
9
relationship between the overall evaluation of an experience and ratings of distinct components
of the experience—and fail to observe any increased impact of the final rating.
Study 1: A Better Beginning versus a Better End
In this first study, we re-examine the end effect using a simple stimulus (an aversive
noise), which is unlike the complex stimuli from the boundary condition studies, but similar to
the stimuli used in the classic demonstrations of the effect (e.g., Ariely & Zauberman, 1999;
Schreiber & Kahneman, 2000). Our study did, however, differ from those demonstrations in that
it systematically manipulated the structure of the experience, both between participants and
without changing the average intensity of the experience. To achieve this, we presented
participants with one of two sound profiles, which were the inverse of each other, so that they
were identical in total volume but one sound clip began loudly and ended quietly, whereas the
other began quietly and ended loudly. Since the sound was an aversive noise, this implies that
some participants experienced a better (less aversive) ending, whereas others experienced a
worse ending. If the end of the experience has a greater impact on global evaluations than other
parts of the experience, then the sound clip with the better ending should be rated as less aversive
than the sound clip with the worse ending.
Method
Three hundred and three Mechanical Turk participants completed the study online in
exchange for monetary compensation. Participants were told they would be listening to a few
short irritating sounds. They were asked to listen to the sounds using their headphones, and told
that they would need to identify the sounds they heard later in the experiment (to ensure that
participants indeed listened to the sounds). Participants first listened to a sound clip of a dot
matrix printer and were asked to use this sound to calibrate the volume of their headphones.
Questioning the End Effect
10
Next, all participants listened to a short drill sound, and indicated how annoying the sound was
on a 9-point scale (1 = not annoying at all, 9 = very annoying). This measure was included to be
used as a covariate in the analyses and thus reduce error variance due to differences across
participants in headphone volume or in their general aversion to annoying sounds.
Participants then listened to one of two sound clips, depending on condition. Both clips
consisted of 24 seconds of vacuum cleaner sound. One clip (Better End condition) started at a
high volume which was sustained for 6 seconds, after which it gradually reduced in volume for
the remaining 18 seconds, resulting in a relatively quiet (i.e., less aversive) ending. The other clip
(Worse End condition) was identical, but reversed in time. That is, it started quietly and
increased in volume for the next 18 seconds, ending with 6 seconds of sustained high volume
noise. See Figure 1 for a visual depiction of the sound profiles.
Figure 1. Visual depiction of sound profiles used in Studies 1 and 4. The height of the waveform
represents the volume of the sound. Time is represented on the horizontal axis in seconds.
Better End:
Worse End:
After listening to the sound clip, participants rated how annoying, unpleasant, and
irritating it was to listen to the clip (all on 9-point scales anchored by: 1 = not at all, 9 = very).
Questioning the End Effect
11
Further, to ensure that participants in the different conditions indeed noticed the difference in the
volume of the ending, participants were asked to indicate how the end of the sound clip
compared to the rest of the sound clip (9-point scale: -4 = end was much worse, 4 = end was
much better). Next, to verify that participants indeed listened to the sound clip, they were asked
to select the sound they listened to from three options (an ambulance, a car alarm, and a
vacuum). We then asked participants whether they had adjusted the volume of their headphones
at any point while listening to the sound clips. Finally, we collected demographic information.
Results
Six people failed to correctly recognize the sound clip and are thus excluded from the
analysis, leaving a sample of 297 participants (MAge = 32.3, SD = 11.12; 63.3% male).
Manipulation check. To verify that the manipulation of the ending was successful, we
first analyzed participants’ perception of how the end compared to the rest of the clip. As
intended, participants in the Better End condition rated the end of the clip as relatively better (M
= 1.93, SD = 1.64) than did participants in the Worse End condition (M = -2.47, SD = 1.94), F(1,
295) = 273.67, p < .001, ηp2 = 0.481.
Perceived aversiveness. The measures of annoyance, unpleasantness, and irritation were
standardized and combined to form an aversiveness index (α = .94). To test the end effect, we
then analyzed this index—while adjusting for the covariate (i.e., the aversiveness of the drill
sound) to increase the power of this test. If the final moments of an experience indeed have an
inherently disproportionate impact, participants in the Better End condition should rate their
listening experience as less aversive than those in the Worse End condition. However, the two
Questioning the End Effect
12
conditions did not substantially differ in the perceived aversiveness of the experience (M Better End
= 6.69, SD = 1.71; M Worse End = 6.96, SD = 1.81), F < 1, ηp2 = .0021.
Discussion
Study 1 tested the end effect using a simple stimulus that was unlike the stimuli in the
boundary condition studies (e.g., complex experiences, experiences that are expected to
continue), but similar to the stimuli used in prior demonstrations of the end effect. Yet, in spite of
this, we did not observe an end effect: placing the better (less aversive) part of the sound at the
beginning versus the end did not substantially affect participants’ evaluation of the experience.
This null effect is quite informative given the large sample size and given that participants
readily recalled the ending as better or worse (consistent with the manipulation). In the following
studies, we will provide additional tests of the end effect, as well as attempt to reconcile previous
demonstrations with the absence of an effect in this study.
Study 2: A Better Average versus a Better End
In the second study, we revisit previous demonstrations that extending an aversive
experience with a less aversive (but still negative) ending tends to improve the overall evaluation
of the experience. Although this finding is consistent with an over-weighting of the end of the
experience, it could also be due to a decrease in the average intensity of the experience. To
distinguish between these two accounts, we exposed participants to one of three sound clips of an
irritating noise: (1) a clip with a softer (and thus better) middle section (Better Middle), (2) a clip
with a softer ending (Better End), or (3) a clip with a softer middle section and an additional
softer ending (Added End). The Better Middle and Better End clips had an identical average
volume and only differed in the timing of the softer section. The Added End clip consisted of the
Better Middle clip with an additional, softer extension of the noise.
1
The means adjusted for the covariate: M Better End = 6.70, M Worse End = 6.97.
Questioning the End Effect
13
Thus, the Better Middle and Better End clips differed in the aversiveness of the ending,
but not in the average intensity of the experience, whereas the Added End clip differed from both
other clips in the average intensity of the experience. If the end effect holds and endings are
inherently over-weighted, then the Better End and Added End experiences should both be
perceived as less aversive than the Better Middle experience. However, if adding a less aversive
ending improves evaluations because it reduces the average intensity of the experience (and not
because endings are over-weighted), then the Added End experience should be perceived as less
aversive than both the Better Middle and Better End experiences, and there should be no
difference between the perceived aversiveness in the Better Middle and Better End conditions.
Method
Two hundred and sixty undergraduate students participated in the study for either partial
course credit or monetary compensation.
Participants were seated at a desktop computer and asked to wear headphones, the
volume of which was fixed and approximately equal across computers. All participants first
listened to a short drill sound, and rated their irritation with the sound on a 101-point sliding
scale (0 = not at all irritating, 100 = very irritating). As in Study 1, this measure was included to
be used as a covariate in the analyses and thus reduce error variance due to individual differences
in aversion to annoying sounds. Next, participants completed a short filler task before continuing
with the main study.
Participants were then asked to listen to the sound of a vacuum cleaner. They listened to
one of three sound profiles, depending on condition. All three sound profiles consisted of a
vacuum noise that fluctuated in volume between low and moderately high. In the Better Middle
condition, the clip contained a 30-second low-volume segment in the middle of the clip. Both of
Questioning the End Effect
14
the other conditions were based on the Better Middle condition, but in the Better End condition,
the low-volume segment was moved to the end of the clip (instead of the middle), and in the
Added End condition, an additional 30-second low-volume segment was added to the end of the
experience (together with a 5-second transition, resulting in a total clip time of 170 seconds).
Thus, the sound clips in the Better Middle and Better End conditions differed in ending, but not
in average volume, whereas the sound clip in the Added End condition differed in average
volume from the clips in both other conditions. See Figure 2 for a visual depiction of the sound
profiles.
Figure 2. Visual depiction of sound profiles used in Study 2. The height of the waveform
represents the volume of the sound. Time is represented on the horizontal axis in seconds.
Better End:
Better Middle:
Added End:
After participants listened to the clip, they rated the extent to which they found the
experience of listening to the sound annoying (9-point scale: 1 = mildly annoying, 7 = extremely
annoying), unpleasant (9-point scale: 1 = mildly unpleasant, 7 = very unpleasant), or irritating
Questioning the End Effect
15
(measured on the same scale as the covariate: a 101-point slider scale anchored by: 0 = mildly
irritating, 100 = extremely irritating).
After the primary dependent measures were collected, participants were asked to again
listen to the drill sound that they listened to at the start of the study, and then indicated whether
this experience was more or less irritating than listening to the vacuum sound (9-point scale, 1 =
much less irritating, 9 = much more irritating). Participants then rated the volume of the vacuum
sound (1 = very quiet, 9 = very loud). Next, participants indicated how much money, out of $10,
they would give back to avoid repeating the experience, and how long (in seconds) they believed
the experience lasted. These four additional measures were included to test whether, if the end
effect would again not obtain in scale measures of the subjective experience, it might instead
manifest in alternative measures: a relative preference measure (which avoids scaling effects), an
evaluation of the objective experience (volume), valuation, or a downstream effect (on time
perception).
To verify that participants had noted the volume at the end of the clip, they were asked to
indicate how the end of the experience compared to the rest of the experience (by selecting one
of three options: the end was quieter, the end was about the same, the end was louder).
Finally, participants provided demographic information and completed an Instructional
Manipulation Check (Oppenheimer, Meyvis, & Davidenko, 2009), which consisted of a
paragraph of text explaining the importance of reading instructions and asking participants to
choose “none of the above” from a marital status dropdown list.
Results
Thirty-five people failed the Instructional Manipulation Check, leaving a sample of 224
participants (MAge = 20.2, SD = 2.17; 44.2% male).
Questioning the End Effect
16
Manipulation check. Participants were more likely to indicate that the end was quieter
than the rest of the sound clip in the Better End condition (P = 60.1%) than in the Better Middle
condition (P = 31.5%), χ2 (1) = 12.82, p < .001, indicating that the manipulation of the ending
was successful. Participants in the Added End condition were also more likely to indicate that the
end was quieter (P = 45.3%) than were participants in the Better Middle condition, but this effect
was only marginally significant, χ2 (1) = 2.95, p = .086 (possibly because this clip was longer and
therefore the perception of the end extended beyond the final low-volume segment).
Perceived aversiveness. The measures of annoyance, unpleasantness, and irritation were
standardized and combined to form an aversiveness index (α = .93). As in Study 1, we analyzed
this index while controlling for the aversiveness covariate (the rating of the drill sound at the
start of the study) to increase the power of the tests. First, we tested the end effect by comparing
the Better Middle and Better End conditions, which differed in ending, but not in average
intensity. A planned contrast showed that the experience was not perceived as less aversive in the
Better Middle condition (M = 0.00, SD = 0.92) than in the Better End condition (M = 0.11, SD =
0.96), F < 1, ηp2 < 0.001. Thus, as in Study 1, we again did not observe an end effect. Next, we
tested whether adding a better end (rather than moving the better part to the end) changes the
perceived aversiveness of the experience, by comparing the Added End condition to the other
two conditions, both of which had a higher average intensity. A planned contrast confirmed that
the experience was perceived as less aversive in the Added End condition (M = -0.10, SD = 0.94)
than in the other two conditions, F(1, 220) = 4.43, p = .036, ηp2 = 0.020.2 Thus, while we again
did not replicate the end effect, we did replicate the prior finding that adding a less aversive
2
The means adjusted for the covariate: M Added End = -0.16, M Better End = 0.06, and M Better Middle = 0.10.
Questioning the End Effect
17
ending to a negative experience reduces the overall aversiveness of the experience (in spite of
adding negative utility).
Other measures. Similar to the aversiveness index, the additional measures did not show
any difference between the Better Middle and Better End conditions. The position of the lowvolume segment did not affect the relative preference over the drill sound (F < 1), perceived
volume (F < 1), willingness to pay to avoid the experience (F < 1), or the perceived duration of
the clip (F(2, 220) = 1.63, NS).
The contrast comparing the Added End condition to the other two conditions also did not
show any reliable difference for relative preference over the drill sound (F(1, 220) = 1.18, NS) or
perceived volume (F(1, 220) = 1.88, NS). However, participants in the Added End condition
were willing to pay less to avoid the experience (M = $0.78, SD = 1.92) than were participants in
the Better End condition (M = $1.56, SD = 2.52) or Better Middle condition (M = $1.60, SD =
3.37), F(1, 220) = 5.25, p = .023, ηp2 = 0.023, consistent with the earlier finding that adding a
better end reduced the perceived aversiveness of the experience. Finally, participants in the
Added End condition also provided higher estimates of clip duration (M = 155 secs, SD = 90.71)
than those in the Better End condition (M = 116 secs, SD = 76.30) or the Better Middle condition
(M = 101.09, SD = 50.56), F(1, 220) = 18.56, p < .001, ηp2 = 0.078, which was consistent with
the actual longer duration of the clip in that condition.
Discussion
Moving the less aversive part of an irritating noise to the end versus the middle did not
affect the perceived aversiveness of the experience, casting further doubt on the existence of an
inherent end effect. However, extending the irritating noise with an additional, less aversive part
did lead participants to perceive the overall experience as less aversive. Thus, Study 2 replicates
Questioning the End Effect
18
prior findings of the beneficial effects of “adding a better end,” but also indicates that this effect
is driven by a lowering of the average rather than a disproportionate impact of the end. In the
next study, we conceptually replicate this finding in the positive domain.
Study 3: Adding a Worse Middle versus a Worse End
In Study 2, we examined the effect of adding a less aversive (i.e., better) segment to an
aversive experience. In this next study, we examine the effect of adding a less enjoyable (i.e.,
worse) segment to an enjoyable experience: listening to pleasant music clips. Furthermore,
unlike in Study 2, we now vary whether the segment is added to the middle of the experience or
to the end of the experience. In other words, we compare the effect of adding a worse middle to
the effect of adding a worse end. If evaluations of the experience are disproportionately based on
the end of the experience, then adding a worse end should have a greater (negative) effect than
adding a worse middle. However, if evaluations are based on the average intensity of the
experience rather than the end of the experience, then adding a worse segment should have a
similar (negative) effect, regardless of whether it is added to the end or to the middle.
Method
Pretest. For the main study, we constructed three different music clips: one music clip
consisting of four enjoyable pieces of instrumental music (for the control condition) and two
music clips consisting of those same four enjoyable pieces of music and one additional, less
enjoyable piece of instrumental music (either inserted in the middle or at the end). To select
these music fragments, we first pretested a wide range of instrumental music fragments using a
sample of 121 participants drawn from the same population as used for the main study
(Mechanical Turk). Each participant listened to a selection of 10 30-second clips of instrumental
music (out of a total set of 30 clips) and rated each clip on a 9-point scale. Based on this pretest,
Questioning the End Effect
19
we selected four clips that were enjoyed by most participants, namely 30-second fragments from
“Herd Reunion” (from the Ice Age: Continental Drift Soundtrack, M = 6.84, SD = 1.91), “Heart
Song” (performed by Gosha Mataradze, M = 6.29, SD = 2.09), Bach’s “Goldberg Variations” (M
= 6.38, 1.55), and Mozart’s “Rondo Alla Turca” (M = 6.05, SD = 2.03). We also selected one
sound clip that was significantly less enjoyable than each of the four other clips: “Reanimator”
(performed by Amon Tobin, M = 4.71, SD = 2.12), all t’s(79) > 2.92, p’s < .002. To further
ensure that this last clip was clearly less enjoyable than the others, we increased repetitiveness by
expanding it to 45 seconds and also applied a minor change in pitch shift.
Main study. Nine hundred and twelve Mechanical Turk participants completed the study
online in exchange for monetary compensation.
Analogous to the previous studies, we first obtained a measure of participants’ propensity
to like instrumental music, to be used as a covariate in the analysis (and thus increase the power
of our tests). Specifically, participants first listened to a 10-second instrumental music clip (a
segment from “On the Right Track,” performed by Zhanna Hamilton) and indicated how much
they enjoyed listening to the clip on a 9-point scale (1 = not at all, 9 = very much). Participants
were then told that they would next listen to a music compilation. They were reminded that the
study was on the enjoyment of music and to simply sit back, relax, and listen to the music
compilation which would be the length of a short song. Participants then heard one of three
music clips, depending on condition. In the Control condition, the music clip consisted of the
four enjoyable 30-second fragments identified in the pretest. The fragments were combined into
one 148-second clip by gradually phasing out of one fragment and into the next (thus preserving
the unity of the experience). In the two experimental conditions, the clips consisted of the
Control condition clip with the addition of the less enjoyable fragment identified in the pretest.
Questioning the End Effect
20
This fragment was either inserted in the middle of the clip (Worse Middle condition) or at the
end of the clip (Worse End condition). The order of the four enjoyable fragments was
counterbalanced. Next, participants indicated how much they enjoyed listening to the clip on the
same 9-point scale as used for the covariate measure.
As manipulation checks, participants were asked to indicate how the middle compared to
the rest of the clip (-4 = middle was much worse, 4 = middle was much better) and how the end
compared to the rest of the clip (-4 = end was much worse, 4 = end was much better).
Participants then listened to a 10-second version of the less enjoyable fragment and were asked
to categorize this fragment as either pleasant, neither pleasant nor unpleasant, or unpleasant.
Finally, to verify that participants had indeed listened to the music compilation, we asked them
to listen to three short music fragments and to identify which one of these fragments had been
played as part of the music compilation.
Results
Twenty-eight people failed to recognize the fragment used in the compilation and are
thus excluded from all analyses, leaving 884 participants (MAge = 29.4, SD = 9.64; 65.3% male).
Manipulation checks. The majority of participants rated the less enjoyable fragment as
either unpleasant (35.7%) or neither pleasant nor unpleasant (35.4%), confirming that this
fragment was not particularly enjoyable, as intended. More important, participants in the
experimental conditions reported that the middle section (or the end section, depending on where
this less enjoyable fragment was placed) was indeed relatively less enjoyable, indicating that the
manipulation was successful. Specifically, participants in the Worse End condition rated the end
as worse than the rest of the compilation (M = -1.27, SD = 2.20), compared to participants in the
Worse Middle condition (M = 1.22, SD = 2.06), F(1, 880) = 197.11, p < .001, and those in the
Questioning the End Effect
21
Control condition (M = 0.79, SD = 1.99), F(1, 880) = 125.13, p < .001. In addition, participants
in the Worse Middle condition rated the middle as worse than the rest of the compilation (M = 0.43, SD = 2.09), compared to participants in the Worse End condition (M = 1.02, SD = 1.96),
F(1, 880) = 79.78, p < .001, and those in the Control condition (M = 0.50, SD = 1.99), F(1, 880)
= 30.54, p < .001.
Enjoyment of the experience. Similar to the previous studies, the analysis of the
enjoyment measure was adjusted for the covariate (i.e., the enjoyment of the clip at the start of
the study) to increase the power of the test. However, once again, we did not obtain any evidence
of an end effect. Participants did not enjoy the clip less when the worse fragment was placed at
the end of the clip (M = 6.50, SD = 1.65) rather than in the middle of the clip (M = 6.58, SD =
1.69), F < 1, ηp2 = 0.001. However, adding the worse fragment (regardless of its position) did
decrease the enjoyment of the clip relative to the Control condition (M = 6.76, SD = 1.56), F(1,
880) = 4.37, p = .037, ηp2 = 0.005.3 This conceptually replicates the results of Study 2 and
suggests that participants’ enjoyment relied on the average of the experience rather than the
ending.
Discussion
The results of Study 3 replicate those of Study 2 in the positive domain, and provide
additional evidence that the effect of adding a less intense ending on the overall evaluation is
driven by changes in average intensity, rather than over-weighting of the end. Adding a less
enjoyable music fragment reduced overall enjoyment of the music compilation, but it did not
matter whether this fragment was inserted at the end or in the middle of the experience. The fact
that the positioning of the less enjoyable fragment did not affect overall evaluations provided
particularly compelling evidence against the existence of a substantial, inherent end effect, given
3
The means adjusted for the covariate: M Control = 6.76, M Worse End = 6.51, M Worse Middle = 6.58. Questioning the End Effect
22
that (1) the manipulation check showed that participants in the respective conditions could
clearly identify the end (or middle) as less enjoyable than the rest of the experience, (2) adding
the less enjoyable fragment did affect overall evaluations, and (3) the test of the end effect in this
study was particularly powerful given the large sample size and the use of a highly correlated
covariate (r = .43). In fact, the procedure of this study allowed for the detection of a small effect
(Cohen’s f 2 = 0.01) with a probability of 90.8%.
In other words, studies 2 and 3 suggest that the often documented “adding a better end”
effect should be interpreted solely as a demonstration of duration neglect, rather than evidence
for the over-weighting of the end. However, the end effect has also received support from studies
using other paradigms. In particular, studies that have systematically manipulated the structure of
experiences have provided more direct evidence of the end effect (e.g., Ariely, 1998; Ariely &
Zauberman, 2000). These studies have commonly found that experiences with a better (or less
aversive) ending are evaluated more favorably (or less negatively) than experiences with a better
beginning or a better middle, even when the average intensity is held constant. However, as
mentioned earlier, these studies tend to employ within-subject designs, which expose each
participant to anywhere between 8 and 64 different experiences. Since these experiences are
identical except for their structure (e.g., a loud noise that ends softly versus the same noise that
starts softly), this structure would be particularly salient for participants, who may infer that they
need to use this structure in their evaluations. As such, end effects observed in these studies may
reflect people’s lay beliefs that it is better to end on a high note (or preferences for improving
sequences, Loewenstein & Prelec, 1993), rather than a spontaneous reaction to an experience that
ends well versus poorly. Even if participants are not relying on lay beliefs, the increased salience
of experience structure in within-subject designs could still be a requirement for end effects to
Questioning the End Effect
23
manifest (suggesting that the end is not inherently over-weighted). The last two studies were
designed to test whether endings are indeed over-weighted in the context of repeated
experiences, but not when experiences are judged in isolation.
Study 4: Single versus Repeated Negative Experiences
To examine whether exposure to repeated experiences (with variations in structure) leads
people to increase their evaluations of experiences that end well, we used a repeated measures
design. Specifically, participants were asked to listen to two aversive sounds that were identical,
but reversed in sequence, such that one sound ended well and one sound ended poorly. The order
of the sounds was manipulated between conditions. Consistent with the lack of an end effect in
the first studies, we expected that the difference in structure would not affect participants’ rating
of the first sound they heard: participants will rate the experience as equally aversive, regardless
of whether they were assigned to a noise that ends well or to a noise that ends poorly. However,
consistent with prior demonstrations of end effects in within-subject designs, we expected that
the difference in structure would affect the rating of the second sound: after listening to a noise
that ends well, participants will rate a noise that ends poorly as more aversive (and vice versa). Method
Two hundred and four Mechanical Turk participants completed the study online in
exchange for monetary compensation.
The procedure was similar to that of Study 1. All participants first listened to the printer
sound clip (to calibrate the volume) and then rated their irritation with the drill sound clip (to be
used as covariate). Participants then listened to one of two versions of the main stimulus: 24
seconds of vacuum cleaner noise. As in Study 1, the two sound clips were identical but reversed,
so that one clip started with 6 seconds of high volume noise, followed by 18 seconds that
Questioning the End Effect
24
gradually tapered off in volume (Better End), whereas the other clip started quietly and ended at
a high volume (Worse End). See Figure 1for a visual depiction of the sound profiles.
Immediately after listening to the sound clip, participants rated how annoying, unpleasant, and
irritating it was to listen to the clip (on 9-point scales: 1 = not at all, 9 = very).
Unlike in Study 1, participants next listened to the other clip (i.e., those who listened to
the Better End clip then listened to the Worse End clip and vice versa), and rated that clip as
well. After rating the second sound clip, participants were asked to indicate which of the two
sound clips they would choose if they had to listen to one of the clips again (9-point scale: -4 = I
would definitely choose the first clip, 0 = No preference, 4 = I would definitely choose the
second clip). As a manipulation check, we next asked participants, for each clip, how the end of
the experience compared to the rest of the experience (9-point scale: -4 = End was much worse, 4
= End was much better). To verify that participants had indeed listened to the clips, we then
asked them to select the sound they listened to from three options: an ambulance, a car alarm,
and a vacuum. Finally, we asked participants whether they had adjusted the volume of their
headphones at any point, before collecting demographic information.
Results
Three people failed to recognize the sound clip used in the study and are thus excluded
from the analysis, leaving a sample of 201 participants (MAge = 32.8, SD = 11.12; 63.7% male).
Manipulation checks. The end of the first clip was rated as significantly better by
participants who listened to the Better End clip first (M = 6.73, SD = 1.92) than by participants
who listened to the Worse End clip first (M = 3.15, SD = 2.15), F(1, 199) = 155.06, p < .001.
Similarly, the end of the second clip was rated as significantly better by participants who listened
to the Better End clip second (M = 7.16, SD = 2.02) than by participants who listened to the
Questioning the End Effect
25
Worse End clip second (M = 3.17, SD = 2.06), F(1, 199) = 191.86, p < .001. These results
confirm that the manipulation of the structure of the experience was successful.
Perceived aversiveness. The measures of annoyance, unpleasantness, and irritation were
standardized and combined to form an aversiveness index for each sound clip (α clip 1 = .95, α clip 2
= .96). As in the previous studies, the between-subjects analysis of this index was adjusted for
the covariate (the irritation with the drill sound) to increase the power of those tests. Consistent
with prior demonstrations of the end effect, the within-subject analysis showed that participants
perceived their Better End experience as less aversive than their Worse End experience, F(1,
199) = 73.45, p < .001, ηp2 = 0.270. Although this effect is quite sizeable, the between-subjects
analysis (i.e., the comparison of the two order conditions) adds an important nuance to the
interpretation of this effect. Indeed, consistent with our previous studies, the perceived
aversiveness of the first clip did not differ between participants who listened to the Better End
clip first (M = 6.56, SD = 1.90) and those who listened to the Worse End clip first (M = 6.69, SD
= 1.82), F < 1, ηp2 < .001. It was only for the second clip that participants who listened to the
Better End clip reported less aversiveness (M = 5.88, SD = 1.98) than those who listened to the
Worse End clip (M = 7.42, SD = 1.60), F(1, 198) = 53.44, p < .001, ηp2 = .2124. These results are
graphically depicted in Figure 3. Thus, although the within-subject analysis replicated previous
demonstrations of the end effect, the effect once again failed to obtain when participants
evaluated a single experience.
Figure 3. Perceived Aversiveness of Sound Clips by Condition (Study 4)
4
The means adjusted for the covariate, Clip 1: M Worse End = 6.59, M Better End = 6.66, Clip 2: M Worse End = 7.45, M Better
5.85. End =
Questioning the End Effect
26
Aversiveness (9-point scale)
8
7
6
5
4
First Sound Clip
Worse End
Second Sound Clip
Better End
Note: Error bars denote standard errors.
Preference. Whether participants preferred to repeat the first or the second clip depended
on which clip they listened to first, F(198) = 140.26, p < .001. Participants who listened to the
Better End clip followed by the Worse End clip preferred listening to the first clip again (M = 3.36, SD = 2.59), t(98) = -2.44, p = .016, whereas participants who listened to the clips in the
opposite order preferred listening to the second clip again (M = 3.36, SD = 2.59), t(101) = 14.08,
p < .001. These results indicate that participants consciously preferred a noise that ended well
over an equivalent noise that started well.
Discussion
At first glance, the results of this study provide strong evidence for an end effect,
consistent with prior research. When participants listened to a noise that started loudly but ended
better (Better End) and one that started better but ended loudly (Worse End), they rated the noise
that ended better as less aversive, and strongly preferred that noise to the one with a worse
ending. Yet, the between-subjects analysis reveals that this advantage for the better ending
experience only emerges after participants have been exposed to multiple experiences (that are
Questioning the End Effect
27
identical to each other except for their structure). Indeed, for the first sound clip, the end effect is
as conspicuously absent in this study as it was in the previous studies: participants rated the first
clip as equally aversive, regardless of whether it ended well or poorly. The end effect only
emerged for the second sound clip, which was identical to the first sound clip except for its
structure. Thus, in this study, the end effect only emerged when the repetition of experiences
made the structure of the experience salient, indicating that although people do not
spontaneously over-weight the end of an experience, they may do so when they are encouraged
to base their evaluations on differences in structure. Note that it is not sufficient that participants
note that the end of the experience is clearly better or worse than the rest of the experience (as
our manipulation checks indicate this is generally the case in our studies), but rather that
differences in structure are made salient as a criterion for evaluation.
Study 5: Single versus Repeated Positive Experiences
Study 5 aimed to conceptually replicate Study 4 with positive stimuli. We used pleasant
music compilations that varied in the position of a less enjoyable segment (as in Study 3) and
presented all participants with both versions, in counterbalanced order (as in Study 4). Similar to
the results of Study 4, we expected that the position of the less enjoyable segment would not
affect participants’ rating of the first music compilation, but would affect the rating of the second
compilation: after listening to a music compilation with a mediocre middle (ending), participants
will rate a clip with a mediocre ending (middle) as less (more) enjoyable.
Method
Five hundred and two Mechanical Turk participants completed the study online in
exchange for monetary compensation.
Questioning the End Effect
28
As in Study 3, participants first listened to a 10-second instrumental music clip ( “On the
Right Track”) and rated their enjoyment on a 9-point scale (1 = not at all, 9 = very much), to be
used as a covariate in the analysis. Next, participants read that they would listen to two music
compilations. Both music compilations were composed of three of the five fragments used in
Study 3: two of the very enjoyable fragments (“Herd Reunion” and “Heart Song”) as well as the
less enjoyable fragment (“Reanimator”). The fragments lasted thirty seconds each and were
tapered and integrated to create a more continuous experience, resulting in a music compilation
of 80 seconds. The two compilations differed only in the position of the less enjoyable fragment:
it was either positioned in the middle (Worse Middle) or at the end (Worse End). The order in
which participants heard each compilation was counterbalanced: half of participants heard the
Worse Middle clip first, while the other half heard the Worse End clip first.
After each compilation, participants were asked to indicate how enjoyable and pleasant it
was to listen to the music, both on 9-point scales (1 = not at all, 9 = very much). Similar to Study
2, we also added a relative preference measure after the primary measures: participants were
asked to indicate how much they enjoyed listening to the experience relative to listening to music
on the radio (9-point scale: -4 = much less than listening to the radio, 4 = much more than
listening to the radio). After participants completed these measures for the second music
compilation, they were asked to indicate which of the two music experiences they enjoyed more
(9-point scale: -4 = definitely the first experience, 4 = definitely the second experience).
As a manipulation check, we next asked participants to indicate, for each music
compilation, how the middle compared to the rest of the compilation (9-point scale: -4 = middle
was much worse, 4 = middle was much better) and how the end compared to the rest of the
compilation (9-point scale: -4 = end was much worse, 4 = end was much better). Participants
Questioning the End Effect
29
then listened to a 10-second version of the less enjoyable fragment and were asked to categorize
this fragment as either pleasant, neither pleasant nor unpleasant, or unpleasant. Finally, to verify
that participants had indeed listened to the music compilation, we asked them to listen to three
short music fragments and to identify the fragment that was part of the music compilation.
Results
Twelve people failed to recognize the song used in the compilation and are thus excluded
from all analyses, leaving 490 participants (MAge = 21.8, SD = 10.3; 57.8% male).
Manipulation checks. The majority of participants rated the less enjoyable fragment as
either unpleasant (66.3%) or neither pleasant nor unpleasant (22.2%), indicating that it was not
particularly enjoyable. More important, the manipulation of the placement of this fragment
within the music clip had the intended effect on participants’ perceptions. This was true for
ratings of the first music clip: participants who listened to the Worse Middle clip first rated the
middle of the clip as worse and the end of the clip as better (MMiddle = -1.37, SD = 2.36; MEnd =
1.39, SD = 2.09) than did participants who listened to the Worse End clip first (MMiddle = 0.81,
SD = 2.32; MEnd = -1.00, SD = 2.53), FMiddle(1, 488) = 106.89, p < .001; FEnd(1, 488) = 129.57, p
< .001. This was also true for ratings of the second music clip: participants who listened to the
Worse Middle clip second rated the middle of the clip as worse and the end of the clip as better
(MMiddle = -1.20, SD = 2.44; MEnd = 1.87, SD = 1.93) than did participants who listened to the
Worse End clip second (MMiddle = 1.41, SD = 1.97; MEnd = -1.41, SD = 2.49), FMiddle(1, 488) =
170.02, p < .001; FEnd(1, 488) = 265.86, p < .001. In short, the manipulation was successful:
participants perceived the middle of the Worse Middle clip and the end of the Worse End clip as
relatively less enjoyable.
Questioning the End Effect
30
Enjoyment. The measures of enjoyment and pleasantness were averaged to form an
enjoyment index (α clip 1 = .95, α clip 2 = .94). The between-subjects analysis of this index was
again adjusted for the covariate (the enjoyment of the clip at the start of the study) to increase the
power of those tests. As in Study 4, the within-subject analysis of this index is consistent with
prior demonstrations of the end effect: participants rated their Worse End experience as less
enjoyable than their Worse Middle experience, F(1, 488) = 15.52, p < .001, ηp2 = 0.031.
However, the between-subjects analysis again adds an important nuance to the interpretation of
this result. Consistent with the absence of an end effect in our prior studies, the enjoyment of the
first music clip did not differ between participants who listened to the Worse End clip (M = 6.28,
SD = 1.61) and those who listened to the Worse Middle clip (M = 6.20, SD = 1.64), F < 1, ηp2 <
0.001. Mirroring the results of Study 4, it was only for the second music clip that participants
who listened to the Worse End clip rated their experience as less enjoyable (M = 5.97, SD =
1.63) than participants who listened to the Worse Middle clip (M = 6.27, SD = 1.61), F(1, 487) =
5.10, p = .024, ηp2 = 0.010.5 These results are graphically depicted in Figure 4.
5
The means adjusted for the covariate, Clip 1: M Worse End = 6.21, M Worse Middle = 6.27, Clip 2: M Worse End = 5.96, M
6.27. Worse Middle =
Questioning the End Effect
31
Figure 4. Enjoyment of Sound Clips by Condition (Study 5)
Enjoyment (9-point scale)
8
7
6
5
4
First Sound Clip
Worse End
Second Sound Clip
Worse Middle
Note: Error bars denote standard errors.
Other measures. We next analyzed participants’ relative preference between listening to
the clip and listening to a song on the radio. Consistent with the enjoyment index (and with prior
demonstrations of the end effect), a within-subjects analysis of this measure indicated that
participants’ showed a greater preference for listening to the music clip (rather than the radio)
when rating the Worse Middle clip (M = 5.48, SD = 2.17) rather than the Worse End clip (M =
5.35, SD = 2.16), F(1, 488) = 8.169, p = .004, ηp2 = 0.016. However, the between-subjects
analysis of these relative preference ratings did not show any reliable difference between people
who listened to the Worse Middle clip and those who listened to the Worse End clip, neither for
the first clip, F(1, 486) = 2.18, NS, nor for the second clip, F < 1. Thus, similar to the analysis of
the enjoyment index, we did not observe an end effect for the first clip, but unlike for the
enjoyment index, we also did not observe an end effect for the second clip, suggesting that this
particular measure may not be sufficiently sensitive to provide a strong test of the end effect.
Questioning the End Effect
32
Finally, participants’ stated preference between sound clips 1 and 2 showed that they
were more likely to prefer the second clip over the first one when that second clip was the Worse
Middle clip (M = 1.14, SD = 2.69) rather than the Worse End clip (M = 0.11, SD = 2.70), F(1,
488) = 17.64, p < .001. Thus, participants showed a conscious preference for music with a poor
middle over music that ends poorly.
Discussion
Study 5 conceptually replicated the effect of Study 4 with positive experiences.
Consistent with prior research, participants reported enjoying the same music compilation less
when the less enjoyable segment appeared at the end, rather than in the middle. However, this
finding only held when participants were asked to directly compare the two arrangements, either
implicitly (when they were asked to evaluate the second clip after evaluating a clip that was
identical except for the position of the less enjoyable segment), or explicitly (when asked which
of the two clips they preferred). When participants simply listened to and rated the first music
compilation, their enjoyment was completely unaffected by the position of the less enjoyable
segment—even though participants could clearly tell that the middle (or end) of the clip was
worse than the rest of the clip, as revealed by the manipulation check measures for the first clip.
This is consistent with the absence of an end effect observed in the previous four studies, and
suggests that people do not spontaneously overweight the end of an experience. Instead, the
structure of the experience has to be made salient as a possibly relevant evaluation criterion for
average constant).
Study 6: Relating Overall Evaluations to Ratings of the End of the Experience
So far, we have addressed two sources of support for the end effect in prior research:
demonstrations of the positive effect of “adding a better end” and the more favorable evaluations
Questioning the End Effect
33
of experiences that end well in within-subject designs. However, as we mentioned in the
introduction, there is a third type of prior support for the end effect. Specifically, several studies
have demonstrated that, when overall evaluations are regressed on moment-to-moment ratings of
the experience, the rating of the final moments of the experience is a particularly effective
predictor of the overall evaluation. Yet, as discussed earlier, this does not necessarily imply that
the final moments are being over-weighted. If moment-to-moment ratings do not only reflect
people’s isolated reaction to the current moment, but are also influenced by past moments, then
final ratings would be more effective predictors because they incorporate more information.
In Study 6, we aimed to examine this issue by focusing on an experience with distinct
components that can be evaluated separately, thus reducing any possible confusion or
contamination by prior impressions (as may be the case with a continuous noise). Specifically,
we used field study data from participants in an obstacle course fun run. After the run,
participants were asked to rate their satisfaction with the race in addition to rating each
individual obstacle, as well as providing an overall rating of the obstacles. If participants’
impression of the experience was disproportionately affected by the end, then one would expect
that the rating of the final obstacle would be a better predictor of participants’ satisfaction than
the ratings of the other obstacles. Further, one would expect that, when controlling for the overall
rating of the obstacles, the rating of the final obstacle would improve the prediction of
participants’ satisfaction with the race.
Method
Seven hundred and fifty participants in an obstacle course fun run completed the study
online in the days following the completion of the race.
Questioning the End Effect
34
Participants completed a fun run consisting of 12 large obstacles. The night following the
race, participants received an email from the race company which included a link to a race
evaluation survey. Participants first indicated their satisfaction with the race on a 10-point scale
(1 = not satisfied, 10 = very satisfied). Later in the survey, participants provided their overall
evaluation of all the obstacles in the run on a 10-point scale (1 = lame, 10 = awesome). Next,
they rated each individual obstacle they completed on a five-point scale (1 = lame, 5 =
awesome). Other items measured in this survey, not relevant to the current research, are available
upon request.
Results and Discussion
We first regressed participants’ satisfaction with the race on the ratings of each of the
twelve obstacles. Although the final obstacle was a significant predictor (β = 0.24, t(737) = 6.61,
p < .001), out of the eleven other obstacles, nine were better predictors of participants’
satisfaction than the rating of the final obstacle (see Table 1).
Table 1. Results of Separate Regressions of Satisfaction with the Race on the Rating of each
Obstacle (in Chronological Order).
Obstacle
Obstacle
Obstacle
Order
Order
Order
β
β
β
1
0.294
5
0.278
9
0.257
2
0.292
6
0.181
10
0.241
3
0.293
7
0.289
11
0.252
4
0.209
8
0.238
12
0.235
Note: All betas are reliably different from 0 (all t’s(737) > 5.03, p’s < .001)
As an alternative test of the special status of the final event, we also regressed
participants’ satisfaction with the race on both the overall rating of the obstacles and the
Questioning the End Effect
35
individual rating of the final obstacle. The overall rating of the obstacles significantly predicted
satisfaction with the race, β = 0.55, t(747) = 17.01, p < .001. However, once the overall rating of
the obstacles was taken into account, the rating of the final obstacle did not contribute
significantly to the prediction of overall satisfaction with the race, β = 0.05, t(747) = 1.54, NS.
These results suggest that, when people can cleanly separate the different components of
an experience, then the rating of the final component is not a privileged determinant of the
overall evaluation. Of course, since this study was a field experiment, it suffered from several
limitations, the most important of which is that the order of the obstacles was not
counterbalanced. We can therefore not rule out that an idiosyncratic property of the final obstacle
may have reduced its relationship with satisfaction and thus counter-acted the end effect.
General Discussion
Prior research has argued that evaluations of experiences are disproportionately
influenced by the final moments of the experience, since endings have a privileged status as a
prototypical moment of the experience. Although past research has documented several notable
boundary conditions in which this effect does not obtain, there exists an impressive body of
evidence supporting the existence of an end effect for simple, continuous experiences. In this
paper, we did not set out to identify additional boundary conditions, but rather re-examined the
basic end effect, starting with the type of experience that was used in the initial demonstrations
of the effect: a simple, short, meaningless, continuous, aversive sensation (listening to an
irritating noise). Yet, in spite of meeting those conditions, this experience did not produce an end
effect in our studies: the noise was not rated as more aversive when the loudest part was placed
at the end rather than in the beginning (Studies 1 and 4) and was not rated as less aversive when
a softer section was placed at the end rather than in the middle (Study 2). Other studies with a
Questioning the End Effect
36
positive experience also failed to document an end effect: listening to a short music compilation
was not rated as less enjoyable when a weaker music segment was placed at the end of the
compilation rather than in the middle (Studies 3 and 5). These null effects obtained even though
participants could readily recall whether the end or middle of the experience was particularly
good or bad, and even though the studies were properly powered to detect a small effect with a
reasonable probability. Finally, results from a correlational study further corroborate the pattern
observed in the experimental studies. In a large-scale field study with obstacle race participants,
we failed to observe a privileged relationship between the rating of the last obstacle of the race
and the overall satisfaction with the race (Study 6). As such, these results question the
assumption that the final moments of an experience have an inherent, substantial advantage in
determining the overall evaluation of the experience.
Although each of our studies documented a failure to obtain the end effect, our results are
not inconsistent with past demonstrations. Specifically, consistent with prior research, we found
that extending an experience with a less intense ending results in less extreme global evaluations
of that experience. However, adding the less intense segment in the middle rather than at the end
produced the same results. Thus, our findings indicate that the effect of adding a less intense
ending is driven by a reduction in the average intensity of the experience rather than a
disproportionate impact of the ending. Similarly, consistent with prior research, we observed that
when each participant evaluated multiple experiences that only differed in the ordering of its
components, participants did prefer experiences that ended well over experiences that ended
poorly. Yet, the structure of the experience did not affect the evaluation of the first experience
participants encountered. Therefore, our results suggest that people do not spontaneously assign
Questioning the End Effect
37
greater weight to the ending, but instead rely on the structure of the experience when it is a
salient basis for evaluation because it is the only aspect that differs between experiences.
Our experimental studies thus empirically addressed two types of prior support for the
end effect: the fact that extending an experience with a less intense ending weakens the overall
evaluation, and the fact that people prefer experiences that end well over equivalent experiences
that end poorly. Additionally, in Study 6, we addressed a third type of empirical support for the
end effect: the fact that the moment-to-moment rating of the end of an experience can be a
particularly good predictor of the global evaluation of that experience. We have proposed that
this privileged relationship may be explained by mechanisms other than the over-weighting of
the end. For instance, if moment-to-moment ratings are also influenced by past moments in the
experience, then final ratings would be more effective predictors because they incorporate more
information. Alternatively, explicit final moment-to-moment ratings may simply serve as salient
anchors for the immediately subsequent overall evaluation of the experience. In study 6, we
examined an experience that consisted of easily identifiable parts (thus reducing confusion and
contamination of the ratings) and we asked participants to rate those parts after the overall
evaluation (thus avoiding anchoring of the overall evaluation on the final rating). Under these
circumstances, we did not observe any privileged relationship between the rating of the final part
of the experience and participants’ overall satisfaction with the experience—which is consistent
with our alternative interpretations of those prior findings.
Although we propose, based on our results, that endings are not inherently over-weighted
in retrospective evaluations of experiences, this certainly does not imply that endings cannot
have a disproportionate impact when additional conditions are fulfilled. As studies 4 and 5
already indicate, when differences in structure are highly salient, people may rely on their lay
Questioning the End Effect
38
beliefs about the desirability of good endings and prefer experiences with better endings over
other, equivalent experiences. Moreover, we can identify at least two other circumstances under
which the final moments of experiences are likely over-weighted.
First, when the last part of an experience is particularly meaningful, and colors the
perception of everything that preceded it, we would naturally expect it to disproportionately
impact the overall evaluation. For instance, evaluations of goal-directed experiences may be
particularly affected by the end of the experience (Carmon & Kahneman, 1996) since the end
often determines whether the goal has been met (and thus whether the preceding effort was in
vain or not). Similar to goal-directed experiences, the end may also be particularly meaningful
(and influential) for narrative experiences, such as watching television shows (Hui et al., 2014),
since the end of an episode often provides some type of resolution. The evaluation of a murder
mystery strongly depends on how the mystery is being resolved, just as the evaluation of a
romantic comedy depends on whether the couple ends up together, and the evaluation of a
baseball game depends on which team wins.
Aside from being particularly meaningful, endings can also have a disproportionate
impact through a second mechanism: a recency effect. Specifically, for experiences that are long
and varied (e.g., a year-long trip around the world), people may simply be unable to remember
many parts of the experience due to memory constraints. In that case, the overall evaluation may
be disproportionately influenced by the beginning and end of the experience since research on
list memorization finds that these components are recalled more easily than items in the middle
(Ebbinghaus 1913). For instance, the observation of an end effect for hypothetical experiences
presented in list format has been attributed to such recency effects due to memory constraints
(Montgomery & Unnava, 2009).
Questioning the End Effect
39
It should be noted that, for many experiences, the end does not benefit from either
recency effects or being particularly meaningful—in which case we would not expect the end to
have a disproportionate impact on evaluations. Even TV shows or narratives do not always offer
meaningful endings that provide a resolution. For instance, in contrast to a romantic comedy, the
ending of a nature documentary may not be more meaningful than what preceded it. Similarly,
for the experiences studied in this paper, neither the final seconds of the noise nor the final
fragment of the music compilation were more meaningful than the rest of the experience. The
same holds for the situations identified as boundary conditions of the effect: the last part of a
meal (Rode, Rozin, & Durlach, 2007), the final activity over the course of a day (Miron-Shatz,
2009), and the last moments of a vacation (Kemp, Burt, & Furneaux, 2008) do not commonly
convey any special meaning.
In sum, the current research cautions against the common recommendation to restructure
experiences to end on a high note. Although this improved ending may disproportionately impact
evaluations in specific cases, our studies suggest that this would not occur merely because it is
the ending. That is, rather than positing the existence of an inherent end effect that is disabled in
specific circumstances (i.e., those identified by the boundary condition studies), we propose that
it is more accurate to state that there is no inherent end effect, but that endings can have a
disproportionate impact on evaluations through other processes under specific circumstances.
Questioning the End Effect
40
References
Ariely, D. (1998). Combining experiences over time: The effects of duration, intensity changes
& on-line measurements on retrospective pain evaluations. Journal of Behavioral
Decision Making, 11, 19-45.
Ariely, D., & Zauberman, G. (2000). On the Making of an Experience: The Effects of Breaking
and Combining Experiences on their Overall Evaluation. Journal of Behavioral Decision
Making, 13(2), 219-232.
Ariely, D., & Carmon, Z. (2000). Gestalt Characteristics of Experiences: The Defining Features
of Summarized Events. Journal of Behavioral Decision Making, 13, 191-201.
Baumgartner, H., Sujan, M., & Padgett, D. (1997). Patterns of Affective Reactions to
Advertisments: The Integration of Moment-to-Moment Responses. Journal of Marketing
Research, 34(2), 219-232.
Branigan, C., Moise, J., Fredrickson, B., & Kahneman, D. (1997). Peak (but not end) ANS
reactivity to aversive episodes predicts bracing for anticipated re-experience. Poster
presented at Society for Psychophysiological Research, Cape Cod, MA. Abstract
retrieved from https://www.sprweb.org/meeting/past_mtng/1997/97posters1.html.
Carmon, Z., & Kahneman, D. (1996). The Experienced Utility of Queuing: Experience Profiles
and Retrospective Evaluations of Simulated Queues. Retrieved from
http://faculty.insead.edu/carmon/pdffiles/The%20Experienced%20Utility%20of%20Que
uing.pdf.
Questioning the End Effect
41
Conniff, R. (2006). What Modern Science Can Teach You About Turning That Frown Upside
Down. Men's Health. January, 118-123.
Cusick, B. (2012). The Peak-End Rule: A way to improve every customer experience. Retail
Customer Experience. Newsletter, Networld Media Group, September 19, 2012.
Ebbinghaus, H. (1913). On memory: A contribution to experimental psychology, New York:
Teachers College.
Fredrickson, B. L. (1991). Anticipating endings: An explanation for selective social interaction
(Doctoral dissertation, Stanford University, 1990). Dissertation Abstracts.
Fredrickson, B. L., & Kahneman D. (1993). Duration neglect in retrospective evaluations of
affective episodes. Journal of Personality and Social Psychology, 65, 45-55.
Hui, S. K., Meyvis, T., & Assael, H. (2014). Analyzing Moment-to-Moment Data Using a
Bayesian Functional Linear Model: Application to TV Show Pilot Testing. Marketing
Science, 33, 2, 222-240.
Kahneman, D. (2000a). Evaluation by moments: Past and future. In A. Tversky, D. Kahneman,
(Eds.), Choices, values, and frames (pp. 693-708). Cambridge: Cambridge University
Press.
Kahneman, D. (2000b). "Experienced utility and objective happiness: A moment-based
approach," In A. Tversky, D. Kahneman, (Eds.), Choices, values, and frames (pp. 673692). Cambridge: Cambridge University Press.
Kahneman, D., Fredrickson, B. L., Schreiber, C. A., & Redelmeier, D. A. (1993). When more
pain is preferred to less: Adding a better end. Psychological Science, 4, 401-405.
Kemp, S., Burt, C. D. B., & Furneaux L. (2008). A Test of the Peak-End Rule with Extended
Autobiographical Events. Memory & Cognition, 36, 132-138.
Questioning the End Effect
42
Loewenstein, G. F., & Prelec, D. (1993). Preferences for sequences of outcomes. Psychological
review, 100(1), 91 - 108.
Miron-Shatz, T. (2009). Evaluating multi-episode events: boundary conditions for the peak-end
rule. Emotion, 9(2), 206-213.
Montgomery, N. V., & Unnava, H. R. (2009). Temporal sequence effects: A memory
framework. Journal of Consumer Research, 36(1), 83-92.
Oppenheimer, D. M., Meyvis, T., & Davidenko, N. (2009). Instructional manipulation checks:
Detecting satisficing to increase statistical power. Journal of Experimental Social
Psychology, 45, 867–872.
Pine, J. B., & Gilmore, J. H. (1998). Welcome to the experience economy. Harvard Business
Review, 76, 97-105.
Redelmeier, D. A., & Kahneman D. (1996). Patients' memories of painful medical treatments:
Real-time and retrospective evaluations of two minimally invasive procedures. Pain, 116,
3-8.
Redelmeier, D. A., Katz, J., & Kahneman, D. (2003). Memories of colonoscopy: a randomized
trial. Pain, 104(1), 187-194.
Rode, E., Rozin, P., & Durlach, P. (2007). Experienced and remembered pleasure for meals:
Duration neglect but minimal peak, end (recency) or primacy effects. Appetite, 49(1), 18–
29.
Schreiber, C. A., & Kahneman D. (2000). Determinants of the remembered utility of aversive
sounds. Journal of Experimental Psychology: General, 129(1), 27-42.
Questioning the End Effect
43
Shaw, C., Dibeehi, Q., & Walden, S. (2010). Customer Experience: Future Trends and Insights.
Great Britain: Palgrave Macmillan. Google books. Web. 29 August 2014.
http://books.google.com.
Surowiecki, J. (2002). Boom and Gloom. The New Yorker. 11 November: The New Yorker.
Web. 29 August 2014. www.newyorker.com.
Varey, C. A., & Kahneman D. (1992). Experiences extended across time: Evaluation of moments
and episodes. Journal of Behavioral Decision Making, 5, 169-186.
Wirtz, D., Kruger, J., Scollon, C. N., & Diener, E. (2003). What to do on spring break? The role
of predicted, on-line and remembered experience in future choice. Psychological Science,
14, 520-524.
Xu, E. R., Knight, E. J., & Kralik, J. D. (2011). Rhesus monkeys lack a consistent peak-end
effect. The Quarterly Journal of Experimental Psychology, 64(12), 2301-2315.
Download