How Relevant are Incidental Power Poses for HCI?

The concept of power pose originates from a Psychology study from 2010 which suggested that holding an expansive pose can change hormone levels and increase risk-taking behavior. Follow-up experiments suggested that expansive poses incidentally imposed by the design of an environment lead to more dishonest behaviors. While multiple replication attempts of the 2010 study failed, the follow-up experiments on incidental postures have so far not been replicated. As UI design in HCI can incidentally lead to expansive body postures, we attempted two conceptual replications: we first asked 44 participants to tap areas on a wall-sized display and measured their self-reported sense of power; we then asked 80 participants to play a game on a large touch-screen and measured risk-taking. Based on Bayesian analyses we find that incidental power poses had little to no effect on our measures but could cause physical discomfort. We conclude by discussing our findings in the context of theory-driven research in HCI.


INTRODUCTION
In 2010 Carney et al. asserted that "a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful [which] has real-world, actionable implications" [14] thereby coining the concept of power poses. Yap et al. later identified a set of behaviors such as increased risk-taking or cheating which they showed could be induced through incidental power poses, that is, expansive postures imposed by the design of the environment [65]. Interface and interaction design can also lead to expansive postures of users. Thus the range of scenarios in HCI which could potentially be influenced through explicit design of incidental postures is wide, ranging from decision making under risk such as control room interfaces in power plants, over education (e.g., Isbister and colleagues' work on game interfaces addressing math anxieties [37]), to engaging game design [7,4]. In a 2016 keynote, Bianchi-Berthouze [5] argued that the "affective body is underused in the design of interactive technology despite what it has to offer". Indeed, apart from some isolated studies, the potential relevance of incidental body postures as a design tool in HCI remains unclear.
However, incidental body postures may only be leveraged in HCI if they can be reliably elicited. In 2015, a large-scale replication project [20] re-opened the files on 100 published experiments and found that a considerable number of reported effects did not replicate, leading to the so-called "replication crisis" in Psychology. Neither the study by Carney et al. [14] nor the one by Yap et al. [65] was among the replicated studies, but multiple high powered and pre-registered studies have since then failed to establish a link between power poses and various behavioral measures [53,30,43,55,1,8,38,47,44]. While a Bayesian meta-analysis of six pre-registered studies [34] provides credible evidence for a small effect of power poses on self-reported felt power (d ≈ 0.2), the practical relevance of this small effect remains unclear [41].
It should be noted that all of the failed replications focused on explicitly elicited postures as studied by Carney et al. [14], that is, participants were explicitly instructed to take on a certain posture and afterwards were tested on various measures. Most relevant to HCI are, however, the experiments by Yap et al. [65] on incidental power poses which so far appear to have not been replicated or refuted. Thus it remains unclear whether these effects replicate in an HCI context, and we offer the following contributions with this article: • We operationalize power poses as incidental body postures which can be brought about by interface and interaction design. • We measure in a first experiment effects on self-reported felt power. Our results on their own are inconclusive as our data are consistent with a wide range of possible effect sizes, including zero. • In a second experiment we measure behavioral effects on risk-taking behavior while playing a computer game. Results indicate that the manipulation of incidental body posture does not predict willingness to take risks.

BACKGROUND
In this section we clarify the terminology we use in this article, motivate our work through two scenarios, summarize work on body posture in HCI and previous work in Psychology including the recent controversies around power poses.

Postures versus Gestures
Our use of the terms posture and gesture is consistent with the definitions of the American Heritage Dictionary: posture: position of a person's body or body parts gesture: a motion of the limbs or body made to express or help express thought or to emphasize speech.
Accordingly, a gesture could be described as a dynamic succession of different hand, arm, or body postures. This article is mainly concerned with body postures as we are interested in features of postures "averaged" over the course of interaction, for example, the overall expansiveness of someone's posture during the use of a system.

Motivation
Within a classic desktop environment, that is, a desktop computer equipped with an external display, keyboard, and mouse, a user interface designer has little influence on a user's posture besides requiring or avoiding frequent changes between keyboard and mouse, or manipulating the mouse transfer function. As device form factors diversified, people now find themselves using computers in different environments such as small mobile phones or tablets while sitting, standing, walking, or lying down, or large touch sensitive surfaces while sitting or standing. Device form factors combined with interface design can thus impose postures on the user during interaction. For example, an interface requiring two-handed interaction on a small-screen device (phone, tablet, or laptop) requires that users bring together both hands and their gaze thereby leading to a constrictive incidental posture. On a large touchscreen interface, a UI designer can spread out elements which would require more reaching and lead to more expansive incidental postures (see Fig. 1B and D) or use techniques to bring elements closer together (e.g., [2]) which can make postures more constrictive (see Fig. 1A and C).
We now sketch two scenarios to illustrate how work on body posture from Psychology applies to HCI and why it is relevant for UI design guidelines to determine whether expansive and constrictive body postures during interface use can influence people's motivation, behavior, or emotions.

Education
Riskind and Gotay [54] reported that expansive postures led to a higher persistence when solving problems while people in constrictive postures gave up more easily. Within the area of interfaces for education purposes, say, in schools, it would be important to know whether learning environments designed for small tablet devices incur detrimental effects due to incidental constrictive postures during their use. Should this be the case then design guidelines for such use cases would need to be established recommending larger form factors combined with interfaces leading to more expansive postures.

Risky decision making
Yap et al. reported that driving in expansive car seats leads to riskier driving in a driving simulation [65]. Some professions require decision making under risk on a regular basis, such as air traffic controllers, power plant operators, or financial brokers. Should the interface designs used in such professions (e.g., see Fig. 2) have an influence on people's decisions, then it would be important to minimize such effects accordingly. However, currently we know neither whether such effects exist in these contexts nor how they would need to be counteracted, should they exist.

Body Posture in HCI
The role of the body in HCI has been receiving increased attention. Dourish [26] as well as Klemmer and colleagues [45] emphasize in a holistic manner the importance of the body for interaction design and also consider social factors. However, its role as a feedback channel to emotion and cognition has remained largely unstudied within HCI.
Body postures have received most attention in the context of affective game design. Bianchi-Berthouze and colleagues studied people's body movements during video-game play and how different gaming environments such as desktop or body-controlled games lead to different movement types [6,50]. Savva and colleagues studied how players' emotions can be automatically recognized from their body movements and be used as indicators of aesthetic experience [56], and Bianchi-Berthouze proposed a taxonomy of types of body movements to facilitate the design of engaging game experiences [4]. While this body of work also builds on posture work from Psychology, their interest is in understanding the link between players' body movement and affective experience, not on testing downstream effects of postures on behavior in an HCI context.
Isbister et al. [37] presented scoop! a game using expansive body postures with the intention to overcome math anxiety in students. The focus of this work is on the system's motivation and description and does not include empirical data. Snibbe and Raffle [60] report on their use of body posture and gestures to imbue visitors of science museums with intended emotions. Only little empirical work on concrete effects directly related to the work in Psychology has been published so far. De Rooij and Jones [22] studied gesture pairs based on these ideas. Their work builds on the hypothesis that movements are related to approach and avoidance behaviors, and therefore inherently linked to emotion. They test the hypothesis through an application for creative tasks such as idea generation. In one variant of their application, users extend their arm to record ideas (avoidance gesture); in another variant, they move their arm towards their body (approach gesture). Results show that avoidance gestures lead to lower creativity and more negative emotion than approach gesture.
Two studies within Psychology made use of interactive devices to manipulate incidental postures: (1) Hurtienne and colleagues report in an abstract that sitting hunched or standing upright during the use of a touchscreen leads to different behaviors in a dictator game [36]. If participants were primed with "power concepts" they behaved more self-interested in an upright posture; if they were primed with "moral concepts" the effect was reversed. (2) Bos and Cuddy published a tech report [9] on a study linking display size to willingness to wait. They asked participants to complete a series of tasks, and then let them wait in a room with the device they were using for the tasks (iPod, iPad, MacBook Pro, or iMac). The smaller the device, the longer participants waited and the less likely they went to look for the experimenter. As no details are given about participants' actions during the waiting time (such as playing around with the device or not) nor about the postures participants took on while using the devices, it is unclear whether this correlation can indeed be linked solely to the different display sizes. Further studies are required to determine causal effects due to differences in postures.

Effects of Body Posture on Thought and Behavior
In Psychology, body posture has been linked to a wide range of behavioral and affective effects [11,64,52,39,62]. We focus here only on those closely related to the expansive versus constrictive dyad of power poses.
In 1982, Riskind and Gotay [54] presented four experiments on the relation between physical posture and motivation to persist on tasks. They asked participants to take on either slumped or expansive, upright postures. The former group gave up much faster on a standardized test for learned helplessness than the latter group whereas both groups gave similar self-reports.
More recently, the popular self-help advice to take on a "power pose" before delivering a speech has been linked by multiple studies to increases in confidence, risk tolerance, and even testosterone levels [54,14,21]. Further, Yap and colleagues reported that expansiveness of postures can also affect people's honesty [65]. In contrast to Carney et al., the latter explicitly studied incidental postures, that is, postures imposed by the environment such as a small versus a large workspace or a narrow versus a spacious car seat. Their research suggests that expansive or constrictive postures which are only incidentally imposed by environments (thus allowing more variation between people's postures), can affect people's honesty: people interacting in workspaces that impose expansive postures are supposedly "more likely to steal money, cheat on a test, and commit traffic violations in a driving simulation" [65].

Recent Controversies
In 2015, Ranehill et al. [53] published a high-powered replication attempt of Carney et al. [14] contradicting the original paper. Carney and colleagues responded with an analysis of the differences between their study and the failed replication [15]. They aimed to identify potential moderators by comparing results from 33 studies, to provide alternative explanations for the failed replication. They indicate three variables which they believe most likely determine whether an experiment will detect the predicted effect: (i) whether participants were told a cover story (Carney) or the true purpose of the study (Ranehill), (ii) how long participants had to hold the postures, i.e., comfort, (Carney 2 x 1 min, Ranehill 2 x 3 min) and (iii) whether the study was placed in a social context, i.e., "either a social interaction with another person [...] during the posture manipulation or participants were engaging in a real or imagined social task" [15] (Carney yes, Ranehill no).
In 2016, Carney published a statement on her website where she makes "a number of methodological comments" regarding her 2010 article and expresses her updated belief that these effects are not real [13]. She went on to co-edit a special issue of a psychology journal containing seven pre-registered replication attempts of power pose experiments testing the above discussed moderators to provide "a 'final word' on the topic" [17]. All studies included the self-reported sense of power and one of the following behavioral measures: risk-taking (gambling), performance in mock job interviews, openness to persuasive messages, or self-concept content and size (number and quality of self descriptors). While none of the studies included in the special issue found evidence for behavioral effects, a Bayesian meta-analysis combining the individual results on felt power found a reliable small effect (d ≈ 0.2) [34].
Outside of Psychology the methodology of power pose studies was criticized by statisticians such as Andrew Gelman and Kaiser Fung who argued that most of the published findings on power poses stem from low-powered studies and were likely due to statistical noise [31]. Other statisticians analyzed the evidence base collected by Carney et al. [15] using a method called p-curve analysis [59] whose purpose is to analyze the strength of evidence for an effect while correcting for publication bias. Their analyses "conclusively reject the null hypothesis that the sample of existing studies examines a detectable effect" [58,57].

OBJECTIVES OF THIS ARTICLE
At this point it seems credible that at least some of the initially reported effects of power poses are nonexistent.
Claims related to hormone changes have been definitively refuted [53,55], and none of the recent replications was able to detect a reliable effect on the tested behavioral measures [41].
Nonetheless, a small effect on felt power seems credible [34]. It is however still unclear whether "this effect is a methodological artifact or meaningful" [17]: demand characteristics are an alternative explanation for the effect, that is, participants' responses could be due to the context of the experiment during which they are explicitly instructed to take on certain postures, which may suggest to participants that these postures must be a meaningful experimental manipulation. Such demand characteristics have previously been shown to be explanatory for an earlier finding claiming that people wearing a heavy backpack perceive hills as steeper (see Bhalla and Proffitt [3] for the original study and Durgin et al. [27] for an extended study showing that the effect can be attributed to demand characteristics).
As all recent replications focused on explicitly elicited postures, i.e., participants were explicitly instructed by experimenters to take on a certain posture, demand characteristics are indeed a plausible alternative explanation. This is, however, much less plausible for studies concerned with incidental postures. For the latter, participants are simply instructed to perform a task within an environment, as for a typical HCI experiment, without being aware that different types of environments are part of the experiment, thereby reducing demand characteristics.

Rationale
The experiments on incidental postures reported by Yap et al. [65] have to our knowledge so far not been replicated or refuted. Thus it is currently unclear whether the behavioral effects reported in these experiments can be reproduced and whether they are relevant for HCI.
We argue that the potential impact of such effects for HCI justifies more studies to determine whether the effect exists and, if so, under which conditions the effect can be reproduced. We investigate the potential effects of incidental power poses in two HCI scenarios: first when interacting with a touchoperated wall-sized display, then, when interacting with a large tabletop display. We consider both self-reported sense of power (experiment 1) and risk-taking behavior (experiment 2) as potential outcomes, similar to the studies reported by Yap et al. Again, we only consider incidental postures, that is, postures that are the result of a combination of device form factor and interface layout. As we are only interested to study whether these two factors alone can produce a reliable effect, we do not control for possible variations in posture which are beyond these two factors, such as whether people sit straight up or cross their legs, since controlling for these would make demand characteristics more likely. Instead, our experiment designs only manipulate factors which are in the control of a UI designer. In particular we identify the following differences to previous work in Psychology: • The existing body of work on power poses comes from the Psychology community where postures were carefully controlled by experimenters. We only use device form factors and interface design to impose postures on participants. • We do not separate a posture manipulation phase and a test phase (in experiment 2) but integrate the two which is more relevant in an HCI context. • Similar to the existing literature we measure felt power and risk-taking behavior. In contrast to previous studies which measured risk-taking behavior only through binary choices (one to three opportunities to take a gamble) we use a continuous measure of risk-taking. • For exploratory analysis, we additionally collect a taskrelevant potential covariate that has been ignored in previous work: people's baseline tendency to act impulsively (i.e., to take risks).

EXPERIMENT 1: WALL DISPLAY
In a first experiment, we tested for an effect of incidental posture while interacting with a touch-operated wall display. We asked 44 participants who had signed up for an unrelated pointing experiment whether they were interested to first participate in a short, unrelated "pilot study" which would only last about 3 min. All 44 participants agreed and gave informed consent. The experimenter, who was blind to the experimental hypothesis, instructed participants to stand in front of a 3 m x 1.2 m wall display, and started the experimental application. The experiment was between-subjects and participants were randomly assigned to receive either instructions for a constrictive interface or an expansive interface. Instructions were shown on the display, and participants were encouraged to confirm with the experimenter if something was unclear to them.
To make the interface independent of variances in height and arm span, participants were asked to adapt it to their reach. Participants in the expansive condition were instructed to "move these two circles such that they are located at the height of your head, then move them apart as far as possible so that you can still comfortably reach both of them" (as in Figure 1B). In the constrictive condition, participants were asked to "move these two circles such that you can comfortably tap them with your index fingers while keeping your elbows close to your body" (as in Figure 1A). Once participants had adjusted the position of the circles, they were instructed that the experiment would start once they tapped one of the targets, and that they should continue to alternately tap the two targets for 90 sec. In comparison, Carney et al. [14] used two poses for one minute each, Yap et al. [65] (study 1) one pose for one minute, and Ranehill et al. [53] two poses for three minutes each.
After participants finished the tapping task, the experimenter handed them a questionnaire inquiring about their level of physical discomfort (3 items), then, on a second page, participants were asked how powerful and in charge they felt at that moment (2 items) similar to Carney et al. [14] but on a 7-point scale instead of their 4-point scale.
An a priori power analysis using the G*Power tool [28] indicated that we would require 48 participants, if we hypothe-sized a large effect size 1 of d = 0.73 and aimed for statistical power of 0.8. Since the experiment only took 3 min to complete, and we relied on participants coming in for an unrelated experiment, we stopped after 44 participants resulting in an a priori power of 0.76.

Results
Figures 3-5 summarize the results of experiment 1. The respective left charts show histograms of the responses (7point Likert scale), the charts in the middle show estimates of means with 95% bootstrap confidence intervals 2 and the right charts show the respective differences between the means, also with 95% bootstrap confidence intervals. Sense of feeling in charge. For the feeling in charge item we find no overall difference between the two postures. We should note here that this item caused confusion among participants as many asked the experimenter to explain what the question meant. The experimenter was instructed to advise participants to simply answer what they intuitively felt, which might have led to random responses. We nonetheless report our results in Figure 4. Discomfort. The discomfort measure is derived from three items, inquiring about participants' impressions of how difficult, fatiguing, and painful they found the task. Ratings across the three items were generally similar, thus we computed one derived measure discomfort from an equal-weighted linear combination of the three items. Here, we find a very large effect between the postures, with expansive being rated as leading to much higher levels of discomfort ( Fig. 5)   Bayesian Meta-Analysis on the Power Item. The Bayesian meta-analysis from Gronau et al. [34] made data and R scripts for their analysis of six pre-registered studies measuring felt power available (see osf.io/fxg32). This allowed us to rerun their analysis including our data. Figure 6 shows the results of the analysis for the original meta-analysis and for the extension including our results. The range of plausible effect sizes given our data is wider than for the previous, higher

Discussion
While inconclusive on their own, our results on felt power are consistent with a small effect size d ≈ 0.2 for expansive versus constrictive postures when using a touch interaction on a wall-sized display. More importantly though, we observed a much larger effect, d ≈ 1.5 for discomfort as participants in the expansive condition were asked to hold their arms stretched out for 90 sec to complete the task. Given the small expected effect size, we find the large increase in discomfort more important and do not recommend to attempt to affect users' sense of power through the use of expansive postures on touch-operated wall-sized displays.
These considerations played into the design of a second experiment. We identified as most important factors to control: maintaining equal levels of comfort for both postures, and using an objectively quantifiable and continuous behavioral measure instead of self-evaluation.

EXPERIMENT 2: INCLINED TABLETOP
Our second experiment is inspired by experiment 2 from Yap et al. [65]. There, participants' incidental posture was imposed by either arranging their tools around a large (0.6 m 2 ) or a small (0.15 m 2 ) workspace. Yap et al. study investigated the effect of incidental postures imposed by the different workspaces on people's dishonesty, whereas we applied the paradigm to risk-taking behavior which is a common behavioral measure in multiple studies on explicit power poses [14,18,53]. These previous studies all gave binary choices to participants, asking them whether they were willing to take a single gamble [14,18] or to make several risky choices in both gain and loss domain [53] using examples taken from Tversky and Kahneman [63]. There, participants' binary response was the measure for risk-taking. We opted for a more continuous measure for risk-taking as it results in higher resolution for responses, and used the balloon analog risk task (BART), a behavioral measure for risk-taking propensity [49]. We again study one main factor: incidental posture with two levels, expansive and constrictive, implemented as two variations of the same graphical user interface (see Figure 7). To keep comfort constant across conditions, we used a slightly inclined 60" tabletop display instead of a wall display so that participants in both conditions could rest their arms while performing the task (see Figure 1C&D).

BART: The Balloon Analogue Risk Task
The BART is a standard test in Psychology to measure people's risk-taking behavior in the form of a game [49]. The basic task is to pump up 30 virtual balloons using on-screen buttons. In our implementation, two buttons were placed as indicated in Figure 7 and players were asked to place their hands near them. With each pump, the balloon grows a bit and the player gains a point. Points are commonly linked to monetary rewards and the more players pump up the balloons, the higher their payout. The maximum size of a balloon is reached after 128 pumps. The risk is introduced through a random point of explosion for each balloon with the average and median explosion point at 64 pumps. A balloon needs to be cashed in before it explodes to actually gain points for that balloon. Participants are only told that a balloon can explode ( Fig. 7-D) at any point between the minimum size, i.e., after 1 pump, and the maximum, when it touches the line drawn underneath the pump (see Figure 7-B), and that they need to cash in a balloon before it explodes to gain points. The

Measures
The measure of the BART is the average number of pumps people make on balloons which did not explode, called adjusted number of pumps. It is used in Psychology as a measure of people's tendency to take risks: with each pump players have to weigh the risk of the balloon exploding against the possible gain of points [49,48,10,29]. The theoretically optimal behavior would be to perform 64 pumps on all balloons. It would maximize payout and also lead to 50% exploding balloons. Yet, previous studies found that participants stop on average much earlier [48].

Adjusted number of pumps
According to a meta-analysis of 22 studies using this measure [48], the average adjusted number of pumps is 35.6 (SE 0.28) . However, the meta-analysis showed that means varied considerably between studies from 24.60 to 44.10 (with a weighted SD = 5.93). Thus, only analyzing the BART's main measure would probably not be sensitive enough to identify a difference between the studied postures. We account for this by also computing a normalized measure (percent change) and by capturing people's general tendency to take risks as a covariate.

Percent change of pumps
The game can be conceptually divided into 3 phases: during the first 10 balloons, players have no prior knowledge of when balloons will explode. This phase has been associated with decision making under uncertainty [10]. In the second phase, players mostly consolidate the impressions gained in the first phase, whereas the last phase indicates decision making under risk: players developed intuitions and aim to maximize their payout. While the BART is widely used [48], little data is available for the individual phases. Most studies only report the main measure which is averaged over all phases. Still, we know from the original study that the average increase of pumps between the first and the last phase is about 33% [49].
Since we hypothesize that a possible effect of incidental posture should occur over the course of the experiment, we expect that it should not be present while pumping up the first balloons. By comparing data from this first phase with data from the last phase, we derive a normalized measure for how people's behavior changed over the course of the experiment (∼10 min). We define this measure, percent change as follows: number of pumps required to achieve the maximum size and, most importantly, the number of pumps needed to optimize the payout is unknown to the participant.¯X (adj. pumps in phase 3) � X(adj. pumps in phase 1) % change = X (adj. pumps in phase 1)

Covariate: impulsiveness
We additionally tested participants on the BIS-11 Barrett impulsiveness scale [51,61] to capture their general tendencies to react impulsively. The scale is a 30 items questionnaire inquiring about various behaviors such as planning tasks, making decisions quickly, or buying things on impulse. We included it as a covariate as Lejuez et al. reported a correlation with the BART measure (r = 0.28 [49]).

Covariate: comfort
In light of our findings from experiment 1, we also included an extended questionnaire relating to both physical and mental comfort as well as fatigue (items 1-12 from the ISO 9241-9 device assessment questionnaire [25]).

Participants
We recruited a total of 80 participants (42 women, 38 men, mean age 26) in two batches. Similar to experiment 1, we initially recruited 40 participants. A Bayes factor analysis [24] at that point indicated that our data was not sensitive enough to draw any conclusions, and we decided to increase the total number of participants to 80. As is common in this type of experiment and as suggested by Carney et al. [15], we used a cover story to keep our research question hidden from participants. The study was advertised as a usability study for a touchscreen game. Participants were unaware of the different interface layouts since posture was manipulated between subjects, making it more difficult for them to guess the real purpose of the study.

Procedure
Similar to experiment 1, participants were alternately assigned in order of arrival to either the constrictive or the expansive condition. After signing an informed consent form for the "usability study", participants were introduced to the tabletop setup and asked to go through the on-screen instructions of the game. They were informed that the amount of their compensation would depend on the number of points they achieved in the game. They then pumped up 30 balloons. Once finished, they filled a questionnaire on their level of comfort during the game (12 items), and the BIS-11 impulsivity test [51] (30 items). Finally, participants filled a separate form to receive a cinema voucher for their participation. The value of the voucher was between 13e and 20e, depending on how many points they accumulated in the game following the original BART protocol [49]. The entire experiment lasted about 20 min.

BAYESIAN ANALYSIS
We analyze our data using Bayesian estimation following the analysis steps described by Kruschke [46] for the robust analysis of metric data in nominal groups with weakly informed skeptical priors which help to avoid inflated effect sizes [42]. We reuse R code supplied by Kruschke [46] combined with the tidybayes 3 package for R to plot posterior distributions.
Our analysis setup can be seen as a Bayesian analog to a standard ANOVA analysis yet without the prerequisites of an 3 Tidybayes by Matthew Kay, github.com/mjskay/tidybayes ANOVA, normality and equal variances, and with the possibility of accepting the null hypothesis if the posterior credibility for parameter ranges falls into a pre-defined region of practical equivalence (ROPE) [46]. For example, we could decide that we consider any difference between groups of less than + � 5% as too small a difference to be of practical relevance. As we did not decide on a ROPE before data collection, we refrain from using this tool. Most importantly, the outcome of the analysis are distributions for credible ranges of parameter estimates which is more informative than dichotomous hypotheses testing [42]. Figure 8 shows that for adjusted number of pumps, the distributions from both groups are rather similar and mostly symmetric. For percent change, the data is positively skewed.

Priors
We choose weakly informed skeptical priors. Since previous work on the BART reports large variances between studies [48], we scale the prior for the intercept a 0 based on our data and not on estimates from previous work. For the deflection parameters a[x[i]], we choose a null hypothesis of no difference between groups expressed through a normally distributed prior centered at 0 with individual standard deviations per group. For the scale parameters σ y and σ a we assume a gamma distribution with shape and rate parameters chosen such that its mode is SD(y)/2 and its standard deviation is 2 * SD(y) [46, page 560f]. The regularizing prior for degrees of freedom ν is a heavy-tailed exponential.  Figure 9. Eye plots of the posterior distributions of parameters with 95% HDI (highest density interval). Left: parameter estimates for the standard BART measure; right: parameter estimates for the percent change measure.

Fitting the model
We fit the model using Markov chain Monte Carlo (MCMC) sampling in JAGS [46]. We ran three chains with 10,000 steps burnin, thinning of 10 for a final chain length of 50,000. Convergence of chains was assessed through visual inspection of diagnostic plots such as trace plots, density plots, and autocorrelation plots as well as by checking that all parameters passed the Gelman-Rubin diagnostic [32]. The results presented in the next section are computed from the respective first chains.

RESULTS
The outcome of our analysis is posterior distributions for the parameters in our model. These distributions indicate credible values for the parameters. One way of representing these is to plot the density of these distributions together with a 95% highest density interval (HDI) as so-called eyeplots [42] (as done in Figures 9 & 11). Any value within an HDI is more credible than all values outside an HDI. The width of an HDI is an indicator for the certainty of our beliefs: narrow intervals indicate high certainty in estimates whereas wide ones indicate uncertainty. Finally, not all values within an HDI are equally credible which is indicated through the density plot around the HDI: values in areas with higher density have a higher credibility than values in less dense areas.
We now present our results by first analyzing the posterior parameter estimates for our Bayesian model for both the standard BART measure and our percent change measure (summarized in Figure 9) and analyze contrasts pertaining to our research question as to whether incidental posture had an influence on people's behavior.

Posterior Parameter Estimates
Posterior distributions for our parameter estimates are summarized in Figure 9. The intercept, a 0 , indicates the estimate for the overall mean across both groups, whereas the groupwise estimates, a 0 + a[x i ], show distributions for estimates of the means split by expansive-constrictive posture. The difference plots in the middle indicate whether a group differs from the overall mean, and the third plot to the right indicates the difference between the two groups.

Standard BART Measure
The results for the standard BART measure are shown in Figure 9-left. For the adjusted number of pumps we find a shared intercept a 0 of 42.6 with a [39.1, 46.0] 95% highest density interval (HDI). This value is within the upper range of previous studies using the BART which varied between 24.60 and 44.1 [48]. The estimates for the group-wise means for the two body postures are both close to the overall mean which is confirmed by the HDIs for the credible differences to the intercept as well as the difference between postures: point estimates are all within the range of [-1,1] from the intercept with HDIs smaller than [-5,5].

Percent Change Measure
The results for the percent change measure are illustrated in Figure 9-right. For the percent change measure we find an overall intercept a 0 of 24.7% [15.0, 34.7] 95% HDI which is below the average increase of 33% found by Lejuez et al. [49]. Similar to the standard BART measure we find very small differences for the two posture groups which are within [-0.5, 0.5] for the point estimates with 95% HDIs smaller than [-9,9]. Not only is the credible range for the estimates considerably larger than for the BART measure, but also the posterior distribution for the difference between the two postures is rather uncertain with a wide HDI spanning [-17.3, 15.8].

Effects and Interactions with Covariates
We captured comfort, impulsiveness [51], and gender as covariates. Both comfort and gender showed only negligible variance both across postures and within groups. We therefore only report the analysis for impulsiveness in more detail.

Impulsiveness
To test for possible influence of the impulsiveness covariate, we split participants into either "high risk takers" (BIS11 index >= 64) or "low risk takers" (BIS11 index < 64, where 64 is the median value within our sample population). This split leads to different profiles between the resulting four groups as  The intercept for the two factors combined is 25.2% [15.2, 35.7].
Body posture accounts for some of the uncertainty but similarly for both conditions.
For high impulsiveness indices positive values are slightly more credible than negative values and vice versa.
It seems most credible that the interaction parameters crossing body posture and impulsiveness account for most of the observed differences. Figure 11. Summary of our two-factor analysis for percent change indicating the highest density intervals for the different components of the extended linear model. Figure 10 indicates. For the adjusted # of pumps measure, the split indicates rather similar profiles across groups. For the percent change measure, however, the split separates groups with seemingly different profiles.
To analyze the data for this measure taking the covariate into account, we extend our previous one-factor model with a second factor including an interaction term as follows: Priors were chosen skeptically as detailed before.
Results. The results are summarized visually in Figure 11. We find again almost completely overlapping credible intervals for the posture factor centered within [-0.5,0.5] with HDIs smaller than [-10,10]. The impulsiveness factor also played a rather negligible role. Surprisingly, we find an interaction between posture and impulsiveness: it appears that body posture affected low risk-takers as predicted by Yap et al. whereas it seems to have reversed the effect for high risktakers. However, this part of the analysis was exploratory and a confirmatory study would be needed to verify this finding. Additionally, the two experimental groups were slightly unbalanced, that is, the BIS scores in the expansive group had a slightly lower mean than in the constrictive group (µ exp = 63.2, µ cons = 66.0, [-7.1, 1.5] 95% CI on difference).

DISCUSSION
We first summarize our findings and then discuss them in light of our research question and approach.

Summary of our findings
We ran two experiments designed to identify possible effects of incidental power poses on the sense of power (experiment 1) and on risk-taking behavior (experiment 2). While multiple replication attempts on explicitly elicited power poses had failed to show reliable effects for behavioral effects and only a small effect on felt power, it remained unclear whether the effects for incidental power poses, reported by Yap et al. [65] would replicate and whether incidental power poses are important to consider when designing user interfaces.

Experiment 1
The first experiment found a considerably larger effect for discomfort (d ≈ 1.5 [0.8, 2.3]) than for felt power (d ≈ 0.4 [-0.2, 1.1]). On its own the first experiment thus failed to find the effect expected based on Yap et al. [65], and the optimism for incidental power poses generated from that study is not supported by our findings. Our results are however consistent with a much smaller effect of d ≈ 0.2 as was recently suggested by a meta-analysis [34]. Thus, we can at best conclude that a small effect might exist. In practice, the effect remains difficult to study as the small effect size requires large participant pools to reliably detect the effect. Such large participant pools are rather uncommon in HCI [12] with the exception of crowdsourced online experiments where the reduced experimental control might negatively effect the signal to noise ratio of an already small effect. Besides such practical considerations, the very large effect on (dis)comfort severely limits the range of acceptable expansive interfaces.

Experiment 2
The second experiment found that incidental body posture did not predict participants' behavior. As with experiment 1, this is consistent with the findings of the recent replications which elicited postures explicitly; none of those were able to detect an effect on behavior either. Again, a large effect as reported by Yap et al. [65] is highly unlikely in light of our results. We thus conclude that incidental power poses are unlikely to produce measurable differences in risk-taking behavior when tested across a diverse population. An exploratory analysis of interaction effects on the normalized measure suggests that an effect of body posture as predicted by Yap et al. could be observed within the group of participants showing low BIS-11 scores, while the effect was reversed for participants with high BIS-11 scores. Should this interaction replicate, then it would explain why overall no effect for the expansiveness of postures can be found. However, a confirmatory study verifying such an interaction is needed before one can draw definitive conclusions and possibly amend design guidelines.

Relevance of Power Poses for HCI
Overall we found an apparent null or at best negligible effect of body postures on behavior. For a user interface targeted at diverse populations, it thus seems futile to attempt to influence people's behavior through incidental postures. As a general take-away, we recommend avoiding both overly expansive as well as constrictive postures and to rather focus on factors such as general comfort or efficiency as appropriate to the purpose of an intended user interface.
In some previous work it was argued that a social interaction would be necessary to observe a power pose effect [18,15].
While our experiments did not investigate this claim, recent work by Cesario and Johnson [16] provides evidence against this claim. It thus seems equally unlikely that power poses would be of concern for social user interfaces. However, our research only concerned power poses and tested downstream effects, that is, whether posture manipulations led to changes in behavior. We cannot draw any conclusions about the other direction: for example, posture seems to be indicative of a user's engagement or affective state [56].

Need for Replication
Concerning the interaction observed in our second experiment, we want to again caution that this finding needs to be replicated to confirm such an interaction. The analysis that brought forward this finding was exploratory, and our experiment included only 80 participants -more than usual inperson experiments in HCI [12] but less than the failed replications of explicitly elicited power poses. We suggest that replications could focus on specific, promising or important application areas where effects in different directions might have an either desirable or detrimental impact on people's lives, and participants should be screened for relevant personality traits, such as impulsiveness or the "the big-five" [33], to examine interaction effects with these covariates.
Replication is still not very common within HCI [35] despite various efforts to encourage more replications such as the repliCHI panel and workshops between 2011 and 2014 (see www.replichi.com for details and reports) as well as the "repliCHI badge" given to some CHI articles published at CHI'13/14. Original results are generally higher valued than confirmations or refutations of existing knowledge. A possible approach to encourage more replications could be through special issues of HCI journals. For example, the (Psychology) journal that published the special issue on power poses took a progressive approach to encourage good research practices, such as preregistered studies [19] or replications, by moving the review process before the collection of data, thereby removing possible biases introduced by a study's outcomes [40]: only the introduction, background, study designs, and planned analyses are sent in for review, possibly revised upon reviewer feedback, and only once approved, the study is actually executed and already guaranteed to be published, irrespective of its findings. We believe such an approach could be equally applied in HCI to work towards a conclusive evidence base for research questions the community deems interesting and important.

Reflections on our Approach
Power poses are an example of a construct from Psychology that has received extensive scientific and public coverage; both soon after publication and once the results of the studies were challenged. Transferring this construct to HCI raised several challenges: (i) practical relevance: identifying which areas of HCI could be impacted by this construct, (ii) ecological validity: operationalizing the construct for HCI such that the resulting manipulations and tasks resemble "realistic" user interfaces which could be encountered outside the lab, and (iii) respecting the boundary conditions within which the construct can be evoked.
Concerning (i), the literature on incidental power poses provides a rich set of behaviors such as cheating and risk-taking.
We gave examples in the background section for areas relevant to HCI -education and risky decision-making -in which an effect of power poses would be pivotal to understand.
Concerning (ii) and (iii), the challenges were less easy to address. Carney et al. argued in their summary of past research on explicitly elicited postures [15] that replications might fail if the postures are not replicated closely enough. The experiments by Yap et al. [65] did not carefully control the postures but only modified the environment. So it was unclear whether we would need to consider a wide set of gestures and poses and how to find out which of those instantiated the construct well. We addressed these challenges by considering the relevance for HCI as the most important experiment design criterion: since an interface designer has very little influence on users' posture beyond the positioning of interface elements, we decided to consider power poses as irrelevant for HCI if they require very specific positioning of users.

CONCLUSION
We investigated whether incidental postures, in particular constrictive and expansive postures, influence how users behave in human-computer interaction. The literature raised the expectation that such postures might set about cognitive and physiological reactions, most famously from findings by Carney et al. [14] as well as Yap et al. [65]. While the findings from Carney et al. on explicitly elicited power poses did not hold up to replications, the experiments by Yap et al. had so far not been replicated. We reported findings from two experiments which conceptually replicated experiments on incidental power poses in an HCI context. We observed an at best small effect for felt power and an at best negligible effect for a behavioral measure for risk-taking. Most surprisingly, an exploratory analysis suggested that an interaction with a personality trait, impulsiveness, might reverse the hypothesized effect for posture manipulations. However, replications controlling for this interaction are needed to determine if this interaction reliably replicates and thus poses a relevant design consideration for HCI. Overall we conclude that incidental power poses are unlikely to be relevant for the design of human-computer interfaces and that factors such as comfort play a much more important role.
To support an open research culture and the possibility to replicate our work or to reanalyze our data, we share all experimental data and software as well as all analysis scripts at github.com/yvonne-jansen/posture.