Skip to main content

‘ads’ directory

See Also

Gwern

“Everything Is Correlated ”, Gwern 2014

Figure 1 (Smith et al 2007): Histogram of Statistically-Significant (at α = 1%) Age-Adjusted Pairwise Correlation Coefficients between 96 Nongenetic Characteristics. British Women Aged 60--79 years old. Demonstrates that pair-wise correlations are common even in apparently unrelated traits.

An­thol­ogy of so­ci­ol­ogy, sta­tis­ti­cal, or psy­cho­log­i­cal pa­pers dis­cussing the ob­ser­va­tion that all real-world vari­ables have non-zero cor­re­la­tions and the im­pli­ca­tions for sta­tis­ti­cal the­ory such as ‘null hy­poth­e­sis test­ing’.

Sta­tis­ti­cal folk­lore as­serts that “every­thing is cor­re­lated”: in any real-world dataset, most or all mea­sured vari­ables will have non-zero cor­re­la­tions, even be­tween vari­ables which ap­pear to be com­pletely in­de­pen­dent of each other, and that these cor­re­la­tions are not merely sam­pling error flukes but will ap­pear in large-scale datasets to ar­bi­trar­ily des­ig­nated lev­els of statistical-significance or pos­te­rior prob­a­bil­ity.

This raises se­ri­ous ques­tions for null-hypothesis statistical-significance test­ing, as it im­plies the null hy­poth­e­sis of 0 will al­ways be re­jected with suf­fi­cient data, mean­ing that a fail­ure to re­ject only im­plies in­suf­fi­cient data, and pro­vides no ac­tual test or con­fir­ma­tion of a the­ory. Even a di­rec­tional pre­dic­tion is min­i­mally con­fir­ma­tory since there is a 50% chance of pick­ing the right di­rec­tion at ran­dom.

It also has im­pli­ca­tions for con­cep­tu­al­iza­tions of the­o­ries & causal mod­els, in­ter­pre­ta­tions of struc­tural mod­els, and other sta­tis­ti­cal prin­ci­ples such as the “spar­sity prin­ci­ple”.

“A/B Testing Long-Form Readability on Gwern.net ”, Gwern 2012

A log of ex­per­i­ments done on the site de­sign, in­tended to ren­der pages more read­able, fo­cus­ing on the chal­lenge of test­ing a sta­tic site, page width, fonts, plu­g­ins, and ef­fects of ad­ver­tis­ing.

To gain some sta­tis­ti­cal & web de­vel­op­ment ex­pe­ri­ence and to im­prove my read­ers’ ex­pe­ri­ences, I have been run­ning a se­ries of CSS A/B tests since June 201213ya. As ex­pected, most do not show any mean­ing­ful dif­fer­ence.

“Banner Ads Considered Harmful ”, Gwern 2017

Huang et al 2019, advertising harms for Pandora listeners: <strong>Figure 4</strong>: Mean Total Hours Listened by Treatment Group; <strong>Figure 5</strong>: Mean Weekly Unique Listeners by Treatment Group. Listeners randomly exposed to more ads gradually erode away compared to their low-ad counterparts, showing that ads cause unhappiness.

9 months of daily A/B-testing of Google Ad­Sense ban­ner ads on Gwern.net in­di­cates ban­ner ads de­crease total traf­fic sub­stan­tially, pos­si­bly due to spillover ef­fects in reader en­gage­ment and re­shar­ing.

One source of com­plex­ity & JavaScript use on Gwern.net is the use of Google Ad­Sense ad­ver­tis­ing to in­sert ban­ner ads. In con­sid­er­ing de­sign & us­abil­ity im­prove­ments, re­mov­ing the ban­ner ads comes up every time as a pos­si­bil­ity, as read­ers do not like ads, but such re­moval comes at a rev­enue loss and it’s un­clear whether the ben­e­fit out­weighs the cost, sug­gest­ing I run an A/B ex­per­i­ment. How­ever, ads might be ex­pected to have broader ef­fects on traf­fic than in­di­vid­ual page read­ing times/bounce rates, af­fect­ing total site traf­fic in­stead through long-term ef­fects on or spillover mech­a­nisms be­tween read­ers (eg. so­cial media be­hav­ior), ren­der­ing the usual A/B test­ing method of per-page-load/ses­sion ran­dom­iza­tion in­cor­rect; in­stead it would be bet­ter to an­a­lyze total traf­fic as a time-series ex­per­i­ment.

De­sign: A de­ci­sion analy­sis of rev­enue vs read­ers yields an max­i­mum ac­cept­able total traf­fic loss of ~3%. Power analy­sis of his­tor­i­cal Gwern.net traf­fic data demon­strates that the high au­to­cor­re­la­tion yields low sta­tis­ti­cal power with stan­dard tests & re­gres­sions but ac­cept­able power with ARIMA mod­els. I de­sign a long-term Bayesian ARIMA(4,0,1) time-series model in which an A/B-test run­ning Jan­u­ary–Oc­to­ber 2017 in ran­dom­ized paired 2-day blocks of ads/no-ads uses client-local JS to de­ter­mine whether to load & dis­play ads, with total traf­fic data col­lected in Google An­a­lyt­ics & ad ex­po­sure data in Google Ad­Sense. The A/B test ran from 2017-01-01 to 2017-10-15, af­fect­ing 288 days with col­lec­tively 380,140 pageviews in 251,164 ses­sions.

Cor­rect­ing for a flaw in the ran­dom­iza­tion, the final re­sults yield a sur­pris­ingly large es­ti­mate of an ex­pected traf­fic loss of −9.7% (dri­ven by the sub­set of users with­out ad­block), with an im­plied −14% traf­fic loss if all traf­fic were ex­posed to ads (95% cred­i­ble in­ter­val: −13–16%), ex­ceed­ing my de­ci­sion thresh­old for dis­abling ads & strongly rul­ing out the pos­si­bil­ity of ac­cept­ably small losses which might jus­tify fur­ther ex­per­i­men­ta­tion.

Thus, ban­ner ads on Gwern.net ap­pear to be harm­ful and Ad­Sense has been re­moved. If these re­sults gen­er­al­ize to other blogs and per­sonal web­sites, an im­por­tant im­pli­ca­tion is that many web­sites may be harmed by their use of ban­ner ad ad­ver­tis­ing with­out re­al­iz­ing it.

“Candy Japan’s New Box A/B Test ”, Gwern 2016

Bayesian decision-theoretic analy­sis of the ef­fect of fancier pack­ag­ing on sub­scrip­tion can­cel­la­tions & op­ti­mal ex­per­i­ment de­sign.

I an­a­lyze an A/B test from a mail-order com­pany of two dif­fer­ent kinds of box pack­ag­ing from a Bayesian decision-theory per­spec­tive, bal­anc­ing pos­te­rior prob­a­bil­ity of im­prove­ments & greater profit against the cost of pack­ag­ing & risk of worse re­sults, find­ing that as the com­pany’s analy­sis sug­gested, the new box is un­likely to be suf­fi­ciently bet­ter than the old. Cal­cu­lat­ing ex­pected val­ues of in­for­ma­tion shows that it is not worth ex­per­i­ment­ing on fur­ther, and that such fixed-sample tri­als are un­likely to ever be cost-effective for pack­ag­ing im­prove­ments. How­ever, adap­tive ex­per­i­ments may be worth­while.

Miscellaneous