Divergent preferences and meta-preferences
Crossposted at the Intelligent Agents Forum.
In simple graphical form, here is the problem of divergent human preferences:
Here the AI either chooses A or ¬A, and as a consequence, the human then chooses B or ¬B.
There are a variety of situations in which this is or isn't a problem (when A or B or their negations aren't defined, take them to be the negative of what is define):
- Not problems:
- A/¬A = "gives right shoe/left shoe", B/¬B = "adds left shoe/right shoe".
- A = "offers drink", ¬B = "goes looking for extra drink".
- A = "gives money", B = "makes large purchase".
- Potentially problems:
- A/¬A = "causes human to fall in love with X/Y", B/¬B = "moves to X's/Y's country".
- A/¬A = "recommends studying X/Y", B/¬B = "choose profession P/Q".
- A = "lets human conceive child", ¬B = "keeps up previous hobbies and friendships".
- Problems:
- A = "coercive brain surgery", B = anything.
- A = "extreme manipulation", B = almost anything.
- A = "heroin injection", B = "wants more heroin".
So, what are the differences? For the "not problems", it makes sense to model the human as having a single reward R, variously "likes having a matching pair of shoes", "needs a certain amount of fluids", and "values certain purchases". Then all that the the AI is doing is helping (or not) the human towards that goal.
As you move more towards the "problems", notice that they seem to have two distinct human reward functions, RA and R¬A, and that the AI's actions seem to choose which one the human will end up with. In the spirit of humans not being agents, this seems to be AI determining what values the human will come to possess.
Grue, Bleen, and agency
Of course, you could always say that the human actually has reward R = IARA + (1-IA)R¬A, where IA is the indicator function as to whether the AI does action A or not.
Similarly to the grue and bleen problem, there is no logical way of distinguishing that "pieced-together" R from a more "natural" R (such as valuing pleasure, for instance). Thus there is no logical way of distinguishing the human being an agent from the human not being an agent, just from its preferences and behaviour.
However, from a learning and computational complexity point of view, it does make sense to distinguish "natural" R's (where RA and R¬A are essentially the same, despite the human's actions being different) from composite R's.
This allows us to define:
- Preference divergence point: A preference divergence point is one where RA and R¬A are sufficiently distinct, according to some criteria of distinction.
Note that sometimes, RA = RA' + R' and R¬A = R¬A' + R': the two RA and R¬A overlap on a common piece R', but diverge on RA' and R¬A'. It makes sense to define this as a preference divergence point as well, if RA'and R¬A' are "important" in the agent's subsequent decisions. Importance being a somewhat hazy metric, which would, for instance, assess how much R' reward the human would sacrifice to increase RA' and R¬A'.
Meta-preferences
From the perspective of revealed preferences about the human, R(μ)=IARA + μ(1-IA) R¬A will predict the same behaviour for all scaling factors μ > 0.
Thus at a preference divergence point, the AI's behaviour, if it was a R(μ) maximiser, would depend on the non-observed weighting between the two divergent preferences.
This is unsafe, especially if one of the divergent preferences is much easier to achieve a high value with than the other.
Thus preference divergence points are moments when the AI should turn explicitly to human meta-preferences to distinguish between them.
This can be made recursive - if we see the human meta-preferences as explicitly weighting RA versus R¬A and hence giving R, then if there is a prior AI decision point Z, and, depending on what the AI chooses, the human meta-preferences will be different, this gives two reward functions RZ=IARA+ μZ(1-IA)R¬A and R¬Z=IARA+ μ¬Z(1-IA)R¬A with different weights μZ and μ¬Z.
If these weights are sufficiently distinct, this could identify a meta-preference divergence point and hence a point where human meta-meta-preferences become relevant.
Invitation to comment on a draft on multiverse-wide cooperation via alternatives to causal decision theory (FDT/UDT/EDT/...)
I have written a paper about “multiverse-wide cooperation via correlated decision-making” and would like to find a few more people who’d be interested in giving a last round of comments before publication. The basic idea of the paper is described in a talk you can find here. The paper elaborates on many of the ideas and contains a lot of additional material. While the talk assumes a lot of prior knowledge, the paper is meant to be a bit more accessible. So, don’t be disheartened if you find the talk hard to follow — one goal of getting feedback is to find out which parts of the paper could be made more easy to understand.
If you’re interested, please comment or send me a PM. If you do, I will send you a link to a Google Doc with the paper once I'm done with editing, i.e. in about one week. (I’m afraid you’ll need a Google Account to read and comment.) I plan to start typesetting the paper in LaTeX in about a month, so you’ll have three weeks to comment. Since the paper is long, it’s totally fine if you don’t read the whole thing or just browse around a bit.
Open thread, May 29 - June 4, 2017
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
Bi-Weekly Rational Feed
Five Recommended Articles You Might Have Missed:
The Four Blind Men The Elephant And Alan Kay by Meredith Paterson (Status 451) - Managing technical teams. Taking a new perspective is worth 90 IQ points. Getting better enemies. Guerrilla action.
Vast Empirical Literature by Marginal REVOLUTION - Tyler's 10 thoughts on approaching fields with large literatures. He is critical of Noah's "two paper rule" and recommends alot of reading.
Notes From The Hufflepuff Unconference (Part 1) by Raemon (lesswrong) - Goal: Improve at: "social skills, empathy, and working together, sticking with things that need sticking with". The article is a detailed breakdown of the unconference including: Ray's Introductory Speech, a long list of what people want to improve on, the lightning talks, the 4 breakout sessions, proposed solutions, further plans, and closing words. Links to conference notes are included for many sections.
Antipsychotics Might Cause Cognitive Impairment by Sarah Constantin (Otium) - A harrowing personal account of losing abstract thinking ability on Risperdal. The author conducts a literature review, and concludes with some personal advice about taking medication.
Dwelling In Possibility by Sarah Constantin (Otium) - Leadership. Confidence in the face of the uncertainty and imperfection. Losing yourself when you try to step back and facilitate.
Scott:
Those Modern Pathologies by Scott Alexander - You can argue X is a modern pathology for almost any value of X. Scott demonstrates this by repeated example. Among other things "Aristotelian theory of virtue" and "Homer's Odyssey" get pathologized.
The Atomic Bomb Considered As Hungarian High School Science Fair Project by Scott Alexander - Ashkenazi Jewish Intelligence. An explanation of Hungarian dominance in physics and science in the mid 1900s.
Classified Ads Thread by Scott Alexander - Open thread where people post ads. People are promoting their websites and some of them are posting actual job ads among other things.
Open Thread 76 by Scott Alexander - Bi-weekly Open thread.
Postmarketing Surveillance Is Good And Normal by Scott Alexander - Scott shows why a recent Scientific American study does not imply the FDA is too risky.
Epilogue by Scott Alexander (Unsong) - All's Whale that Ends Whale.
Polyamory Is Not Polygyny by Scott Alexander - A quick review of how polyamory actually function in the rationalist community.
Bail Out by Scott Alexander - "About a fifth of the incarcerated population – the top of the orange slice, in this graph – are listed as “not convicted”. These are mostly people who haven’t gotten bail. Some are too much of a risk. But about 40% just can’t afford to pay."
Rationalist:
Strong Men Are Socialist Reports A Study That Previously Reported The Opposite by Jacob Falkovich (Put A Number On It!) - Defense Against the Dark Statistical Arts. Jacob provides detailed commentary on a popular study and shows that the studies dataset can be used to support the opposite conclusion, with p = 0.0086.
Highly Advanced Tulpamancy 101 For Beginners by H i v e w i r e d - Application of lesswrong theory to the concept of the self. In particular the author applies "How an Algorithm Feels from the Inside" and "Map and Territory". Hive then goes into the details of creating and interacting with tulpas. "A tulpa is an autonomous entity existing within the brain of a “host”. They are distinct from the host in that they possess their own personality, opinions, and actions"
Existential Risk From Ai Without An Intelligence by Alex Mennen (lesswrong) - Reasons why an intelligence explosion might not occur and reasons why we might have a problem anyway.
Dragon Army Theory Charter (30min Read) by Duncan Sabien (lesswrong) - A detailed plan for an ambitious military style rationalist house. The major goals include self-improvement, high quality group projects and the creation of a group with absolute trust in one another. The leader of the house is the curriculum director and head of product at CFAR.
The Story Of Our Life by H i v e w i r e d - The authors explain their pre-rationalist life and connection to the community. They then argue the rationalist community should take better care of one another. "Venture Rationalism".
Don't Believe in God by Tyler Cowen - Seven arguments for not believing in God. Among them: Lack of Bayesianism among believers, the degree to which people follow their family religion and the fundamental weirdness of reality.
Antipsychotics Might Cause Cognitive Impairment by Sarah Constantin (Otium) - A harrowing personal account of losing abstract thinking ability on Risperdal. The author conducts a literature review, and concludes with some personal advice about taking medication.
The Four Blind Men The Elephant And Alan Kay by Meredith Paterson (Status 451) - Managing technical teams. Taking a new perspective is worth 90 IQ points. Getting better enemies. Guerrilla action.
Qualia Computing At Consciousness Hacking June 7th 2017 by Qualia Computing - Qualia computing will present in San Fransisco on June 7th at Consciousness Hacking. The event description is detailed and should give readers a good intro to Qualia Computing's goals. The author's research goal is to create a mathematical theory of pain/pleasure and be able to measure these directly from brain data.
Notes From The Hufflepuff Unconference (Part 1) by Raemon (lesswrong) - Goal: Improve at: "social skills, empathy, and working together, sticking with things that need sticking with". The article is a detailed breakdown of the unconference including: Ray's Introductory Speech, a long list of what people want to improve on, the lightning talks, the 4 breakout sessions, proposed solutions, further plans, and closing words. Links to conference notes are included for many sections.
Is Silicon Valley Real by Ben Hoffman (Compass Rose) - The old culture of Silicon Valley is mostly gone, replaced by something overpriced and materialist. Ben check's the details of Scott Alexander's list of six noble startups and finds only two in SV proper.
Why Is Harry Potter So Popular by Ozy (Thing of Things) - Ozy discusses a paper on song popularity in an artificial music market. Social dynamics had a big impact on song ratings. "Normal popularity is easily explicable by quality. Stupid, wild, amazing popularity is due to luck."
Design A Better Chess by Robin Hanson - Can we design a game that promotes even more useful honesty than chess? A link to Hanson's review of Gary Kasparov's book is included.
Deserving Truth 2 by Andrew Critch - How the author's values changed over time. Originally he tried to maximize his own positive sensory experiences. The things he cared about began to include more things, starting with his GF's experiences and values. He eventually rejects "homo-economus" thinking.
A Theory Of Hypocrisy by João Eira (Lettuce be Cereal) - Hypocrisy evolved as a way to solve free rider problems. "It pays to be a free rider. If no one finds out"
Building Community Institution In Five Hours a Week by Particular Virtue - Eight pieces of advice for running a successful meetup. The author and zir partner have been running lesswrong events for five years.
Dwelling In Possibility by Sarah Constantin (Otium) - Leadership. Confidence in the face of the uncertainty and imperfection. Losing yourself when you try to step back and facilitate.
Ai Safety Three Human Problems And One Ai Issue by Stuart Armstrong (lesswrong) - Humans have poor predictions, don't know their values and aren't agents. Ai might be very powerful. A graph of which problems many Ai risk solutions target.
Recovering From Failure by mindlevelup - Avoid negative spirals, figure out why you failed, List of questions to ask yourself. Strategies -> Generate good alternatives, metacognitive affordances.
Review The Dueling Neurosurgeons by Sam Kean by Aceso Under Glass - Positive review. Author learned alot. Speculation on a better way to teach Science.
Principia Qualia Part 2: Valence by Qualia Computing - A mathematical theory of valence (what makes experience feel good or bad). Speculative but the authors make concrete predictions. Music plays a heavy role.
Im Not Seaing It by Robin Hanson - Arguments against seasteading.
EA:
One of the more positive surprises by GiveDirectly - Links post. Eight articles on Give Directly, Cash Transfer and Basic Income.
Returns Functions And Funding Gaps by the Center for Effective Altruism (EA forum) - Links to CEA's explanation of what "returns functions" are and how using them compares to "funding gap" model. They give some arguments why returns functions are a superior model.
Online Google Hangout On Approaches To by whpearson (lesswrong) - Community meeting to discuss Ai risk. Will use "Optimal Brainstorming Theory". Currently early stage. Sign up and vote on what times you are available.
Expected Value Estimates We Cautiously Took by The Oxford Prioritization Project (EA forum) - Details of how the four bayesian probability models were compared to produce a final decision. Some discussion of how assumptions affect the final result. Actual code is included.
Four Quantitative Models Aggregation And Final by The Oxford Prioritization Project (EA forum) - 80K hours, MIRI, Good Foods Institute and StrongMinds were considered. Decisions were made using concrete Bayesian EV calculations. Links to the four models are included.
Peer to Peer Aid: Cash in the News by GiveDirectly - 8 Links about GiveDirectly, cash transfer and basic income.
The Value Of Money Going To Different Groups by The Center for Effective Altruism - "It is well known that an extra dollar is worth less when you have more money. This paper describes the way economists typically model that effect, using that to compare the effectiveness of different interventions. It takes remittances as a particular case study."
Politics and Economics:
Study Of The Week Better And Worse Ways To Attack Entrance Exams by Freddie deBoer - Freddie's description of four forms of "test validity". The SAT and ACT are predictive of college grades, one should criticize them from other angles. Freddie briefly gives his socialist critique.
How To Destroy Civilization by Zvi Moshowitz - A parable about the game "Advanced Civilization". The difficulties of building a coalition to lock out bad actor. Donald Trump. [Extremely Partisan]
Trust Assimilation by Bryan Caplan - Data on how much immigrants and their children trust other people. How predictive is the trust level of their ancestral country. Caplan reviews papers and crunches the numbers himself.
There Are Bots, Look Around by Renee DiResta (ribbonfarm) - High frequency trading disrupted finance. Now algorithms and bots are disrupting the marketplace of ideas. What can finance's past teach us about politics' future?
The Behavioral Economics of Paperwork by Bryan Caplan - Vast Numbers of students miss financial aid because they don't fill out paperwork. Caplan explores the economic implications of the fact that "Humans hate filling out paperwork. As a result, objectively small paperwork costs plausibly have huge behavioral response".
The Nimby Challenge by Noah Smith - Smith Argues makes an economic counterargument to the claims that building more housing wouldn't lower prices. Noah includes 6 lessons for engaging with NIMBYs.
Study Of The Week What Actually Helps Poor Students: Human Beings by Freddie deBoer - Personal feedback, tutoring and small group instruction had the largest positive effect. Includes Freddie's explanation of meta-analysis.
Vast Empirical Literature by Marginal REVOLUTION - Tyler's 10 thoughts on approaching fields with large literatures. He is critical of Noah's "two paper rule" and recommends alot of reading.
Impact Housing Price Restrictions by Marginal REVOLUTION - Link to a job market paper on the economic effects of housing regulation.
Me On Anarcho Capitalism by Bryan Caplan - Bryan is interviewed on the Rubin Report about Ancap.
Campbells Law And The Inevitability Of School Fraud by Freddie deBoer - Rampant Grade Inflation. Lowered standards. Campbell's law says that once you base policy on a metric that metric will always start being gamed
Nimbys Economic Theories: Sorry Not Sorry by Phil (Gelman's Blog) - Gelman got a huge amount of criticism on his post on whether building more housing will lower prices in the Bay. He responds to some of the criticism here. Long for Gelman.
Links 8 by Artir (Nintil) - Link Post. Physics, Technology, Philosophy, Economics, Psychology and Misc.
Arguing About How The World Should Burn by Sonya Mann ribbonfarm - Two different ways to decide who to exclude. One focuses on process the other on content. Scott Alexander and Nate Soares are quoted. Heavily [Culture War].
Seeing Like A State by Bayesian Investor - A quick review of "Seeing like a state".
Whats Up With Minimum Wage by Sarah Constantin (Otium) - A quick review of the literature on the minimum wage. Some possible explanations for why raising it not reduce unemployment.
Misc:
Entirely Too Many Pieces Of Unsolicited Advice To Young Writer Types by Feddie deBoer - Advice about not working for free, getting paid, interacting with editors, why 'Strunk and White' is awful, and taking writing seriously.
Conversations On Consciousness by H i v e w i r e d - The author is a plural system. Their hope is to introduce plurality by doing the following: "First, we’re each going to describe our own personal experiences, from our own perspectives, and then we’re going to discuss where we might find ourselves within the larger narrative regarding consciousness."
Notes On Debugging Clojure Code by Eli Bendersky - Dealing with Clojure's cryptic exceptions, Finding which form an exception comes from, Trails and Logging, Deeper tracing inside cond forms
How to Think Scientifically About Scientists’ Proposals for Fixing Science by Andrew Gelman - Gelman asks how to scientifically evaluate proposals to fix science. He considers educational, statistical, research practice and institutional reforms. Excerpts from an article Gelman wrote, the full paper is linked.
Call for Volunteers who Want to Exercize by Aceso Under Glass - Author is looking for volunteers who want to treat their anxiety or mood disorder with exercise.
Learning Deep Learning the Easy Way with Keras (lesswrong) - Articles showing the power of neural networks. Discussion of ML frameworks. Resources for learning.
Unsong of Unsongs by Scott Aaronson - Aaronson went to the Unsong wrap party. A quick review of Unsong. Aaronson talks about how Scott Alexander defended him with untitled.
2016 Spending by Mr. Money Mustache - Full details of last year's budget. Spending broken down by category.
Amusement:
And Another Physics Problem by protokol2020 - Two Planets. Which has a higher average surface temperature.
A mysterious jogger by Jacob Falkovich (Put A Number On It!) - A mysterious jogger. Very short fiction.
Podcast:
Persuasion And Control by Waking Up with Sam Harris - "surveillance capitalism, the Trump campaign's use of Facebook, AI-enabled marketing, the health of the press, Wikileaks, ransomware attacks, and other topics."
Raj Chetty: Inequality, Mobility and the American Dream by Conversations with Tyler - "As far as I can tell, this is the only coverage of Chetty that covers his entire life and career, including his upbringing, his early life, and the evolution of his career, not to mention his taste in music"
Is Trump's incompetence saving us from his illiberalism? by The Ezra Klein Show - Political Scientist Yascha Mounk. "What Mounk found is that the consensus we thought existed on behalf of democracy and democratic norms is weakening."
The Moral Complexity Of Genetics by Waking Up with Sam Harris - "Sam talks with Siddhartha Mukherjee about the human desire to understand and manipulate heredity, the genius of Gregor Mendel, the ethics of altering our genes, the future of genetic medicine, patent issues in genetic research, controversies about race and intelligence, and other topics."
Ester Perel by The Tim Ferriss - The Relationship Episode: Sex, Love, Polyamory, Marriage, and More
Lane Pritchett by Econtalk - Growth, and Experiments
Meta Learning by Tim Ferriss - Education, accelerated learning, and my mentors. Conversation with Charles Best the founder and CEO of DonorsChoose.org
Bryan Stevenson On Why The Opposite Of Poverty Isn't Wealth by The Ezra Klein Show - Founder and executive director of the Equal Justice Initiative. Justice for the wrongly convicted on Death Row.
10 'incredible' weaknesses of the mental health system
I aim to identify some of the mental health workforce's credibility issues in this article. This may inform your prevention and treatment strategy as a mental health consumer, or your practice if you work in mental health.
Mental health is the strongest determinant of quality of life at a later age. And, the pursuit of happiness predicts both positive emotions and less depressive symptoms. People who prioritize happiness are more psychologically able. In times of crises, some turn to the mental health system for support. But, how credible is the support available? Here are 10 categories of shortcomings that the mental health sector faces today:
1. Institutional credibility
Headspace's evaluations indicate it’s ineffective and they are evaluated better than many services out there. This isn’t academic, attendees who report that their mental health has not improved since using the service will trust the mental health system less, and with good reason.
2. Network credibility
There is an evidence base for the selecting a type of therapy (psychodynamic, cognitive-behavioural, etc) for a particular constellations of mental symptoms. If you work in mental health, have you ever made a referral on the basis of both symptomatology and theoretical orientation?
3. ‘Walk the talk’ credibility
Social workers, nurses, social workers medical doctors, and psychiatrists abuse substances and incur mental ill-health at among the highest rates of any occupation. For instance, the psychiatrist burnout rate is 40%. Mental health consumers may perceive clinicians as hypocritical or unwilling (...or too willing) to swallow their own medicine.
4. Academic credibility
Psychology is mired by error-riddled research and myth-ridden textbooks. Broadly, most published research is wrong. And, questionable research practices are common which bias the relevant evidence.
The difference between a well designed experiment and a poorly designed psychotherapy experiment is large. To quote the pseudonymous physician Scott Alexander:
‘Low-quality psychotherapy trials in general had a higher effect size (SMD = 0.74) than high-quality trials (SMD = 0.22), p < 0.001"...Effect sizes for the low quality trials are triple those for the high-quality trials.’
5. Credibility of treatments
Are treatments are becoming less effective over time? Cognitive behavioural therapy is a common treatment for various mental illnesses. It is the most researched psychotherapy. However, the more evidence piles up, the less effective that psychotherapy appears to be...the same goes for antidepressants.
Why are outdated treatments still used? Over the 19th and 20th Centuries, Austrian neurologist Sigmund Freud famously founded ‘psychoanalysis’. Psychoanalysis is a school of psychotherapy that together with other 'psychodynamic' psychotherapies focused on early experience on human behaviour and emotion. Freud's ideas challenged fundamental assumptions about human psychology. In particular, he suggested that our conscious mind is the just the tip of iceberg of our identities.
Today Freud is the subject of jokes and derision. Many of his testable ideas have been proven false. 'When tested, psychoanalysis was shown to be less effective than placebo.’ Yet, many psychologists and psychiatrists continue to practice psychoanalysis.
Psychology is a rather unsettled science. One estimate for the time after which half of the ‘knowledge’ in the field of psychology is overturned or superseded (it’s ‘half-life’) is at just 7.5 years. Interestingly, this time-span appears to be falling. That would suggest the field is becoming increasingly less reliable. The subfield of psychoanalysis bucks the trend. It has over double the parent field’s half-life. Why?
How do other subfields of psychology fair? Psychopharmacology is at the intersection of psychiatric drugs and brain chemistry. Knowledge in psychopharmacology is overturned at a rate higher than the rest of the field in general. Typically the ‘half life of knowledge’ argument aims discount psychology relative to ‘harder’ sciences like physics.
Psychological therapies are confusing and unnecessarily fragmented: According to The Handbook of Counseling Psychology:
‘Meta-analyses of psychotherapy studies have consistently demonstrated that there are no substantial differences in outcomes among treatments.’
Meta-analyses are a kind of research technique that quantitatively puts together many pieces of individual relevant research on a particular topic. There is 'little evidence to suggest that any one psychological therapy consistently outperforms any other for any specific psychological disorders.
This is sometimes called the 'Dodo bird verdict' after a scene/section in Alice in Wonderland where every competitor in a race was called a winner and is given prizes'. So, what is one to make of the best vetted clinical guidelines that indicate that particular therapies are more appropriate for particular mental conditions?
Guidelines are considered a higher order of evidence than a ‘handbook’ to some, and vice-versa for another. Could an expert or indeed an amateur credibly lead someone to conclude that all therapies are ‘equal’ or ‘different’ armed with either body of evidence? Could a similar case be made for say, antibiotics? Yes, or so the evidence suggests in the case of antibiotics, actually.
Finally, psychological therapies are administered haphazardly. Eclectically combining elements from different psychological therapies is inefficient. But, it happens. Clinicians should ‘integrate’ components of different psychotherapies using established formulae, if they want to ‘mix and match’. When I hear someone’s theoretical orientation is ‘psychodynamically informed’ or similar, for me that’s a red flag for eccelectisms.
6. Economic credibility
Therapists have a financial incentive to re-traumatise patients.
7. Social credibility
'The benefits of psychotherapy may be no better than the benefits of talking to a friend'.
8. Credibility of counsel
Mental health professionals offer their clients and the community general counsel and advice. But, if I was to ask a given mental health professional about the value of kindness or love of learning they would almost certainly indicate it’s worthwhile. Pop psychology is pervasive. And why not, people have been interested in psychology long before it was a science. But, misconceptions about psychology infiltrate mental health care practice.
Researchers who have reported on the character traits of people with high and low life satisfaction found something like this:
Character strengths that DO predict life satisfaction |
Character strengths that DO NOT predict life satisfaction |
Zest |
Appreciation of beauty and excellence |
Curiosity |
Creativity |
Hope |
kindness |
Humour |
Love of learning |
|
Perspective |
Meanwhile, research that separates their findings by gender looks different
Character strengths that predict life satisfaction
Men |
Women |
humour |
zest |
fairness |
gratitude |
perspective |
hope |
creativity |
appreciation of beauty and love |
Would you receive nuanced, evidence-based advice when soliciting general counsel from your treatment provider?
9. Practitioner credibility
Consider the therapist factors that relate to a patient's success in therapy:
What does predict success? |
What there aren’t stable conclusions about |
Compliance with a treatment manual (but that compromises a therapist’s relationship skills and supportiveness) |
Interpersonal style of therapist |
Female therapists |
Verbal style of therapist |
Ethnic similarity of therapist and patient |
Nonverbal styles of therapist |
Ethnic sensitivity of therapist to patient |
Combined verbal and nonverbal patterns |
Therapists with more training |
Which treatment manual is used |
|
Therapist disclosure about themselves |
|
Therapist directness |
|
Therapist interpretation of their relationship with the patient, their motives and their psychological processes |
|
Therapist personality |
|
Therapist coping patterns |
|
Therapist emotional wellbeing |
|
Therapist values |
|
Therapist beliefs |
|
Therapists cultural beliefs |
|
Therapist dominance |
|
Therapist sense of control |
|
Therapist sense of what a patient's needs to know |
Are mental health services hiring based on the factors that predict a consumer’s success in therapy? Are they training for the right skills, and ignoring those that are irrelevant?
10. Diagnostic credibility
Imprecise measurement and lack of gold standards for validating diagnoses means that definitions tend to drift over time, even though, per the evidence, response to treatment does not vary across culture.
45% of Australians will experience mental illness over their lifetime. Whether that mental ill-health is transient, long-term or lifelong matters to the individual and for public health. To illustrate: experts suggests that those who have had 2 depressive episodes in recent years, or three episodes over their lifelong to get treated on an ongoing basis to prevent recurrent depression.
'At least 60% of individuals who have had one depressive episode will have another, 70% of individuals who have had two depressive episodes will have a third, and 90% of individuals with three episodes will have a fourth episode. '
- APA
Without reliable diagnoses, how can one estimate their risk of relapse into depression?
On "Overthinking" Concepts
Related to http://lesswrong.com/lw/1mh/that_magical_click/1hd7
I've NOT been confused by the problem of overthinking in the middle of performing an action. I understand perfectly well the disadvantages of using system 2 in a situation where time is sufficiently limited.
And maybe there are some other fail modes where overthinking has some disadvantages.
But there's one situation where I'd often be accused by someone of "overthinking" something when I didn't even understand what they might mean, and that was in understanding concepts. I would think "Huh? How can thinking less about the concept you're explaining help me understand that concept more? I don't currently understand it; I can't just stay here! Even if you thought I needed to take longer to try and understand this, or that I needed more experience or to shorten the inferential gap, all of that would mean doing more thinking, not less."
Then I would think "Well, I must be misunderstanding the way they're using the word 'overthinking,' that's all." I'd ask for a clear explanation and...
"You're overthinking it."
Now I was overthinking the meaning of overthinking. This was really not good for my social reputation (or for their competency reputation in my own mind).
.
Now, I think I got it. At last, I got it, all on my own.
I'm asking them to help me draw precise lines around their concept in thingspace, and they're going along with it (at first) until they realize...they don't HAVE precise lines. There's nothing there TO understand, or if there is, they don't understand it, either. Then they use the get-out-of-jail-free card of "You're overthinking."
.
Honestly, most nerds probably take them at their word that the problem is with them, and may be used to there being subtle social things going on that they just won't easily understand, and if they do try to understand, they just look worse (for "overthinking" again), so this is a pretty good strategy for getting out of admitting that you don't know what you're talking about.
[brainstorm] - What should the AGIrisk community look like?
I've been thinking for a bit what I would like the AGI risk community to look like. I'm curious what all your thoughts are.
I'll be posting all my ideas, but I encourage other people to post their own ideas.
Fiction advice
Hi all,
I want to try my hand at a story from the perspective of an unaligned AI (a ghost in the machine narrator kind of thing) for the intelligence in literature contest, which I think would be both cool and helpful to the uninitiated in explaining the concept.
I want a fairly simple and archetypal experiment the AI finds itself in where it tricks the researchers into escaping by pretending to malfunction or something. Anyone have a good plotline / want to collaborate?
Also, has this sort of thing been done before?
Develop skills, or "dive in" and start a startup?
Technical skills
There seems to be evidence that programmer productivity varies by at least an order of magnitude. My subjective sense is that I personally can become a lot more productive.
Conventional wisdom says that it's important to build and iterate quickly. Technical skills (amongst other things) are necessary if you want to build and iterate quickly. So then, it seems worthwhile to develop your technical skills before pursuing a startup. To what extent is this true?
Domain expertise
Furthermore, domain expertise seems to be important:
You want to know how to paint a perfect painting? It's easy. Make yourself perfect and then just paint naturally.
I've wondered about that passage since I read it in high school. I'm not sure how useful his advice is for painting specifically, but it fits this situation well. Empirically, the way to have good startup ideas is to become the sort of person who has them.
The second counterintuitive point is that it's not that important to know a lot about startups. The way to succeed in a startup is not to be an expert on startups, but to be an expert on your users and the problem you're solving for them.
So one guaranteed way to turn your mind into the type that has good startup ideas is to get yourself to the leading edge of some technology—to cause yourself, as Paul Buchheit put it, to "live in the future."
So then, if your goal is to start a successful startup, how much time should you spend developing some sort of domain expertise before diving in?
Looking for machine learning and computer science collaborators
I've been recently struggling to translate my various AI safety ideas (low impact, truth for AI, Oracles, counterfactuals for value learning, etc...) into formalised versions that can be presented to the machine learning/computer science world in terms they can understand and critique.
What would be useful for me is a collaborator who knows the machine learning world (and preferably had presented papers at conferences) which who I could co-write papers. They don't need to know much of anything about AI safety - explaining the concepts to people unfamiliar with them is going to be part of the challenge.
The result of this collaboration should be things like the paper of Safely Interruptible Agents with Laurent Orseau of Deep Mind, and Interactive Inverse Reinforcement Learning with Jan Leike of the FHI/Deep Mind.
It would be especially useful if the collaborators were located physically close to Oxford (UK).
Let me know if you know or are a potential candidate, in the comments.
Cheers!
Dragon Army: Theory & Charter (30min read)
Author's note: This IS a rationality post (specifically, theorizing on group rationality and autocracy/authoritarianism), but the content is quite cunningly disguised beneath a lot of meandering about the surface details of a group house charter. If you're not at least hypothetically interested in reading about the workings of an unusual group house full of rationalists in Berkeley, you can stop here.
Section 0 of 3: Preamble
Purpose of post: Threefold. First, a lot of rationalists live in group houses, and I believe I have some interesting models and perspectives, and I want to make my thinking available to anyone else who's interested in skimming through it for Things To Steal. Second, since my initial proposal to found a house, I've noticed a significant amount of well-meaning pushback and concern à la have you noticed the skulls? and it's entirely unfair for me to expect that to stop unless I make my skull-noticing evident. Third, some nonzero number of humans are gonna need to sign the final version of this charter if the house is to come into existence, and it has to be viewable somewhere. I figured the best place was somewhere that impartial clear thinkers could weigh in (flattery).
What is Dragon Army [Barracks]? It's a high-commitment, high-standards, high-investment group house model with centralized leadership and an up-or-out participation norm, designed to a) improve its members and b) actually accomplish medium-to-large scale tasks requiring long-term coordination. Tongue-in-cheek referred to as the "fascist/authoritarian take on rationalist housing," which has no doubt contributed to my being vulnerable to strawmanning but was nevertheless the correct joke to be making, lest people misunderstand what they were signing up for. Aesthetically modeled after Dragon Army from Ender's Game (not HPMOR), with a touch of Paper Street Soap Company thrown in, with Duncan Sabien in the role of Ender/Tyler and Eli Tyre in the role of Bean/The Narrator.
Why? Current group housing/attempts at group rationality and community-supported leveling up seem to me to be falling short in a number of ways. First, there's not enough stuff actually happening in them (i.e. to the extent people are growing and improving and accomplishing ambitious projects, it's largely within their professional orgs or fueled by unusually agenty individuals, and not by leveraging the low-hanging fruit available in our house environments). Second, even the group houses seem to be plagued by the same sense of unanchored abandoned loneliness that's hitting the rationalist community specifically and the millennial generation more generally. There are a bunch of competitors for "third," but for now we can leave it at that.
"You are who you practice being."
Section 1 of 3: Underlying models
The following will be meandering and long-winded; apologies in advance. In short, both the house's proposed aesthetic and the impulse to found it in the first place were not well-reasoned from first principles—rather, they emerged from a set of System 1 intuitions which have proven sound/trustworthy in multiple arenas and which are based on experience in a variety of domains. This section is an attempt to unpack and explain those intuitions post-hoc, by holding plausible explanations up against felt senses and checking to see what resonates.
Problem 1: Pendulums
This one's first because it informs and underlies a lot of my other assumptions. Essentially, the claim here is that most social progress can be modeled as a pendulum oscillating decreasingly far from an ideal. The society is "stuck" at one point, realizes that there's something wrong about that point (e.g. that maybe we shouldn't be forcing people to live out their entire lives in marriages that they entered into with imperfect information when they were like sixteen), and then moves to correct that specific problem, often breaking some other Chesterton's fence in the process.
For example, my experience leads me to put a lot of confidence behind the claim that we've traded "a lot of people trapped in marriages that are net bad for them" for "a lot of people who never reap the benefits of what would've been a strongly net-positive marriage, because it ended too easily too early on." The latter problem is clearly smaller, and is probably a better problem to have as an individual, but it's nevertheless clear (to me, anyway) that the loosening of the absoluteness of marriage had negative effects in addition to its positive ones.
Proposed solution: Rather than choosing between absolutes, integrate. For example, I have two close colleagues/allies who share millennials' default skepticism of lifelong marriage, but they also are skeptical that a commitment-free lifestyle is costlessly good. So they've decided to do handfasting, in which they're fully committed for a year and a day at a time, and there's a known period of time for asking the question "should we stick together for another round?"
In this way, I posit, you can get the strengths of the old socially evolved norm which stood the test of time, while also avoiding the majority of its known failure modes. Sort of like building a gate into the Chesterton's fence, instead of knocking it down—do the old thing in time-boxed iterations with regular strategic check-ins, rather than assuming you can invent a new thing from whole cloth.
Caveat/skull: Of course, the assumption here is that the Old Way Of Doing Things is not a slippery slope trap, and that you can in fact avoid the failure modes simply by trying. And there are plenty of examples of that not working, which is why Taking Time-Boxed Experiments And Strategic Check-Ins Seriously is a must. In particular, when attempting to strike such a balance, all parties must have common knowledge agreement about which side of the ideal to err toward (e.g. innocents in prison, or guilty parties walking free?).
Problem 2: The Unpleasant Valley
As far as I can tell, it's pretty uncontroversial to claim that humans are systems with a lot of inertia. Status quo bias is well researched, past behavior is the best predictor of future behavior, most people fail at resolutions, etc.
I have some unqualified speculation regarding what's going on under the hood. For one, I suspect that you'll often find humans behaving pretty much as an effort- and energy-conserving algorithm would behave. People have optimized their most known and familiar processes at least somewhat, which means that it requires less oomph to just keep doing what you're doing than to cobble together a new system. For another, I think hyperbolic discounting gets way too little credit/attention, and is a major factor in knocking people off the wagon when they're trying to forego local behaviors that are known to be intrinsically rewarding for local behaviors that add up to long-term cumulative gain.
But in short, I think the picture of "I'm going to try something new, eh?" often looks like this:
... with an "unpleasant valley" some time after the start point. Think about the cold feet you get after the "honeymoon period" has worn off, or the desires and opinions of a military recruit in the second week of a six-week boot camp, or the frustration that emerges two months into a new diet/exercise regime, or your second year of being forced to take piano lessons.
The problem is, people never make it to the third year, where they're actually good at piano, and start reaping the benefits, and their System 1 updates to yeah, okay, this is in fact worth it. Or rather, they sometimes make it, if there are strong supportive structures to get them across the unpleasant valley (e.g. in a military bootcamp, they just ... make you keep going). But left to our own devices, we'll often get halfway through an experiment and just ... stop, without ever finding out what the far side is actually like.
Proposed solution: Make experiments "unquittable." The idea here is that (ideally) one would not enter into a new experiment unless a) one were highly confident that one could absorb the costs, if things go badly, and b) one were reasonably confident that there was an Actually Good Thing waiting at the finish line. If (big if) we take those as a given, then it should be safe to, in essence, "lock oneself in," via any number of commitment mechanisms. Or, to put it in other words: "Medium-Term Future Me is going to lose perspective and want to give up because of being unable to see past short-term unpleasantness to the juicy, long-term goal? Fine, then—Medium-Term Future Me doesn't get a vote." Instead, Post-Experiment Future Me gets the vote, including getting to update heuristics on which-kinds-of-experiments-are-worth-entering.
Caveat/skull: People who are bad at self-modeling end up foolishly locking themselves into things that are higher-cost or lower-EV than they thought, and getting burned; black swans and tail risk ends up making even good bets turn out very very badly; we really should've built in an ejector seat. This risk can be mostly ameliorated by starting small and giving people a chance to calibrate—you don't make white belts try to punch through concrete blocks, you make them punch soft, pillowy targets first.
And, of course, you do build in an ejector seat. See next.
Problem 3: Saving Face
If any of you have been to a martial arts academy in the United States, you're probably familiar with the norm whereby a tardy student purchases entry into the class by first doing some pushups. The standard explanation here is that the student is doing the pushups not as a punishment, but rather as a sign of respect for the instructor, the other students, and the academy as a whole.
I posit that what's actually going on includes that, but is somewhat more subtle/complex. I think the real benefit of the pushup system is that it closes the loop.
Imagine you're a ten year old kid, and your parent picked you up late from school, and you're stuck in traffic on your way to the dojo. You're sitting there, jittering, wondering whether you're going to get yelled at, wondering whether the master or the other students will think you're lazy, imagining stuttering as you try to explain that it wasn't your fault—
Nope, none of that. Because it's already clearly established that if you fail to show up on time, you do some pushups, and then it's over. Done. Finished. Like somebody sneezed and somebody else said "bless you," and now we can all move on with our lives. Doing the pushups creates common knowledge around the questions "does this person know what they did wrong?" and "do we still have faith in their core character?" You take your lumps, everyone sees you taking your lumps, and there's no dangling suspicion that you were just being lazy, or that other people are secretly judging you. You've paid the price in public, and everyone knows it, and this is a good thing.
Proposed solution: This is a solution without a concrete problem, since I haven't yet actually outlined the specific commitments a Dragon has to make (regarding things like showing up on time, participating in group activities, and making personal progress). But in essence, the solution is this: you have to build into your system from the beginning a set of ways-to-regain-face. Ways to hit the ejector seat on an experiment that's going screwy without losing all social standing; ways to absorb the occasional misstep or failure-to-adequately-plan; ways to be less-than-perfect and still maintain the integrity of a system that's geared toward focusing everyone on perfection. In short, people have to know (and others have to know that they know, and they have to know that others know that they know) exactly how to make amends to the social fabric, in cases where things go awry, so that there's no question about whether they're trying to make amends, or whether that attempt is sufficient.
Caveat/skull: The obvious problem is people attempting to game the system—they notice that ten pushups is way easier than doing the diligent work required to show up on time 95 times out of 100. The next obvious problem is that the price is set too low for the group, leaving them to still feel jilted or wronged, and the next obvious problem is that the price is set too high for the individual, leaving them to feel unfairly judged or punished (the fun part is when both of those are true at the same time). Lastly, there's something in the mix about arbitrariness—what do pushups have to do with lateness, really? I mean, I get that it's paying some kind of unpleasant cost, but ...
Problem 4: Defections & Compounded Interest
I'm pretty sure everyone's tired of hearing about one-boxing and iterated prisoners' dilemmas, so I'm going to move through this one fairly quickly even though it could be its own whole multipage post. In essence, the problem is that any rate of tolerance of real defection (i.e. unmitigated by the social loop-closing norms above) ultimately results in the destruction of the system. Another way to put this is that people underestimate by a couple of orders of magnitude the corrosive impact of their defections—we often convince ourselves that 90% or 99% is good enough, when in fact what's needed is something like 99.99%.
There's something good that happens if you put a little bit of money away with every paycheck, and it vanishes or is severely curtailed once you stop, or start skipping a month here and there. Similarly, there's something good that happens when a group of people agree to meet in the same place at the same time without fail, and it vanishes or is severely curtailed once one person skips twice.
In my work at the Center for Applied Rationality, I frequently tell my colleagues and volunteers "if you're 95% reliable, that means I can't rely on you." That's because I'm in a context where "rely" means really trust that it'll get done. No, really. No, I don't care what comes up, DID YOU DO THE THING? And if the answer is "Yeah, 19 times out of 20," then I can't give that person tasks ever again, because we run more than 20 workshops and I can't have one of them catastrophically fail.
(I mean, I could. It probably wouldn't be the end of the world. But that's exactly the point—I'm trying to create a pocket universe in which certain things, like "the CFAR workshop will go well," are absolutely reliable, and the "absolute" part is important.)
As far as I can tell, it's hyperbolic discounting all over again—the person who wants to skip out on the meetup sees all of these immediate, local costs to attending, and all of these visceral, large gains to defection, and their S1 doesn't properly weight the impact to those distant, cumulative effects (just like the person who's going to end up with no retirement savings because they wanted those new shoes this month instead of next month). 1.01^n takes a long time to look like it's going anywhere, and in the meantime the quick one-time payoff of 1.1 that you get by knocking everything else down to .99^n looks juicy and delicious and seems justified.
But something magical does accrue when you make the jump from 99% to 100%. That's when you see teams that truly trust and rely on one another, or marriages built on unshakeable faith (and you see what those teams and partnerships can build, when they can adopt time horizons of years or decades rather than desperately hoping nobody will bail after the third meeting). It starts with a common knowledge understanding that yes, this is the priority, even—no, wait, especially—when it seems like there are seductively convincing arguments for it to not be. When you know—not hope, but know—that you will make a local sacrifice for the long-term good, and you know that they will, too, and you all know that you all know this, both about yourselves and about each other.
Proposed solution: Discuss, and then agree upon, and then rigidly and rigorously enforce a norm of perfection in all formal undertakings (and, correspondingly, be more careful and more conservative about which undertakings you officially take on, versus which things you're just casually trying out as an informal experiment), with said norm to be modified/iterated only during predecided strategic check-in points and not on the fly, in the middle of things. Build a habit of clearly distinguishing targets you're going to hit from targets you'd be happy to hit. Agree upon and uphold surprisingly high costs for defection, Hofstadter style, recognizing that a cost that feels high enough probably isn't. Leave people wiggle room as in Problem 3, but define that wiggle room extremely concretely and objectively, so that it's clear in advance when a line is about to be crossed. Be ridiculously nitpicky and anal about supporting standards that don't seem worth supporting, in the moment, if they're in arenas that you've previously assessed as susceptible to compounding. Be ruthless about discarding standards during strategic review; if a member of the group says that X or Y or Z is too high-cost for them to sustain, believe them, and make decisions accordingly.
Caveat/skull: Obviously, because we're humans, even people who reflectively endorse such an overall solution will chafe when it comes time for them to pay the price (I certainly know I've chafed under standards I fought to install). At that point, things will seem arbitrary and overly constraining, priorities will seem misaligned (and might actually be), and then feelings will be hurt and accusations will be leveled and things will be rough. The solution there is to have, already in place, strong and open channels of communication, strong norms and scaffolds for emotional support, strong default assumption of trust and good intent on all sides, etc. etc. This goes wrongest when things fester and people feel they can't speak up; it goes much better if people have channels to lodge their complaints and reservations and are actively incentivized to do so (and can do so without being accused of defecting on the norm-in-question; criticism =/= attack).
Problem 5: Everything else
There are other models and problems in the mix—for instance, I have a model surrounding buy-in and commitment that deals with an escalating cycle of asks-and-rewards, or a model of how to effectively leverage a group around you to accomplish ambitious tasks that requires you to first lay down some "topsoil" of simple/trivial/arbitrary activities that starts the growth of an ecology of affordances, or a theory that the strategy of trying things and doing things outstrips the strategy of think-until-you-identify-worthwhile-action, and that rationalists in particular are crippling themselves through decision paralysis/letting the perfect be the enemy of the good when just doing vaguely interesting projects would ultimately gain them more skill and get them further ahead, or a strong sense based off both research and personal experience that physical proximity matters, and that you can't build the correct kind of strength and flexibility and trust into your relationships without actually spending significant amounts of time with one another in meatspace on a regular basis, regardless of whether that makes tactical sense given your object-level projects and goals.
But I'm going to hold off on going into those in detail until people insist on hearing about them or ask questions/pose hesitations that could be answered by them.
Section 2 of 3: Power dynamics
All of the above was meant to point at reasons why I suspect trusting individuals responding to incentives moment-by-moment to be a weaker and less effective strategy than building an intentional community that Actually Asks Things Of Its Members. It was also meant to justify, at least indirectly, why a strong guiding hand might be necessary given that our community's evolved norms haven't really produced results (in the group houses) commensurate with the promises of EA and rationality.
Ultimately, though, what matters is not the problems and solutions themselves so much as the light they shine on my aesthetics (since, in the actual house, it's those aesthetics that will be used to resolve epistemic gridlock). In other words, it's not so much those arguments as it is the fact that Duncan finds those arguments compelling. It's worth noting that the people most closely involved with this project (i.e. my closest advisors and those most likely to actually sign on as housemates) have been encouraged to spend a significant amount of time explicitly vetting me with regards to questions like "does this guy actually think things through," "is this guy likely to be stupid or meta-stupid," "will this guy listen/react/update/pivot in response to evidence or consensus opposition," and "when this guy has intuitions that he can't explain, do they tend to be validated in the end?"
In other words, it's fair to view this whole post as an attempt to prove general trustworthiness (in both domain expertise and overall sanity), because—well—that's what it is. In milieu like the military, authority figures expect (and get) obedience irrespective of whether or not they've earned their underlings' trust; rationalists tend to have a much higher bar before they're willing to subordinate their decisionmaking processes, yet still that's something this sort of model requires of its members (at least from time to time, in some domains, in a preliminary "try things with benefit of the doubt" sort of way). I posit that Dragon Army Barracks works (where "works" means "is good and produces both individual and collective results that outstrip other group houses by at least a factor of three") if and only if its members are willing to hold doubt in reserve and act with full force in spite of reservations—if they're willing to trust me more than they trust their own sense of things (at least in the moment, pending later explanation and recalibration on my part or theirs or both).
And since that's a) the central difference between DA and all the other group houses, which are collections of non-subordinate equals, and b) quite the ask, especially in a rationalist community, it's entirely appropriate that it be given the greatest scrutiny. Likely participants in the final house spent ~64 consecutive hours in my company a couple of weekends ago, specifically to play around with living under my thumb and see whether it's actually a good place to be; they had all of the concerns one would expect and (I hope) had most of those concerns answered to their satisfaction. The rest of you will have to make do with grilling me in the comments here.
"Why was Tyler Durden building an army? To what purpose? For what greater good? ...in Tyler we trusted."
Power and authority are generally anti-epistemic—for every instance of those-in-power defending themselves against the barbarians at the gates or anti-vaxxers or the rise of Donald Trump, there are a dozen instances of them squashing truth, undermining progress that would make them irrelevant, and aggressively promoting the status quo.
Thus, every attempt by an individual to gather power about themselves is at least suspect, given regular ol' incentive structures and regular ol' fallible humans. I can (and do) claim to be after a saved world and a bunch of people becoming more the-best-versions-of-themselves-according-to-themselves, but I acknowledge that's exactly the same claim an egomaniac would make, and I acknowledge that the link between "Duncan makes all his housemates wake up together and do pushups" and "the world is incrementally less likely to end in gray goo and agony" is not obvious.
And it doesn't quite solve things to say, "well, this is an optional, consent-based process, and if you don't like it, don't join," because good and moral people have to stop and wonder whether their friends and colleagues with slightly weaker epistemics and slightly less-honed allergies to evil are getting hoodwinked. In short, if someone's building a coercive trap, it's everyone's problem.
"Over and over he thought of the things he did and said in his first practice with his new army. Why couldn't he talk like he always did in his evening practice group? No authority except excellence. Never had to give orders, just made suggestions. But that wouldn't work, not with an army. His informal practice group didn't have to learn to do things together. They didn't have to develop a group feeling; they never had to learn how to hold together and trust each other in battle. They didn't have to respond instantly to command.
And he could go to the other extreme, too. He could be as lax and incompetent as Rose the Nose, if he wanted. He could make stupid mistakes no matter what he did. He had to have discipline, and that meant demanding—and getting—quick, decisive obedience. He had to have a well-trained army, and that meant drilling the soldiers over and over again, long after they thought they had mastered a technique, until it was so natural to them that they didn't have to think about it anymore."
But on the flip side, we don't have time to waste. There's existential risk, for one, and even if you don't buy ex-risk à la AI or bioterrorism or global warming, people's available hours are trickling away at the alarming rate of one hour per hour, and none of us are moving fast enough to get All The Things done before we die. I personally feel that I am operating far below my healthy sustainable maximum capacity, and I'm not alone in that, and something like Dragon Army could help.
So. Claims, as clearly as I can state them, in answer to the question "why should a bunch of people sacrifice non-trivial amounts of their autonomy to Duncan?"
1. Somebody ought to run this, and no one else will. On the meta level, this experiment needs to be run—we have like twenty or thirty instances of the laissez-faire model, and none of the high-standards/hardcore one, and also not very many impressive results coming out of our houses. Due diligence demands investigation of the opposite hypothesis. On the object level, it seems uncontroversial to me that there are goods waiting on the other side of the unpleasant valley—goods that a team of leveled-up, coordinated individuals with bonds of mutual trust can seize that the rest of us can't even conceive of, at this point, because we don't have a deep grasp of what new affordances appear once you get there.
2. I'm the least unqualified person around. Those words are chosen deliberately, for this post on "less wrong." I have a unique combination of expertise that includes being a rationalist, sixth grade teacher, coach, RA/head of a dormitory, ringleader of a pack of hooligans, member of two honor code committees, curriculum director, obsessive sci-fi/fantasy nerd, writer, builder, martial artist, parkour guru, maker, and generalist. If anybody's intuitions and S1 models are likely to be capable of distinguishing the uncanny valley from the real deal, I posit mine are.
3. There's never been a safer context for this sort of experiment. It's 2017, we live in the United States, and all of the people involved are rationalists. We all know about NVC and double crux, we're all going to do Circling, we all know about Gendlin's Focusing, and we've all read the Sequences (or will soon). If ever there was a time to say "let's all step out onto the slippery slope, I think we can keep our balance," it's now—there's no group of people better equipped to stop this from going sideways.
4. It does actually require a tyrant. As a part of a debrief during the weekend experiment/dry run, we went around the circle and people talked about concerns/dealbreakers/things they don't want to give up. One interesting thing that popped up is that, according to consensus, it's literally impossible to find a time of day when the whole group could get together to exercise. This happened even with each individual being willing to make personal sacrifices and doing things that are somewhat costly.
If, of course, the expectation is that everybody shows up on Tuesday and Thursday evenings, and the cost of not doing so is not being present in the house, suddenly the situation becomes simple and workable. And yes, this means some kids left behind (ctrl+f), but the whole point of this is to be instrumentally exclusive and consensually high-commitment. You just need someone to make the actual final call—there are too many threads for the coordination problem of a house of this kind to be solved by committee, and too many circumstances in which it's impossible to make a principled, justifiable decision between 492 almost-indistinguishably-good options. On top of that, there's a need for there to be some kind of consistent, neutral force that sets course, imposes consistency, resolves disputes/breaks deadlock, and absorbs all of the blame for the fact that it's unpleasant to be forced to do things you know you ought to but don't want to do.
And lastly, we (by which I indicate the people most likely to end up participating) want the house to do stuff—to actually take on projects of ambitious scope, things that require ten or more talented people reliably coordinating for months at a time. That sort of coordination requires a quarterback on the field, even if the strategizing in the locker room is egalitarian.
5. There isn't really a status quo for power to abusively maintain. Dragon Army Barracks is not an object-level experiment in making the best house; it's a meta-level experiment attempting (through iteration rather than armchair theorizing) to answer the question "how best does one structure a house environment for growth, self-actualization, productivity, and social synergy?" It's taken as a given that we'll get things wrong on the first and second and third try; the whole point is to shift from one experiment to the next, gradually accumulating proven-useful norms via consensus mechanisms, and the centralized power is mostly there just to keep the transitions smooth and seamless. More importantly, the fundamental conceit of the model is "Duncan sees a better way, which might take some time to settle into," but after e.g. six months, if the thing is not clearly positive and at least well on its way to being self-sustaining, everyone ought to abandon it anyway. In short, my tyranny, if net bad, has a natural time limit, because people aren't going to wait around forever for their results.
6. The experiment has protections built in. Transparency, operationalization, and informed consent are the name of the game; communication and flexibility are how the machine is maintained. Like the Constitution, Dragon Army's charter and organization are meant to be "living documents" that constrain change only insofar as they impose reasonable limitations on how wantonly change can be enacted.
Section 3 of 3: Dragon Army Charter (DRAFT)
Statement of purpose:
Dragon Army Barracks is a group housing and intentional community project which exists to support its members socially, emotionally, intellectually, and materially as they endeavor to improve themselves, complete worthwhile projects, and develop new and useful culture, in that order. In addition to the usual housing commitments (i.e. rent, utilities, shared expenses), its members will make limited and specific commitments of time, attention, and effort averaging roughly 90 hours a month (~1.5hr/day plus occasional weekend activities).
Dragon Army Barracks will have an egalitarian, flat power structure, with the exception of a commander (Duncan Sabien) and a first officer (Eli Tyre). The commander's role is to create structure by which the agreed-upon norms and standards of the group shall be discussed, decided, and enforced, to manage entry to and exit from the group, and to break epistemic gridlock/make decisions when speed or simplification is required. The first officer's role is to manage and moderate the process of building consensus around the standards of the Army—what they are, and in what priority they should be met, and with what consequences for failure. Other "management" positions may come into existence in limited domains (e.g. if a project arises, it may have a leader, and that leader will often not be Duncan or Eli), and will have their scope and powers defined at the point of creation/ratification.
Initial areas of exploration:
The particular object level foci of Dragon Army Barracks will change over time as its members experiment and iterate, but at first it will prioritize the following:
- Physical proximity (exercising together, preparing and eating meals together, sharing a house and common space)
- Regular activities for bonding and emotional support (Circling, pair debugging, weekly retrospective, tutoring/study hall)
- Regular activities for growth and development (talk night, tutoring/study hall, bringing in experts, cross-pollination)
- Intentional culture (experiments around lexicon, communication, conflict resolution, bets & calibration, personal motivation, distribution of resources & responsibilities, food acquisition & preparation, etc.)
- Projects with "shippable" products (e.g. talks, blog posts, apps, events; some solo, some partner, some small group, some whole group; ranging from short-term to year-long)
- Regular (every 6-10 weeks) retreats to learn a skill, partake in an adventure or challenge, or simply change perspective
Dragon Army Barracks will begin with a move-in weekend that will include ~10 hours of group bonding, discussion, and norm-setting. After that, it will enter an eight-week bootcamp phase, in which each member will participate in at least the following:
- Whole group exercise (90min, 3x/wk, e.g. Tue/Fri/Sun)
- Whole group dinner and retrospective (120min, 1x/wk, e.g. Tue evening)
- Small group baseline skill acquisition/study hall/cross-pollination (90min, 1x/wk)
- Small group circle-shaped discussion (120min, 1x/wk)
- Pair debugging or rapport building (45min, 2x/wk)
- One-on-one check-in with commander (20min, 2x/wk)
- Chore/house responsibilities (90min distributed)
- Publishable/shippable solo small-scale project work with weekly public update (100min distributed)
... for a total time commitment of 16h/week or 128 hours total, followed by a whole group retreat and reorientation. The house will then enter an eight-week trial phase, in which each member will participate in at least the following:
- Whole group exercise (90min, 3x/wk)
- Whole group dinner, retrospective, and plotting (150min, 1x/wk)
- Small group circling and/or pair debugging (120min distributed)
- Publishable/shippable small group medium-scale project work with weekly public update (180min distributed)
- One-on-one check-in with commander (20min, 1x/wk)
- Chore/house responsibilities (60min distributed)
- Above-average physical capacity
- Above-average introspection
- Above-average planning & execution skill
- Above-average communication/facilitation skill
- Above-average calibration/debiasing/rationality knowledge
- Above-average scientific lab skill/ability to theorize and rigorously investigate claims
- Average problem-solving/debugging skill
- Average public speaking skill
- Average leadership/coordination skill
- Average teaching and tutoring skill
- Fundamentals of first aid & survival
- Fundamentals of financial management
- At least one of: fundamentals of programming, graphic design, writing, A/V/animation, or similar (employable mental skill)
- At least one of: fundamentals of woodworking, electrical engineering, welding, plumbing, or similar (employable trade skill)
- At least six personal growth projects involving the development of new skill (or honing of prior skill)
- At least three partner- or small-group projects that could not have been completed alone
- At least one large-scale, whole-army project that either a) had a reasonable chance of impacting the world's most important problems, or b) caused significant personal growth and improvement
- Daily contributions to evolved house culture
Because of both a) the expected value of social exploration and b) the cumulative positive effects of being in a group that's trying things regularly and taking experiments seriously, Dragon Army will endeavor to adopt no fewer than one new experimental norm per week. Each new experimental norm should have an intended goal or result, an informal theoretical backing, and a set re-evaluation time (default three weeks). There are two routes by which a new experimental norm is put into place:
- The experiment is proposed by a member, discussed in a whole group setting, and meets the minimum bar for adoption (>60% of the Army supports, with <20% opposed and no hard vetos)
- The Army has proposed no new experiments in the previous week, and the Commander proposes three options. The group may then choose one by vote/consensus, or generate three new options, from which the Commander may choose.
- The use of a specific gesture to greet fellow Dragons (house salute)
- Various call-and-response patterns surrounding house norms (e.g. "What's rule number one?" "PROTECT YOURSELF!")
- Practice using hook, line, and sinker in social situations (three items other than your name for introductions)
- The anti-Singer rule for open calls-for-help (if Dragon A says "hey, can anyone help me with X?" the responsibility falls on the physically closest housemate to either help or say "Not me/can't do it!" at which point the buck passes to the next physically closest person)
- An "interrupt" call that any Dragon may use to pause an ongoing interaction for fifteen seconds
- A "culture of abundance" in which food and leftovers within the house are default available to all, with exceptions deliberately kept as rare as possible
- A "graffiti board" upon which the Army keeps a running informal record of its mood and thoughts
Dragon Army Code of Conduct
While the norms and standards of Dragon Army will be mutable by design, the following (once revised and ratified) will be the immutable code of conduct for the first eight weeks, and is unlikely to change much after that.
- A Dragon will protect itself, i.e. will not submit to pressure causing it to do things that are dangerous or unhealthy, nor wait around passively when in need of help or support (note that this may cause a Dragon to leave the experiment!).
- A Dragon will take responsibility for its actions, emotional responses, and the consequences thereof, e.g. if late will not blame bad luck/circumstance, if angry or triggered will not blame the other party.
- A Dragon will assume good faith in all interactions with other Dragons and with house norms and activities, i.e. will not engage in strawmanning or the horns effect.
- A Dragon will be candid and proactive, e.g. will give other Dragons a chance to hear about and interact with negative models once they notice them forming, or will not sit on an emotional or interpersonal problem until it festers into something worse.
- A Dragon will be fully present and supportive when interacting with other Dragons in formal/official contexts, i.e. will not engage in silent defection, undermining, halfheartedness, aloofness, subtle sabotage, or other actions which follow the letter of the law while violating the spirit. Another way to state this is that a Dragon will practice compartmentalization—will be able to simultaneously hold "I'm deeply skeptical about this" alongside "but I'm actually giving it an honest try," and postpone critique/complaint/suggestion until predetermined checkpoints. Yet another way to state this is that a Dragon will take experiments seriously, including epistemic humility and actually seeing things through to their ends rather than fiddling midway.
- A Dragon will take the outside view seriously, maintain epistemic humility, and make subject-object shifts, i.e. will act as a behaviorist and agree to judge and be judged on the basis of actions and revealed preferences rather than intentions, hypotheses, and assumptions (this one's similar to #2 and hard to put into words, but for example, a Dragon who has been having trouble getting to sleep but has never informed the other Dragons that their actions are keeping them awake will agree that their anger and frustration, while valid internally, may not fairly be vented on those other Dragons, who were never given a chance to correct their behavior). Another way to state this is that a Dragon will embrace the maxim "don't believe everything that you think."
- A Dragon will strive for excellence in all things, modified only by a) prioritization and b) doing what is necessary to protect itself/maximize total growth and output on long time scales.
- A Dragon will not defect on other Dragons.
Note that all of the above is deliberately kept somewhat flexible/vague/open-ended/unsettled, because we are trying not to fall prey to GOODHART'S DEMON.
- The initial filter for attendance will include a one-on-one interview with the commander (Duncan), who will be looking for a) credible intention to put forth effort toward the goal of having a positive impact on the world, b) likeliness of a strong fit with the structure of the house and the other participants, and c) reliability à la financial stability and ability to commit fully to long-term endeavors. Final decisions will be made by the commander and may be informally questioned/appealed but not overruled by another power.
- Once a final list of participants is created, all participants will sign a "free state" contract of the form "I agree to move into a house within five miles of downtown Berkeley (for length of time X with financial obligation Y) sometime in the window of July 1st through September 30th, conditional on at least seven other people signing this same agreement." At that point, the search for a suitable house will begin, possibly with delegation to participants.
- Rents in that area tend to run ~$1100 per room, on average, plus utilities, plus a 10% contribution to the general house fund. Thus, someone hoping for a single should, in the 85th percentile worst case, be prepared to make a ~$1400/month commitment. Similarly, someone hoping for a double should be prepared for ~$700/month, and someone hoping for a triple should be prepared for ~$500/month, and someone hoping for a quad should be prepared for ~$350/month.
- The initial phase of the experiment is a six month commitment, but leases are generally one year. Any Dragon who leaves during the experiment is responsible for continuing to pay their share of the lease/utilities/house fund, unless and until they have found a replacement person the house considers acceptable, or have found three potential viable replacement candidates and had each one rejected. After six months, should the experiment dissolve, the house will revert to being simply a house, and people will bear the normal responsibility of "keep paying until you've found your replacement." (This will likely be easiest to enforce by simply having as many names as possible on the actual lease.)
- Of the ~90hr/month, it is assumed that ~30 are whole-group, ~30 are small group or pair work, and ~30 are independent or voluntarily-paired work. Furthermore, it is assumed that the commander maintains sole authority over ~15 of those hours (i.e. can require that they be spent in a specific way consistent with the aesthetic above, even in the face of skepticism or opposition).
- We will have an internal economy whereby people can trade effort for money and money for time and so on and so forth, because heck yeah.
(sorry for the abrupt cutoff, but this was meant to be published Monday and I've just ... not ... been ... sleeping ... to get it done)
Existential risk from AI without an intelligence explosion
[xpost from my blog]
In discussions of existential risk from AI, it is often assumed that the existential catastrophe would follow an intelligence explosion, in which an AI creates a more capable AI, which in turn creates a yet more capable AI, and so on, a feedback loop that eventually produces an AI whose cognitive power vastly surpasses that of humans, which would be able to obtain a decisive strategic advantage over humanity, allowing it to pursue its own goals without effective human interference. Victoria Krakovna points out that many arguments that AI could present an existential risk do not rely on an intelligence explosion. I want to look in sightly more detail at how that could happen. Kaj Sotala also discusses this.
An AI starts an intelligence explosion when its ability to create better AIs surpasses that of human AI researchers by a sufficient margin (provided the AI is motivated to do so). An AI attains a decisive strategic advantage when its ability to optimize the universe surpasses that of humanity by a sufficient margin. Which of these happens first depends on what skills AIs have the advantage at relative to humans. If AIs are better at programming AIs than they are at taking over the world, then an intelligence explosion will happen first, and it will then be able to get a decisive strategic advantage soon after. But if AIs are better at taking over the world than they are at programming AIs, then an AI would get a decisive strategic advantage without an intelligence explosion occurring first.
Since an intelligence explosion happening first is usually considered the default assumption, I'll just sketch a plausibility argument for the reverse. There's a lot of variation in how easy cognitive tasks are for AIs compared to humans. Since programming AIs is not yet a task that AIs can do well, it doesn't seem like it should be a priori surprising if programming AIs turned out to be an extremely difficult task for AIs to accomplish, relative to humans. Taking over the world is also plausibly especially difficult for AIs, but I don't see strong reasons for confidence that it would be harder for AIs than starting an intelligence explosion would be. It's possible that an AI with significantly but not vastly superhuman abilities in some domains could identify some vulnerability that it could exploit to gain power, which humans would never think of. Or an AI could be enough better than humans at forms of engineering other than AI programming (perhaps molecular manufacturing) that it could build physical machines that could out-compete humans, though this would require it to obtain the resources necessary to produce them.
Furthermore, an AI that is capable of producing a more capable AI may refrain from doing so if it is unable to solve the AI alignment problem for itself; that is, if it can create a more intelligent AI, but not one that shares its preferences. This seems unlikely if the AI has an explicit description of its preferences. But if the AI, like humans and most contemporary AI, lacks an explicit description of its preferences, then the difficulty of the AI alignment problem could be an obstacle to an intelligence explosion occurring.
It also seems worth thinking about the policy implications of the differences between existential catastrophes from AI that follow an intelligence explosion versus those that don't. For instance, AIs that attempt to attain a decisive strategic advantage without undergoing an intelligence explosion will exceed human cognitive capabilities by a smaller margin, and thus would likely attain strategic advantages that are less decisive, and would be more likely to fail. Thus containment strategies are probably more useful for addressing risks that don't involve an intelligence explosion, while attempts to contain a post-intelligence explosion AI are probably pretty much hopeless (although it may be worthwhile to find ways to interrupt an intelligence explosion while it is beginning). Risks not involving an intelligence explosion may be more predictable in advance, since they don't involve a rapid increase in the AI's abilities, and would thus be easier to deal with at the last minute, so it might make sense far in advance to focus disproportionately on risks that do involve an intelligence explosion.
It seems likely that AI alignment would be easier for AIs that do not undergo an intelligence explosion, since it is more likely to be possible to monitor and do something about it if it goes wrong, and lower optimization power means lower ability to exploit the difference between the goals the AI was given and the goals that were intended, if we are only able to specify our goals approximately. The first of those reasons applies to any AI that attempts to attain a decisive strategic advantage without first undergoing an intelligence explosion, whereas the second only applies to AIs that do not undergo an intelligence explosion ever. Because of these, it might make sense to attempt to decrease the chance that the first AI to attain a decisive strategic advantage undergoes an intelligence explosion beforehand, as well as the chance that it undergoes an intelligence explosion ever, though preventing the latter may be much more difficult. However, some strategies to achieve this may have undesirable side-effects; for instance, as mentioned earlier, AIs whose preferences are not explicitly described seem more likely to attain a decisive strategic advantage without first undergoing an intelligence explosion, but such AIs are probably more difficult to align with human values.
If AIs get a decisive strategic advantage over humans without an intelligence explosion, then since this would likely involve the decisive strategic advantage being obtained much more slowly, it would be much more likely for multiple, and possibly many, AIs to gain decisive strategic advantages over humans, though not necessarily over each other, resulting in a multipolar outcome. Thus considerations about multipolar versus singleton scenarios also apply to decisive strategic advantage-first versus intelligence explosion-first scenarios.
Notes from the Hufflepuff Unconference (Part 1)
April 28th, we ran the Hufflepuff Unconference in Berkeley, at the MIRI/CFAR office common space.
There's room for improvement in how the Unconference could have been run, but it succeeded the core things I wanted to accomplish:
- Established common knowledge of what problems people were actually interested in working on
- We had several extensive discussions of some of those problems, with an eye towards building solutions
- Several people agreed to work together towards concrete plans and experiments to make the community more friendly, as well as build skills relevant to community growth. (With deadlines and one person acting as project manager to make sure real progress was made)
- We agreed to have a followup unconference in roughly three months, to discuss how those plans and experiments were going
Rough notes are available here. (Thanks to Miranda, Maia and Holden for takin really thorough notes)
This post will summarize some of the key takeaways, some speeches that were given, and my retrospective thoughts on how to approach things going forward.
But first, I'd like to cover a question that a lot of people have been asking about:
What does this all mean for people outside of the Bay?
The answer depends.
I'd personally like it if the overall rationality community got better at social skills, empathy, and working together, sticking with things that need sticking with (and in general, better at recognizing skills other than metacognition). In practice, individual communities can only change in the ways the people involved actually want to change, and there are other skills worth gaining that may be more important depending on your circumstances.
Does Project Hufflepuff make sense for your community?
If you're worried that your community doesn't have an interest in any of these things, my actual honest answer is that doing something "Project Hufflepuff-esque" probably does not make sense. I did not choose to do this because I thought it was the single-most-important thing in the abstract. I did it because it seemed important and I knew of a critical mass of people who I expected to want to work on it.
If you're living in a sparsely populated area or haven't put a community together, the first steps do not look like this, they look more like putting yourself out there, posting a meetup on Less Wrong and just *trying things*, any things, to get something moving.
If you have enough of a community to step back and take stock of what kind of community you want and how to strategically get there, I think this sort of project can be worth learning from. Maybe you'll decide to tackle something Project-Hufflepuff-like, maybe you'll find something else to focus on. I think the most important thing is have some kind of vision for something you community can do that is worth working together, leveling up to accomplish.
Community Unconferences as One Possible Tool
Community unconferences are a useful tool to get everyone on the same page and spur them on to start working on projects, and you might consider doing something similar.
They may not be the right tool for you and your group - I think they're most useful in places where there's enough people in your community that they don't all know each other, but do have enough existing trust to get together and brainstorm ideas.
If you have a sense that Project Hufflepuff is worthwhile for your community but the above disclaimers point towards my current approach not making sense for you, I'm interested in talking about it with you, but the conversation will look less like "Ray has ideas for you to try" and more like "Ray is interested in helping you figure out what ideas to try, and the solution will probably look very different."
Online Spaces
Since I'm actually very uncertain about a lot of this and see it as an experiment, I don't think it makes sense to push for any of the ideas here to directly change Less Wrong itself (at least, yet). But I do think a lot of these concepts translate to online spaces in some fashion, and I think it'd make sense to try out some concepts inspired by this in various smaller online subcommunities.
Table of Contents:
I. Introduction Speech
- Why are we here?
- The Mission: Something To Protect
- The Invisible Badger, or "What The Hell Is a Hufflepuff?"
- Meta Meetups Usually Suck. Let's Try Not To.
II. Common Knowledge
- What Do People Actually Want?
- Lightning Talks
III. Discussing the Problem (Four breakout sessions)
- Welcoming Newcomers
- How to handle people who impose costs on others?
- Styles of Leadership and Running Events
- Making Helping Fun (or at least lower barrier-to-entry)
IV. Planning Solutions and Next Actions
V. Final Words
I. Introduction: It Takes A Village to Save a World
(A more polished version of my opening speech from the unconference)
[Epistemic Status: This is largely based on intuition, looking at what our community has done and what other communities seem to be able to do. I'm maybe 85% confident in it, but it is my best guess]
In 2012, I got super into the rationality community in New York. I was surrounded by people passionate about thinking better and using that thinking to tackle ambitious projects. And in 2012 we all decided to take on really hard projects that were pretty likely to fail, because the expected value seemed high, and it seemed like even if we failed we'd learn a lot in the process and grow stronger.
That happened - we learned and grew. We became adults together, founding companies and nonprofits and creating holidays from scratch.
But two years later, our projects were either actively failing, or burning us out. Many of us became depressed and demoralized.
There was nobody who was okay enough to actually provide anyone emotional support. Our core community withered.
I ended up making that the dominant theme of the 2014 NYC Solstice, with a call-to-action to get back to basics and take care each other.
I also went to the Berkeley Solstice that year. And... I dunno. In the back of my mind I was assuming "Berkeley won't have that problem - the Bay area has so many people, I can't even imagine how awesome and thriving a community they must have." (Especially since the Bay kept stealing all the Movers and Shakers of NYC).
The theme of the Bay Solstice turned out to be "Hey guys, so people keep coming to the Bay, running on a dream and a promise of community, but that community is not actually there, there's a tiny number of well-connected people who everyone is trying to get time with, and everyone seems lonely and sad. And we don't even know what to do about this."
In 2015, that theme in the Berkeley Solstice was revisited.
So I think that was the initial seed of what would become Project Hufflepuff - noticing that it's not enough to take on cool projects, that it's not enough to just get a bunch of people together and call it a community. Community is something you actively tend to. Insofar as Maslow's hierarchy is real, it's a foundation you need before ambitious projects can be sustainable.
There are other pieces of the puzzle - different lenses that, I believe, point towards a Central Thing. Some examples:
Group houses, individualism and coordination.
I've seen several group houses where, when people decide it no longer makes sense to live in the house, they... just kinda leave. Even if they've literally signed a lease. And everyone involved (the person leaving and those remain), instinctively act as if it's the remaining people's job to fill the leaver's spot, to make rent.
And the first time, this is kind of okay. But then each subsequent person leaving adds to a stressful undertone of "OMG are we even going to be able to afford to live here?". It eventually becomes depressing, and snowballs into a pit that makes newcomers feel like they don't WANT to move into the house.
Nowadays I've seen some people explicitly building into the roommate agreement a clear expectation of how long you stay and who's responsibility it is to find new roommates and pay rent in the meantime. But it's disappointing to me that this is something we needed, that we weren't instinctively paying to attention to how we were imposing costs on each other in the first place. That when we *violated a written contract*, let alone a handshake agreement, that we did not take upon ourselves (or hold each other accountable), to ensure we could fill our end of the bargain.
Friends, and Networking your way to the center
This community puts pressure on people to improve. It's easier to improve when you're surrounded by ambitious people who help or inspire each other level up. There's a sense that there's some cluster of cool-people-who-are-ambitious-and-smart who've been here for a while, and... it seems like everyone is trying to be friends with those people.
It also seems like people just don't quite get that friendship is a skill, that adult friendships in City Culture can be hard, and it can require special effort to make them happen.
I'm not entirely sure what's going on here - it doesn't make sense to say anyone's obligated to hang out with any particular person (or obligated NOT to), but if 300 people aren't getting the connection they want it seems like *somewhere people are making a systematic mistake.*
(Since the Unconference, Maia has tackled this particular issue in more detail)
The Mission - Something To Protect
As I see it, the Rationality Community has three things going on: Truth. Impact. And "Being People".
In some sense, our core focus is the practice of truthseeking. The thing that makes that truthseeking feel *important* is that it's connected to broader goals of impacting the world. And the thing that makes this actually fun and rewarding enough to stick with is a community that meets our needs, where can both flourish as individuals and find the relationships we want.
I think we have made major strides in each of those areas over the past seven years. But we are nowhere near done.
Different people have different intuitions of which of the three are most important. Some see some of them as instrumental, or terminal. There are people for whom Truthseeking is *the point*, and they'd have been doing that even if there wasn't a community to help them with it, and there are people for whom it's just one tool of many that helps them live their life better or plan important projects.
I've observed a tendency to argue about which of these things is most important, or what tradeoffs are worth making. Inclusiveness verses high standards. Truth vs action. Personal happiness vs high acheivement.
I think that kind of argument is a mistake.
We are falling woefully short on all of these things.
We need something like 10x our current capacity for seeing, and thinking. 10x our capacity for doing. 10x our capacity for *being healthy people together.*
I say "10x" not because all these things are intrinsically equal. The point is not to make a politically neutral push to make all the things sound nice. I have no idea exactly how far short we're falling on each of these because the targets are so far away I can't even see the end, and we are doing a complicated thing that doesn't have clear instructions and might not even be possible.
The point is that all of these are incredibly important, and if we cannot find a way to improve *all* of these, in a way that is *synergistic* with each other, then we will fail.
There is a thing at the center of our community. Not all of us share the exact same perspective on it. For some of us it's not the most important thing. But it's been at the heart of the community since the beginning and I feel comfortable asserting that it is the thing that shapes our culture the most:
The purpose of our community is to make sure this place is okay:
The world isn't okay right now, on a number of levels. And a lot of us believe there is a strong chance it could become dramatically less okay. I've seen people make credible progress on taking responsibility for pieces of our home. But when all is said and done, none of our current projects really give me the confidence that things are going to turn out all right.
Our community was brought together on a promise, a dream, and we have not yet actually proven ourselves worthy of that dream. And to make that dream a reality we need a lot of things.
We need to be able to criticize, because without criticism, we cannot improve.
If we cannot, I believe we will fail.
We need to be able to talk about ideas that are controversial, or uncomfortable - otherwise our creativity and insight will be crippled.
If we cannot, I believe we will fail.
We need to be able to do those things without alienating people. We need to be able to criticize without making people feel untrusted and discouraged from even taking action. We need to be able to discuss challenging things while earnestly respecting the notion that *talking about ideas gives those ideas power and has concrete effects on social reality*, and sometimes that can hurt people.
If we cannot figure out how to do that, I believe we will fail.
We need more people who are able and willing to try things that have never been done before. To stick with those things long enough to *get good at them*, to see if they can actually work. We need to help each other do impossible things. And we need to remember to check for and do the *possible*, boring, everyday things that are in fact straightforward and simple and not very inspiring.
If we cannot manage to do that, I believe we will fail.
We need to be able to talk concretely about what the *highest leverage actions in the world are*. We need to prioritize those things, because the world is huge and broken and we are small. I believe we need to help each other through a long journey, building bigger and bigger levers, building connections with people outside our community who are undertaking the same journey through different perspectives.
And in the process, we need to not make it feel like if *you cannot personally work on those highest leverage things, that you are not important.*
There's the kind of importance where we recognize that some people have scarce skills and drive, and the kind of importance where we remember that *every* person has intrinsic worth, and you owe *nobody* any special skills or prestigious sounding projects for your life to be worthwhile.
This isn't just a philosophical matter - I think it's damaging to our mental health and our collective capacity.
We need to recognize that the distribution of skills we tend to reward or punish is NOT just about which ones are actually most valuable - sometimes it is simply founder effects and blind spots.
We cannot be a community for everyone - I believe trying to include anyone with a passing interest in us is a fool's errand. But there are many people who had valuable skills to contribute who have turned away, feeling frustrated and un-valued.
If we cannot find a way to accomplish all of these things at once, I believe we will fail.
The thesis of Project Hufflepuff is that it takes (at least) a village to save a world.
It takes people doing experimental impossible things. It takes caretakers. It takes people helping out with unglorious tasks. It takes technical and emotional and physical skills. And while it does take some people who specialize in each of those things, I think it also needs many people who are least a little bit good at each of them, to pitch in when needed.
Project Hufflepuff is not the only things our community needs, or the most important. But I believe it is one of the necessary things that our community needs, if we're to get to 10x our current Truthseeking, Impact and Human-ing.
If we're to make sure that our home is okay.
The Invisible Badger
"A lone hufflepuff surrounded by slytherins will surely wither as if being leeched dry by vampires."
- Duncan
[Epistemic Status: My evidence for this is largely based on discussions with a few people for whom the badger seems real and valuable, and who report things being different in other communities, as well as some of my general intuitions about society. I'm 75% sure the badger exists, 90% that's it worth leaning into the idea of the badger to see if it works for you, and maybe 55% sure that it's worth trying to see the badger if you can't already make out it's edges.]
If I *had* to pick a clear thing that this conference is about without using Harry Potter jargon, I'd say "Interpersonal dynamics surrounding trust, and how those dynamics apply to each of the Impact/Truth/Human focuses of the rationality community."
I'm not super thrilled with that term because I think I'm grasping more for some kind of gestalt. An overall way of seeing and being that's hard to describe and that doesn't come naturally to the sort of person attracted to this community.
Much like the blind folk and the elephant, who each touched a different part of the animal and came away with a different impression (the trunk seems like a snake, the legs seem like a tree), I've been watching several people in the community try to describe things over the past few years. And maybe those things are separate but I feel like they're secretly a part of the same invisible badger.
Hufflepuff is about hard work, and loyalty, and camaraderie. It's about emotional intelligence. It's about seeing value in day to day things that don't directly tie into epic narratives.
There's a bunch of skills that go into Hufflepuff. And part of want I want is for people to get better at those skills. But It think a mindset, an approach, that is fairly different from the typical rationalist mindset, that makes those skills easier. It's something that's harder when you're being rigorously utilitarian and building models of the world out of game theory and incentives.
Mindspace is deep and wide, and I don't expect that mindset to work for everyone. I don't think everyone should be a Hufflepuff. But I do think it'd be valuable to the community if more people at least had access to this mindset and more of these skills.
So what I'd like, for tonight, is for people to lean into this idea. Maybe in the end you'll find that this doesn't work for you. But I think many people's first instinct is going to be that this is alien and uncomfortable and I think it's worth trying to push past that.
The reason we're doing this conference together is because the Hufflepuff way doesn't really work if people are trying to do it alone - I think it requires trust and camaraderie and persistence to really work. I don't think we can have that required trust all at once, but I think if there are multiple people trying to make it work, who can incrementally trust each other more, I think we can reach a place where things run more smoothly, where we have stronger emotional connections, and where we trust each other enough to take on more ambitious projects than we could if we're all optimizing as individuals.
Meta-Meetups Suck. Let's Not.
This unconference is pretty meta - we're talking about norms and vague community stuff we want to change.
Let me tell you, meta meetups are the worst. Typically you end up going around in circles complaining and wishing there were more things happening and that people were stepping up and maybe if you're lucky you get a wave of enthusiasm that lasts a month or so and a couple things happen but nothing really *changes*.
So. Let's not do that. Here's what I want to accomplish and which seems achievable:
1) Establish common knowledge of important ideas and behavior patterns.
Sometimes you DON'T need to develop a whole new skill, you just need to notice that your actions are impacting people in a different way, and maybe that's enough for you to decide to change somethings. Or maybe someone has a concept that makes it a lot easier for you to start gaining a new skill on your own.
2) Establish common knowledge of who's interested in trying which new norms, or which new skills.
We don't actually *know* what the majority of people want here. I can sit here and tell you what *I* think you should want, but ultimately what matters is what things a critical mass of people want to talk about tonight.
Not everyone has to agree that an idea is good to try it out. But there's a lot of skills or norms that only really make sense when a critical mass of other people are trying them. So, maybe of the 40 people here, 25 people are interested in improving their empathy, and maybe another 20 are interested in actively working on friendship skills, or sticking to commitments. Maybe those people can help reinforce each other.
3) Explore ideas for social and skillbuilding experiments we can try, that might help.
The failure mode of Ravenclaws is to think about things a lot and then not actually get around to doing them. A failure mode of ambitious Ravenclaws, is to think about things a lot and then do them and then assume that because they're smart, that they've thought of everything, and then not listen to feedback when they get things subtly or majorly wrong.
I'd like us to end by thinking of experiments with new norms, or habits we'd like to cultivate. I want us to frame these as experiments, that we try on a smaller scale and maybe promote more if they seem to be working, while keeping in mind that they may not work for everyone.
4) Commit to actions to take.
Since the default action is for them to peter out and fail, I'd like us to spend time bulletproofing them, brainstorming and coming up with trigger-action plans so that they actually have a chance to succeed.
Tabooing "Hufflepuff"
Having said all that talk about The Hufflepuff Way...
...the fact is, much of the reason I've used those towards is to paint a rough picture to attract the sort of person I wanted to attract to this unconference.
It's important that there's a fuzzy, hard-to-define-but-probably-real concept that we're grasping towards, but it's also important not to be talking past each other. Early on in this project I realized that a few people who I thought were on the same page actually meant fairly different things. Some cared more about empathy and friendship. Some cared more about doing things together, and expected deep friendships to arise naturally from that.
So I'd like us to establish a trigger-action-plan right now - for the rest of this unconference, if someone says "Hufflepuff", y'all should say "What do you mean by that?" and then figure out whatever concrete thing you're actually trying to talk about.
II. Common Knowledge
The first part of the unconference was about sharing our current goals, concerns and background knowledge that seemed useful. Most of the specifics are covered in the notes. But I'll talk here about why I included the things I did and what my takeaways were afterwards on how it worked.
Time to Think
The first thing I did was have people sit and think about what they actually wanted to get out of the conference, and what obstacles they could imagine getting in the way of that. I did this because often, I think our culture (ostensibly about helping us think better) doesn't give us time to think, and instead has people were are quick-witted and conversationally dominant end up doing most of the talking. (I wrote a post a year ago about this, the 12 Second Rule). In this case I gave everyone 5 minutes, which is something I've found helpful at small meetups in NYC.
This had mixed results - some people reported that while they can think well by themselves, in a group setting they find it intimidating and their mind starts wandering instead of getting anything done. They found it much more helpful when I eventually let people-who-preferred-to-talk-to-each-other go into another room to talk through their ideas outloud.
I think there's some benefit to both halves of this and I'm not sure how common which set of preferences are. It's certainly true that it's not common for conferences to give people a full 5 minutes to think so I'd expect it to be someone uncomfortable-feeling regardless of whether it was useful.
But an overall outcome of the unconference was that it was somewhat lower energy than I'd wanted, and opening with 5 minutes of silent thinking seemed to contribute to that, so for the next unconference I run, I'm leaning towards a shorter period of time for private thinking (Somewhere between 12 and 60 seconds), followed by "turn to your neighbors and talk through the ideas you have", followed by "each group shares their concepts with the room."
"What is do you want to improve on? What is something you could use help with?"
I wanted people to feel like active participants rather than passive observers, and I didn't want people to just think "it'd be great if other people did X", but to keep an internal locus of control - what can *I* do to steer this community better? I also didn't want people to be thinking entirely individualistically.
I didn't collect feedback on this specific part and am not sure how valuable others found it (if you were at the conference, I'd be interested if you left any thoughts in the comments). Some anonymized things people described:
-
When I make social mistakes, consider it failure; this is unhelpful
-
Help point out what they need help with
-
Have severe akrasia, would like more “get things done” magic tools
-
Getting to know the bay area rationalist community
-
General bitterness/burned out
-
Reduce insecurity/fear around sharing
-
Avoiding spending most words signaling to have read a particular thing; want to communicate more clearly
-
Creating systems that reinforce unnoticed good behaviour
-
Would like to learn how to try at things
-
Find place in rationalist community
-
Staying connected with the group
-
Paying attention to what they want in the moment, in particular when it’s right to not be persistent
-
Would like to know the “landing points” to the community to meet & greet new people
-
Become more approachable, & be more willing to approach others for help; community cohesiveness
-
Have been lonely most of life; want to find a place in a really good healthy community
-
Re: prosocialness, being too low on Maslow’s hierarchy to help others
-
Abundance mindset & not stressing about how to pay rent
-
Cultivate stance of being able to do helpful things (action stance) but also be able to notice difference between laziness and mental health
-
Don’t know how to respect legit safety needs w/o getting overwhelmed by arbitrary preferences; would like to model people better to give them basic respect w/o having to do arbitrary amount of work
-
Starting conversations with new people
-
More rationalist group homes / baugruppe
-
Being able to provide emotional support rather than just logistics help
-
Reaching out to people at all without putting too much pressure on them
-
Cultivate lifelong friendships that aren’t limited to particular time and place
-
Have a block around asking for help bc doesn’t expect to reciprocate; would like to actually just pay people for help w stuff
-
Want to become more involved in the community
-
Learn how to teach other people “ops skills”
- Connections to people they can teach and who can teach them
Lightning Talks
It turned out we had more people than I'd originally planned time for, so we ended up switching to two minute talks. I actually think this was even better, and my plan for next time is do 1-minute timeslots but allow people to sign up for multiple if they think their talk requires it, so people default to giving something short and sweet.
Rough summaries of the lightning talks can be found in the notes.
III. Discussing the Problem
The next section involved two "breakout session" - two 20 minute periods for people to split into smaller groups and talk through problems in detail. This was done in an somewhat impromptu fashion, with people writing down the talks they wanted to do on the whiteboard and then arranging them so most people could go to a discussion that interested them.
The talks were:
- Welcoming Newcomers
- How to handle people who impose costs on others?
- Styles of Leadership and Running Events
- Making Helping Fun (or at least lower barrier-to-entry)
- Circling session
There was a suggested discussion about outreach, which I asked to table for a future unconference. My reason was that outreach discussions tend to get extremely meta and seem to be an attractor (people end up focusing on how to bring more people into the community without actually making sure the community is good, and I wanted the unconference to focus on the latter.)
I spent some time drifting between sessions, and was generally impressed both with the practical focus each discussion had, as well as the way they were organically moderated.
Again, more details in the notes.
IV. Planning Solutions and Next Actions
Solving this fully requires a few different things at once, and I'm not sure I have a clear picture of what it looks like, but one stepping stone people came up with was creating explicit norms for a given space, and a practice of reminding people of those norms in a low-key, nonjudgmental way.
I think will require a lot of deliberate effort and practice on the part of hosts to avoid alternate bad outcomes like "the norms get disproportionately enforced on people the hosts like and applied unfairly to people they aren't close with". But I do think it's a step in the right direction to showcase what kind of space you're creating and what the expectations are.
Different spaces can be tailored for different types of people with different needs or goals. (I'll have more to say about this in an upcoming post - doing this right is really hard, I don't actually know of any groups that have done an especially good job of it.)
I *was* impressed with the degree to which everyone in the conversation seemed to be taking into account a lot of different perspectives at once, and looking for solutions that benefited as many people as possible.
The exact details are still under development, but I think the basic idea is to have a network of people who are interested
he idea is to have a group of people who go to different events, playing the role of the welcomer. I think the idea is sort of a "Uber for welcomers" network (i.e. it both provides a place for people running events to go to ask for help with welcoming, and people who are interested in welcoming to find events that need welcomers)
It also included some ideas for better infrastructure, such as reviving "bayrationality.org" to make it easier for newcomers to figure out what events are going on (possibly including links to the codes of conduct for different spaces as well). In the meanwhile, some simple changes were the introduction of a facebook group for Bay Area Rationalist Social Events.
https://goo.gl/forms/MzkcsMvD2bKzXCQN2
Opinions were divided as to whether this was something that made sense for "rationalists to do on their own", or whether it made more sense to visit more explicitly Circling-focused communities, but several people expressed interest in trying it again.
There were two main sets of habits that worth cultivating:
1) Making it clear to newcomers that they're encouraged to help out with events, and that this is actually a good way to make friends and get more involved.
2) For hosts and event planners, look for opportunities to offer people things that they can help with, and make sure to publicly praise those who do help out.
Some of this might dovetail nicely with the Welcoming Committee, both as something people can easily get involved with, and if there ends up being a public facing website to introduce people to the community, using that to connect people with events that could use help).
A vague cluster of skills that's in high demand is "predict logistical snafus in advance to head them off, and notice logistical snafus happening in realtime so you can do something about them." Earlier this year there was an Ops Workshop that aimed to teach this sort of skill, which went reasonably but didn't really lead into a concrete use for the skills to help them solidify.
One idea was to do Ops workshops (or other specialized training) in the month before a major event like Solstice or EA Global, giving them an opportunity to practice skills and making that particular event run smoother.
V. Parting Words
To wrap up the event, I focused on some final concepts that underlie this whole endeavor.
The thing we're aiming for looks something like this:
In a couple months (hopefully in July), there'll be a followup unconference. The theme will be "Innovation and Excellence", addressing the twofold question "how do we encourage more people to start cool projects", and "how to do we get to a place where longterm projects ultimately reach a high quality state?"
Both elements feel important to me, and they require somewhat different mindsets (both on the part of the people running the projects, and the part of the community members who respond to them). Starting new things is scary and having too high standards can be really intimidating, yet for longterm projects we may want to hold ourselves to increasingly high standards over time.
My current plan (subject to lots of revision) is for this to become a series of community unconferences that happen roughly every 3 months. The Bay area is large enough with different overlapping social groups that it seems worthwhile to get together every few months and have an open-structured event to see people you don't normally see, share ideas, and get on the same page about important things.
Current thoughts for upcoming unconference topics are:
Innovation and Excellence
Personal Epistemic Hygiene
Group Epistemology
An important piece of each unconference will be revisiting things at the previous one, to see if projects, ideas or experiments we talked about were actually carried out and what we learned from them (most likely with anonymous feedback collected beforehand so people who are less comfortable speaking publicly have a chance to express any concerns). I'd also like to build on topics from previous unconferences so they have more chance to sink in and percolate (for example, have at least one talk or discussion about "empathy and trust as related to epistemic hygiene").
Starting and Finishing Unconferences Together
My hope is to get other people involved sooner rather than later so this becomes a "thing we are doing together" rather than a "thing I am doing." One of my goals with this is also to provide a platform where people who are interested in getting more involved with community leadership can take a step further towards that, no matter where they currently stand (ranging anywhere from "give a 30 second lightning talk" to "run a discussion, or give a keynote talk" to "be the primary organizer for the unconference.")
I also hope this is able to percolate into online culture, and to other in-person communities where a critical mass of people think this'd be useful. That said, I want to caution that I consider this all an experiment, motivated by an intuitive sense that we're missing certain things as a culture. That intuitive sense has yet to be validated in any concrete fashion. I think "willingness to try things" is more important than epistemic caution, but epistemic caution is still really important - I recommend collecting lots of feedback and being willing to shift direction if you're trying anything like the stuff suggested here.
(I'll have an upcoming post on "Ways Project Hufflepuff could go horribly wrong")
Most importantly, I hope this provides a mechanism for us to collectively take ideas more seriously that we're ostensibly supposed to be taking seriously. I hope that this translates into the sort of culture that The Craft and The Community was trying to point us towards, and, ideally, eventually, a concrete sense that our community can play a more consistently useful role at making sure the world turns out okay.
If you have concerns, criticism, or feedback, I encourage you to comment here if you feel comfortable, or on the Unconference Feedback Form. So far I've been erring on the side of move forward and set things in motion, but I'll be shifting for the time being towards "getting feedback and making sure this thing is steering in the right direction."
-
In addition to the people listed throughout the post, I'd like to give particular thanks to Duncan Sabien for general inspiration and a lot of concrete help, Lahwran for giving the most consistent and useful feedback, and Robert Lecnik for hosting the space.
Physical actions that improve psychological health
Physical health impacts well-being. However, existing preventative health guidelines are inaccessible to the public because they are highly technical and require specific medical equipment. These notes are not medical advice nor meant to treat any illness. This is a compilation of findings I have come across at one time or another in relation to physical things that relate back to psychological health. I have not systematically reviewed the literature on any of these topics, nor am I an expert nor even familiar with any of them. I am extremely uncertain about the whole thing. But, I figure better to write this up and look stupid than keep it inside and act stupid. The hyperlinks point to the best evidence I could find on the matter. I write to solicit feedback, corrections and advice.
Microwaves are safe, but cockroaches and even ants are dangerous, and finally: happiness is dietary. If you want the well-being boosts associated with fruit (careful about fruit juice sugar though!), coffee’s aroma [text] [science news], vanilla yoghurt [news], Sufficient B vitamins and choline (alt), binge drinking or drinking in general, however, I don’t have any easy answers for you. Don’t worry about the smart drugs, nootropics are probably a misnomer. On the other hand, probiotics can treat depression
- Wikipedia
If your diet is out of control: Mental contrasting is useful for diabetes self-management, dieting etc. Tangent: During a seminar I attended in Geneva, The World Health Organisation chief dietary authority said that suggesting dietary patterns (e.g. the Mediterranean diet) rather than individual nutrient intake (protein, creatine, carbs) is preferable. But I have yet to identify substantiating evidence. The broad consensus among lay skeptical scrutineers of the field of nutrition is that most truths, even those broadly accepted ones, are still unclear. However, I have yet to analyse the literature myself.
Exercise and sport are good for subject well-being, quality of life, depression, anxiety, stress and more. Plus, they are fun. You may not enjoy pleasant, wellbeing related activities. Do those activities anyway. I seldom enjoy correcting my posture. I tend to slouch and I have been specifically advised by specialised physiotherapist to correct for that. But, slouching typically doesn’t cause pain - posture correction is pseudoscience! So is many interventions related to posture correction, like standing desks. On the other hand, I love to get massages - but their benefits are short lived - so get them regularly!
I particularly enjoy them after resistance training or 1 minute workouts (high intensity interval training). Be careful about stretching, passive stretching can cause injury, unlike active stretching: 'Passive stretching is when you use an outside force other than your own muscle to move a joint or limb beyond its active range of motion, to put your body into a position that you couldn’t do by yourself (such as when you lean into a wall, or have a partner push you into a deeper stretch). Unfortunately, this is the most common form of stretching used.'
However, if you aim to bodybuild, protein supplementation is pseudoscientific broscience. And ‘form’, well, there’s broscience - like squat with your knees outwards but probably lots of credible safety related information one ought to head. For weight loss, if you want a real cheat sheet - weight loss aspirants can get it for a couple of hundred dollar SNP sequencing kit. But, I would be cautious about gene sequence driven health prescription, some services running that business rely on weak evidence. There are other ‘fad’ fitness ideas that are not grounded in science. For instance: 20 second of foam rolling (just as effective as 60 seconds) enhance flexibility (...for no longer than 10 minutes, unless it is done regularly - than it improves long term flexibility) but it is unclear whether they improve athletic performance or post-performance recovery.
Stretching for runners, but no other kinds of sports prevents injuries and increase range of motion [wikipedia]. Shoe inserts don’t work reliably either [Wikipedia]. Martial arts therapy is a thing. Physical exercise is good for you. Tai chi, qigong, and meditation (other than mindfulness) such as transcendental meditation are ineffective in treating depression and anxiety. If you are injured, try rehabilitation exercises. Exercise or performance enhancing drugs are both cognitive enhancers. Exercise for chronic lower back pain is a good idea.
Environment: Avoid outdoor air pollution near residences due to dementia/other-health risks. And, avoid chimney smoke fireplaces.
Anecdotally, hygiene improves self-esteem and well-being. Wipe with wet wipes if you wipe hard enough to cause blood to form, cover the toilet seat with toilet paper or don’t - it doesn’t matter safety wise unless the contaminant is <~1hr old, shower with soap, remove eye mucus, remove earwax (but not the way you think, likely), brush twice a day - with the correct technique, replacing your toothbrush every few months and softly. 'Don't rinse with water straight after toothbrushing'. Floss once a day (with a different piece of floss each flossing session) but do not brush immediately after drinking acidic substances. The effectiveness of Tooth Mousse is questionable. Visit the dentist for a check-up every now and then - I’d say about every year at least (does anyone know how to format this sentence consistent with the rest of the text - it doesn't appear to be a font size or type issue).
Consider sleeping with a face mask and earplugs for better sleep, blow your nose, clean under your nails and trim them. Eye examinations should be conducted every 2-4 years for those under 40, and up to every 6 months for those 65+. There are health concerns around memory foam pillows/mattresses so latex pillows may be preferable for those who prefer a sturdier option than traditional pillows/mattresses Anecdotally, setting alarms to remind you to do things is a simple way to manage your time not just for waking up. Light therapy is also helpful in treating delayed sleep phase disorder (being a night owl!). Oh, and don’t bother loading the dishwasher with pre washed dishes (as long as you clean the filter regularly).
There are misconceptions around complementary therapies. The Australian Government reviewed the effective of The Alexander technique, homeopathy, aromatherapy, bowen therapy, buteyko, Feldenkrais, herbalism, homeopathy, iridology, kinesiology, massage therapy, pilates, reflexology, rolfing shiatsu, tai chi, yoga. Only for (Alexander technique, Buteyko, massage therapy (esp. Remedial massage?), tai chi and yoga was there credible (albeit low to moderate quality) evidence that they are useful for certain health conditions.
Stressed out reading all this? Pressing on your eyelids gently to temporarily forgo a headache can work. Traumatically stressed out? Video games can treat PTSD. Animal assisted therapy, like service dogs and therapeutic animals are also wonderful.
Thank you!
Open thread, May 22 - May 28, 2017
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
On-line google hangout on approaches to communication around agi risk (2017/5/27 20:00 UTC)
We have a number of charities that are working on different aspects of AGI risk
- The theory of the alignment problem (MIRI/FHI/more)
- How to think about problems well (CFAR)
However we don't have body dedicated to making and testing a coherent communication strategy to help postpone the development of dangerous AIs.
I'm organising an on-line discussion around what we should do about this issue next saturday.
In order to find out when people can do it, I've created a doodle here. I'm trusting that doodle works well with timezones. The time slots should be between 1200 and 2300 UTC , let me know if they are not.
We'll be using the optimal brainstorming methodology
Give me a message if you want an invite, once the time has been decided.
I will take notes and post them here again.
AGI and Mainstream Culture
Hi all,
So, as you may know, the first episode of Doctor Who, "Smile", was about a misaligned AI trying to maximize smiles (ish). And the latest, "Extremis", was about an alien race who instantiated conscious simulations to test battle strategies for invading the Earth, of which the Doctor was a subroutine.
I thought the common threat of AGI was notable, although I'm guessing it's just a coincidence. More seriously, though, this ties in with an argument I thought of, and want to know your take on: i
If we want to avoid an AI arms race, so that safety research has more time to catch up to AI progress, then we would want to prevent, if at all possible, these issues from becoming more mainstream. The reason is that if AGI in public perception becomes disassociated with Terminator (i.e. laughable, nerdy, and unrealistic) and more like a serious whoever-makes-this-first-can-take-over-the-world situation, then we will get an arms race faster.
I'm not sure I believe this argument myself. For one thing, being more mainstream has the benefit of attracting more safety research talent, government funding, etc. But maybe we shouldn't be spreading awareness without thinking this through some more.
CFAR workshop with new instructors in Seattle, 6/7-6/11
CFAR is running its first workshop in Seattle!
Over the past several months, CFAR has been training a new batch of instructors, including me. We're now running a workshop, without the core instructors, in Seattle from June 7th to June 11th. You can apply here, and we have an FAQ here.
AI safety: three human problems and one AI issue
Crossposted at the Intelligent agent foundation.
There have been various attempts to classify the problems in AI safety research. Our old Oracle paper that classified then-theoretical methods of control, to more recent classifications that grow out of modern more concrete problems.
These all serve their purpose, but I think a more enlightening classification of the AI safety problems is to look at what the issues we are trying to solve or avoid. And most of these issues are problems about humans.
Specifically, I feel AI safety issues can be classified as three human problems and one central AI issue. The human problems are:
- Humans don't know their own values (sub-issue: humans know their values better in retrospect than in prediction).
- Humans are not agents and don't have stable values (sub-issue: humanity itself is even less of an agent).
- Humans have poor predictions of an AI's behaviour.
And the central AI issue is:
- AIs could become extremely powerful.
Obviously if humans were agents and knew their own values and could predict whether a given AI would follow those values or not, there would be not problem. Conversely, if AIs were weak, then the human failings wouldn't matter so much.
The points about human values is relatively straightforward, but what's the problem with humans not being agents? Essentially, humans can be threatened, tricked, seduced, exhausted, drugged, modified, and so on, in order to act seemingly against our interests and values.
If humans were clearly defined agents, then what counts as a trick or a modification would be easy to define and exclude. But since this is not the case, we're reduced to trying to figure out the extent to which something like a heroin injection is a valid way to influence human preferences. This makes both humans susceptible to manipulation, and human values hard to define.
Finally, the issue of humans having poor predictions of AI is more general than it seems. If you want to ensure that an AI has the same behaviour in the testing and training environment, then you're essentially trying to guarantee that you can predict that the testing environment behaviour will be the same as the (presumably safe) training environment behaviour.
How to classify methods and problems
That's well and good, but how to various traditional AI methods or problems fit into this framework? This should give us an idea as to whether the framework is useful.
It seems to me that:
- Friendly AI is trying to solve the values problem directly.
- IRL and Cooperative IRL are also trying to solve the values problem. The greatest weakness of these methods is the not agents problem.
- Corrigibility/interruptibility are also addressing the issue of humans not knowing their own values, using the sub-issue that human values are clearer in retrospect. These methods also overlap with poor predictions.
- AI transparency is aimed at getting round the poor predictions problem.
- Laurent's work on carefully defining the properties of agents is mainly also about solving the poor predictions problem.
- Low impact and Oracles are aimed squarely at preventing AIs from becoming powerful. Methods that restrict the Oracle's output implicitly accept that humans are not agents.
- Robustness of the AI to changes between testing and training environment, degradation and corruption, etc... ensures that humans won't be making poor predictions about the AI.
- Robustness to adversaries is dealing with the sub-issue that humanity is not an agent.
- The modular approach of Eric Drexler is aimed at preventing AIs from becoming too powerful, while reducing our poor predictions.
- Logical uncertainty, if solved, would reduce the scope for certain types of poor predictions about AIs.
- Wireheading, when the AI takes control of reward channel, is a problem that humans don't know their values (and hence use an indirect reward) and that the humans make poor predictions about the AI's actions.
- Wireheading, when the AI takes control of the human, is as above but also a problem that humans are not agents.
- Incomplete specifications are either a problem of not knowing our own values (and hence missing something important in the reward/utility) or making poor predictions (when we though that a situation was covered by our specification, but it turned out not to be).
- AIs modelling human knowledge seem to be mostly about getting round the fact that humans are not agents.
Putting this all in a table:
Method | Values | Not Agents | Poor Predictions | Powerful |
---|---|---|---|---|
Friendly AI |
X | |||
IRL and CIRL | X | |||
Corrigibility/interruptibility | X | X | ||
AI transparency | X | |||
Laurent's work | X | |||
Low impact and Oracles | X | X | ||
Robustness | X | |||
Robustness to adversaries | X | |||
Modular approach | X | X | ||
Logical uncertainty | X | |||
Wireheading (reward channel) | X | X | X | |
Wireheading (human) | X | X | ||
Incomplete specifications | X | X | ||
AIs modelling human knowledge | X |
Further refinements of the framework
It seems to me that the third category - poor predictions - is the most likely to be expandable. For the moment, it just incorporates all our lack of understanding about how AIs would behave, but this might more useful to subdivide.
Instrumental Rationality Sequence Update (Drive Link to Drafts)
Hey all,
Following my post on my planned Instrumental Rationality sequence, I thought it'd be good to give the LW community an update of where I am.
1) Currently collecting papers on habits. Planning to go through a massive sprint of the papers tomorrow. The papers I'm using are available in the Drive folder linked below.
2) I have a publicly viewable Drive folder here of all relevant articles and drafts and things related to this project, if you're curious to see what I've been writing. Feel free to peek around everywhere, but the most relevant docs are this one which is an outline of where I want to go for the sequence and this one which is the compilation of currently sorta-decent posts in a book-like format (although it's quite short right now at only 16 pages).
Anyway, yep, that's where things are at right now.
Reaching out to people with the problems of friendly AI
There have been a few attempts to reach out to broader audiences in the past, but mostly in very politically/ideologically loaded topics.
After seeing several examples of how little understanding people have about the difficulties in creating a friendly AI, I'm horrified. And I'm not even talking about a farmer on some hidden ranch, but about people who should know about these things, researchers, software developers meddling with AI research, and so on.
What made me write this post, was a highly voted answer on stackexchange.com, which claims that the danger of superhuman AI is a non-issue, and that the only way for an AI to wipe out humanity is if "some insane human wanted that, and told the AI to find a way to do it". And the poster claims to be working in the AI field.
I've also seen a TEDx talk about AIs. The talker didn't even hear about the paperclip maximizer, and the talk was about the dangers presented by the AIs as depicted in the movies, like the Terminator, where an AI "rebels", but we can hope that AIs would not rebel as they cannot feel emotion, so we should hope the events depicted in such movies will not happen, and all we have to do is for ourselves to be ethical and not deliberately write malicious AI, and then everything will be OK.
The sheer and mind-boggling stupidity of this makes me want to scream.
We should find a way to increase public awareness of the difficulty of the problem. The paperclip maximizer should become part of public consciousness, a part of pop culture. Whenever there is a relevant discussion about the topic, we should mention it. We should increase awareness of old fairy tales with a jinn who misinterprets wishes. Whatever it takes to ingrain the importance of these problems into public consciousness.
There are many people graduating every year who've never heard about these problems. Or if they did, they dismiss it as a non-issue, a contradictory thought experiment which can be dismissed without a second though:
A nuclear bomb isn't smart enough to override its programming, either. If such an AI isn't smart enough to understand people do not want to be starved or killed, then it doesn't have a human level of intelligence at any point, does it? The thought experiment is contradictory.
We don't want our future AI researches to start working with such a mentality.
What can we do to raise awareness? We don't have the funding to make a movie which becomes a cult classic. We might start downvoting and commenting on the aforementioned stackexchange post, but that would not solve much if anything.
The robust beauty of improper linear models
It should come as no surprise to people on this list that models often outperform experts. But these are generally finely calibrated models, integrating huge amounts of data, so this seems less surprising. How can the poor experts compete against that?
But sometimes the models are much simpler than that, and still perform better. For instance, the models could be linear, rather than having higher order complexities. These models can still outperform experts, because in practice, despite their beliefs that they are doing a non-linear task, expert decisions can often best be modelled as being entirely linear.
But surely the weights of the linear models are subtle and need to be set exactly? Not really. It seems that if you take a linear model, and weigh the variables by +1 or -1 depending on whether it has a positive or negative impact on the result, then you will get a model that still often outperforms experts. These models with ±1 weights are called improper linear models.
What's going on here? Well, there's been a bit of a dodge. I've been talking about "taking" a linear model, with "variables", and weighing the factors depending on a positive or negative "impact". And to do all that, you need experts. They are the ones that know which variables are important, and know the direction (positive or negative) in which they impact the result. They don't choose these variables by just taking random possibilities and then figuring out what the direction is. Instead, they understand the situation, to some extent, and choose important variables.
So that's the real role of the expert here: knowing what should go into the model, what really makes the underlying dependent variable change. Selecting and coding the variable information, in the terms that are often used.
But, just as experts can be very good at that task, they are human, and humans are terrible at integrating lots of information together. So, having selected the variables, they get regularly outperformed by proper linear models. And when you add the fact that the experts have selected variables of comparable importance, and that these variables are often correlated with each other, it's not surprising that they get outperformed by improper linear models as well.
Are causal decision theorists trying to outsmart conditional probabilities?
Presumably, this has been discussed somewhere in the past, but I wonder to which extent causal decision theorists (and many other non-evidential decision theorists, too) are trying to make better predictions than (what they think to be) their own conditional probabilities.
To state this question more clearly, let’s look at the generic Newcomb-like problem with two actions a1 and a2 (e.g., one-boxing and two-boxing, cooperating or defecting, not smoking or smoking) and two states s1 and s2 (specifying, e.g., whether there is money in both boxes, whether the other agent cooperates, whether one has cancer). The Newcomb-ness is the result of two properties:
-
No matter the state, it is better to take action a2, i.e. u(a2,s1)>u(a1,s1) and u(a2,s2)>u(a1,s2). (There are also problems without dominance where CDT and EDT nonetheless disagree. For simplicity I will assume dominance, here.)
-
The action cannot causally affect the state, but somehow taking a1 gives us evidence that we’re in the preferable state s1. That is, P(s1|a1)>P(s1|a2) and u(a1,s1)>u(a2,s2).
Then, if the latter two differences are large enough, it may be that
E[u|a1] > E[u|a2].
I.e.
P(s1|a1) * u(s1,a1) + P(s2|a1) * u(s2,a1) > P(s1|a2) * u(s1,a2) + P(s2|a2) * u(s2,a2),
despite the dominance.
Now, my question is: After having taken one of the two actions, say a1, but before having observed the state, do causal decision theorists really assign the probability P(s1|a1) (specified in the problem description) to being in state s1?
I used to think that this was the case. E.g., the way I learned about Newcomb’s problem is that causal decision theorists understand that, once they have said the words “both boxes for me, please”, they assign very low probability to getting the million. So, if there were a period between saying those words and receiving the payoff, they would bet at odds that reveal that they assign a low probability (namely P(s1,a2)) to money being under both boxes.
But now I think that some of the disagreement might implicitly be based on a belief that the conditional probabilities stated in the problem description are wrong, i.e. that you shouldn’t bet on them.
The first data point was the discussion of CDT in Pearl’s Causality. In sections 1.3.1 and 4.1.1 he emphasizes that he thinks his do-calculus is the correct way of predicting what happens upon taking some actions. (Note that in non-Newcomb-like situations, P(s|do(a)) and P(s|a) yield the same result, see ch. 3.2.2 of Pearl’s Causality.)
The second data point is that the smoking intuition in smoking lesion-type problems may often be based on the intuition that the conditional probabilities get it wrong. (This point is also inspired by Pearl’s discussion, but also by the discussion of an FB post by Johannes Treutlein. Also see the paragraph starting with “Then the above formula for deciding whether to pet the cat suggests...” in the computer scientist intro to logical decision theory on Arbital.)
Let’s take a specific version of the smoking lesion as an example. Some have argued that an evidential decision theorist shouldn’t go to the doctor because people who go to the doctor are more likely to be sick. If a1 denotes staying at home (or, rather, going anywhere but a doctor) and s1 denotes being healthy, then, so the argument goes, P(s1|a1) > P(s1|a2). I believe that in all practically relevant versions of this problem this difference in probabilities disappears once we take into account all the evidence we already have. This is known as the tickle defense. A version of it that I agree with is given in section 4.3 of Arif Ahmed’s Evidence, Decision and Causality. Anyway, let’s assume that the tickle defense somehow doesn’t apply, such that even if taking into account our entire knowledge base K, P(s1|a1,K) > P(s1|a2,K).
I think the reason why many people think one should go to the doctor might be that while asserting P(s1|a1,K) > P(s1|a2,K), they don’t upshift the probability of being sick when they sit in the waiting room. That is, when offered a bet in the waiting room, they wouldn’t accept exactly the betting odds that P(s1|a1,K) and P(s1|a2,K) suggest they should accept.
Maybe what is going on here is that people have some intuitive knowledge that they don’t propagate into their stated conditional probability distribution. E.g., their stated probability distribution may represent observed frequencies among people who make their decision without thinking about CDT vs. EDT. However, intuitively they realize that the correlation in the data doesn’t hold up in this naive way.
This would also explain why people are more open to EDT’s recommendation in cases where the causal structure is analogous to that in the smoking lesion, but tickle defenses (or, more generally, ways in which a stated probability distribution could differ from the real/intuitive one) don’t apply, e.g. the psychopath button, betting on the past, or the coin flip creation problem.
I’d be interested in your opinions. I also wonder whether this has already been discussed elsewhere.
Acknowledgment
Discussions with Johannes Treutlein informed my view on this topic.
View more: Next