(cache) Should Surgeons Keep Score?

Hacker News

	Should Surgeons Keep Score? (medium.com)
	186 points by jsomers 1 day ago \| 84 comments

shanusmagnus 1 day ago | link

I'm astounded at the tenor of comments on this article, which I would expect to find at any site other than this one. Yes, measuring surgical outcome data and surgeon performance could lead to a variety of complications, perverse incentives, regulatory capture, etc. etc. All true.

But you know what's worse than that? Having no fucking information about anything. I'm in the process of trying to find a surgeon to help with an orthopedic procedure, and it is so incredibly frustrating that I have a thousand times as much information on which refrigerator I should buy, or which phone, than about something that will affect my life more intimately than any consumer product ever could.

And yes, of course I get that rating a commodity with a fixed function that is used by zillions of people is way more straightforward. But look, all of these apocalyptic scenarios about brain surgeons who only take trivial cases so they can have a better score on the leaderboard? I'm just not worried about it. At all. Because people are generally proud and want to do better. Because of the social and professional stigma that would come from such behavior. Because if you collect a rich dataset, you can account for most kinds of gaming, just as you can do for teacher outcomes, or every other damn thing.

But probably most of all because the current state of affairs is so abjectly wretched, and literally any effort at measurement and accountability would be better under every reasonable scenario.

hermanhermitage 1 day ago | link

Incase you find this useful:

Just lying in hospital bed here about 36 hours after 2nd revision on a THR first done in 1987. In the absence of clear public metrics I used the following approach to choose my most recent surgeon:

1. Age/experience. I was looking for someone about 2/3 of their way thru career. Assuming experience and survival bias were positive indicators.

2. I got a series of second opinions asking surgeons the direct question of whom they would get to operate on themselves if they needed the procedure done.

3. I consulted a series of family doctors on their recommendations.

4. I went with someone with a strong track record designing prostheses but also a large history of performing the particular procedure I was having.

There are always trade offs. My surgeon works long shifts and so I got them 12 hours into a shift on a friday, which had me a little nervous (fatigue wise).

So my main advice is don't be shy in getting a lot of opinions and advice. I will probably need another 2-3 revisions in my life time on this hip alone. So I thought it worth my while taking a thorough approach.

Still at the end of the day bad luck can always happen. Expect the best but be mentally prepared for the worst.

Ntrails 5 hours ago | link

So who do you nominate to be operated on by the surgeons exactly 1/5 of the way through their career? The poor? The desperate? How about those over the hill all the way over at 7/8ths through? The guy who has only done this procedure twice before, shall we have him work only on the mentally deficient who don't know any better?

By definition we cannot all have the best surgeon (and there is one). Therefore you can only ration access to this "best" surgeon in one of three ways: need, luck, or money. Personally I'd rank them in that order in terms of sense - and none of them truly involve patient choice.

Metrics are great for use internally, as oversight. They allow hospitals to assess doctors, to help doctors, to improve practices and better themselves. As a "consumer" of healthcare I think they're mostly damaging.

tomcam 1 day ago | link

I believe #2 is key and have thought for years this could be the basis of a super-effective site. But not a lucrative one: a key problem is that any kind of monetization that would benefit the doctors doing the rating could pervert the results.

imaginenore 1 day ago | link

It could easily turn into "i scratch your back, you scratch mine" type of reviews. Even voting rings of doctors.

hermanhermitage 1 day ago | link

Definitely detected that in my travels - diabolical surgeon golfers ring :). I learnt to ask them what handicap they play off.

shanusmagnus 19 hours ago | link

Thank you, that is indeed useful. I've been doing a kind of crappy and ad-hoc version of 1 and 3, although the latter (and recommendations in general) has been less useful than I thought, since the GPs I know aren't in a great position to evaluate orthopedic surgeons -- their experience is limited, and they also lack data; but also because I can get a strong rec from someone I trust to a surgeon that my insurance won't cover. But that's a whole nother topic.

Your systematicity in the search is inspiring, though. Thanks for the advice.

patcheudor 1 day ago | link

The thing is, many of us have been around the block on more than one occasion with poorly thought out metrics which lead to unintended and down right bad consequences. A few years ago I was rallying against a system used to rate and rank relative risk of software systems based on a score. Many times I was told: "at least it's better than nothing." Ultimately it was discovered that there was nearly an inverse relationship to the scoring. Those bits of software which ranked as the most secure ended up being the least, with the least secure being the most. This happened because developers unaware of security concepts assumed security was happening elsewhere rather than taking the initiative to practice secure coding. Interestingly enough this entirely invalidated the argument that doing something was better than nothing as the system drove misplaced focus and overall reduced the security of the overall environment as the number of bad bits of software out numbered the good, thus leading to poor hiring practices. Management saw the score and figured everything was getting better and the environment needed fewer security resources rather than the reality which was that many more were needed.

Complex systems are challenging & one study isn't going to provide the miracle solution. Surgery is not a refrigerator and to think every problem can immediately be solved by a trivial metric can be a dangerous game, especially when it comes to lives. When it comes to health care I tend to look at the macro, rather than the micro. Which hospitals have higher ratings, lower insurance claims and rates, etc., however, I understand that information can be gamed as well and take it with a grain of salt. At the end of the day I hedge my bets in this space by eating well, working out daily, and otherwise trying to avoid the medical industry at all costs.

nostromo 1 day ago | link

So we measure nothing? As pointed out in your comment, the fix is the consumer of the information understanding it and using it correctly, not avoiding the data entirely.

This reminds me of the debate about improving schools in the U.S. We hear objections from teachers unions how it's literally impossible to measure the efficacy of schools and teachers.

What other industries can afford to not rate their employees' effectiveness, outside of the broken markets of primary education and healthcare?

fixedd 1 day ago | link

Schools and teachers should be amongst the easiest to test since we have so much data... measured multiple times a year.

dmm 1 day ago | link

You have to be careful what you measure. Imagine you have a 6th grader who is reading at a 2nd grade level. At the end of 6th grade you test her and find out now she's reading at a 5th grade level. Wow! A big improvement! Her teacher must be great.

Unfortunately the test shows she's reading below grade level and the teacher is put on an improvement plan.

ScottBurson 1 day ago | link

I agree that bad metrics can be worse than none. I've certainly seen other examples in books on management.

In this case, though, it sounds like they've put a lot of effort into coming up with good metrics that are hard to game and, most importantly, are convincing to the surgeons themselves. The article talks about that at some length.

(BTW thanks for plugging static code analysis -- I work in that field :-)

[deleted]

dragonwriter 1 day ago | link

> I'm having a hard time understanding how this could be. How could attempting to meet a list of security requirements result in less security than doing nothing?

IME, people given a list of "X requirements" often assume that the appropriate experts have addressed subdomain X and that it is no longer necessary (or even appropriate) to expend further attention on that subdomain beyond meeting those requirements. Thus a standard security checklist that isn't well tailored to a particular project may decrease the quality of the result for that project, because the people on the project are less likely to consider security in the context of the particular project, and instead inspect efforts in crossing boxes off the checklist.

patcheudor 1 day ago | link

Okay, lets take a simple one:

"Have you implemented input validation and output encoding?"

What could go wrong, right? A naive developer with little security understanding will look at that question and might even ask other members of the team about it. After a short bit of research the developer answers the question as "yes the application implements input validation and output encoding" based on finding that the framework utilized does in fact have input validation and output encoding functionality built in.

Of course while the framework might have such controls and they may even be implemented for the particular code base, it turns out that relying on the framework alone isn't sufficient because there are many places in the code where that framework can be subverted or entirely bypassed. Rather than stopping at the framework controls the developer actually needed to perform a code audit and find everywhere user data might go to a sink which could result in unauthorized code execution, SQLi, or XSS.

This is a particular scenario I've seen from the days of PHP magic quotes all the way to the latest ASP.NET request validation. It turns out the question was too simplistic. If we look at solving this problem we then need to dive into a great level of detail not only pertaining specifically to the framework controls available within the environment, but we must also look at how the code is implemented, ultimately scrapping the check-list only approach in favor of static code analysis. Of course if a management team only has a checklist and they are getting false answers as exampled above, they have no idea why they need static code analysis or should pay for the software and resources to use it.

noonespecial 1 day ago | link

It may conceivably be a feature and not a bug at least as American healthcare is implemented today. Imagine if was known beyond doubt who the single best surgeon in the country was (and people were more aware of just how big a difference in outcome surgeon skill makes).

How many people would demand and feel entitled to that surgeon? Who would shrug and say "oh well, I'm on Medicare so 146th best will be fine"?

There's a myth floating around today that everyone is entitled to and can have "the very best" medical care. (Or this is the way it would be except demo/publicans.)

Objective scores would severely damage this myth.

refurb 1 day ago | link

You make a great point. At the same time the US is already kind of there. In Canada most folks just go to their local hospital. Only if that hospital can't provide care do you go somewhere else.

In the US, i've noticed that folks often pursue the best care they can, especially if it's a serious condition. Of course that's if you're insurance covers it.

arjie 1 day ago | link

Certainly you may not be worried about this, but when there are human beings involved one has to be more careful. With any physical process, you can change something by just applying force. Changing the status quo in situations like these is harder. The harder you push, the more they'll form a rock wall against your demands.

Nobody wants to protect the poor surgeon. But everyone knows the tragic fact: each one of them was a poor surgeon at some procedure at some point. Experience made them better.

The article stresses the importance of the way these things are done. In order to have surgeons participate you need to earn their trust. Show them that you're not out to 'expose the failures' so to speak.

Besides, the idea that surgeons will only take easy cases is not a hypothetical. The article lists instances where gaming occurred. In fact, it stresses the importance of confidentiality in gaining surgeons' trust in ensuring this doesn't happen.

Like in everything else, we should treat how to change systems as a science and the evidence looks like it favours Amplio's approach to the problem. Maybe in some years from now things will be different and more information can be public. But for that to succeed we need to get there carefully.

hueving 1 day ago | link

Taking easy cases could easily be accounted for by tracking the cases the surgeon chose not to take as well.

teddyh 1 day ago | link

> But you know what's worse than that? Having no fucking information about anything.

Sometimes, having no information, and choosing truly randomly, can be better than having bad information and choosing according to your own biased and faulty intuition and preconceptions.

moonka 1 day ago | link

This seems to be the case for everything to do with Healthcare. Even choosing a healthcare plan is confusing and terrifying, and then figuring out if what you need is covered or not is hard to do.

k-mcgrady 1 day ago | link

Is 'choosing a surgeon' an American thing or a private health care thing? Personally I've I'm ill enough to be in hospital I get admitted, they assign me a consultant and if necessary he assigns me a surgeon. No choosing involved.

jacalata 1 day ago | link

>Because if you collect a rich dataset, you can account for most kinds of gaming, just as you can do for teacher outcomes, or every other damn thing.

Has this actually been solved for teacher outcomes or are you just saying it could be?

the_cat_kittles 1 day ago | link

i think the kind of doctors that would "game the system" are probably already shitty doctors, so i think the risk of surgeons only taking trivial cases is even less of a big deal.

bokonist 1 day ago | link

Recently I had a major shoulder operation. I was shocked at how little information I had about who was a good surgeon. If I didn't have a family member who worked in the complaints department of the local big hospital, I would have had no way of knowing who was considered good or bad.

My surgeon told me that I had a 90% percent chance of success, and a 1% chance of nasty complications like nerve damage. The literature on the procedure said that typical success rates were 75% and the complications rate more like 5%. Was my surgeon particular good? Did I have a better shot because I was young and healthy? Or was my surgeon suffering from the Lake Wobegon affect and overconfidence? There was no way for me to know as a patient.

That said, naive score keeping could go very awry, for very obvious reasons that others in thread have mentioned.

Here is the system I would like to see. Tell me how this could get gamed:

Surgeons should be required to give official, written, probabilities to all potential patients. So for instance, a surgeon might say that there is a 90% chance that I can play football again, a 1% chance that my arm will end up worse than before the surgery, and a .01% chance of death.

Then surgeons should simply be measured against their own predictions. When I go to a surgeon, I should have access to that data. The surgeon has no incentive to be overly conservative with the probabilities - because then I will go to a surgeon who is more skilled, can predict better outcomes, and the track record to prove it. Nor does the surgeon have an incentive to be overly optimistic, because then they will get dinged for not scoring according to their own predictions. Nor does the surgeon have an incentive to turn away high-risk patients, they just need to state the risks accurately.

The patient wins because the patient can finally have the most accurate as possible information about the risks and benefits of a surgery, and can get multiple opinions, compare them, and have good data about which surgeons are reliable in their predictions.

imaginenore 1 day ago | link

Why ask for the doctor's opinion about the data, and not for the data itself?

And what's the punishment if the doctor's estimate is wrong? Or he/she is lying?

Doctors have enough shit on their plate, we simply need the access to the hospital data with the doctor names attached.

bokonist 1 day ago | link

I ask for the doctor's official opinion/prediction about the probabilities for my own surgery. This must be the doctor's opinion because every person and every surgery is different. A doctor's job is to analyze each person's situation, and then use their experience, knowledge, and professional judgement to give the patient the doctor's best estimate of the benefits and risks of a given procedure. This is not "adding sh*t to their plate", this is formalizing a core job function of every doctor.

The patient would get the doctor's prediction track record directly from the hospital or a third party monitoring agency. That way the patient knows if the doctor is generally accurate in their predictions, or if they are consistently overconfident in their own abilities.

If a doctor was wrong once, they are wrong. If they are consistently wrong, then that shows up in their stats. Patients will no longer trust their predictions, and will seek other doctors. The doctor will have to really improve their prediction ability (a good thing) or else go out of business for lack of patients.

pedrosorio 1 day ago | link

I don't see how this solves the problem of some doctors tackling harder/easier cases.

How do you distinguish two doctors with high prediction ability and low success rates (compared to the average for that procedure) if one is bad (and she knows it) and the other is tackling harder cases (and is actually one of the best in the field for cases with high probability of complications)?

Without input from other doctors (or simply using a lot of data where you can correlate hard procedures with other factors in the patient data) you'll never be able to distinguish the two doctors mentioned above.

bokonist 1 day ago | link

"How do you distinguish two doctors with high prediction ability and low success rates (compared to the average for that procedure)"

The doctor is never compared against the "average for that procedure" for exactly the reasons you give. The doctor is only compared against that doctor's own predictions.

So as a patient in need of a surgery, you would get opinions from 3-4 different surgeons, each one would offer their personal outcome probabilities. The patient gets access to that doctor's stats that score their actual track record against their own predictions. The patient should then choose the surgeon who gives the best odds but also has a track record of hitting their predictions.

A doctor who tackles hard cases should still have a good success rate against their own predictions. Such a doctor will just lower their predictions according to the riskiness of the case. If the doctor is good, such a doctor will still get business, because the skilled doctor will still offer better odds (odds that the patient can actually trust) than can be reliably offered by a less skilled doctor.

The one weakness of my system is that it does not give any sort of global score. There is still the problem of having to find 3 to 4 good surgeons to ask for an opinion in the first place. But at least once you have gotten to that point, you can have trustworthy predictions upon which to base your decisions.

sokoloff 23 hours ago | link

Doctors in such a system may still have an incentive to over negatively predict the tough cases, and specifically by an amount greater than their peers.

The hope would be "send this tough or impossible case to someone else" such that the doctor's success rate stats will remain high and the outcome prediction stats would be unaffected (as the "trial" would go to another doctor).

I'm all for having more information available, and when my extended family faces a serious medical concern, we seek out friends and family in medicine, asking "if you faced this situation, what doctor would you trust?" I don't know of a way to globally institutionalize that process.

MarkMc 14 hours ago | link

But the track record for such a doctor will clearly see that they aren't tackling the difficult cases. Would a doctor want a high success rate if it comes with a reputation for only taking easy cases?

Even if the answer to the above question is Yes, the reduced number of doctors willing to tackle difficult cases will be able to charge higher fees. So at some point you would reach a market equilibrium where the desire to tackle easy cases is balanced by the desire to earn a higher income. That is, when compared to the current system the easy cases will become cheaper and the more difficult cases will become more expensive - but maybe that is an acceptable outcome if it means the system as a whole is more efficient?

sokoloff 12 hours ago | link

It depends on the goals of the doctor. If the doctor aspires to a massive volume of fixed-rate, "easy" procedures (look at cataract surgeons benefiting from the advances in that field while insurance and Medicare reimbursement rates remained constant [and high]). I'm not knocking those docs; they provided real and tangible benefits for millions of patients with cataracts and I don't begrudge them their money. It's just super amusing to me to walk down the multi-million dollar warbird parking area and have half the owners be eye doctors.

As for aspiring to have a reputation for efficacy in difficult cases and assuming that you'll be able to charge more due to market forces? I don't see that playing out in any Western medicine economy. IMO, you can't build a functional ecosystem around the very few patients who are self-paying and willing to pay large sums for better care.

Very wealthy individuals and pro sports teams are the only customers I can see for that. The overwhelming majority of people (far in excess of 99%) are going to have two hurdles to procure your expensive services. First, they have to find and select you. Second, they have to convince their insurance provider to pay your rate, instead of the "going rate". That seems uphill, probably steeply so.

mjevans 1 day ago | link

Even if this does work, all it validates (measures) is the accuracy of guessing outcome. It does not (as mentioned via reference to global score) actually measure what the best odds for likelyhood are.

The system also does not account for new entry in to the surgical field, nor doctors changing in skill and accuracy over time (the data doesn't have aging parameters).

MarkMc 1 day ago | link

You can distinguish the two doctors very easily: for any given patient, the doctor with low success rates will give that patient lower odds than the doctor who only tackles harder cases.

MarkMc 1 day ago | link

This article reminds me of Bill Gate's emphasis on measuring outcomes - here's a quote from the Gates Foundation 2013 letter [1]:

-------------- Begin Gates Quote ---------------

Over the holidays I read The Most Powerful Idea in the World, a brilliant chronicle by William Rosen of the many innovations it took to harness steam power. Among the most important were a new way to measure the energy output of engines and a micrometer dubbed the "Lord Chancellor," able to gauge tiny distances.

Such measuring tools, Rosen writes, allowed inventors to see if their incremental design changes led to the improvements-higher-quality parts, better performance, and less coal consumption-needed to build better engines. Innovations in steam power demonstrate a larger lesson: Without feedback from precise measurement, Rosen writes, invention is "doomed to be rare and erratic." With it, invention becomes "commonplace."

Starting around 1805, the “Lord Chancellor” micrometer, according to author William Rosen, was “an Excalibur of measurement, slaying the dragon of imprecision,” for inventors in the Industrial Revolution. (© Science Museum, London) Of course, the work of our foundation is a world away from the making of steam engines. But in the past year I have been struck again and again by how important measurement is to improving the human condition. You can achieve amazing progress if you set a clear goal and find a measure that will drive progress toward that goal-in a feedback loop similar to the one Rosen describes. This may seem pretty basic, but it is amazing to me how often it is not done and how hard it is to get right.

-------------- End Gates Quote ---------------

Nobody questions the need for measurement in engineering, but when Mr Gates tried to apply the same logic to measuring teacher effectiveness [2] he received a lot of pushback from people who say his method is flawed [3,4] or simply that teaching effectiveness cannot be reliably measured.

This is a controversial topic. Here is an interesting take on this subject from a book called Teaching as Leadership [5]:

-------------- Begin Teaching as Leadership Quote -----------

As we see modeled by these teachers, the less tangible nature of such longer term dispositions, mindsets, and skills does not mean they cannot be tracked and, in some sense, measured. In fact, if these ideas are going to be infused into a big goal, you must have a way to know that you are making progress toward them.

Mekia Love, a nationally recognized reading teacher in Washington, D.C., sets individualized, quantifiable literacy goals for each of her students but also frames them in her broader vision of "creating lifelong readers." This is a trait she believes is a key to her students opportunities and fulfillment in life. In order for both Ms. Love and her students to track their progress toward creating lifelong readers, Ms. Love developed a system of specific and objective indicators (like students self-driven requests for books, students' own explanations of their interest in reading, the time students are engaged with a book.) By setting specific quantifiable targets for and monitoring each of those indicators, she was able to demonstrate progress and success on what would otherwise be a subjective notion.

Strong teachers -- because they know that transparency and tracking progress add focus and urgency to their and their students efforts -- find a way to make aims like self-esteem, writing skills, "love of reading," or "access to high-performing high schools" specific and objective. These teachers -- like Ms. Love, Mr. Delhagen, and Ms. Jones -- ask themselves what concrete indicators of resilience or independence or "love of learning" they want to see in their students by the end of the year and work them into their big goals.

In our experience, less effective teachers may sometimes assume that because a measurement system may be imperfect or difficult, then it must be wrong or impossible. As Jim Collins reminds us in his studies of effective for profit and nonprofit organizations:

"To throw our hands up and say, But we cannot measure performance in the social sectors the way you can in a business is simply lack of discipline. All indicators are flawed, whether qualitative or quantitative. Test scores are flawed, mammograms are flawed, crime data are flawed, customer service data are flawed, patient outcome data are flawed. What matters is not finding the perfect indicator, but settling upon a consistent and intelligent method of assessing your output results, and then tracking your trajectory with rigor."

-------------- End Teaching as Leadership Quote -----------

Lastly, on a personal note I have found that I simply cannot lose weight unless I keep track of the number of calories I eat. There is something about seeing that number that has a strong influence over my behaviour.

-------------- References --------------

[1] http://www.gatesfoundation.org/Who-We-Are/Resources-and-Medi...

[2] http://www.metproject.org/

[3] http://jaypgreene.com/2013/01/09/understanding-the-gates-fou...

[4] http://garyrubinstein.teachforus.org/2013/01/09/the-50-milli...

[5] http://www.amazon.com/Teaching-As-Leadership-Effective-Achie...

conorh 1 day ago | link

My wife is a highly specialized surgeon, she does one operation, and she does it around 600 times a year (Parathyroidectomy). An average endocrine surgeon might do 20 of these a year. She went through training as an endocrine surgeon and she tells me that the difference between the operation that they do at her center and what an average endocrine surgeon will do is like night and day. It is just not possible for surgeons with normal volumes to be able to achieve that level of skill. What helped them recently was the release of the medicare volume data [1]. This data is probably right now the only way to get an idea of how many operations of a particular type your surgeon does (not for all operations unfortunately, not unless you know a lot about billing codes and practices!).

[1] http://blog.parathyroid.com/parathyroid-surgery-medicare/

forrestthewoods 1 day ago | link

Why don't we? Because when it comes to health care we aren't rational. Emotions run and win the day. But not only do they win the day, they win the day in court for a lawsuit.

Ob-Gyns have some of the highest insurance rates among doctors. Possibly the highest. Why? Because they screw up it's a baby that dies. God help you if you think there's an ounce of rationality in a room with a dead baby.

Here's a similar article from earlier this year. The short version is that a man lost his wife to a mistake and... well that was it. The NHS (UK) doesn't do investigations as to what happened or why. That just means it will happen again. When a plane crashes there is a gigantic investigation and the results are shared. There are some famous cases where a series of basic mistakes needlessly lead to a crash and everyone dies. All pilots know about this so it doesn't happen anymore. Hospitals don't do that. Because it ends in pointing the blame finger. And then people lose their job, lose their license, and get the ever living shit sued out of them.

http://www.newstatesman.com/2014/05/how-mistakes-can-save-li...

rokhayakebe 1 day ago | link

The difference with the airline industry is that when there is a crash everyone dies. If my father is dead, and his doctor who made the mistake is alive, you can imagine she is not going to volunteer the information.

kijiki 1 day ago | link

This is only true if the cause is pilot error. If the cause is improper maintenance, the responsible party is still alive.

In the case of the NTSB, there is a strong culture of not penalizing errors unless they were criminal or egregious, which works well in promoting cooperation with the investigation. This would likely be very difficult to achieve with doctors, especially in the US.

forrestthewoods 1 day ago | link

You're right. That's the problem. Doctors make mistakes. Every doctor will be directly responsible for needlessly causing someone to die. Every single one.

The doctor should volunteer in the information because she should not fear losing her job or getting sued. Instead nothing will happen and more people will needlessly die.

ytturbed 1 day ago | link

Perhaps would-be surgeons ought to have their manual coordination assessed before they commence years of expensive training. The irony is that for my father's generation, in England, prowess on the rugby field was considered important in getting into medical school. (And that might actually have been a good thing. I suspect top athletes, musicians and surgeons all possess the same talent which would cease to be a 'talent' if only we could explain it.)

IndianAstronaut 1 day ago | link

Right now the main criteria for medical school admission is simply how much information you can cram into your head and regurgitate. Critical thinking, thinking outside the box, dexterity, compassion...irrelevant.

robert_tweed 1 day ago | link

The ideas presented here seem pretty sensible. However, there is a risk that if such data is collected, it may later be used to compare surgeons to each other. One of the reasons that is risky is that good surgeons tend to deliberately take on more complex and riskier operations than those less capable, and therefore have statistically worse outcomes. There can also be geographical biases if statistics are summarised, such as certain hospitals having poor outcomes because they happen to get a lot of gunshot wounds, or stabbings, or the local population is older than average, or childbirth rates are higher, etc. It would difficult to control for all possible such variables, and ever more difficult to explain the methodology to the public.

It's probably a universally good idea as long as there are safeguards to ensure that the data is not shared generally, which might lead to misinformed reactions and subsequently, target-chasing (see: Goodhart's law). Using the data purely for personal improvement seems like it should be effective. I think it's reasonable to assume that most surgeons would want to self-improve if they can. Also surgeons tend to be especially competitive, so creating a competition against themselves could be a good motivator in itself.

The data could perhaps be used to weed out poorly performing surgeons too. I believe the right way to do that would be to share the data anonymously with peers who are able to interpret it properly. These peers could then flag any worrying anomalies in the data that can't be explained away, which should trigger a follow-up investigation of the individual concerned. This could perhaps stop the next Harold Shipman, as well as weeding out incompetence.

Of course, it may not be possible to usefully anonymise case data since some of the more complex cases may be sufficiently unique that the surgeon could be personally identified from the description. This would only work if case can be classified broadly enough to avoid personal identification and narrowly enough to provide enough information for peer review.

arjie 1 day ago | link

As the case of Dr. Christopher Duntsch revealed, the limiting factor in these things isn't that other surgeons noticed an incredible lack of skill (because they did), but the nature of the problem requires a great deal of time before the board involved can collect sufficient evidence of incompetence to the degree of negligence. It looks like this sort of thing would still help, but you'd still need to determine that the surgeries themselves were done ineptly.

busyant 1 day ago | link

I heard this first hand from Judah Folkman (http://en.wikipedia.org/wiki/Judah_Folkman) at a seminar in 1999. He was discussing news reports that other labs were unable to reproduce the anti-cancer properties of certain proteins discovered by an MD in his lab.

He said (paraphrasing): Scientists become upset when other researchers cannot reproduce their results. Surgeons become upset when other surgeons can.

tokenadult 1 day ago | link

I like this paragraph of the article best (but there are a lot of other good paragraphs, building to an overall good whole, so I encourage you to read the whole article): "In Better, Atul Gawande argues that when we think of improving medicine, we always imagine making new advances, discovering the gene responsible for a disease, and so on — and forget that we can simply take what we already know how to do, and figure out how to do it better. In a word, iterate." That's exactly it. Medicine improves most dramatically simply by spreading the word about how to prevent and how to treat illnesses better to everyone who hasn't mastered that yet. That's the biggest single factor in steadily reducing death rates at all ages all over the developing world.[1]

The one time I had an immediate family member who needed treatment for a puzzling disease, my mom was still working as a surgical nurse in our state's main research university's teaching hospital. She knew who the best surgeon was, who the best surgical resident was, who the best anesthesiologist was, and who the best surgical nurses were. My relative was able to recover fully very soon after surgery that THAT surgeon thought had "nil" risk--he was confident of his abilities, with justification. Now that my mom is not in active practice of nursing anymore, I would like a better consumer-facing channel for information about which surgeons are the best in town. I would definitely ask the nurses I know who work at teaching hospitals if the question came up again for my family.

[1] An article in a series on Slate, "Why Are You Not Dead Yet? Life expectancy doubled in past 150 years. Here’s why"[3] Provides some of the background.

http://www.slate.com/articles/health_and_science/science_of_...

Life expectancy at age 40, at age 60, and at even higher ages is still rising throughout the developed countries of the world.[4]

http://www.nature.com/scientificamerican/journal/v307/n3/box...

healthenclave 1 day ago | link

I would like to give my 2 cents to the discussion, as a Medical Doc who was training to become an Orthopedic Surgeon.

One of the problems with Surgical Branches is that beyond a certain point (i.e: Beyond from knowing how to do a procedure) the act of performing a surgery essentially is an art form (Skill). I learned the same from one of the leading Orthopedic Surgeons in India. And that is one of the crucial most factors that differentiates Good surgeon from a Bad One.

Although skill can NOT be quantified but certainly in the case of Surgery we can quantify the results of the skills in terms of complications of surgery, recovery and patient satisfaction.

One solution to the problem would be :

(A) To have a feedback mechanism for doctors. Where they receive a score on their performance and can compare if to other surgeons performing similar procedures.

The surgeon would upload a video of all the types of procedures they do every 3 months. And just like NEJM a committee of people provide inputs and rating on the skills of the surgeon. This score in combination with the complication rate and patient feedback would go into making the overall score of the doctor. The doctor will be able to see where he stand in compare to their colleagues from across the country (possible the world). And also for newer (or BAD) surgeons this system would provide a way to learn from the best in the field and improve their skills.

(B) If you try to make such a score Public initially, it will receive a huge backslash from the doctors and the industry. But having an internal score keeping mechanism is much better than having no score / rating system.

(C) Some hospitals actually do have internal metrics where they track surgeon's performance. In terms of complication rate and other metrics -- but this data is RARELY available to the public.

(D) Unless some kind of law is passed at a Federal level in the US, I am not very optimistic about the situation improving.

alextgordon 1 day ago | link

I've been watching TCEC this week - the chess world championship for computers.

The joint #1 engine, Stockfish, has had a series of embarrassing losses to Gull, a much inferior engine. Everybody was convinced that Stockfish was surely broken because its level of play was so poor.

Then suddenly, in the last couple of days, it has turned itself around. After a series of wins it's only half a point off the top spot (if it draws the current game).

Here's the thing: nothing changed. It's the same code, the same algorithms, the same build. That bad run it had was just bad luck, there was never anything wrong with it.

An excess of information can absolutely be a force for ignorance. People will see patterns even when none exist. Often the only way to stop people from misinterpreting data is to not have any data at all.

If people can be so easily mislead by the fortunes of an unchangingly consistent algorithm, just think how destructive data on surgeons could be.

lucio 1 day ago | link

question: bad luck in chess? I do not get it. Maybe I'm simplifying, but the program which gets deeper in the movement tree, should always win.

shadowfox 1 day ago | link

> but the program which gets deeper in the movement tree, should always win

It doesn't always work like that. Usually, since examining, the whole move tree is very expensive, heuristics tend to kick in at various points providing assessments for positions (thus helping to prune the tree). It is quite possible to land in a position where your set of heuristics is not quite right.

peterfirefly 1 day ago | link

The decision tree is enormous. One of the ways one can cope with that is to sample it, i.e. use randomness.

http://en.wikipedia.org/wiki/Monte_Carlo_tree_search

grkvlt 22 hours ago | link

Aside - I'm very surprised that this comment got enough downvotes to gray it out. It's sad that someone asking an honest question (and I had the same thought occur to me as well) about a seemingly strange result loses karma for it, when this sort of debate is what should be being encouraged. Fortunately there are a couple of replies that do explain why randomness and luck are a factor in computer chess programming...

johnorourke 1 day ago | link

Developers could learn from this. I've a few grey hairs, many caused by late nights coding, and I now run a dev team. I wonder if we could really learn from this - for example, looking at the 3rd video in the youtube playlist from the article, looking at the criteria they judge each other on, let's see how transferable they are:

- minimal movement: using clear, concise code, or few commands, to solve a problem

- lack of repeated actions: exactly that. Why did I just look at that same log file 5 times?

- confident use of tools: do I have a set of tools (IDEs, editors, commands) I know intimately?

- awkwardness of actions: can I think several steps ahead in the problem and bring things into line to form the solution?

And so on. This is a raw, unrefined thought and I hope it gets thoroughly pulled apart in any replies.

codingdave 1 day ago | link

I think this would lose some practicality when put into practice. Clumsy, awkward code may still offer great business value/ Brilliant, precise code, when misapplied to its purpose, can still fail. Of course, everyone wants good code... but I do not think the direct correlation of quality to outcomes from surgery would exist in the coding world.

c0rtex 1 day ago | link

Maybe it's sufficient to ask: what would be useful for developers to keep score of in order to improve professionally?

Surgeons keep score by measuring patient outcomes in order to draw conclusions about their own performance, the effect of which is to "shrink the outliers" - people see where they can improve and they go ask their colleagues, "hey, how'd you do that?". So what would the equivalent of that be for a developer?

The best tool I can think of for this isn't an automated system for scorekeeping - it's soliciting feedback [1]. You ask a more seasoned dev who is familiar with your work where you can improve.

How do you "shrink the outliers" on a team of developers? Get people to work together. Take each other's code apart.

[1] Particularly negative feedback, according to Elon Musk at the end of this vid: http://www.ted.com/talks/elon_musk_the_mind_behind_tesla_spa...

ScottBurson 1 day ago | link

Just ask the developers whose code they'd rather have to maintain.

lumberjack 1 day ago | link

This doesn't solve the information asymmetry (and I don't really think you can solve the information asymmetry short of getting an MD yourself). What it does is abstract it behind tables of scores but at the end of the day, the patient still needs to trust that the system is logical and that the compiled data is accurate and then somehow bridge the generalized case of the compiled data to his own case which is probably the trickiest part for somebody without medical training.

Maybe you should just trust your GP on this one. Or if you think you cannot, maybe find a healthcare system that aligns the incentives of your GP with your well-being.

xorcist 1 day ago | link

A big problem with scoring systems is that it incentivises risk taking. If you enter one of the stock picking competitions, for example, you rational choice is to take extreme risks.

You probably won't pick a winning stock, but if you do you're likely to have a good chance at winning the competition. This is the opposite of what an investor wants to do with his own money, which is to manage the risk taking.

This is a common problem in designing scoring systems for measuring performance, even after the more ovious problems with natural variations.

learnstats2 1 day ago | link

That's true of stock picking competitions/tournament-style gambling, yes, but it wouldn't apply here.

If I'm a surgeon with a good score, I make bad judgements and they start coming off poorly, I won't have a good score any more. Am I motivated to take a bad risk? Not unless I have some other reason to.

xorcist 1 day ago | link

It's not directly applicable because it is a different problem domain, but the general reasoning holds. The problem is that the optimum strategy for maximizing your score is not to be as good a surgeon as possible.

(Which I stupidly say without having the slighest idea how this works beyond what is described in the article. But I have yet to see a scoring system which aligns where the optimum strategy aligns perfectly with the desired outcome, and this holds for everything from grades in school to karma systems.)

learnstats2 1 day ago | link

No: the general reasoning doesn't hold.

The reasoning for tournaments is that you have to be first at all cost, so you should consider taking the largest risk available to avoid coming second. That reasoning doesn't hold for surgeons: there's room for more than one surgeon.

There are problems with scoring systems, yes, but a complete lack of information for consumers is considered terrible in any other industry.

bawana 1 day ago | link

Measuring the quality of surgeons is like measuring the quality of policemen or firemen or lawyers or politicians. It just isn't done.

And even if it could be done - how are you going to enforce quality once you measure it? By suspending the bottom 20%? There is already a shortage of surgeons. And surgeons are paid poorly for their work and their investment in education. You cannot pay the better ones more-It is not a free market- prices are fixed by government decree. And physician reimbursement is ONLY tied to work volume (Relative Value Units billed) Frankly, I would be scared to be a patient in the US now; cannon fodder in war of the medical industrial complex versus the third party payers. Insurors would rather pay multimillion dollar salaries to executives and hospital CEOs instead of a quality assurance program. Incentives are aligned to maximize profit, which constantly and carefully measured. Shareholder return is the single most important metric of a publicly held company like Blue Cross, United Health Care, etc. Quality is a pass/fail grade based on outdated measures designed to be politically correct.

k2enemy 1 day ago | link

Here's a study of hospital report cards that shows a decrease in patient welfare from the increased transparency. As others in this thread have noted, the report cards led to doctors not wanting to take on risky patient.

http://www.kellogg.northwestern.edu/faculty/satterthwaite/re...

lostlogin 1 day ago | link

This article mentions this problem and addresses it towards the end of the article. The key part of addressing the problem is confidentiality. Interestingly know of another piece of software which is being developed with the aim of reducing inappropriate dosagages during medical care. It works a similar way and uses anonymous comparison as well.

patcheudor 1 day ago | link

Like anything in life the devil is in the details. What measures and metrics could possibly be used to determine the score? If you go by mortality rates alone you risk creating an environment where no one wants to operate on the most at risk patients. You'll quickly find that the best surgeons aren't necessarily the most surgically skilled, but instead those who do the best job of pre-screening surgical candidates. If this continues, soon you'll have patients whom doctors flat out refuse to cut open. If instead of mortality rates, you score the doctors on skill of movement in their procedures, again, you'll find doctors gaming the system by picking healthier patients with lower body fat percentages.

Fundamentally, if a system was in place to score surgeons a lot of checks and balances would need to be enacted to avoid lowering the quality of care by ensuring doctors, hospital staff, and administrators couldn't simply pick and choose what surgeon gets what patient. I really see this as a neat data science project after the fact, but if implemented could have a significant downside impact on patient outcomes for some.

Kliment 1 day ago | link

Well, the solution proposed in the article is to keep the scores hidden from everyone but the particular surgeon. That is, have the score be a motivator and guide for personal improvement rather than an external quality indicator. They report on trials where similar scores were published in NY and the end result was that the highest risk patients got shipped off to Ohio, exactly as you describe. So it's critical that the data is not shown to anyone but the surgeon in question.

patcheudor 1 day ago | link

The problem is that by keeping it hidden from everyone but the surgeon and by still allowing the surgeon to have a say in who they operate on, they can still game the system. There are no external controls at that point. Now if it's only for their benefit what possible reason would a surgeon have to game the score you might ask?

You can provide all the assurances in the world that the score is a personal motivator, but we all know that could change at the drop of the hat. I've seen this far too often in the security field. Someone will come up with a great measure, will reassure everyone that all unintended consequences have been considered and then boom. A year down the line an executive, unaware of the ramifications of unintended consequences will want to use the metric for performance evaluations. I think any surgeon who believes the score is only there to help them and that they shouldn't plan for a future where someone pushes to make it more public is being a bit naive in the ways of the world.

This isn't to say I don't think there could be a solution. The scoring system needs to be built from the ground up with an offset risk equation which provides significant incentive to operate on at risk patients. Maybe a surgeon gets three points for operating on the patient and if they die they only loose one. However if they have a healthy patient maybe they start off with one point, with just one to loose. Obviously all the unintended consequences for this model would need to be explored at length.

jerf 1 day ago | link

"There are no external controls at that point."

Well... there appear to be no external controls at this point, either...

baddox 1 day ago | link

Right. Even in the current situation, surely there is some incentive for surgeons to turn down high-risk patients, because of the inevitable stress and potential legal trouble from a failed operation.

wesleyy 1 day ago | link

If the score exists, the top end surgeons are going to show it to their patients, which will force the low end surgeons to do so as well. The only way to keep the score secret is to design regulations for it which I just don't see happening.

k__ 1 day ago | link

I agree, but it could be multi-dimensional.

How many survived and how critical were they.

So the score would reflect if the surgeon avoided critical patients.

visarga 1 day ago | link

Surgeons would optimize for the test criteria instead of focusing on the operation.

tallTrees 1 day ago | link

This is worth reading, to develop your software engineering skills.

http://www.amazon.com/Introduction-Personal-Software-Process...

ankit84 1 day ago | link

After watching the videos, my answer is YES.

If a bad programmer is 10x slower, experience with a bad surgeon make 10x more likely to die, have complications, undergo reoperation, and be readmitted after hospital discharge.

TazeTSchnitzel 1 day ago | link

Honestly, rating surgeons is probably much more difficult than rating programmers.

gd1 1 day ago | link

Not so sure about that, these surgeons are performing the same procedure again and again. And being compared to surgeons who also perform the same procedure.

We don't tend to write the exact same code repeatedly. Or the exact same code as other coders.

robert_tweed 1 day ago | link

I suppose it is just like writing exactly the same app over and over again. But in a different Lisp dialect every time. And several of the system libraries are missing, but you never know which ones until you try running the code.

baldfat 21 hours ago | link

“You can think of surgery as not really that different than golf.” ... The difference is that golfers keep score.

No Golfer don't think they are God.

known 1 day ago | link

http://blogs.law.harvard.edu/abinazir/2005/05/23/why-you-sho...

fiatjaf 1 day ago | link

Here's an interesting story of medical open data and highly improving medical procedures: http://www.newyorker.com/magazine/2004/12/06/the-bell-curve

Zigurd 1 day ago | link

Measuring surgeon performance directly is both very difficult and unlikely to do what you want: increase your chances of living through a surgery.

BUT, surgeons do not perform in a vacuum. There are a number of things you can measure and get usable information.

You can measure the number of surgeries of the kind you are getting performed by your surgeon vs. by practices specializing in such procedures. You will almost always find that the more a surgeon does a procedure, the better the outcomes.

You can measure re-admissions after surgery. You can measure infection rates. Etc. These will tell you the quality of the hospital where the surgery will be performed.

bayesianhorse 1 day ago | link

Surgeons, at least in first-world countries, and not only there, already operate on a skill level which is hard to measure at all.

In procedures where there is a high probability of success, many surgeons would need to collect data points for years to even reliably tell they are better or worse than a particular other doctor.

Very challenging, statistically...

kens 1 day ago | link

The whole point of the article is that surgeon ability can be reliably determined from looking at short videos, and this ability is correlated with how well the patient does.

For another interesting article on this, see http://well.blogs.nytimes.com/2013/10/31/a-vital-measure-you...

apetresc 1 day ago | link

And not only that, by ten-year-old daughters, too!

bronbron 1 day ago | link

Hm, this article reads like kind of a puff piece for Amplio.

> similar efforts to “grade” American schoolteachers, for instance, have perhaps generated more controversy than results.

Yes, for good reasons, namely...

> It’s all about trust.

No, it's not. At all. The author even notes the problem with scoring systems, that happened in this exact field. When you start scoring people, they start gaming the system to increase that score. It's the same problem with "grading teachers". You give surgeons huge incentives to start "fudging the truth" about their patients' surgical risk.

"Oh blah blah blah it's private". Great. Hopefully everyone involved can see the obvious future problems (which 7 comments in, other HN posters have zeroed in on), but they haven't given any assurances that these fears will never come to fruition. Or any prevention plans.

> It’s like Vickers said to me one night in early November, as we were discussing Amplio, “Having been in health research for twenty years, there’s always that great quote of Martin Luther King: The arc of history is long, but it bends towards justice.”

I actually laughed when I read this. How pretentious.