Hacker News new | past | comments | ask | show | jobs | submit login
Facebook Stored Hundreds of Millions of User Passwords in Plain Text for Years (krebsonsecurity.com)
1041 points by snaky 9 hours ago | hide | past | web | favorite | 382 comments





(Edit: reworded a bit to make it clear I don't think this is acceptable)

Sounds a lot like some service was logging the full body of a signup/login request, which then was readable for anyone with access to the logging/tracing infrastructure.

Dumb mistake but it's not hard to imagine this happening, considering that FB probably has a bunch of services involved in the login/signup flow to prevent bots/spam, abuse, etc.

Not to imply this is acceptable, especially at a IT company like FB with vast resources and know how. Raw passwords are an especially big screw-up. There are a lot of failures here, from actually logging something so sensitive over giving access to so many employees to not noticing this for years. (Assuming this was actually log data).

BUT if we are honest, anonymizing log data is rarely a priority. Even if it is, leaking sensitive data can happen easily in a lot of different points in the infrastructure. In actual application code, client + server exception tracing (just imagine a deserialization exception which contains part of the input) , web server, load balancer, proxies, service mesh... There is a lot of interesting stuff hiding in the logs at pretty much every company.

This is a good time to look in the mirror and audit your logging and tracing data. Unless you are in a highly regulated field like finance/healthcare or there is a strong company-wide culture for security/privacy with regular audits already, I can almost guarantee you will find at least one data point that should not be where it is.

Protecting sensitive data needs to be a big consideration for every dev, ops and especially management, which has to allocate enough time for security reviews and audits.


On a scale of Facebook with literally billions of users this is pretty much unexcusable.

This community usually scoffs at entities, which store passwords in plain text. Why do you give Facebook a pass?


Because there's two types of storing passwords in plain text. There's the "your password is stored in plaintext in the database" way which everyone agrees is 10 different kinds of stupid, and then there's the "we accidentally logged the body of all requests that went through this system, and it turns out login requests came through here" kind. One is a bad security decision (because you must decide how to store passwords in the DB), and the other is a very easy mistake to make which can go unnoticed for a long time (because you can be attempting to log things completely unrelated to logins).

Same same but different.


I've worked in healthcare / biotech for more than a decade and I can promise you that the FDA would see no difference between the two types of gaffes. As custodians of sensitive information it was our responsibility to ensure that said information didn't leak, period.

I don't know anything about FB's infrastructure, but when I was lead I would have viewed leaks in logs as far worse than something in the DB because our DB's were harder to gain access to.

I get what you're saying, but it's irrelevant. Easy to screw up, hard to screw up, doesn't matter; just don't screw up because the result is the same. This stuff is security 101. If you're logging requests then you need to ensure they don't contain sensitive info.


The problem is, you can't ensure that. You can do best effort filtering/scanning only. Let's say you have an API and someone writes a client with a memory corruption bug which results in password and some other field being sent swapped 1 in 10k times. Now you have password logged. You maybe can't even tell exactly where and it's close to impossible to detect automatically. "just don't screw up" is not possible in this case and the chance you'll learn about this soon is really small.

Or if you think a client bug is a stretch: If you're running a large enough website which logs usernames on failed logins, I bet you have at least one password or a concatenated usernamepassword in your logs. Just because someone wasn't paying attention to the text box focus.


Large orgs could have a central knowledge base of all places someone could type a password (as required by code reviews, and perhaps by automated detection of password-type inputs), and a bot which attempts to fill correct and incorrect passwords for a number of canary users, and scans across all logs for plaintext canary passwords. It's shocking that someone like Facebook, which has so many partner-facing and user-facing interfaces, wouldn't have developed something like this.

Free startup idea: build a SaaS for the bot phase of this, and make the log-scanner simple and open source so enterprise security teams can deploy it freely. Offer a certification process and consulting. Hire lobbyists (and folks with inroads into insurance companies) to paint plaintext passwords as the devil and make yourself a de facto legal requirement. And make sure your log-scanner doesn't do any logging of its own :)


You can never ensure anything completely in engineering. Perfection is never the goal, but you can do a much better job than FB has here.

You're talking in hypotheticals. Nowhere did I say "nothing ever goes wrong if you know what you're doing". What about the issue at hand? What's your opinion on passwords being logged on every request for years?


> This stuff is security 101. If you're logging requests then you need to ensure they don't contain sensitive info.

If what you meant by that is "best effort" then sure. The issue at hand is only obvious now that we're reading about it. It could easily be missed depending on how their infrastructure works.


>The issue at hand is only obvious now that we're reading about it

No, it's not. It's completely obvious to anyone who has ever worked on a system which deals with sensitive information. It can only "easily be missed" if you're not auditing your code and logs. This stuff is basic to anyone who works in industries where the data is a liability.


I don't think anyone's arguing that it's ok, just that it's less "easy to screw up" vs "hard to screw up" and more "did something really stupid on purpose" and "did something really stupid by accident."

But yes, either way the end result is the same and you shouldn't do it.


Yes but how many people looked at these logs, saw the passwords, and said nothing? If it was logged presumably the logs were viewed from time to time. The mistake is easy to make, but it's also easy to correct if there is a culture to report such mistakes.

The article indicates it was searchable and used for something by employees, so most of these comments to the effect of ‘they must have been doing this accidentally because Facebook is just so big therefore they are excused’ are invalid. Some group of people did this on purpose and knew it was happening.

You're probably being downvoted for speculation, but that sounds completely reasonable to me. At least one person had to have noticed a password in a log file they were viewing at some time. Most people viewing log files know that passwords should not be there. It would trigger alarm bells, and depending on the person, also excitement -- you have somebody's facebook password.

It's conceivable that someone then told their friend at work, and a few of these 2,000 developers knew of this secret internal stash of passwords they could access whenever they wanted to "prank" someone on facebook...


Why would humans be reading these log files?

For plenty of reasons, I'm sure. From the article:

> My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.


The logs being in a searchable index makes it more likely that the password storage was inadvertent, not less. It implies that the primary usage model for the logs was targeted queries, not people starting at the top of the logs and reading down in such a way that nothing could have been missed.

That's true. Kerbs article says:

> My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.


Those numbers are pretty meaningless without some concept of what queries they were making. When I was working with large logs I'd regularly pull huge amounts of logs to only look at maybe a handful of lines. Even then often only scanning for the things I'm specifically working on. Not saying this is what happened just saying those 2,000 employees almost definitely were not doing careful analysis on the complete result of their 9 million queries(And shouldn't have!).

'Only scanning for the things one is working on' to the point of missing huge security holes or safety risks is exactly that's wrong with the industrial division-of-labor approach. Workers start to assume that even if they do notice a problem, someone else is either going to fix it or has examined and OKed the current procedure. Workers who speak up and assert that something they noticed actually looks more important than whatever task they were originally assigned to are typically punished rather than rewarded, and fired if they make too much of a fuss because otherwise managerial authority might be diminished. Most firms run internally like a dictatorship, and given Facebook's absurd ownership structure (where there are voting and non-voting shares Zuckerberg retains a majority of the voting shares) it's a recipe for dysfunction.

This is true but not very interesting. Like literally every large tech company in the entire industry, Facebook has a specialized team of people that work on security --- it has one of the better versions of that team, for what it's worth. Just like ordinary Apple engineers, ordinary Google engineers, ordinary LinkedIn engineers, and ordinary Airbnb engineers, ordinary Facebook engineers don't have an end-to-end picture of how all of security at Facebook works.

There's no evidence that there was whistleblowing suppression about this issue anywhere at Facebook, is there?


Out of 2000 users, there was not one that had a look at username/password? Impressive. Also developers during dev. Seriously, this is not justifiable, it's pure laziness if we want to be positive.

What if one among the 2000 was curious ?

I understand that, but I believe they do not have the correct mentality here. I'm not certain how many of the people saying that have worked on systems as sensitive as FB. They think it's easier to screw this up because, for them, it is, but they don't operate at this level.

This sort of thing should not be any easier to screw up when you operate at the scale of FB. If it happens then you have the wrong procedures in place. In my personal example our logs and logging practices/code were audited in the same way our DB layer was. There was no difference; if a system touches sensitive data, it is a massive vulnerability.


This. If you have two or three engineers at a startup it's a very easy mistake to make. With thousands of engineers it's an impossible mistake to make, so it could have only been done purposely.

"could only have been done purposely". In this case there's also a chance it was negligence. Who gives a shit. Move fast. A negligent culture can probably do more damage than the odd individual doing something deliberately.

And even with a startup of two or three engineers, it's pretty much unforgivable.

It depends on how many users you have and what sort of data you're protecting. If you have a billion users then sending password reset tokens to your analytics provider is a very serious vulnerability. If you have 100 users and you're running a generic message board then it's not really a vulnerability because the difficulty of that method of attacking the assets under protection almost certainly exceeds their value.

> that method of attacking the assets under protection almost certainly exceeds their value

This was most likely the reasoning Sony had used (prior to 2012) when deciding how to safeguard the account info of users who had registered for marketing promotions.

It turned out that user credentials are unlike, say, office furniture, in that the cost of a data breach can vastly exceed the "value" of the protected "asset".


That's the best practice in the security world though. There's no such thing as an unhackable system, only systems where A) the cost of attacking them is more than the value of the assets under protection B) where the time it takes to attack the system is long enough for the attack to be detected.

C.f. Bruce Schneier's book: https://www.schneier.com/books/beyond_fear/


That book was from 2003.

This is Schneier in 2016: https://www.schneier.com/essays/archives/2016/03/data_is_a_t...

Data Is a Toxic Asset, So Why Not Throw It Out?

"All this makes data a toxic asset, and it continues to be toxic as long as it sits in a company's computers and networks. The data is vulnerable, and the company is vulnerable. It's vulnerable to hackers and governments. It's vulnerable to employee error. And when there's a toxic data spill, millions of people can be affected. The 2015 Anthem Health data breach affected 80 million people. The 2013 Target Corp. breach affected 110 million.

If data is toxic, why do organizations save it?

..."


This is exactly how Zuck was able to steal login information in his stories about founding facebook.

I find it hard to believe this is accidental.


This is borderline illegal, if not simply illegal. What if he entered into Paypal accounts or bank accounts? I don't want a world where if I enter the wrong password, I have companies pouring over my other accounts trying to breach them. Let alone if I reuse a password by accident.

Do you have a reference? I was wondering the same but have no reference.


Yep i remember this story. So clever and yet so dirty.

Wow! That does not look accidental at all!

I work fairly extensively in health care technology and have since ~2005 and am not aware of the particular regulation that would cause the "FDA" to react to a password storage security gap. Could you be more specific about the regulations you're referring to?

Unsafe logging of PHI would be explicitly problematic, but passwords aren't PHI.


I never said the FDA was specifically interested in passwords. The FDA was brought up in reference to my work with PHI. The FDA does care about how you secure access to sensitive data however. If you leak passwords which allow unauthorized access to PHI, you're going to have a problem.

The analogy was intended to focus on practices which should be employed by companies which handle sensitive data, not the specifics of what the FDA is looking for.


These aren't "leaked" passwords; they're logged in plaintext internally, which is not great, but is not the same thing as having leaked them.

"leaked" in the sense that they're somewhere they shouldn't be. I could have chosen a better word. if I were to store database passwords on a file share the entire company had access to we would consider that a "leak" even though they weren't exposed externally.

HIPAA/HITECH would not necessarily; the analysis would have to include compensating controls and who had access. The fact pattern in this Facebook case would probably not support a violation. Again: crappy stored passwords aren't PHI.


> I don't know anything about FB's infrastructure

This explains a lot of the comments here. Facebook’s scale is not like anything most engineers have worked on. Facebook probably has logs in the 100s of terabytes. Ensuring that sensitive data isn’t logged takes more than some occasional greps.


My point is that it's irrelevant. Have more data? Need more auditing. Any system which touches sensitive data is subject to security review. Yes, FB systems are massive. They should have massive oversight as well. You may as well be defending a nuclear meltdown because desiging nuclear power plants is hard.

>Ensuring that sensitive data isn’t logged takes more than some occasional greps.

Right; it requires investment into process, requirements, testing, and oversight. Most importantly, it requires a company wide, top down mentality that your customer's privacy and protection is more important than your margins.

If you can't (won't) dedicate the resources required to ensure your customers data is protected then you have no business operating at such a scale.


Everything is a tradeoff, how much cost that fb incure due to this incident, I would guess not so much, at least not big enough to warrant massive resources needed.

Well, that's the problem really; they obviously don't care.

According to the timeline of events in The Fine Article, this story only exists because someone cared. At many companies the story would end at "one diff reviewer noticed passwords getting logged in one diff". All of the numbers in this story come from an internal investigation to see where else they're making the same mistake, _so they stop doing that_. That's not what you do if you don't care.

If FB truly cared it would likely never have happened, let alone gone on for years. I'm not saying no employee at FB cares; I'm saying FB as an organization doesn't, and we have plenty of "I'm sorry, we'll do better" statements to back that up.

Exactly. Can every piece of data logged be tied back to a legitimate business purpose? What’s needed here is a mentality change: These logs should be thought of as liabilities rather than assets. You should log only what you need, while you need it, and then turn off the log when you’re done. If your mentality is “log everything, always, because maybe we’ll need it later” then these privacy and security trash fires should be expected.

Log everything: you don't care about privacy.

Treat logs as liabilities: why can't you solve the issue I experienced yesterday?

You can't win. Either you log more than you think you need right now or you can't do engineering investigations on past data. You're going to end up somewhere in the middle realistically.


This feels like a slightly differently framed version of the "Too big to fail" argument and does pretty much nothing for me.

Collecting data on such massive scales is literally FB's whole business.

But with that also comes a responsibility that shouldn't simply be waved away with "But they are so big, it's so difficult!"

Because when it's about monetizing their massive amounts of, often illegally collected, data then FB seem to have no issues having everything in order and getting stuff to work, regardless of how "difficult" it might be.

Probably has to do with the fact that there's no money in protecting users data properly and FB seems to be pretty much immune from negative PR having any bad consequences.


>takes more than some occasional greps

Of course at FB scale you'd automate this by creating a set of canary accounts with unique passwords that you perform a search for in the ETL pipeline, or some other handy place. This will at least catch inadvertent plaintext password logging.


This is how professional software engineering and infrastructure implementation is done, and I appreciate you pointing it out so forcefully.

Corollary: Most web development simply lack the rigor of engineering.

A lot of HNers probably have the job title "software engineer" without any of the qualifications of an "engineer"... that's probably why you're being down-voted.

But you're not wrong.


Most software engineers aren’t. Engineering requires licensing and exams through a governing body. Can’t have the prestige of the title without the responsibility. Software developers want to have their cake and eat it too (power and respect with no oversight and governance).

Pile on the regulation (GDPR for starts, more PII protection to follow up, extremely painful fines for failures).


In the US, you only need certifications and testing for a "professional engineer". Most electrical, chemical, mechanical, or civil engineers are not certified PEs. Yet I don't think anyone is claiming they aren't engineers.

I have a PhD in chemical engineering, but not a PE, since I didn't go the industry route. I wouldn't feel comfortable calling myself an Engineer without it, and that was the attitude expressed by my professors and cohort coming out of undergrad.

It's my opinion that the title "Engineer" should be on par with things like "Doctor" or "Lawyer". It should require certification and there should be consequences for claiming to be one if you aren't.

I don't think anyone would get in trouble for job titles like "computer doctor" or "network protocol lawyer". Practicing medicine or practicing law (or in some cases, practicing as an engineer) without the required qualifications, regardless of job title ... different story.


> Software developers want to have their cake and eat it too (power and respect with no oversight and governance).

This doesn't come from software developers.

This comes from the SV culture of "move fast and break things."


I painted too broad strokes. I agree, and admit my mistake.

Just replace "software developer" with "self-proclaimed software engineer" in your comment and you'd be spot-on.

Nothing about engineering really covers security best principles and practices.

I went through classical engineering school, and I'd summarize with three categories:

* Newtonian Physics & The Mathematics Behind It (i.e. Differential Equations)

* Ethics

* Technical Communications


That's because there is no software engineering program, and other engineering programs don't deal with "security" per se.

My alma mater offers a degree in Software Engineering.

> Engineering requires licensing and exams through a governing body.

Offering engineering services to the public requires licensure. Working at places like Boeing, General Motors, Texas Instruments, and Caterpillar does not (for the vast majority of positions); It just requires an accredited engineering degree.

Note that this is independent of the software industry needing more engineering rigor, which it desperately does.


> Most software engineers aren’t.

"Software Engineer" is just a job title.

> Engineering requires licensing and exams through a governing body.

"Software Engineering" doesn't, it's just a job title.

> Can’t have the prestige of the title without the responsibility.

There is no real-world prestige associated with having "Software Engineer" as a job title (as opposed to "Software Developer").

> Software developers want to have their cake and eat it too (power and respect with no oversight and governance).

"Power and respect" is not actually something that is commonly associated with a guy sitting at a desk for a salary.

> Pile on the regulation (GDPR for starts, more PII protection to follow up, extremely painful fine for failures).

You just learned that Facebook stored passwords in plaintext for years and nothing bad happened. In fact, it's hard to conceive that anything particularly disastrous could have happened. Yet, you call for more regulation that will drive up costs for everything. Whole industries could become unprofitable and those high-paid "Software Engineers" would become redundant.

I have the opposite opinion: The fact that computer security is generally poor teaches people how to properly use computers. They need to assume that there are no secrets, because that is the plain reality. No amount of regulation will make computers secure. No certification or best-practice encryption scheme will prevent users from writing their password on post-it notes, or from opening "sexy.jpg.exe", or from answering that call from the "Microsoft Service Center".


In Ontario the Professional Engineers (http://www.peo.on.ca/) crack down on unlicensed use of the "Engineer" word in job titles pretty aggressively.

The point is that the public should have confidence in engineers. Software developers who tout themselves as "software engineers" actively piggy-back upon, and undermine this confidence. It's malicious, and you should feel bad for defending the practice.

If you still don't see why, "Lawyer" and "Doctor" are "just job titles" but obviously that's a problem. I trust you can see that.


> In Ontario the Professional Engineers (http://www.peo.on.ca/) crack down on unlicensed use of the "Engineer" word in job titles pretty aggressively.

Yeah well, good for them I guess. I don't think anybody else cares.

> The point is that the public should have confidence in engineers.

Why? "Engineer" is a very broad term used in a variety of occupations for very different things. If you're looking at someone's credentials and you're so uninformed that you can't tell the difference between a licensed engineer in a particular profession and a guy that has "Engineer" written on their business card, you're not qualified to make a decision either way.

> If you still don't see why, "Lawyer" and "Doctor" are "just job titles" but obviously that's a problem. I trust you can see that.

I can see the argument for why a medical doctor or a lawyer should be afforded some amount of protection, because they are directly dealing with laymen. I don't buy the same argument for the word "Engineer", much less "Software Engineer". It just doesn't really mean anything. We already have degrees and certifications for when specifics matter.


> "Software Engineer" is just a job title.

And like the job title "sandwich engineer", it should probably be derided.

Apologies to anybody working in the honorable trade of sandwich crafting. I love sandwiches and appreciate your work. They bring me much more enjoyment than software.


I agree. The job title is literally _engine_er. Unless you oversee the operation of engines (as in locomotives), you are not an engineer.

But seriously, the definition of engineer in regular lexicon is someone who designs, builds, or maintains complex systems. I struggle to think how software does not tick all of those boxes. Sandwiches are a different story.

Perhaps you are confusing engineer with professional engineer? I can see how it might be easy to mix them up if you are not paying attention, but professional engineer carries a different meaning. Most software engineers are not professional engineers but are unquestionably engineers.


The word engine signifies a product of ingenuity. A siege engine is a clever device to attack fortifications. Engineers don't just have to be engaged with engines in the modern sense to do engineering. The PE bureaucracy is just gate-keeping by the guild masters.

It becomes a meaningless word if anyone can just call themselves one. There has to be some standard, some governing body, some degree of rigour applied here.

> It becomes a meaningless word if anyone can just call themselves one.

Only to the extent that all words are meaningless. However, we keep a reference known as a dictionary that helps maintain some consistency around meanings of words.

The Oxford Dictionary defines engineer as:

1. A person who designs, builds, or maintains engines, machines, or structures.

2. A person who controls an engine, especially on an aircraft or ship.

3. A skilful contriver or originator of something.

Is software an engine, machine, or structure? That is debatable, although I would suggest that it does meet the definition of machine. Under that suggestion, software engineer is a perfectly appropriate term. Designing, building, and maintaining is exactly what software engineers do.

> There has to be some standard, some governing body, some degree of rigour applied here.

But this is certainly not why software engineers are not engineers. Perhaps you are thinking of professional engineer, which has its own definition that more closely resembles what you are trying to get at here? Software engineers may not be professional engineers (although they can be).


A 'software engineer' need not even have a college degree, the only qualification for calling yourself a software engineer is finding an employer willing to give you that title. Even non-PE mechanical engineers at least have the bare minimum decency to get an undergraduate engineering degree before calling themselves engineers.

Of course, some people get software engineering degrees, or computer science degrees (that term is another can of worms entirely), but that is not the rule.


Professional engineers do engineering.

Title only engineers work on complex systems.

This distinction has largely been lost and most "Software Engineers" think they are doing PE Engineering when they are not.


Do you think that if someone calls themselves a sandwich engineer people will begin to believe that designing cars is like makings subs? Do you really think the public can't tell the difference?

Same with electrical, chemical, mechanical, and civil engineers since almost none of them are PEs?

Seems a bit silly.


I couldn't agree more.

What I'm hearing is that, because they collected it and stored the passwords as part of their reckless attempt to log every action that every human makes online in order to manipulate them, its more excusable than if it was in a specific and protected user credential database?

Its only an easy mistake when you're vacuuming up everything people do.


Data anonymization and de-identification is nothing new. Especially when it comes to logging. I don't get why a lot of you are downplaying it in this thread. This is just as bad as plain text passwords.

The diffirence is intent and/or gross negligence. Storing passwords in plain text in a database requires a gross lack of security mentality in both design stages and implementation changes. It is also drilled into people's brains constantly about how bad of an idea it is. To put it simply, it cannot happen accidentally.

Logging like this can easily be attributed to an accident. The person who implemented this logging should get hit with some repercussions because he surely tested the logging and must have seen the passwords when glancing by eye. But other than that, this was clearly a minor oversight.


Yes. Consider this not-unlikely scenario: The people who implemented the logging did it as a feature of a generic API proxy (there’s no way Facebook implements logging separately for each of their bazillion services!), and no doubt put in a provison for masking sensitive data. They tested it and it worked fine.

Then some devs miles and years away didn’t use that feature properly and accidentally failed to not log passwords in an incoming request. They may not even have been looking at those request logs because that’s not the request they were testing.

Then that feature went into production and this oversight was magnified millions of times.

At large scale you don’t just tail the production log firehose and look for stuff. You have to search for specifics to find anything st all. So if nobody was debugging this thing in production it’s quite plausible nobody saw the passwords in the log.

One way to catch this sort of thing is sentinel data — in this case, have a unique value for a test account’s password and test every service with it, then search everywhere you can think of for that value.


So here is the thing: It was presumably relatively easy for you to come up with that scenario, which you called "not-unlikely". Then what you do is you put that scenario into your risk analysis when you're designing the authentication architecture, and figure out mitigations to make sure that particular mistake becomes (very) unlikely.

The notion that "it could easily happen" that is being brought up throughout this thread should really only suggests that people aren't doing even rudimentary security assessments (or, hopefully, they're not working with security sensitive software).

If you can't solve it technically, you solve it through processes and training. Same goes for any other industry -- if a construction worker said that it's just one bad morning away from dropping a two tonne girder on a playground, we would never accept that. Or a pilot crashing an airliner into the waiting hall when they're supposed to land. Somehow it seems that large parts of the software industry simply hasn't reached the level of maturity we expect from pretty much all other industries.

Facebook is an enormous company. They should be able to have entire departments working on these topics. It's not a one-person hobby project we're talking about.


I'm sure they did figure out mitigations. They failed. Things fail. Two airliners just failed rather spectacularly, and that's the very industry you're benchmarking against.

>Somehow it seems that large parts of the software industry simply hasn't reached the level of maturity we expect from pretty much all other industries.

True, but that's a rather broad brush — in terms of actual risk of damages there is nowhere near an equivalence between "airliner crashing into waiting hall" and "logging some plaintext passwords".

Of course the culture, priorities, and domain are also very different between social network engineering and airliner engineering, which is by the way one reason Facebook could grow from nothing to mind-bogglingly gigantic in a decade, while it takes a decade to get just one new airliner into production.


The point I was making by comparing to a pilot, which I realise I could have expressed a lot more clearly, is that it is perfectly possible to mitigate risks through proper training and procedures even if it's not possible technically. (I.e. all it takes for a plane to crash is to turn the flight controls a few centimetres in the wrong way at the wrong time, yet it almost never happens.)

Of course things fail and people screw up. What I don't agree with are arguments along the lines of this just being a slight oversight, and that those can easily happen. It should require serious failure on multiple levels for anything like this to happen at that scale, if they are implementing things properly, not minor oversight.


Exactly — my scenario was an example of how failures at multiple levels could have caused this to happen. My "not-unlikely" is meant retroactively — now that it's happened, what's a not-unlikely explanation for how it was allowed to happen in a company the size of Facebook?

I didn't intend to imply it was a "slight oversight" — it's clearly a significant oversight — but there are people saying it's obviously gross negligence because how could this ever happen in a company that wasn't completely incompetent, etc. No, terrible accidents can and do happen even in companies that are trying hard to do a good job. Just like when a 737 crashes, you shouldn't assume Boeing is totally incompetent, but rather that several things must have gone wrong at once.


> The notion that "it could easily happen" that is being brought up throughout this thread should really only suggests that people aren't doing even rudimentary security assessments

Precisely.


How is this in any way better or more excusable? Calling it a 'minor oversight' could also apply to the DB being in plaintext.

Storing passwords in a plaintext DB heppes in the same manner-a dev is lazy or ignorant of the security reprocussions. Which is what happened here; barring evidence this was done maliciously we can assume that this was accidental. But that doesn't make it any more excusable, since it should be clear that logs can also contain sensitive data that needs to be protected/anonymized.

Considering how selective FB is for hiring, I would hope we could expect a higher standard.


Not to mention 200-600 million users had their passwords exposed for many years. That must be a massive trove of log files.

I see this differently. If someone stores passwords in plaintext in the database, they’re idiots that don’t know any better.

If someone logs your password in plaintext but has it encrypted in the database that’s grossly negligent.


You assume that storing passwords in plaintext is intent as opposed to gross negligence. In many past instances it’s been intentional and gross negligence because the people making all the design and implementation choices were not knowledgeable about best practices.

Because we all know we are one bad morning away from doing it ourselves. People are distinguishing between extreme incompetence (storing plaintext passwords in databases) vs people trying to do their jobs.

It's the same reason when there are large outages, where the comments are split between the enraged customers, and then the ones that are "Man, sucks to be them" as know they could easily have been the one that got the config push wrong.


The fact that sensitive data like this is just one config push away from being exposed means it's a brittle design (not just the code, but the entire design /review / deployment process - this is a platform with 1 billion users, not your 1-dev WordPress webpage). That said, this is further aggravated by the simple fact that for whatever reason, 20k of their employees had access to these logs.

All in all it just paints a picture of an irresponsible company where their house is not in order.


Exactly. It's one thing not to have enough controls in place to catch something like this on the forum for your warcraft guild. It's another thing entirely when you operate at the scale of Facebook.

"people are distinguishing between extreme vs people trying to do their jobs"

Those things are in no way exclusive. Storing passwords in plaintext is awful practise. Including the contents of form submissions in logging is a similarly awful practise. I really don't see what the difference is.


I like logging and at our company we blanket log all GET parameters on every request, similarly I think it's a Good Idea™ to never log POST parameters, but there are some pages where we've started logging some whitelisted ones because they can help diagnose some errors we run across, but, in an environment where all data is coming into the server in JSON blobs attached in request bodies then it can be a bit harder to make any sort of blanket whitelist.

Earlier in the thread:

>Because there's two types of storing passwords in plain text. There's the "your password is stored in plaintext in the database" way which everyone agrees is 10 different kinds of stupid, and then there's the "we accidentally logged the body of all requests that went through this system, and it turns out login requests came through here" kind.

Basically the conscious decision to store passwords in plain text is worse than unintentionally doing so in logs. One is purposeful, one is not. Yes it's bad, but it's not as bad as if it was done on purpose. And generally, this how most laws are enforced or implemented.


Can't both be attributed to ignorance? You could store passwords in plaintext in a database because you didn't know better. You can store plaintext passwords in logs because you don't know better. If I store passwords in plaintext am I being purposeful or am I being ignorant of best practises?

I think people are drawing a distinction where one doesn't really exist.


>You can store plaintext passwords in logs because you don't know better.

This whole conversation is about how it likely was not done on purpose, and that it was the result of logging HTTP requests and responses.

If they wrote logger.info(password), yea I'd say that's as outrageous.


I'm working with Rails and Phoenix on two different projects right now and all of them filter out passwords in the log files [1] [2].

I'm also working on a Django project but we're not logging HTTP calls arguments there. I think we could use a filter like [3] but I'd rather have the framework to automatically take care of that.

And if I'd be writing a web app from scratch, no matter if I've been doing web for 25 years, I'm sure I'll do a lot of silly mistakes. That's why I prefer to build on top of frameworks.

[1] https://guides.rubyonrails.org/configuring.html#rails-genera...

[2] https://hexdocs.pm/phoenix/Phoenix.Logger.html

[3] https://djangosnippets.org/snippets/2966/


If a single developer is able to push something like this to production without scrutiny in "one bad morning" on a platform with billions of users, the company itself is the problem.

You're right, the impact is just as bad. I think people in this thread (including myself) are downplaying it because it's a very easy mistake to make, rather than a terrible design decision.

The major difference is that we don't have standard patterns for protecting against one attack, vs the other.

With password storage everyone knows the patterns, or we expect them to.

With everything between the request and password storage, we don't.

This type of attack could easily be prevented. When secrets come in, immediately store them (ideally at the web framework level) in a type that overrides print/debug formatting. Then add a "get_raw" to it, and you can now grep for that being used anywhere outside of storage to a DB (and your DB libs should take the Secret type too). Or don't use a `get_raw` and instead use a `hash` method that returns a safely hashed version of it.

Further, your secret type could at the very least add a round of SHA256, maybe even with a pepper?, just to be sure.

This isn't hard, I've done it before. The problem is that it isn't something that people feel embarrassed not to do, vs storing plaintext creds.

Impact is the same - creds are plaintext in a DB. Attackers always expect sensitive data in logs, so it isn't as if you'd get lucky and they'd miss this.


> When secrets come in, immediately store them (ideally at the web framework level) ...

But, in practice, a lot of this stuff is being logged by things like proxies that could be several layers in front of the "web framework level".


Do proxies usually log POST data? Are these proxies terminating TLS?

Either way, yeah, you're right - it is not a perfect solution. But it means that for any system your engineers build, so long as they build them using your web framework that imposes this password type, you can grep your codebase for bugs.

You could implement client side hashing as a best-effort "in transit" mechanism but that has obvious downsides. Not sure how I'd feel about that approach in practice, but I can't see a big downside.


A mistake that should have been found very quickly, given the eyes. Not for years without ever apparently being known (I find that hard to believe)

When you're FB, Google, ADP, a bank, whatever, you have failed if this sort of thing is "an easy mistake to make." Responsibility should never fall on a single dev or team to begin with. This was going on for _years_.

It's either ametuer hour over there or the organization simply doesn't care enough to invest heavily in the protection of their users.


This happened to Apple on the desktop, CVE-2014-1317 - In that case, they were logging the body of failed API requests (hex encoded) to a file in /var/log. It turned out that the login request occasionally hit an error, logging your AppleID password.

It was easy to overlook, since as you said, it was intended to log something else (failed API requests), it only happened in the case of an error, and it was hex encoded. I only happened to stumble across it out of curiosity.


The Apple ID team at Apple is one of the worst.

Interesting. Care to elaborate?

Anecdotally, Apple ID auth on iOS/macOS has been a mess for me. Changing my password would result in multiple login prompts per device. Sometimes inputting the new password works, sometimes another prompt comes up after a few minutes.

Also, though the flexibility of being able to use a different ID for iCloud, home sharing, iTunes, App Store, Messages, etc. is neat, it's pretty annoying to need to set it in each of those places. (And TBH it seems like being able to share purchases/access/etc between IDs is more useful than being able to have separate ones, yet the iCloud family stuff took a while to arrive.)

The issues I've seen aren't as bad now as they used to be, but they haven't left a good impression.


> and the other is a very easy mistake to make which can go unnoticed for a long time (because you can be attempting to log things completely unrelated to logins).

This is not an excuse. If you're logging all request data, you need to strip or encrypt sensitive information in that request data. Handling Persistence of sensitive data is web development 101. Just because it's not in a database doesn't give you a pass to leave it unprotected.

This level of incompetency is unacceptable.


Different causes, identical security implications for the user's passwords that were improperly stored.

Both are security fails, regardless of the cause of the fail.

And a company the size of, and with the resources of, facebook, most certainly should not get a pass for the "oops, we logged more than we should have" cause. A corp. of their size, and with their resources, should be doing password handling correctly, every single time, no exceptions, no excuses.


The point perhaps is that they’re both terrible security designs and the impact to the user is the same regardless of how the “accident” happened. Nobody in FB’s position gets a pass for being irresponsible and negligent.

I disagree that they are both terrible security designs. We expect every company to not use plaintext for auth. We do not expect every company to have infra/ops setup to prevent logging on login requests.

By extending this logic, a car manufacturer should be blamed for not designing proper brakes for their car. But if a worker then accidentally installs the breaks wrong they are not responsible?

Imho, a company (especially as big as facebook) should have the right process and procedures to prevent these kind of problems and ensure developers have proper training to make them aware of the consequences of their actions.


Interesting analogy and I see your point. If a car manufacturer's process made it easy to install the brakes wrong they would be held responsible (probably with a recall or damages for lives lost due to faulty brakes).

I guess part of this is that passwords aren't considered that important to many people :(


Who is "we" in this scenario? Governing bodies do. Engineers I've worked with in health care do, as do our PM's, security officers, etc.

I designed a clinical testing platform a couple of years ago. Our initial requirements stated very clearly that PHI and PII were not to appear in logs. This is basic stuff for anyone who actually works at this scale / level of sensitivity.


>We do not expect every company to have infra/ops setup to prevent logging on login requests.

What? I absolutely expect every company to not log my password in plain text. In my 15 years as a developer across several companies and industries, I have never seen anybody log passwords, or advocate for logging passwords.

I'm struggling to think why any employee of any company should be able to view a plain text password in any form. Why would there not be an expectation here?


You're taking about expectation not to log the password. That's fine.

The parent was taking about expectation about infrastructure that validates this. This is both very uncommon and impossible to do 100% correctly. You can scan logs for a prefix (password=), you can do entropy counting, you can try to decode hex values in text. But if you find a base64 encoded hex string representing "foobar" - how do you even know it's a password?

Short of trying all possible decodings of all possible substrings against your full password database, this is an impossible task. (You can do best-effort things though)


Ah ok thanks, I misunderstood.

That’s certifiably depressing.

It might be depressing, but I think it's true. What do you think?

It is not true in my experience. It may seem true to the typical CRUD app web dev, but do they really know what it's like to be a custodian of so much sensitive information?

I would expect FB to have that, though.

there's two types of storing passwords in plain text

nope.

there are n types of storing passwords in plain text (including storing them in memory).

you just named 2.

all n types fall under infosec responsibility to prevent, and none are acceptable. there is no pass because logs were less intentional than storing in a table


I think they are indeed the same outcome but I'd count plain-text DB passwords as grossly negligent while as this was merely negligent. Clearly both are terrible and deserve a shaming, but not hashing passwords going into a DB is a decision that someone pretty much necessarily had to have the full scope of and made a bad decision on. This logging issue could potentially be attributed to a mistake in team communication (i.e. the people we hired to auto-log requests didn't realize that `/login` should be blacklisted for logging and for some reason we never explicitly told them to do so).

So I agree that these actions are indeed same but different.

They ultimately speak to a failure at Facebook, but it doesn't speak to utter incompetence at Facebook.


Yeah, there's so many ways this could happen. The one that came to mind for me was an engineer who thought passwords were being filtered by the time it got to their part of the stack, but somehow didn't notice they weren't because the payload was minified or because they tested in a test environment where they knew there wasn't the filtering but didn't look at prod (where there was also not the filtering).

So many ways this could have gone down, so easy to do.


>the other is a very easy mistake to make which can go unnoticed for a long time

Sure, then the question for anyone at facebook would be from the Bobs: "What exactly is it, you'd say, you do around here??

FB claims to have "top minds" in essentially every discipline...

Heck, they are IMO a revolving door to the .gov/nsa/infosec community...

So... I call BS on your statement as "whoops! Easy mistake!"

WTF is it that you'd say you do around here, Mr. FB-Security-Guy??


The second is way worse. You don’t need access to the database for it. Most systems make it difficult to log a plaintext password; it should be filtered in your logs. This is day 1 shit.

>Same same but different.

They're both issues with which any moderately competent engineer would be very familiar. Heck I've seen commercial contracts that specified that audits be done to ensure no password and PII leaks into log files. Everywhere I've worked in the past 20 years that logged stuff conducted audits on a regular basis to check that sensitive data wasn't being logged or in inadvertently stored in a database.

The grey area might be for example when some data gets logged as a blob of HEX that it turns out has a password in it if you run the right decoder function on it. I've seen cases like that though show up in audits -- you see a blob of HEX or MIME and go find out what might be in it.

Anyway, it really isn't accurate to say that this is a kind of <slaps forehead> <doh, why didn't we think of that> thing. It gets thought about all the time.


Maybe the HN crowd needs to re-think login security. About 25 years ago challenge-response was a HUGE thing in login security. As https gained momentum and traction that went away. But the nice thing about challenge-response (at least for Javascript enabled clients) is that the password is one-way hashed before sending to the server.

I wonder if it's time to up our game again and go back to that model.


I was just thinking the same thing. There is no reason the password itself even has to be sent over the wire.

The logging issue is something we've become aware of more recently, but it's just as bad. There is no intrinsic difference between the statement "don't store passwords in plain text" and "don't log PII or passwords in plain text."

A company that has lower acceptance rate than Harvard is "accidentally logging passwords"? Yeah, sure :D

All the more reason to make the client side send HMAC(HMAC(username + password) + Unix Epoch rounded to last 5 min block)) over the wire in its POST to the auth endpoint.

All the transport encryption and DB encryption/hashing/salting won't protect you from this kind of logging mistake, but the above would.

P.S. There are ways to make the above even better by adding a nonce that has to be requested from the server before POST etc.


If they were following best practices, the server would never have access to plaintext passwords. The client / frontend would hash the password contents and send that across the wire.

That's actually not a best practice; on the contrary it's extremely uncommon. Not because it's bad, but because it just doesn't actually add a meaningful security improvement. On the other hand, it does add non-negligible complexity to your authentication system. In particular, it would have done absolutely nothing to prevent this specific vulnerability.

If you hash your users' passwords using a key-derivation algorithm on the client-side, each user's password simply becomes the original password's digest. From the server's perspective nothing has changed. Moreover the server will need to re-hash the password digest sent over the wire, because if the server is compromised the password digests can be directly replayed to the server to compromise corresponding accounts.

Additionally, since the shared secret between the client and server is the user's password digest, the password needs to be hashed using the same salt every time the user authenticates. So each user's password digest is still a de facto unique password which will be sent over the wire anyway. This scheme basically retrofits the server's job onto the client with added complexity. It's like the (slightly) faster horses version of password authentication, when we could really be experimenting with developing cars (like two/multi-factor authentication, more robust server-side controls and provable correctness).

That's not to say the scheme has no benefits whatsoever. It does mean that user passwords will be more complex, because the actual token stored by the server (the de facto password) is a digest. But there are two drawbacks - with enough users you'll still see many duplicated digests in your password database, even if you randomly generate salts on the client side. More importantly, you're offloading hashing to the client side in JavaScript. JavaScript can be very fast in 2019, but companies like Facebook and Google still maintain very low latency, substantially stripped-down versions of their websites[1] because client-side hashing isn't going to be nearly as fast as server-side hashing for a huge number of people. There's also a sizable population of people who don't even have JavaScript enabled, or who might have incompatible browsers.

tl;dr - Client-side hashing is not a best practice (and not widely deployed) because it comes with a nontrivial complexity increase, lower client compatibility and negligible security benefits. It also would not have prevented this vulnerability.

_______________________

1. For example, mbasic.facebook.com.


Your criticism (outside of complexity) of the suggestion may be unfounded, consider:

  Client: asks server for nonce
  Server: sends nonce
  ---- OR ----
  Nonce arrives with login page

  Client: sends HMAC(nonce + HMAC(username + password + appname) + Unix Epoch rounded to last 5 min block))

  Server: 
  1. gets response
  2. using username as key pulls HMAC(username + password + appname) from DB
  3. Computes HMAC(last nonce sent to username + DB HMAC + Unix Epoch rounded to last 5 min block)) and compares to user token
  4. last nonce is cleared
This algorithm would have prevented the attack (only the client computed HMAC would be in the logs) and is not subject to replay.

To be fair what you're describing is a PAKE, which is substantially different from "merely" moving the key-derivation functionality of password hashing from the server to the client. They're categorically different things. But you're right - if you're going down the rabbit hole of client-side hashing, you might as well implement a PAKE instead.

This kind of gets to the heart of what I was referring to when I said client-side hashes are like faster horses rather than cars. If you're spending this much effort, a superior protocol is better than an unorthodox, modified one. SRP is a PAKE which basically takes your proposal and moves it into a different layer of abstraction (TLS), and OPAQUE makes improvements upon it which allow you to use elliptic curves[1]. There are other reasons not to use PAKEs, but they're a much more coherent and defensible suggestion than just bolting the key derivation system onto the client rather than the server.

______________________

1. https://blog.cryptographyengineering.com/2018/10/19/lets-tal...


This is the same as:

    Server sends nonce
    Client sends HMAC(nonce + password + time)
Your inner HMAC becomes the new password which now is stored in plaintext in the DB. You just call it something else.

There are better ways to implement this idea, like SRP/PAKE https://en.m.wikipedia.org/wiki/Secure_Remote_Password_proto...


That’s hardly a best practice and if followed consistently just means that the hash is your actual password, which means if someone steals the hash from a log file they can still impersonate you.

Unless you actually mean some sort of challenge response scheme which is rather uncommon to see, e.g. http “digest” authentication or SRP.


The big problem of a service having your true password (instead of the hash of the password) is that many users use the same password for a multitude of services (read Gmail, Hotmail, Yahoo, Amazon, ...)

So it's bad practice to keep or transfer the users cleartext password. It should never leave her browser/client. Period.


If the service has only the hash of the password, then that's the password, which would then be subject to the same problems as a "cleartext password".

Password managers and service specific 2FA solve that problem quite nicely for now (edit: although yes most users aren’t willing to do that).

The irony of this post is that most people would probably beef up the security of their DB, far beyond whatever layer of security they prop up for their logs.

Just because it's an easier oversight to make doesn't mean FB should be cut any more slack over it.

This isn't a case of a company intentionally storing passwords in plaintext in their database. That's a big security no-no, because password hashing is such a basic and fundamental part password-based authentication that it's nearly impossible to miss if anyone involved in maintaining that system is remotely security-conscious.

This situation, on the other hand, seems to be more of an unintentional capture of passwords by their logging system. That's much harder to notice and more akin to a bug or security vulnerability, which is something pretty much every system suffers from sooner or later.

In such cases I'm more concerned with the company's response to the vulnerability than the fact that it exists at all. Here it seems like Facebook noticed the problem and proactively fixed it before it was exploited, which to me is a good sign.


Impossible is just a couple of mistakes away. Paper holds whole worlds.

Agree this is pretty much inexcusable.

Logging request or response payloads without an explicit whitelist should raise flags for any developer. There are very few cases where you can assert that not only in the present but also for all future use cases of a system, the entirety of a payload will not contain sensitive user data.

Only a whitelist will suffice to maintain good security. It's common for developers to attach sensitive data for debugging and other use cases under arbitrary paths.

Systems can improve further by adding patterns and other heuristics to drop values from the whitelist that look like sensitive data.


I would argue that an organization the scale of Facebook is the most likely place for this mistake to happen.

Log in and sign up pages -- especially on mobile web -- are being constantly iterated on by "growth" and "emerging markets" teams whose first priority is getting graphs to go up and to the right, not making sure the pages are secure. Their entire mission is to make things work with yesterday's technology (feature phones, old Android tablets, and so forth).

On the other hand, the dedicated security folks are focusing on ensuring there are multiple factors protecting high-profile targets, so they're working on SMS as a second factor, Login Codes, Security Keys, and primarily focused on Desktop Web, iOS, and Android. Or they're doing fuzzing, or building robots to chase down engineers for buffer overflows and the like.


Dude, you'd be surprised how much data your average company leaks in logging.

Ask any HIPPA compliant company how they scrub their logs from errors that include medical data and you'll truly see how bad things are.


This is, supposedly, one of the top 4 global tech companies. Not some regional med-tech enterprise shop with a fleet of ageing WindowsNT servers. It is literally inexcusable.

It happens at fortune 5 medical companies. When you sign up to Facebook you know you’re signing up to Facebook. When you go to the doctors office you are probably completely unaware that your PHI is being shared with a couple dozen other companies often intentionally and sometimes unintentionally. If you are a human in the United States there is a high probability your PHI is sitting in a log file on a server owned by some company you’ve never even heard of.

Fwiw none of this is necessarily a breach of hipaa laws.


Wait till you get into companies that allow doctors to have other doctors review your charts... your data gets silently shipped to a third party company who shares it with doctors... and in most cases you don't even get told it happens.

OT: how much of a pain is it under HIPPA to deal with medical data provided by the patients themselves through channels that aren't supposed to be used for such data?

I've never dealt with medical data, but do deal with credit card data so have to deal with PCI. With credit card data we run into customers who send emails, or use online chat, or leave voice mails telling us they have a new card and giving the number. We never ask for them to do this, and indeed everything we tell them says never to send us a credit card by such means, but they do it anyway.

That brings all those systems into scope for PCI, which is a pain in the ass.

We use helpdesk and support chat software licensed from a third party. We had to write scripts that understand its DB schema and data formats and can find and remove credit card information, and keep them up to date as the helpdesk and chat software is updated.

A few years ago, the vendor tried to discontinue the stand-alone version and move its users to their cloud service version, but had to drop that plan when they found out that a lot of their customers were doing the same thing we were, and absolutely could not move to a cloud service unless that cloud service handled PCI issues. Apparently they didn't want to deal with making their cloud service handle that, and there had been no talk since then of dropping the stand alone version.

Do people dealing with HIPPA run into similar issues?


Yes, this is why people build in HIPPA compliant clouds like AWS... but I left that business years ago. You should always audit your third part services to see what data is sent in debug logs.

Just as alarming for me is that Facebook engineers don't seem to understand risk management:

"In this situation what we’ve found is these passwords were inadvertently logged but that there was no actual risk that’s come from this. We want to make sure we’re reserving those steps and only force a password change in cases where there’s definitely been signs of abuse."

Inadvertently logging passwords is a risk. If those logs were accessed then that's a bigger risk. Signs of abuse is an issue. There is no such thing as an "actual risk", there are just probabilities (and possible consequences). Once a consequence happens, it is no longer a risk -- then it's actual.


Because storing plain-text passwords in DB is a sign of pure incompetence, while accidentally logging a password is a huge screw up, but unintentional one - and frankly a type of mistake that is not unimaginable to happen to any of us. You need to log a full requests to urgently debug some issue, and in all this mess you forget to remove the logger afterwards... and boom, you've got yourself a log full of passwords, credit cards, and all other kinds of secrets. In real world it's just like that, humans make errors, that's why your logs should always be heavily protected and regularly audited.

I had this kind of issue, a sloppy logging tool, when I had <100 customers. It was fixed before we got to 200.

Early FB (c2005) could get the excuse, new team, just getting started. But, no excuse after the first round of funding


We scoff at companies that intentionally store the passwords in plain text, then compare that stored value with the credentials provided to authenticate a session. This is a common mistake for novice engineers. We are dismayed by but I wouldn't quite say scoff at accidental logging of secure data, which is a much more common accident and perpetrated by otherwise educated and advanced engineers every day.

Yeah I agree this is pretty much inexcusable. I'm not sure why the pass.

I don’t think “pretty much”, it’s just inexcusable.

the_duke is not "the community"

so when you say, "the community usually scoffs at entities which store passwords in plaint text. Why do you give Facebook a pass" is a fallacy of composition.


People scoff at companies that store passwords in plain text on purpose. Inadvertently logging them is not the same thing. And Facebook’s scale is why it’s understandable how it could happen.

Someone had to write the code and then use the logs. Individuals, like any company. Then the article says it was actually searchable and used for something by employees. Why does the entire company’s scale make this excusable?

I don’t get it!!!

Facebook is supposed to be hiring the best developers. It has tons of money. This is the most basic security consideration - hashing passwords on the client at least. At LEAST.

I am constantly surprised by how basic practices were ignored by these corporations that got so big, while the small guys implement them. I guess people really were "dumb fucks" to trust Zuckerberg with their passwords.


so yeah, dev have access to the password of your account, but really they dont need it, they probably have access to the hash in the database and login anyway. So really NO harm have been done

Having (read) access to the hash shouldn't give any access to the account.

Edit: You've made this comment twice in these threads. It's wrong.


Yeah I was wrong, it is only true if the dev has access to the prod database

It is neither a standard practice nor a best practice to give your devs access to your production auth databases.

Perhaps not in more mature organizations, but it's standard practice at every startup I've ever worked for. One place had the dev office VPN'd into production at all times.

Did any of those startups have 200-600 million users?

No, these were small organizations. I'm just saying it's not uncommon. Sadly, some places don't even have a dev environment...

> BUT if we are honest, anonymizing log data is rarely a priority. Even if it is, leaking sensitive data can happen easily in a lot of different points in the infrastructure. In actual application code, client + server exception tracing (just imagine a deserialization exception which contains part of the input) , web server, load balancer, proxies, service mesh... There is a lot of interesting stuff hiding in the logs at pretty much every company.

I think so too, and I agree with you.

A couple of months ago we had to discuss with a colleague that wanted to log the request/response body because "without it, it makes debugging almost impossible". We certainly don't log, and we made sure that even exceptions don't spit internal data, and if it happens, it's only an exceptional case, a bug (so far, I didn't see). It's tough, because of course when you have access to plain data you can do anything you want to, but we owe our customers this additional step. Laziness can't be justified in this context.

However, generally speaking, I think it's a common practice in many companies, it's not the first time I see this happen, and I bet it's not gonna be the last.

It's a matter of mindset. The more I read such articles, the more I lose trust in even large companies. Typically smaller companies don't care, in order to move "fast". However, when this happens to large corporations, where does it end? Today it's facebook with the passwords, tomorrow it might be amazon with tons of credit card numbers because of a legacy system not anymore maintained...


On that note, it might be better to change the title to "Facebook logged... passwords in plaintext". When you say FB stored them in plaintext, that's generally understood to mean "used plaintext as the primary mode for storage when authenticating users".

> readable for anyone with access to the logging/tracing infrastructure

The article says more than 20,000 Facebook employees. A quick search shows that they have around 35k now - do 60% of employees at Facebook really need access to all of those logs?


Or is the the article actually correct?

You are technically correct, it is a dumb mistake and sadly not that hard to imagine happening. It's also inexcusable, and I would expect even junior engineers to know better than to log credentials as part of request processing

I doubt someone wrote "log.print(user.creds)". They probably wrote "log.print(req.args)" in a (what they felt) was an unrelated section of code. Sucks, but could easily happen.

I'd be interested in a system or tooling that could identify that something sensitive made it into a log. I think it is practically impossible, but would be interesting.


> I'd be interested in a system or tooling that could identify that something sensitive made it into a log. I think it is practically impossible, but would be interesting.

Have all strings printed to logs go through a common checking routine. That checking routine simply checks for the presence of certain hard coded sequences, and raises an alarm if they are found.

Whenever a production system is updated, run a test suite. The tests includes logging in to a test account whose password is one of the aforementioned hard coded sequences. If your system accepts payments, the tests can include a test purchase using a test credit card number that is one of the aforementioned hard coded sequences. In general, for each type of sensitive information, have a test that supplies sensitive information of that type, with that information being one of the hard coded sequences the log checkers checks for.

This won't stop you from accidentally logging sensitive data in production, but it should catch it during the post-deployment tests so you can fix it quickly.


How is this not a priority? And even worse, how is it possible that a bazillion of the supposedly smartest engineers did not fix it? In many other industry you'd get killed over this as a software engineer.

I don't buy that. In any other industry, you'd have a hard time even explaining what the problem is. "Someone wrote down my password in plaintext - yeah, that was me, it's sitting right there on my monitor".

The fact that the top comment is Facebook apologism informing others how this is "understandable" makes me wonder if there's something more than organic voting going on...

To think the only organic response is mob mentality against anything Facebook is silly. This is HN. We're here to discuss technology. As much as I stay away from Facebook these days, this particular flavor of security issue has happened to the best of companies in the past year, and it's natural to discuss that and why.

To genuflect to with "I hate Facebook too" before having a rational discussion is worthless virtue signaling.


There are still lots of "organic posts" on Reddit threads about Facebook's fuck ups implying that the "anti-facebook narrative" is a "Soros conspiracy", which leads me to believe Facebook is still working with the "Definers".

If you read the Google Dapper paper from way back in 2010 it has this to say about sensitive information and logging:

> Logging some amount of RPC payload information would enrich Dapper traces since analysis tools might be able to find patterns in payload data which could explain performance anomalies. However, there are several situations where the payload data may contain information that should not be disclosed to unauthorized internal users, including engineers working on performance debugging. Since security and privacy concerns are nonnegotiable, Dapper stores the name of RPC methods but does not log any payload data at this time. Instead, application-level annotations provide a convenient opt-in mechanism: the application developer can choose to associate any data it determines to be useful for later analysis with a span.

Given by how influential this paper was and how it likely influenced FBs own tracing system it's crazy that they would choose an opt out model. To me this system design is their biggest mistake and one could have easily been prevented.


The article does clearly address that Twitter and Github had to admit to the same issue but the scope and duration of the problem was far smaller.

> Both Github and Twitter were forced to admit similar stumbles in recent months, but in both of those cases the plain text user passwords were available to a relatively small number of people within those organizations, and for far shorter periods of time.

That is to say that it is somewhat remarkable due to all the servies that use facebook as a login.


Any privacy conscious company would not allow any random data, especially auth data, to be logged. I guess we are used to FB no caring about any of that but it's not normal nor acceptable.

I don't mean to engage in whataboutism here, but unless you're under the impression that no tech company is privacy conscious, what you're saying isn't true (though that'd be a reasonable impression, to be fair).

It's absolutely a security failure and it's not acceptable. But it's not a matter of "allowing" it to happen so much as, "Which vulnerability will we be caught by?" And it actually is a pretty normal vulnerability. For example, Apple[1], GitHub[2] and Twitter[3] have been vulnerable to this exact issue in recent memory.

I also don't mean to be defeatist. This kind of problem is preventable. But it's merely one dumb mistake in a universe of dumb mistakes that leads to serious security failures, all of which are easy to make. The most sophisticated and well-funded information security teams in the world - usually the FAANG teams - still miss things which look pretty silly in isolation.

At this scale being privacy conscious is necessary but insufficient. You can't realistically conclude anything about a company's dedication to privacy based on whether or not it was impacted by this kind of vulnerability. Making a corporate policy to hash passwords in the the database instead of storing them in plaintext is easy to codify, easy to implement and easy to verify. A corporate policy to never log authentication credentials is not nearly as well-defined, even if it's equally as important. That means more mental overhead, disagreement and uncertainty in preventing it. Ultimately, it also means more mistakes can be - and are - made.

________________________

1. https://darthnull.org/security/2014/03/10/cve-2014-1279-touc...

2. https://www.zdnet.com/article/github-says-bug-exposed-accoun...

3. https://arstechnica.com/information-technology/2018/05/twitt...


How do you stop it and still have an effective development org? Services need to be debugged, so requests and responses need to be logged...

Its pretty easy, you configure your logging library NOT to log the attribute, key/value pair, whatever containing the credential. If you can't modify it on the server side (which you can lazy bones), you tell your central logging system to mask it out before it is written to disk.

This isn't difficult or non-standard. If you are logging all client request/responses full take including auth creds, credit cards, SSN, etc, you are likely doing it wrong, and possibly violating some industry regulations.


At a company I worked for, if we logged any production data, we had to confirm there was no PII in there and no passwords or tokens, and very few people had access to these logs.

There's many layers of wrong if what FB did: carelessly logging production data, letting thousands of employee accessing these logs, and of all these people apparently none of them cared to mention there was a problem here, or if they did it was ignored by management. They don't have any excuse here.


You implement your logging infrastructure with awareness of PII and other sensitive information. Whitelisting fields to log would nearly fix it, blacklisting fields would be a bare minimum.

This is a good question and I have been at multiple shops which had this bug induced by a "let's just log everything" accident somewhere in the codebase. It's very nice to think of logging as a sort of "aspect" or a middleware that gets deployed across the whole stack, but it's a bit of a mistake.

You have something like four options for fixing it:

1. Every request is responsible for its own logging. This is actually not a bad approach because state-altering requests really need to be logged whereas state-viewing requests are much more optional, they help to try and guess “what were they doing when they ran into this bug they’ve reported?” but mostly they just occupy database rows. The risk is that someone is in a rush and commits something which does not discharge its logging obligation. You can build a system which forces this if you want, “the router will dispatch to your function and one of your function's arguments will be a locally-stateful logger, and once you are finished I will check whether the logger has handled anything, and if not I will log an error. So you should always `$logger->noLoggingNecessary()` somewhere explicitly in the codebase and then if this is wrong it gets caught in code review more consistently.”

2. The sensitive data is used to generate a bearer token and this flow is outsourced to its own un-logged server. You explicitly use the bearer token to construct everything important about the user account in a step before the logging begins, then delete the bearer token from the rest of the request. This flow can actually get really slick: the bearer token can contain the user data, optionally encrypted, with a message authentication code to ensure the user didn't tamper with it: you can then hit a near-empty Redis instance (or a near-empty table) looking for revoked bearer tokens super-fast, since you probably don’t see too much session revocation. So, user data lookup actually becomes unbelievably cheap because it's mostly CPU bound with (check empty key/value store, MAC-or-decrypt, parse body, pass to the handler function).

3. The logging service becomes controller-aware: each controller specifies whether it is supposed to be logged and the logging service just respects that flag and is otherwise global. So it might log that the login controller was accessed, but it doesn't log anything else about the controller.

4. The logging service becomes message-model-aware. This one is actually kind of slick, too, it means that you describe declaratively what sorts of data types are present in the messages that are transmitted to and from the server: and the first thing you do when you get a request is to validate the request against the model you have declared for messages to that request's namespace. So you will have a `validate($model, $value)` function that takes some arbitrary JSON data and a model and returns a normalized version of that data; a natural extension to this traversal that you're already doing (either by returning two normalized results or calling the function with an extra `options={removeSensitiveData: true}` type of argument) will allow you to define in the message-model itself whether the property is sensitive and should never be logged.


If I can make sure that passwords are securely stored and nowhere available in plain text even for hobby projects that will never see real users, how can Facebook with allegedly the top developers not make sure the same thing? And if you log something like this then everybody that ever saw these logs should be alarmed and press for changing that. Inexcusable to ever have that going on for more than 2 minutes. In 2012 Facebook was not a small startup with just Zuck zucking along.... This company is just so bad.

100% agree. Logging sensitive data is something that we should never do, but in orgs with 100s-1000s of devs, it’s something that almost always happens. At some point somewhere a piece of middleware or whatever logs full request bodies, query params, etc., and nobody notices. It’s an incredibly common mistake because it’s so easy to catch, and so easy to not notice.

This is still a very significant by Facebook, but as a dev I understand how it happens. It’s a mistake that happens at the vast, vast majority of tech companies.


Can happen for sure. Have seen it happen in other places too.

But this going on for up to 7 years? Yeah, I'm less than impressed by FB.

Had a less than stellar opinion of them before, they are definitely hitting the Mariana trench now.


> Sounds a lot like some service was logging the full body of a signup/login request, which then was readable for anyone with access to the logging/tracing infrastructure.

Wouldn't that also include failed login attempts complete with the failed passwords?

I mention this due to FB's history of even logging the entered passwords of failed attempts, which Zuck supposedly used to hack into people's Email accounts [0].

Because if that also applies here, then a whole lot of people just had way more leaked than just their FB password.

[0] https://www.businessinsider.com/henry-blodget-okay-but-youve...


A written statement from Facebook provided to KrebsOnSecurity says the company expects to notify “hundreds of millions of Facebook light users, tens of millions of other Facebook users, and tens of thousands of Instagram users.”

Does this imply it was a Facebook Lite - specific problem?


Yes

This does not sound like logging, because I would imagine that logs wouldn't be kept for years (2012!). The article implies that the passwords were stored and searchable in clear text. Yes, logs can be searchable, but this isn't what the article implies.

The article doesn't claim that passwords from 2012 are viewable today, just that they have been storing them in plaintext since 2012.

I really don’t know about this. Encrypting our log information was literally one of the first things we did when starting our latest product.

We store maybe a few million customer records without any significant PII.

Now there is Facebook which apparently hasn’t even bothered? And is logging the full body of requests somewhere? That’s a major WTF, even for Facebook.


>prevent bots/spam, abuse, etc.

Do they inspect the passwords too?


Dumb mistake?

This was a feature for the government. Plain and simple.


Yes because the government would want Facebook to store user passwords. But only some of them, specifically facebook lite users, you know, the users that can't even afford a modern phone and reasonable internet. And they would want to use the passwords directly, instead of having some other backdoor for getting data from these accounts, so that way the users they spy on can see that they're being spied on because of the suspicious logins using their password. You've cracked the case =)

> My Facebook insider said access logs showed some 2,000 engineers or developers made approximately nine million internal queries for data elements that contained plain text user passwords.

This imo is the truly alarming takeaway. FB employees were retrieving user passwords? Around two thousand FB employees? How in God's name is Zuckerberg going to perform his usual performative contrition about that one?

I'm just trying to imagine the data structures that were being retrieved from databases. Either they stored something like a big user account data type that contained their password in plaintext, which imo is a really weird design choice, or logs for other services were being mixed in with logs leaking the user/pass combos.

Surely one of the engineers could have noticed and said 'wait a minute... those are logins' over the course of the years? We hear all the time that people want FB to follow a responsible social practices (the debate on what those are rages on, which is great imo), but can't FB at least wrangle its own code base?

On the other hand, we shouldn't take the stance that heads should roll, imo - it would just create a chilling effect that would deter other companies from ever going public about their own security mishaps.

edit: I should probably tone it down in this comment but I'll leave it for posterity


>FB employees were retrieving user passwords? Around two thousand FB employees?

This sounds to me like they were writing it into some log analysis tool, and people happened to pull it up.

I have no knowledge of FB internals, but as an example let's say they have Splunk, which is a popular distributed log analysis tool (or some custom equivalent, because FB is huge). Splunk makes it very easy to pull up tons of logging for any app - that's what it's for. So write a poorly-constrained query and you can get millions of log lines from many different apps.

It's also feasible that everyone (or nearly so) in the company has access to Splunk.

So if somebody, somewhere accidentally writes passwords into a logfile, and it gets indexed into Splunk, a ton of people might theoretically see it, but just skim past it because it's not what they were looking for. Picture trying to hunt down an error, and wading through many screens full of logging from different apps - if there's a password in there you could easily not notice if you're not looking for it.

That's all speculation, but it's also entirely plausible. All it takes is accidentally logging out those passwords to somewhere that Splunk can see.

Also not trying to say this is in any way OK or acceptable, just that I can understand how a seemingly-small mistake could quickly result in thousands of people having access to passwords.


This should be the main takeaway. If true, then 2000 people decided not to raise the alarm. And 9,000,000 queries with results that had passwords. This completely invalidates the other thread of people discussing giving facebook a pass because accidental logging could happen to any company.

> If true, then 2000 people decided not to raise the alarm

Have you ever worked at a software co, discovered <terrible software practice>, raised a flag, and were told "thanks, we've added a ticket to the backlog"?

I'd bet that some of them did flag it.


Explicitly logging a password is one of those practices that doesn't sit on the backlog.

It's probably a bit more complicated than that. Usually the things that I encounter have to do with how HTTP requests are logged.

For example, putting sensitive information in a URL that's loaded over HTTPS is considered insecure because many companies have policies where they log every URL that their employees visit. (Think of a password reset link.)

A lot of inexperienced programmers don't realize this, because they don't realize that you can man-in-the-middle yourself, and that most corporate computers come preconfigured to allow the employer to man-in-the-middle everyone.

So, if a password reset link never expires, it means that some guy in IT can own an account that was reset on a corporate computer.

(This, basically, is how they catch people viewing porn on their work computers.)

Anyway, my point is that the problem is probably something where a junior programmer transmitted a password in a way that they didn't realize was being logged.


You're probably right, and I would love to see that confirmed. FB wouldnt have to link to the Jira ticket or give the name or details of the ticket, but verify that was the case and then explain why nothing was done.

Still interesting that not one of them went all out and publicly derided the company. The whistleblower clause would probably have protected them. However our whistleblower laws could use a brush up.

How much do you want to bet that some of those FB employees decided to login by turning off "location detection" flag and in an incognito browser?

I would not be surprised. Some of those 9,000,000+ queries and 2,000+ employees must have had some nefarious use. Statistically speaking...


Judging by the maturity level of the discourse on teamblind this precisely what happened.

Judging by working with hundreds of software engineers over a decade, this seems very unlikely to have happened.

Isn't it possible that those developers just wanted to look at something harmless like for example failed logins and did not even see or reflect on that the log messages also included the password? But if so it is a bit worrying that 2000 eyes did not spot the bug.

Maybe many people were just using SELECT * FROM ... and got plain text passwords they weren't necessarily looking for.

I read the article and this is what I got from it too. I `SELECT *` all the time and see hashed passwords. It is very rare that I need that hash, usually I'm doing something entirely unrelated, but it's just easy to get all the rows and only use what you need, especially when you are troubleshooting and don't even know what you need yet.

What that quote is saying is that the logs that contained the passwords were accessed by 2,000 engineers. Most of those engineers would only be looking at data relevant to their job, only a security engineer would be in a position to notice the passwords, which is what happened.

No you're misrepresenting what (probably) happened. They had some system that logged API requests (fine and normal). Some API requests include plaintext passwords (also fine and normal).

The issue is presumably that they had no exclusion to the logging for sensitive information like passwords, which is honestly very easy to overlook.

So two thousand Facebook employees were not "retrieving passwords". They were looking at the API logs, which is a normal thing to do.


Probably some poor engineers trying to check if their partners were cheating on them. Tools like FB make it easier after all.

I think the technical security problem here (properly whitelisting parameters in logs) is just a symptom and not the core underlying concern (as the article mentions Twitter and Github just dealt with similar issues)

To me, it seems very likely someone before 2019 laid eyes on these logs and either:

  a) Decided not to report it (implies serious security culture issue)
 
  b) Reported it and no action was taken (implies serious security process issue)

  c) Didn't even acknowledge it was inappropriate (implies a serious security training failure)
If you've already become complicit towards regularly violating the privacy of your end-users, one can easily understand an employee devaluing the seriousness of clear-text passwords in a log.

Are FB employees so regularly exposed to sensitive data that they have become desensitized to the seriousness of clear-text passwords in an internally accessible log?


This is very basic stuff, all user-identifying information should be tokenized before being logged. Controlling access to production login logs or exposing them on only a need to know basis is another basic security principle. Sounds like FB is actually a wild wild west internally.

What is now a common practice was nonexistent a few years ago. 20 years ago you could bypass Windows login with a few clicks. 15 years ago CORS was nonexistent. 15 years ago it was common to send sensitive data unencrypted..and so on.

Ex Facebook people told me that until around 2010 FB management turned a blind eye on its employees digging around their databases. Then they released a warning that people should stop and started locking prod data down. A few months later those who still peeked around prod data were let go.


As the article mentions, Twitter disclosed a similar mistake a year ago...and GitHub before them. These are just the organizations responsible enough to say something.

Can we take a moment to acknowledge that this is an easy mistake to make? A logger doesn't care if it's a password or not. Strings are strings. As long as the answer is "humans should be more careful" we'll be seeing these kinds of disclosures regularly across the industry.

My best attempt to address this in my teams has been to use different data types for different data classifications. Naked strings must be loaded in one of these data types after input sanitization. That makes it easier to catch accidental inappropriate use. This is useful for managing PII as well.


Interestingly, Apple seems to have something like this in place where format strings, by default, are not logged (they show up as <private>). I'm thinking that something like this might be useful to have in general.

That's why modern languages need something like a typedef!

At least C# has a concept of a SecureString for things like passwords.


How does SecureString actually work? Presumably it's possible to coerce one into a "normal" string, and I'm sure that someone will end up doing this in the codebase because it's more convenient to work with.

Yes, this is an industry-wide problem. Many logging API's are designed for convenience, not security. We need systematic solutions that are still easy to use (or they won't be used).

I mean, yeah it is easy to make but it is also easy to fix. How long did this occur prior to correction? 1000's of searches for said data in the logs. I suspect we are only hearing about it because Facebook, all of a sudden, has been publicly humiliated multiple times for poor engineering/business practices and has decided that it is time to grow up and take privacy seriously. Otherwise, they risk their existence.

Yet another example of why its important to use _unique_ passwords for every site you have an account on. Even if the site you're using does password storage properly, that's no guarantee plaintext credentials couldn't leak through other means, such as, in this case, improperly configured logging systems.

In the future, WebAuthn may be able to solve this problem for good, as sites will only have access to a unique public key rather than a plaintext password. Until then, a password manager is your best defense against this type of issue.


This is so prevalent in technology companies that it is funny to read this thread with everybody throwing mud at Facebook without considering that most probably the company they are working on has had (or maybe even currently has) the same issue.

That's a fallacy my dude. You can throw shade at your own company and Facebook for the same reason. It doesn't and shouldn't minimize what Facebook has done.

Posted this on the other thread from Facebook, but at what point do we start imposing strict fines on companies that are found to have done this?

Granted, I guess we wouldn't be hearing about this instance at all if there was to be some sort of fine attached - it would have just been swept under the rug - so maybe that's not a good idea. I'm just tired of the "oops we stored your passwords in plaintext lol" from companies with engineers that should clearly know better.


When we can start fining you for your mistakes, when developers can start getting fired for any mistakes immediately.

I don't care about Facebook, but storing user passwords is the plain is not "privacy violation" without the password they still have access to all your data. storing user passwords by logging it is stupid amateur security mistake.

I can understand if Facebook stored the password and used it to access your other accounts with permissions to invite users then sure fine em. But mistakes are mistakes, they owned up to it.


Storing passwords in plain text is beyond a simple slap-on-the-wrist mistake, and it has real security implications.

I wouldn't mind that, more responsibility on the software developer means more leverage to push back. I guarantee you I'm not rushing for a deadline if I think I'm compromising security that may put a black marge on my career.

>>When we can start fining you for your mistakes, when developers can start getting fired for any mistakes immediately.

Sounds good to me. Plenty of people in other industries get fired all the time for violating best practices, regulations, and so on. Why should software be any different? Are we special?


Most tech companies have a “blame the process, learn, and fix the process” approach. I’m not sure what industries you’re talking about but manufacturing and aviation seem like they have a similar process.

An internal tool that allowed for logging clear text passwords was a mistake. A culture that allowed said system to exist for 7 years without being surfaced by any type of internal security audit is something else entirely. Financial penalties could/should/would target the latter not the former.

It's likely to have been a breach of GDPR, so if this situation had existed when GDPR was in force, the answer to your question would be "at this point".

I guess that's true, yeah. Will be interesting to see if there are fines from the EU.

It all depends on if one can prove negligence.

I agree, it's time for there to be criminal negligence penalties for these most egregious failures of even basic security practice.

If being bad at your job is a crime then lock me up.

Depends on the job. If you're a licensed professional, this could very much be the result of you "being bad at your job".

If you get a license and are bad at your job then lock up the licensor.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: