Hacker News new | past | comments | ask | show | jobs | submit | stonemetal12's comments login

Rather any Logic puzzle you post on the internet as something AIs are bad at is in the next round of training data so AIs get better at that specific question. Not because AI companies are optimizing for a benchmark but because they suck up everything.

ARC has two test sets that are not posted on the Internet. One is kept completely private and never shared. It is used when testing open source models and the models are run locally with no internet access. The other test set is used when testing closed source models that are only available as APIs. So it could be leaked in theory, but it is still not posted on the internet and can't be in any web crawls.

You could argue that the models can get an advantage by looking at the training set which is on the internet. But all of the tasks are unique and generalizing from the training set to the test set is the whole point of the benchmark. So it's not a serious objection.


Given the delivery mechanism for OpenAI, how do they actually keep it private?

> So it could be leaked in theory

That's why they have two test sets. But OpenAI has legally committed to not training on data passed to the API. I don't believe OpenAI would burn their reputation and risk legal action just to cheat on ARC. And what they've reported is not implausible IMO.


Yeah I'm sure the Microsoft-backed company headed by Mr. Worldcoin Altman whose sole mission statement so far has been to overhype every single product they released wouldn't dare cheat on one of these benchmarks that "prove" AGI (as they've been claiming since GPT-2).

Wichita Kansas is home to about 20% of aircraft-manufacturing workers in the United States. Not exactly known for HCOL.

>I didn’t say anything about pot. I said drug. That’s any illicit drug or controlled substance including prescription misuse.

Pot is still illegal at the federal level where a clearance would be taking place.


Given that AI output can't be copyrighted, how do you protect and distribute the project you are working on?

I don't know? I guess I don't worry about that. Maybe I am Borg.

>No one is born knowing how to speak or anything about their culture.

Not really the point though. Humans learn about their culture then evolve it so that a new culture emerges. To show an LLM evolving a culture of its own, you would need to show it having invented its own slang or way of putting things. As long as it is producing things humans would say it is reflecting human culture not inventing its own.


>gradually moving in that direction

Lead paint on children's toys and asbestos in the walls suggest there has never been a culture of making sure it is safe before putting it in production even in the physical world.


Lead pipes, yes.

But asbestos in the walls is great for the safety of those living there — the place doesn't burn down so easily. It's the safety of anyone who inhales the dust when the place gets broken down that's endangered by asbestos.


Move to Alaska, Delaware, Montana, New Hampshire or Oregon. No sales tax.

most states have some trick.

I think you have to look at it wholistically:

https://upload.wikimedia.org/wikipedia/commons/9/9f/Median_h...


Not a trick, just they get their income from other sources / taxes. But we're specifically talking about knowing the final price you will pay, in advance, which requires knowing the sales tax.

nitpick: "holistically"

Lol, I didn't even think when I wrote it that way. That said, it seems to be in the dictionary as an alternate spelling of holistic.

I guess the english language lets this stuff happen a lot, but not alot. I definitely did it by accident (not "on accident"!)


As a v0.1 it seems great. Reminds me of the jaws ad in Back to the Future 2. Mall advertising is probably the only thing it is suitable for now. As it gets higher res I could see it replacing TVs.

>It seems like AR, of any stripe, can do better.

Only if everyone in the group has AR goggles, I could see this being better for group activities where everyone can be around and reference the same "thing".


Closer to the projection of Leia at the start of Star Wars, I would have said.

Either way, pretty cool, but I don't see it replacing TVs any time soon.


Either they follow the law or it is illegal to sell in the covered jurisdiction.

There are already US laws that work in that way. Consider DDT, if fruit was grown with DDT you can't sell it in the US. It doesn't matter that DDT is legal in the grower's country.


Taken as given the statement in the parent that "there's no moral principle by which it would be right to expose and open-source their code.", then that sort of transitive requirement ought not be imposed. Though I don't agree with that statement.

It is public key cryptography. You give websites your public key, and keep your private key hidden. When you sign in to a website, they send you a nonce. You then digitally sign the nonce with your private key. They verify that the signature was signed with your private key, allowing you to log in.

There is no private info (aka a password) going out in public so you don't have to trust anyone to keep your password secret.

It greatly reduces the attack surface of logging in, but the attack surface is moved to the weakest part of the system, aka the user.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: