This is an archived post. You won't be able to vote or comment.

全 11 件のコメント

[–]raldi 14 ポイント15 ポイント  (0子コメント)

Can I summon the bot? How? Who maintains it? It would be neat if there were a way for commenters to suggest sentences the bot should've included or rejected, and then upvote those comments to provide emphasis, so the bot gets smarter over time.

[–]cruyff8 5 ポイント6 ポイント  (7子コメント)

Is the bot open-source?

[–][削除されました]  (6子コメント)

[deleted]

    [–]cruyff8 6 ポイント7 ポイント  (5子コメント)

    send it to SMMRY

    The SMMRY service you alluded to is what interests me.

    [–]iforgot120 2 ポイント3 ポイント  (3子コメント)

    It most likely uses something like Stanford's NLP module (which is open source) to process individual words, then uses some form of a TF-IDF algorithm/formula (depending on how complex it is) to identify key phrases and sentences.

    You can use some machine learning and context forests to help improve accuracy, but that's the basics of it.

    [–]cruyff8 0 ポイント1 ポイント  (2子コメント)

    some form of a TF-IDF algorithm/formula

    I'm familiar with the Stanford NLP packages, but this is what I was curious about. Thank you... Perhaps more specifics would be even more grand.

    [–]iforgot120 3 ポイント4 ポイント  (1子コメント)

    Specifics on TF-IDF? It's a very simple algorithm, so there really isn't all too much to it; you can try to improve accuracy by playing with the numbers, but the idea is the same.

    The idea behind TF-IDF (which stands for "term frequency - individual document frequency") is that it analyzes a single document (e.g. a posted article) for individual word count (how often each word appears in the document). Words that appear more frequently are most likely important to that document, however that'll be skewed by words that are simply frequent throughout the English language (e.g. things like conjunctions [and, or, but, etc.], determiners [this, that, each, my, the, etc.], common verbs [is, are, was, etc.], etc.).

    To offset that, you need to normalize the term frequency with the individual document frequency which looks at a body of different documents (called a "corpus" in NLP). Words that appear (however many times) in all or many of the documents are probably words that are just common in the English language, while words that are rare would be more specific to a single argument.

    So if you have a word that appears often in a single document, but only in that single document and in no other documents, then that's probably a relevant word to said document, meaning sentences containing that word probably have higher importance.

    [–]cruyff8 0 ポイント1 ポイント  (0子コメント)

    Oh, I wasn't familiar with the acronym... :)

    [–]aristeiaa 1 ポイント2 ポイント  (1子コメント)

    TLDR from Summry

    [FAQ] AutoTLDR Bot Autotldr will only post if the content can be reduced by atleast 70%. So if the summary is only 50% shorter than the original, autotldr will not post it.

    If you have valid reasons for blacklisting/banning autotldr please contribute to the theory of autotldr discussion.

    [–]polysemous_entelechy 0 ポイント1 ポイント  (0子コメント)

    autotldr does not summarize self posts, as the responsibility of providing that tl;dr should be of the OP.