Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
  • Sign in / Register
Minds Backend - Engine
Minds Backend - Engine
  • Project overview
  • Repository
  • Issues 270
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 34
  • CI / CD
  • Security & Compliance
  • Packages
  • Analytics
  • Wiki
  • Snippets
  • Members
  • Collapse sidebar
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Minds
  • Minds Backend - EngineMinds Backend - Engine
  • Issues
  • #1236

Closed
Open
Opened 2 months ago by Brian Hatchet@brianhatchet:speech_balloon:
Report abuse New issue

Elastic search aggregation for top domains

Create an ES aggregation

Significant terms analysis

Find all urls

Provide a mechanism for flagging top level demands, being careful not to ban good domains

Flagging url results in the content getting removed, not deleted

Edited 2 months ago by Brian Hatchet

Linked issues
0

  • Discussion 4
  • Designs
  • Brian Hatchet :speech_balloon: @brianhatchet added Status::Planning Breakdown scoped label 2 months ago

    added scoped label

  • Brian Hatchet :speech_balloon: @brianhatchet changed the description 2 months ago

    changed the description

  • Brian Hatchet :speech_balloon: @brianhatchet added to epic &102 2 months ago

    added to epic &102

  • Brian Hatchet :speech_balloon: @brianhatchet assigned to @ramialbatal 2 months ago

    assigned to @ramialbatal

  • Brian Hatchet :speech_balloon: @brianhatchet assigned to @brianhatchet 2 months ago

    assigned to @brianhatchet

  • Brian Hatchet
    Brian Hatchet :speech_balloon: @brianhatchet · 2 months ago
    Developer

    Need to talk to @ramialbatal about what to do with the data when we ban content in terms of full text analysis

  • Rami Albatal
    Rami Albatal @ramialbatal · 1 month ago
    Developer

    @brianhatchet here are my thoughts:

    URL queue

    The admins as any Minds user are able to report channels as spam.

    The admins are also able to report a URL/Domain as a spam.

    • Once a URL/Domain is reported, it will be immediately sent to a URL queue.
    • Once a channel is reported, we need to automatically scan this channel and extract the URLs and sending to a URL queue.

    UI and data to retrieve

    • Each time an admin load the moderation page of the potential spam URLs he will see the following information and metrics:
      • URL
      • Is_Spam
      • date of last occurrence
      • number occurrences (all time)
      • number of occurrences (in the 7 days preceding the last occurrence)
      • number of channels mentioned this URL (all time)
      • number of channels mentioned this URL (in the 7 days preceding the last occurrence)
    • 4 radio boxes should be displayed beside each URL:
      • Flag the URL as Spam.
      • Flag the Domain as Spam.
      • Flag the URL as healthy.
      • Flag the full domain as healthy.
    • Beside the 4 radio boxes we should have a "submit" button.

    How to display the list of URLs?

    The URLs can be ordered by decreasing order of a score. This score can be a combination of the 4 metrics above.

    Score = (alpha * number of days since last occurrence) + (beta * number of occurrences in the 7 days preceding the last occurrence) + (gamma * number all occurrences) * (delta * number of channels mentioned this URL in the the 7 days preceding the last occurrence) + (epsilon * number of all channels mentioned this URL)

    Suggested values of parameters (we will change them later based on the feedback from the Admins):

    • alpha = 25
    • beta = 5
    • gamma = 1
    • delta = 10
    • epsilon = 5

    Actions

    • Once the admins clicks on submit button:
      • the metrics mentioned above and the URL will be stored in an Elasticsearch or Cassandra along with admin decision. This is necessary for two reasons:
        • for ML training
        • to avoid displaying the same URLs/Domains to the admins in the future.)
      • any post/comment/blog containing a URL or a domain marked as Spam will be removed (not deleted).
      • A channels or groups that shared a spam URL/Domain will be forwarded to the Admins for a Spam check.
  • Brian Hatchet :speech_balloon: @brianhatchet changed milestone to %Sprint::02/26 Calculated Cricket 1 day ago

    changed milestone to %Sprint::02/26 Calculated Cricket

  • Brian Hatchet :speech_balloon: @brianhatchet assigned to @markeharding 1 day ago

    assigned to @markeharding

  • Brian Hatchet
    Brian Hatchet :speech_balloon: @brianhatchet · 1 day ago
    Developer

    Need to break this down into cards for implementing Rami's suggestions. Any thoughts on this @markeharding

  • Rami Albatal
    Rami Albatal @ramialbatal · 1 hour ago
    Developer

    @brianhatchet as a temporary solution, I have a simple code that is extracting the popular URLs and I can run it a couple of times per week, and see if there is some suspicious URLs there. Except if we have resources to implement a better solution like the one I mentioned above.

Please register or sign in to reply
3 Assignees
Mark Harding's avatar
Rami Albatal's avatar
Brian Hatchet's avatar
Bot, Spam & Fraud Prevention
Epic
Bot, Spam & Fraud Prevention
Sprint::02/26 Calculated Cricket
Milestone
Sprint::02/26 Calculated Cricket
Time tracking
No estimate or time spent
None
Due date
None
1
Labels
Status::Planning Breakdown
None
Weight
None
Confidentiality
Not confidential
Lock issue
Unlocked
3
3 participants
user avatar
user avatar
user avatar
Reference: minds/engine#1236