New Subsystem Layer: Distributed Endpoint Identity Generation Engine

I’ve got most pieces of this already built which I’ve been using for testing, but automated identity creation on IRC networks is super easy, even for highly restrictive environments.

They’re not going to change the IRC protocol, so the constraints of IRC protocol are givens.

Nickserv varies from network to network and depends on what services bots they use and configuration of them.  So some components will need to be network-dependent.

Given input:

  • email
  • host (meta, registered vs used)
  • user
  • ident
  • password

This should be able to operate as a single command that connects and does everything.

There are some problems to solve there:

N, R and E are separate hosts.

In addition to that, R and E need to be random and disparate between iterations for a system like this to really work.

The two problems introduced there are:

  • orchestration
  • endpoint creation

My two major pain points in all things.

Endpoint creation for R is relatively easy.  Endpoint creation for E is slightly more complicated as you need to be able to open ports on a host to do that, which requires root access.  For dynamic endpoint creation you’d almost need to generate the OS image and spawn dynamically.

I have an idea that I want to test.

Field of Visibility Change

 

Updated Field Of Visibility

Joins and Parts are Removed from API Returns

The data is still present in the backend so that identity research pages still work, they just aren’t displayed in the log viewer.  There is almost nothing lost by this and plenty gained.

IRCTHULU Logs

Metadata

  • The time the logs resume (correlates to joins in local log)
  • The time the logs stop after a kline (correlates to kline in local log)
  • [any entries present in local logs not present in ircthulu logs — mitigated by random capture delay at client level]

Their Local Logs

Data

  • joins (completely mitigated)
    • user
    • host
    • ident
    • realname
  • registration
    • email (if I reduce this to the last one I’ll be able to confirm they’re tracking users’ emails)
    • host
    • user
    • ident

Metadata

  • age of registration on join
  • profile of joined channels
  • vps provider profile

This Blog

This blog is a great source of information for predictive analytics now.  As it is intended to be.

What I Learned

Network Field of Visibility

There are defined datapoints visible in the network staff’s field of vision:

  • [IRCTHULU’s Logs]
  • [Their Local Logs]
    • [Registration Data]

Among that there is relevant metadata.

IRCTHULU Logs

Data

  • joins
    • user
    • host
    • ident

Metadata

  • The time the logs resume.
  • The time the logs stop after a kline.

Their Local Logs

Data

  • joins
    • user
    • host
    • ident
    • realname
  • registration
    • email
    • host
    • user
    • ident

Metadata

  • age of registration on join
  • profile of joined channels
  • vps provider profile

Conclusion

There will be varying levels of misanalysis this should need to depend on.  One notable in particular involves the scary police state perspective some of the staff will take which will involve klining random suspected users with no evidence or usable data, which I’ve seen them already start to do.  They’ll end up klining whole network blocks which will start to have a user impact.  This is to my advantage.  The hidden pressure that they place on themselves and their communities and users will manifest as curbed growth and even shrinkage in some communities.  There will be missed hits based on metadata manipulation on my end.  VPS usage on the study network will be almost none after a while.

This is a ‘good’ masked as a ‘bad’ because those people need to go and eventually will over it; will help identify them to network owners who generally do care about such things.  And if they don’t they’ll appropriately lose hard.  If their priorities are right I win.  If they aren’t I win.

I’m still analyzing their field of vision and ways to scramble it  but there’s pretty good progress after yesterday’s bait-and-tackle.

Needs

At this point I have identified a need to introduce a control channel in the message bus being used for the data feeds.  This should relay control commands to the tenta clients to shut on and off feeds.

There will also need to be a reporting channel that is processed differently than the feeds to give a wide view of client status so that I can start development on orchestration components.  Using syslog was a bad idea for this use case.

Deployment needs re-evaluated.

Identity generation needs re-evaluated.

New Year Trouble

Well, I’ve got good new and bad news.

The operation that was conducted pretty much all day today to break the feedback loop for Freenode and OFTC staff unveiled a minor but critical vulnerability in the data shape produced by tenta.

Now that it’s mostly over,  or mitigated at least, I can reveal the details.

Problem

Technically it’s not a bug, as the issue is in the “negative spaces” in the data that’s created.  When the tenta client joins it currently omits its own user from the logs.  This is actually bad, as, it can be used to root out the runners.  I’ll explain more below.

Certainty

I’ve been able to confirm that this is the method the staff were using to identify the “bait bots”.  I’d originally thought they were processing some server-side information, and I’m sure they did in some cases, but was able to conduct thorough A:B and isolation tests to verify that they are also cross-referencing local logs with presenta logs; this was found by making minor adjustments in their field of vision and then waiting for a bait to hook in a controlled manner repeatedly by comparing page views to klined bots in a predetermined manner after assessing what their visible data points were.  They were processing the joins listed in the presenta logs and also checking for missing user data there, and comparing to local logs.

Impact

This has to be fixed before we can use any more runner data.  I went ahead when I first suspected and deleted random rows from the database early on to obfuscate already existing data so we dont have to lose the whole database but I will not be turning the feeds back on until the next update to tenta.  All pooled data is useless without compromising the runners.

Otherwise, A Relative Success

In other, better news, the staff used approximately 6,086 IP Addresses total during the operation to view the logs.  I think we’ve just about got their loop compromised.

Here is a list of those IP addresses in case you’d like to do something similar if you host a rogue clone of IRCTHULU PRESENTA on your PHP-APACHE server — dropping this in an include should pretty must ghost out the whole TOR network, most vpn’s known for being abused, and almost all the relevant staff’s various proxies and owned IP addresses:

http://paste.silogroup.org/axohacugej.apache

The process for adding them to the ban list was automated about 10 minutes in, but, I needed to disable the banning for a good long stretch or they’d have caught on to what was really going on.  One of them was really smart and added in some well crafted characters to try to slide through a grep and I didn’t see what they were doing until about an hour in — whoever that was knew exactly what what was up.

There will still be some of them that can access, but,  it’s pretty straight forward now.  This will buy plenty of time since I can’t use runners until the Tenta update.  New version of Nerve will accompany to add the feature of clearing out the pooled messages on restart.

I’m pretty excited — this was a total blast.  This whole project’s been like that.

Recap

  • This operation did indeed confirm the OFTC and FNODE network is actively targeting the runners.
  • FNODE and OFTC Feedback Loop is mostly broken so they won’t be able to for much longer.
  • They did my bug testing and risk analysis for me today which identified the vulnerability they’d use to find the runners.
  • Unfortunately it was significant enough that I can’t turn them back on without compromising their identities.
  • I obtained excellent data leverage-able to conduct “further WTF”.  Which I will certainly be doing.

Operation in Progress

Yes, the Feeds are Disabled

You might have noticed I’ve turned the feeds off.

Relax.  Your runners will pool messages until I turn it back on.

I’m conducting a mixed signals operation to ensure you’re protected from your network.  They’re actively targeting some of you.

I’ll flip the switch back on when it’s done.  This needs to happen.  We’ve got to shut off their eyes.

Other News

IRCTHULU just got google indexed.  I need to fix that title for the next crawl.

http://i.imgur.com/ShWCc4J.png

-C

IRCTHULU is LIVE

The Presenta layer is complete.

Presenta is an “example” ui. I hope someone comes along and builds a more robust and featured UI — but it works, and it works on mobile phones.

Official IRCTHULU Log Portal:

presenta.silogroup.org

IRCTHULU ARRIVES

The Example UI is crude, simple, and functional.

The API and UI will be moved to alpha, and will be publicly expo’d later today.

This version should be rather sturdy.

An announcement with url’s will be made shortly after the new services are up.

New Direction for Presenta

I’ve decided to take the Presenta example UI in a different direction.

Since this will be primarily tailored for google-indexed log viewing and identity research, we need to have something more palatable to the index crawlers than would easily be provided by a static html page.

So, I’m moving it to be a drill-down style navigation with breadcrumbs and making it PHP based to speed things up a bit.

High Level Nav Structure

[Log Selector] -> [Log Viewer] -> [Identity Research]

Log Selector

[Network] -> [Channel] -> [ Date Start, Date End]

Log Viewer

[Log]

Identity Research

[Host Research] | [Ident Research] | [User Research]

Host Research

[ Host, Associated Idents[], Associated Users[] ]

Ident Research

[ Ident, Associated Hosts[], Associated Users[] ]

User Research

[ User, Associated Hosts[], Associated Idents[] ]

-C

Presenta in Early Draft

The final layer is now in early draft.

I have not yet exposed the API to the public yet as I’ve ended up reshaping alot to make it easier to build sites on top of it.

Presenta is going to have a red white and blue theme.  Here’s a screenshot of where it’s at currently before I start dropping in the identity research and cross-referencing modals windows.