Discussion:
Let's decide on the inclusion criteria
Ryan Thompson
21 years ago
Permalink
Hey all,

Here are my (edited) thoughts on how and why domains should be included
in UC. There are some fuzzy parts, so comments are not only welcome, but
strongly encouraged. :-) Editorial comments/questions in [ square
brackets ].

1. Domains worthy of inclusion into outright blocklists (WS, AB, et.
al.) should still be submitted directly to those lists, and not
submitted to UC. We don't want to list *everything*.. just domains
that can't otherwise be included in SURBL.

2. Domains that have significant definite legitimate uses should *not*
be sent to UC. [ I need help defining what "significant" means! Can
we draw some kind of line, even a broad one, for now? ]

3. Everything else in the middle ground is a UC candidate. We only want
to list domains that are very spammy. Domains can be submitted (with
justification) to this mailing list, or they can be submitted
privately to ***@sasknow.com.

[ Should we only accept domains that have already been hand-checked by
the submitters? ]

To be included in the list, domains need at least two *hand-checked*
votes to be included. [ Should the submitter count as a vote? Should I
count as a vote? Should we use require three votes? I'm thinking two is
sufficient and will keep the inclusion lag to a minimum. ]

I'll privately track the number of unique submissions, and maybe a few
other bits of meta-data.

- Ryan
--
Ryan Thompson <***@sasknow.com>

SaskNow Technologies - http://www.sasknow.com
901-1st Avenue North - Saskatoon, SK - S7K 1Y4

Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America
--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Alex Broens
21 years ago
Permalink
Please note that the following is my personal opinion, often written in
a blunt way (I'm no diplomat) so pls don't take any of my comments
personally.
Also take in account that English is not first language.
Post by Ryan Thompson
Hey all,
Here are my (edited) thoughts on how and why domains should be included
in UC. There are some fuzzy parts, so comments are not only welcome, but
strongly encouraged. :-) Editorial comments/questions in [ square
brackets ].
1. Domains worthy of inclusion into outright blocklists (WS, AB, et.
al.) should still be submitted directly to those lists, and not
submitted to UC. We don't want to list *everything*.. just domains
that can't otherwise be included in SURBL.
[ nope... reports have to be fresh spammer trends should be caught
before they've flooded everyone and if I have to decide where to put a
domain and go somewhere else to do it 150 times a day, I end up not
reporting and just dump them in a local zone and my data remains at home
and nobody gets the profit of it. A reporter won't have the patience to
jump back and forth.
Post by Ryan Thompson
2. Domains that have significant definite legitimate uses should *not*
be sent to UC. [ I need help defining what "significant" means! Can
we draw some kind of line, even a broad one, for now? ]
[ mardoxDOTcom is one of these examples... the phrase "legitimate use"
makes me get the surbl.org itch...]
Post by Ryan Thompson
3. Everything else in the middle ground is a UC candidate. We only want
to list domains that are very spammy. Domains can be submitted (with
justification) to this mailing list, or they can be submitted
[ Should we only accept domains that have already been hand-checked by
the submitters? ]
[ I believe the public report form @ the rulesemporium is one of the
most dangerous procedures imaginable. No way to track reports.

Submitting to a list or any other person delays publishing the data,
possibly making it stale in a question of 30 minutes.

Submissions should not be totally public... I like Bill's method with
which he can track down what comes from whom and where reporter's manual
intervention has been kept to a productive minimum. ]
Post by Ryan Thompson
To be included in the list, domains need at least two *hand-checked*
votes to be included. [ Should the submitter count as a vote? Should I
count as a vote? Should we use require three votes? I'm thinking two is
sufficient and will keep the inclusion lag to a minimum. ]
[ So if I report a flood while you are snoring, you'll be flooded by the
time you take a look at my submissions..... and by then the flood has
stopped.

I report aprox 25 domains/day between 06:00 /11:00 / GMT+2...
Each is manually checked at the emporium and 99% are new doms.
Many are so fresh by now I know they're fresh.
I update my local zones every minute, beam to Bill every 5 or 6 minutes,
if SURBL would update more often, that data would be even fresher. ]
Post by Ryan Thompson
I'll privately track the number of unique submissions, and maybe a few
other bits of meta-data.
and pls let us know these stats.... its the oonly way to knwomif our
data is of any use.

Thanks for letting me make some noise....

Alex





--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Chris Santerre
21 years ago
Permalink
...
Just a few things.

English is my first lanquage, and you speak it better then me :)

Oh, I can track submissions thru rulesemporium, we just don't make it public
we can do that ;)

You are right on the freshness issue. I am only one man.

For the most part I agree with Ryan on these issues. BL first if you can, UC
second.

And lastly, NY red chowder, is NOT real chowder!

--Chris
--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Ryan Thompson
21 years ago
Permalink
Ugh. Sorry for slow replies. I hurt.
Post by Chris Santerre
Post by Alex Broens
and pls let us know these stats.... its the oonly way to knwomif our
data is of any use.
Thanks for letting me make some noise....
Just a few things.
English is my first lanquage, and you speak it better then me :)
Oh, I can track submissions thru rulesemporium, we just don't make it
public we can do that ;)
You are right on the freshness issue. I am only one man.
Even if there were a few of us checking and including submissions (so
far I'm the only guy for UC :-) delays are going to be inevitable.
However, I think the additional hand-checking is important. I know we're
less stringent than outright BLs, but I've already had submissions to
***@sasknow.com containing domains that haven't been hand-checked at all
by *anyone*, and, in going through a few of the domains, a large
proportion of them are obvious white-hat domains which were, at best,
collateral damage along with real spam.

I suppose we could look at some sort of layered trust system where there
is a very small kernel of highly trusted individuals who have 100%
inclusion and veto power (i.e., direct access to the list data), but
even that, in many cases, would rely on consensus. Outside of that, we
build layers of folks, in order of coolness. Submitters with known
methods and excellent proven track records may well be able to have
their domains automatically added to the list, and then put in a queue
to hand-checked by a kernel member within a day or so. That eliminates
the lag, and greatly mitigates the impact of FPs being discovered weeks
or months later.

Outside of that, we could have any number of submitters whose
submissions are automatically pending review. These would essentially be
"from the public". Kernel members can approve or veto on one vote. For
scalability, we can allow submitters in the "cool" layer to comment on
submissions, and, maybe after a couple of votes from the "cool" group,
domains can be automatically included. As long as we have representation
from a few time zones, things can still happen pretty quickly.

Of course, we can have as many or as few layers as we need. At first, we
probably just need the kernel group (1-3 people max) and another group
of trusted submitters (4-5?).

It doesn't have to be terribly formal or complicated... this can all be
done on closed mailing lists for now.

This is just one way to do it... and maybe, for now, we can keep it a
bit simpler, and move towards more rigour as volume increases.
Post by Chris Santerre
For the most part I agree with Ryan on these issues. BL first if you
can, UC second.
Yeah, I really think it pays to have "bins" of domains that fall most
strongly into one category. I'll concede that there *may* be some
overlap, but, in this case, I think we should strive to avoid as much
overlap as possible. Certainly, listing in BL should trump listing in
UC. If the domain is listed in SURBL, there's no real point to us
listing it, too.
Post by Chris Santerre
And lastly, NY red chowder, is NOT real chowder!
;-) Also, 7-Up isn't green. I dunno if they have 7-Up in Europe..

- Ryan
--
Ryan Thompson <***@sasknow.com>

SaskNow Technologies - http://www.sasknow.com
901-1st Avenue North - Saskatoon, SK - S7K 1Y4

Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America
--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Ryan Thompson
21 years ago
Permalink
Post by Ryan Thompson
Ugh. Sorry for slow replies. I hurt.
Oh yeah, and, what I meant to say first, was, "great ideas, guys!" Once
the fog clears in my head, this'll be way more fun. :-)

- Ryan
--
Ryan Thompson <***@sasknow.com>

SaskNow Technologies - http://www.sasknow.com
901-1st Avenue North - Saskatoon, SK - S7K 1Y4

Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America
--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Bret Miller
21 years ago
Permalink
I received an off-list message from somebody wanting to help
out. Just in case they didn't want to identify themselves
on-list, I've removed any identifiable bits so I can forward
my reply, because it is generally applicable to folks that
want to help.
No... Wasn't trying to be anonymous. Just have to get used to using "reply
all" on another list. Most of the lists I use, a simple reply gets directed
back to the list, so it's just a matter of retraining myself for this one.

I will (when I have a few minutes) download getURI and see how it works on
windoze. The other thing is that my web mail client has very good search
capabilities, so I can easily check a domain again the spam/ham folders and
double-check that the messages it hits on are really classified correctly.

I may not be able to keep up with checking all the domains against my
messages if the volume gets large, but I'll certainly try to do what I can.

Plus, it's been my practice to only submit false negatives when they occur.
It'd probably be better for all of us if I could take the time occasionally
to go through my spam corpus, pull all the domains, hand-check them, and
report everything.

Bret

--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Ryan Thompson
21 years ago
Permalink
...
:-) OK, I just didn't want to jump the gun. :-)

Everyone, I can set an automatic Reply-To for this list. Would that be
preferable? Personally, I'm more used to replying to all, but this is
your list, too. :-)
Post by Bret Miller
I will (when I have a few minutes) download getURI and see how it
works on windoze. The other thing is that my web mail client has very
good search capabilities, so I can easily check a domain again the
spam/ham folders and double-check that the messages it hits on are
really classified correctly.
Cool! As for GetURI, you can let me know off-list if you have any
questions/problems. I really have no idea how it'll run in Windows, but,
if it's not too difficult, I'll see if I can make it work. Check the
README first for the other module requirements.
Post by Bret Miller
I may not be able to keep up with checking all the domains against my
messages if the volume gets large, but I'll certainly try to do what I can.
Sure. At least with a larger corpus, you can still just take a smaller
sample.
Post by Bret Miller
Plus, it's been my practice to only submit false negatives when they
occur. It'd probably be better for all of us if I could take the time
occasionally to go through my spam corpus, pull all the domains,
hand-check them, and report everything.
Sure, that'd be great.

- Ryan
--
Ryan Thompson <***@sasknow.com>

SaskNow Technologies - http://www.sasknow.com
901-1st Avenue North - Saskatoon, SK - S7K 1Y4

Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America
--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Bret Miller
21 years ago
Permalink
Post by Ryan Thompson
Cool! As for GetURI, you can let me know off-list if you have
any questions/problems. I really have no idea how it'll run
in Windows, but, if it's not too difficult, I'll see if I can
make it work. Check the README first for the other module
requirements.
See the requirements. This will have to wait until I get SA 3.0 installed, I
see. Since the release was "immanent" in July, I decided to wait "a couple
weeks" for the released version. And now, it's been a couple months, so I'm
wondering if I should just install RC3 for now... Guess I'll see how my free
time goes.

So, the question then remains in my mind, how do I get access to submitted
domains to check them? Or will they simply be posted to this list?

Bret


--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Ryan Thompson
21 years ago
Permalink
...
Actually, if you don't mind trying the 1.6-DEVEL version, I have
re-added support for 2.6x (by popular demand :-). It just needs some
testing. It's not linked on the site, but I can send you a copy
off-list. Just let me know.
Post by Bret Miller
So, the question then remains in my mind, how do I get access to
submitted domains to check them? Or will they simply be posted to this
list?
Working on that (actually, that was the original point of this thread).
Since subscription to this list is open, and submissions are private, I
don't think it'd be a good idea to post submissions. However, I could
indeed grant access to the ***@sasknow.com address to a short list of
submission checkers... or pre-process submissions with GetURI and then
send those to a select few checkers. What'd be the best?

- Ryan
--
Ryan Thompson <***@sasknow.com>

SaskNow Technologies - http://www.sasknow.com
901-1st Avenue North - Saskatoon, SK - S7K 1Y4

Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America
--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Bret Miller
21 years ago
Permalink
Post by Ryan Thompson
Actually, if you don't mind trying the 1.6-DEVEL version, I
have re-added support for 2.6x (by popular demand :-). It
just needs some testing. It's not linked on the site, but I
can send you a copy off-list. Just let me know.
Cool. Sure! I'm always up for testing stuff when it doesn't affect my
production server.
...
I'm new to this process myself, so my opinion doesn't result from
experience. But if you can set up a second list for checkers and just send
out the domains there-- checkers could then do their stuff and report their
comments back to that list. In fact, you could easily make the submission
form simply send the submission to the checker list.

Bret

--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Ryan Thompson
21 years ago
Permalink
Post by Bret Miller
Post by Ryan Thompson
Actually, if you don't mind trying the 1.6-DEVEL version, I have
re-added support for 2.6x (by popular demand :-). It just needs some
testing. It's not linked on the site, but I can send you a copy
off-list. Just let me know.
Cool. Sure! I'm always up for testing stuff when it doesn't affect my
production server.
:-) Wise thinking.
...
Yeah. That's more or less what I had in mind, I think.

I could also just make the GetURI reports for submissions available on a
hidden or password protected page, and anonymify the submitter (but
assign each submission a unique ID so I can track it down). Would any
submitters have objections to that?

- Ryan
--
Ryan Thompson <***@sasknow.com>

SaskNow Technologies - http://www.sasknow.com
901-1st Avenue North - Saskatoon, SK - S7K 1Y4

Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America
--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Bret Miller
21 years ago
Permalink
...
Either way works for me. I guess I prefer the e-mail method since that
doesn't require me to actually go check a page to see if there are new
submissions. It's just a personal preference though-- fits in better
with how I work.

I guess there must be some submitters that want to be anonymous. I don't
care if the checkers know that it was me submitting a domain. But I
prefer not to be in the spotlight either. So if it means ID-ing the
submissions and hiding the submitter from the rest of us, that's fine
with me too.

Bret




--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Ryan Thompson
21 years ago
Permalink
...
Yeah, we'll still send things out via email.
Post by Bret Miller
I guess there must be some submitters that want to be anonymous. I
don't care if the checkers know that it was me submitting a domain.
But I prefer not to be in the spotlight either. So if it means ID-ing
the submissions and hiding the submitter from the rest of us, that's
fine with me too.
Agreed. Let's do it... unless anyone has objections.
I've started assigning submitter IDs and using those in the TXT field of
the entry, along with the count, in the format of:

YYYYMMDD-ID,ID,...

Where YYYYMMDD is the date the domain was added to UC, and the IDs are
4-digit IDs for submitters and approvers. I may remove that from the
zone and store it offline.

- Ryan
--
Ryan Thompson <***@sasknow.com>

SaskNow Technologies - http://www.sasknow.com
901-1st Avenue North - Saskatoon, SK - S7K 1Y4

Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America
--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Ryan Thompson
21 years ago
Permalink
I received an off-list message from somebody wanting to help out. Just
in case they didn't want to identify themselves on-list, I've removed
any identifiable bits so I can forward my reply, because it is generally
applicable to folks that want to help.
Of course, I'd love to help in any way I can. However, I don't have a
large corpus of anything since we just don't have that high of volume.
Of course, large corpora are nice, but they aren't strictly necessary.
You may indeed be seeing some unique spams that the rest of us don't.
And, even if you're seeing something we're seeing, too, having the extra
data point would lead to higher accuracy.

Also, your ham corpus, regardless of size, might be just as valuable to
help identify legitimate uses and FPs.
Also, we're a purely Windows shop, so that generally leave me lacking
for tools for doing the checking anyway. But I'll help if I can.
Well, if you're volume isn't that high (what does that mean? 100 spams a
day? 1000? 10000?) Even if it's 100, I could likely just absorb your
messages into my own corpora. Assuming that, with your low volume, that
you're carefully hand-classifying ham and spam messages, there are a
bunch of ways I see you (or anyone!) being able to help:

1. Use GetURI against your corpus. It really shines in batches of a few
hundred to a few thousand messages. AFAIK, no one has ever tested it
on Windows, but it does use Perl and SA3.0, so it may be within the
realm of possibility. I could use a good Windows tester. :-)

2. Send me (privately) your hand-classified spam feed. You could do this
via an IMAP folder on our server (so you just "Save" the messages to
a different folder), or, if your Windows stuff uses (or can export
to) maildir or mbox, I can do the processing and let you go through
the results.

3. Help us go through submissions. Research submitted domains and make
your comments. Most of the research can be done with web-based tools,
and there are also freeware Windows versions of many UNIX utilities
like whois, dig, host, etc, to make your life a little easier.

4. Post your perspectives and suggestions on this mailing list.

5. If you're handy with HTML and/or graphics, the site at
http://uc.sasknow.com/ could use maintaining. The initial version
took me 32 minutes from scratch with vim, judging from the timestamps
in the public_html directory. ;-) We'll need to publish submission
criteria (when we have some), and possibly set up some forms for
public submission (or, Chris, are we going to piggy-back on the
SARE/SURBL one?)

6. Watch the SURBL discuss list and other lists for reports of domains
that may be "grey" and worthy of inclusion. I know there were a few
posted to SURBL discuss (I'm thinking of the ones Chris posted) that,
with a little hand-checking, would probably be excellent candidates
for UC.

- Ryan
--
Ryan Thompson <***@sasknow.com>

SaskNow Technologies - http://www.sasknow.com
901-1st Avenue North - Saskatoon, SK - S7K 1Y4

Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America
--
The SURBL Unconfirmed mailing list: uc-***@sasknow.com
More Information: http://uc.sasknow.com/
Loading...