I have a quote reject script (one from the Smartlist FAQ) set on all my lists to keep quoting to a minimum. Lately, more and more of my subscribers -- mostly newbies (both on my list and on the Internet itself) -- are using unconventional quoting styles and their posts are sneaking past the quote reject, often with massive amounts of requoting. My current quote reject script looks for the standard greater-than carats (>) at the beginning of lines. I now have several people who simply put quotation marks around quoted sections. For example: "This is stuff someone else said. I'm requoting it and then I'm going to follow it by saying nothing more than than I agree with what they said. I will not add anything of substance in my response, and I will do this repeatedly throughout the rest of my 40k post to this list ... today, tomorrow, and as long as I am a subscriber." And then they write their response and go into the next quoted paragraph. It's annoying as all hell and I'm getting complaints. Has anyone here devised a script to reject based on this style of quoting when it's used excessively? I would really rather not have to start siphoning off certain subscribers' posts just to count the lines by hand. That's such a pain. But my requests for these folks to monitor themselves have not worked. I have to do something. Any help or suggestions are appreciated. Violet xoxox
On Thu, 22 Nov 2001 violet@torithoughts.org wrote:
My current quote reject script looks for the standard greater-than carats (>) at the beginning of lines. I now have several people who simply put quotation marks around quoted sections.
Has anyone here devised a script to reject based on this style of quoting when it's used excessively?
The following is a shell script I wrote some years ago for a list that collects topical postings for an interest group. By nature, lots of the subscribers would come across the same article on different news sites or what-have-you at about the same time, and would all forward them to the topical list without checking whether they'd been forwarded already. The script was intended to eliminate most of these duplicate postings. It works by searching the list archive for previous postings that contain "too much" of the same content as the new posting. If it finds something, it outputs a message to be mailed back to the poster and exits 0. Note that it has a section that attempts to exclude follow-ups from the filtering; it's from the beginning of the while-do loop up to the "At this point ..." comment. Simply delete that to filter for excessively- quoted follow-ups. You may need to fiddle with the similarity ratios in that case. To use this, put it in a file named "similar" in the .bin directory of the smartlist installation, uncomment the RC_LOCAL_SUBMIT_20 lines in rc.submit, and add the following recipe lines to the rc.local.s20 file: --- recipe ---- :0 fW | similar :0 a ! $sendmailOPT -t ---- end recipe ---- About the "Zanshin Public License" -- it's essentially identical to the Mozilla Public License. It says that you can use the following all you like, with or without changes, as long as you don't republish it, but that if you republish it with any modifications then you have to add your name to the Contributors list and clearly document what parts of the new file comprise your changes. ---- 8< ---- snip ---- 8< ---- snip ---- 8< ---- #!/bin/sh # The contents of this file are subject to the Zanshin Public License # Version 1.0 (the "License"); you may not use this file except in # compliance with the License. You may obtain a copy of the License at # http://www.zanshin.com/ZPL.html # # Software distributed under the License is distributed on an "AS IS" # basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the # License for the specific language governing rights and limitations # under the License. # # The Original Code is SQK version 0.1, Thu Sep 24 17:27:11 PDT 1998. # # The Initial Developer of the Original Code is Zanshin, Inc. # Portions created by Zanshin are Copyright (C) 1998 Zanshin, Inc. # All Rights Reserved. # # Contributor(s): ______________________________________. # This script is intended to run as a procmail recipe filter from one of # the RC_LOCAL_SUBMIT scripts in the SmartList mailing list manager. It # expects to be run from the top-level directory of a SmartList-managed # mailing list, and to find an RFC822 (including MIME) message on stdin. # The filter exits with status 0 if the input message is substantially # similar to another message already in the archive of the mailing list, # and is not a reply to that similar message. Otherwise the filter exits # nonzero. The message is not delivered anywhere in either case. # The algorithm: # 1. Extract "interesting" lines from the body of the input message. # 2. Search archived messages of approximately the same size as the input # for those lines. (Some size fudge is permitted, to detect e.g. the # case where a digest forwarded to the list contains messages that have # already been submitted individually.) # 3. If any archived message contains a significant number of the input # lines, check whether the input message is a reply to that message. # 3a. If so, let the input message pass. # 3b. If not, exit 0 (return successful filter completion to procmail). # 4. After searching all archived messages, exit 1. # Configuration: min_interesting=15 # Don't filter messages shorter than this (lines) cmp_ratio="1 / 2" # Compare to messages bigger than size * cmp_ratio same_ratio="1 / 2" # Same if more than same_ratio of lines in common temppfx=archive/$$ cmpfile=${temppfx}cmp.tmp hdrfile=${temppfx}hdr.tmp exitcode=${temppfx}exit.tmp # SmartList oddities: # There's no hook in the default SmartList rc.submit for adding rules to # test whether the message should be archived. The closest we can come # without modifying rc.submit is RC_LOCAL_SUBMIT_20, which happens after # the message has been archived. We can check $ARCHIVE to see if the # input message is already in the archive directory somewhere. skip=`echo "stdin$ARCHIVE" | sed 's@.*/@@'` # Unless testing, bail when not run by SmartList. case $- in *[xn]*) ;; *) [ -z "$listaddr" ] && echo 1>&2 "This script should be run by SmartList from rc.submit" && exit 64 ;; esac # This silly stuff is to work around shell forking behavior, so that # the exit codes are correct. Not all shells exit on "exit", sigh. echo 1 > $exitcode || exit 1 trap 'fail=`cat $exitcode`; /bin/rm -f ${temppfx}*.tmp; exit $fail' 0 1 2 3 # Strip blank and otherwise uninteresting lines and excerpt prefixing # from the incoming message. This is the data we use to check for a # similar message already in the archive. Also stash the header so # we can parse interesting stuff out of it later. : > $hdrfile sed -e '1,/^$/'"{ w $hdrfile d }" \ -e 's/^[>}: ]*//' \ -e '/.....................$/!d' \ -e '/^[- ~!@#$%^&*()_+={}:;,.<>?/]*$/d' > $cmpfile || exit 1 # Compute the sizes of the incoming data and the relative sizes of messages # in the archive to be interested in. The curly-brace construct here is # for silly shells like bash that fork the right-hand-side of pipelines, # thus making a pipe into read useless without it. wc < $cmpfile | { read lsize wsize csize [ "$lsize" -lt "$min_interesting" ] && exit 1 chalf=`expr $csize \* $cmp_ratio` lhalf=`expr $lsize \* $same_ratio` messageids=`formail -x in-reply-to -x references < $hdrfile` # Here we search all "large enough" archived messages for lines found in # the incoming message, and discard any that don't have sufficiently many # lines in common with the incoming message. find archive/latest -type f -size +${chalf}c \! -name $skip -print | { xargs fgrep -cf $cmpfile ; echo x:$lhalf ; } | sort -nrt: +1 | sed -e "/:$lhalf"'$/,$d' -e 's/:/ /' | while read archived same do # We now have candidates for messages with equivalent content. # Screen out any message to which this one is a reply, in case # the sender's only offense is excessive excerpting. archiveid=`formail -x message-id < $archived | sed 's/[><]//g'` echo "$messageids" | fgrep -q "$archiveid" && break # Problem: Some MicroSoft mail programs, when replying, include # the text of the original message as if it had been forwarded, # and do not use In-Reply-To or References headers. (I believe # that the MS Exchange inbox client on Win95 has this problem; # clients identifying themselves as "Internet Mail Service" too.) # The string "-----Original Message-----" appears in this case. MS_original=`fgrep -e "-----Original Message-----" $cmpfile` if test -n "$MS_original" then archivesubj=`formail -zx subject < $archived` messagesubj=`fgrep -e "$archivesubj" $hdrfile` test -n "$messagesubj" && break fi # At this point we think we have a duplicated submission. echo 1>&2 "$skip: $archived has the same content ($same/$lsize)." # Use formail to generate a reply to the input message. Add the # anti-loop headers in case we accidentally route to the list. sed '/^Resent/d' < $hdrfile | formail -rti"From: $listreq" \ -A"X-Loop: $listaddr" \ -I"Precedence: junk" echo "Your recent submission to $listaddr includes content" echo "that is very similar to the content of a previous submission." if [ "$skip" != stdin ] then echo "Your submission has been recorded at the $listreq" echo "archive server as latest/$skip but has NOT been forwarded to the" else echo "Therefore, your submission has NOT been forwarded to the" fi echo "recipients of the mailing list. The previous submission is:" echo "" formail -Xfrom: -Xdate -Xto -Xcc -Xsubject < $archived | sed 's/^/ /' echo "" echo "You can obtain the entire previous submission by sending a message:" echo "" echo " To: $listreq" echo " Subject: archive" echo "" echo " get latest/`echo $archived | sed 's@.*/@@'`" echo " quit" echo "" echo "Thank you for your help in reducing duplication on $listaddr." echo 0 > $exitcode exit 0 done exit `cat $exitcode` } exit `cat $exitcode`
Hi all. I'm currently running a mailing list with over 12,000 subscribers on Smartlist. Many of my subs have hotmail and yahoo addresses. The way choplist works right now, I believe it hacks the list up by domain name and sends it out. Trouble is, yahoo will only allow so many emails before it refuses the rest as "too many recipients". (I'm not even sure how many are allowed before it denies the rest...) My question is, is there a way to modify choplist to get around this problem? And if so, how? Our mailing list is only going to get bigger... Please keep in mind that although I am computer literate when it comes to HTML, I am new to mailing list management and need any explanations in laypersons' terms. :) Thanks a bunch! Autumn Williams
"B & A" <prjctlnk@pacinfo.com> writes:
Hi all. I'm currently running a mailing list with over 12,000 subscribers on Smartlist. Many of my subs have hotmail and yahoo addresses. The way choplist works right now, I believe it hacks the list up by domain name and sends it out. Trouble is, yahoo will only allow so many emails before it refuses the rest as "too many recipients". (I'm not even sure how many are allowed before it denies the rest...)
My question is, is there a way to modify choplist to get around this problem? And if so, how? Our mailing list is only going to get bigger...
Since choplist should be passing the message and addresses to your local MTA, this is really a problem for your MTA to handle. It should be automatically using multiple SMTP transactions (but not necessarily multiple connections) to deliver to all the recipient. Are you actually seeing bounce messages from this? Philip Guenther
participants (4)
-
B & A
-
Bart Schaefer
-
Philip Guenther
-
violet@torithoughts.org