Filter on Scandinavian characters in subject

Hi, when I use recipes like these to filter messages with Scadinavian characters (æ,ø,å) in Subject it fails to work. My locale is nb_NO.UTF-8. Is there a recipe that can be used to match these cases? :0 * ^Subject:.*lån innboks/IN-spam/ :0 * Subject:.*belønning innboks/IN-spam/ Jostein

On Fri, 14 Sep 2018, at 15:40, Jostein Berntsen wrote:
when I use recipes like these to filter messages with Scadinavian characters (æ,ø,å) in Subject it fails to work. My locale is nb_NO.UTF-8. Is there a recipe that can be used to match these cases?
:0 * ^Subject:.*lån innboks/IN-spam/
A proper message must have such characters encoded. Look at the source of messages. You will see something like (for "lån") =?UTF-8?B?bMOlbg==?= When you match against this (mind the ? und escape them as \?) it should work. -- -- Andreas :-)

On 14.09.18,17:34, Andreas Schamanek wrote:
On Fri, 14 Sep 2018, at 15:40, Jostein Berntsen wrote:
when I use recipes like these to filter messages with Scadinavian characters (æ,ø,å) in Subject it fails to work. My locale is nb_NO.UTF-8. Is there a recipe that can be used to match these cases?
:0 * ^Subject:.*lån innboks/IN-spam/
A proper message must have such characters encoded. Look at the source of messages. You will see something like (for "lån")
=?UTF-8?B?bMOlbg==?=
When you match against this (mind the ? und escape them as \?) it should work.
Thanks. I solved it doing this: :0 h * ^Subject:.*=\? SUBJECT=| formail -cXSubject: | perl -MEncode -ne 'print encode("UTF8",decode("MIME-Header",$_))' :0 hE SUBJECT=| formail -cXSubject: :0 * SUBJECT ?? ^Subject:.*lån innboks/IN-spam/ Something for the manual maybe? Jostein

On 14 Sep 2018, at 09:57, Jostein Berntsen <jbernts@broadpark.no> wrote:
On 14.09.18,17:34, Andreas Schamanek wrote:
On Fri, 14 Sep 2018, at 15:40, Jostein Berntsen wrote:
when I use recipes like these to filter messages with Scadinavian characters (æ,ø,å) in Subject it fails to work. My locale is nb_NO.UTF-8. Is there a recipe that can be used to match these cases?
:0 * ^Subject:.*lån innboks/IN-spam/
A proper message must have such characters encoded. Look at the source of messages. You will see something like (for "lån")
=?UTF-8?B?bMOlbg==?=
When you match against this (mind the ? und escape them as \?) it should work.
Thanks. I solved it doing this:
:0 h * ^Subject:.*=\? SUBJECT=| formail -cXSubject: | perl -MEncode -ne 'print encode("UTF8",decode("MIME-Header",$_))'
By rewriting the message to include UTF-8 characters in the headers you have just made your message invalid as the mail headers can only contain 7-BIT ASCII and anything else must be encoded. However, it's your mail, do as you will. You *will* have issues if you try to do something else with those messages, ever. Like, for example, import them into a different client. Or put them on an IMAP server.
Something for the manual maybe?
No. Andreas gave you the right solution, match against the encoded text in the subject :0 * ^Subject:.*\UTF-8\?\V\?bMOlbg { do stuff } Or, save your UTF-8 decoded subject into a variable like UTFSUB=| formail… -- Space Directive 723: Terraformers are expressly forbidden from recreating Swindon.

On Wed, 19 Sep 2018, at 14:07, @lbutlr wrote:
:0 h * ^Subject:.*=\? SUBJECT=| formail -cXSubject: | perl -MEncode -ne 'print encode("UTF8",decode("MIME-Header",$_))'
By rewriting the message to include UTF-8 characters in the headers you have just made your message invalid ...
This is not rewriting a message, it is assigning a variable. It's exactly what you later suggested yourself, and I agree that it is the more versatile solution:
Or, save your UTF-8 decoded subject into a variable like UTFSUB=| formail…
-- -- Andreas :-)

On 21.09.18,13:05, Andreas Schamanek wrote:
On Wed, 19 Sep 2018, at 14:07, @lbutlr wrote:
:0 h * ^Subject:.*=\? SUBJECT=| formail -cXSubject: | perl -MEncode -ne 'print encode("UTF8",decode("MIME-Header",$_))'
By rewriting the message to include UTF-8 characters in the headers you have just made your message invalid ...
This is not rewriting a message, it is assigning a variable. It's exactly what you later suggested yourself, and I agree that it is the more versatile solution:
So my approach is a good one after all? :) Jostein
Or, save your UTF-8 decoded subject into a variable like UTFSUB=| formail…
-- -- Andreas
:-) ____________________________________________________________ procmail mailing list -- procmail@lists.rwth-aachen.de Procmail homepage: http://www.procmail.org/ To unsubscribe send an email to procmail-leave@lists.rwth-aachen.de https://lists.rwth-aachen.de/postorius/lists/procmail.lists.rwth-aachen.de

On 2018-09-24 19:58, Jostein Berntsen wrote:
On 21.09.18,13:05, Andreas Schamanek wrote:
On Wed, 19 Sep 2018, at 14:07, @lbutlr wrote:
:0 h * ^Subject:.*=\? SUBJECT=| formail -cXSubject: | perl -MEncode -ne 'print encode("UTF8",decode("MIME-Header",$_))'
By rewriting the message to include UTF-8 characters in the headers you have just made your message invalid ...
This is not rewriting a message, it is assigning a variable. It's exactly what you later suggested yourself, and I agree that it is the more versatile solution:
So my approach is a good one after all? :)
Yes, See also procmailex. -- Ruud
participants (4)
-
@lbutlr
-
Andreas Schamanek
-
Jostein Berntsen
-
Ruud H.G. van Tol