Get domain and tld ?

Professional Software Engineering PSE-L at
Sun Jan 25 02:36:28 CET 2009

At 01:35 2009-01-25 +0100, Xavier Maillard wrote:
>I am struggling with something really simple: have multiple
>"match" from a single rule.

Keep struggling.  You need a series for rules.

>I want to get both tld and *last* part of the domain for any
>processed email.

 From where?  A From: header, a Received line, a To:, what?

>For example: would give TLD=com and

Since you're not showing an email address, I'll presume you already have a 
recipe that assigns that string to a variable.  Let's say 
that it has been assigned to $FROMDOMAIN by some prior act on your 
part.  For example sake here, I'll deliberatley assign it:

# note the recipe will still work even if this is ""

# first, match the domain down to JUST the rightmost two domain tokens
# (i.e. remove the optional hostname levels).  As parsed here, I'm allowing
# for the FROMDOMAIN to actually be an email address - this will still work.
* FROMDOMAIN ?? [@.]?\/[^@.]+\.[^.]+$
         # preserve the match result - you could repeat the above match
         # instead, but I prefer to do the work once.

         # next, get the TLD portion.  You could use TOPDOMAIN here, but I'm
         # demonstrating that because MATCH still contains the result of the
         # prior match, you can use it as the source to match against.
         * MATCH ?? .*\.\/[^.]+$

         # we need to fall back to the saved TOPDOMAIN and get the
         # domain portion (because the recipe above has truncated MATCH).
         * TOPDOMAIN ?? ^\/[^.]+

BTW, you do realize that outside of the country-generic TLDs such as .com, 
.org, .net, .biz, etc, that some country specific TLDs often have their own 
secondary heirarchy.  For example:

Your desire to parse this will net you:

Which frankly, won't get you far.

>Is there any way to achieve this with one single rule ?

While a series of conditions in a rule could aquire a match and then re-use 
that match to acquire a subsequent match (as above), the original match is 
then lost.  You must assign the result (within the action portion), and 
then run a new match condition (as a subsequent rule or a nested rule - see 

This is (AFAIK) the most concise way to write the above extraction:

* FROMDOMAIN ?? [@.]?\/[^@.]+\.[^.]+$
* MATCH ?? .*\.\/[^.]+$

* FROMDOMAIN ?? [@.]?\/[^@.]+\.[^.]+$
* MATCH ?? ^\/[^.]+

However, having two identical initial extractions is icky - if you later 
determine that you need to update it to deal with some funky variation, you 
need to remember to do them BOTH.  The nested approach doesn't have that 
problem.  The trade off is that the nested approach needs to use an 
intermediate variable to hold the initial results so that they can be reused.

I must wonder, is this an extension of trying to sort list mail 
automatically?  Have you seen the listname_id.rc ruleset?  Search the archives.

  Sean B. Straw / Professional Software Engineering

  Procmail disclaimer: <>
  Please DO NOT carbon me on list replies.  I'll get my copy from the list.

More information about the procmail mailing list