.\" date, expiry date, copyright year, and revision
.DA "26 Feb 2002"
.ds e "26 Aug 2002
.ds c 2002
.ds r 02
.\" boilerplate
.pl 10i
.nr PL 10i
.po 0
.nr PO 0
.ll 7.2i
.nr LL 7.2i
.lt 7.2i
.nr LT 7.2i
.hy 0
.nr HY 0
.ad l
.nr PD 1v
.\" macros for paragraph, section header, reference, TOC
.de P
.br
.LP
.in 3
..
.de H
.br
.ne 5
.LP
.in 0
..
.de R
.IP "   [\\$1]" 14
..
.de T
.ie \\$1=1 \{\
.nf
.ta \n(LLu-3nR
.\}
.el \{\
.fi
.\}
..
.de S
.ie '\\$1'' \\$2 \a \\$3
.el \\$1. \\$2 \a \\$3
..
.\" headers/footers
.ds LH "Internet Draft
.ds CH "IKE Implementation Issues
.ds RH "\*(DY
.ds LF "Spencer & Redelmeier
.ds CF "
.ds RF "[Page %]
.\" and let's get started
.RT
.nf
.tl 'Network Working Group''Henry Spencer'
.tl 'Internet Draft''SP Systems'
.tl 'Expires: \*e''D. Hugh Redelmeier'
.tl '''Mimosa Systems'
.tl '''\*(DY'
.sp
.ce 99
IKE Implementation Issues
<draft-spencer-ipsec-ike-implementation-\*r.txt>
.ce 0
.H
Status of this Memo
.P
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
.P
(If approved as an Informational RFC...)
This memo provides information for the Internet community.
This memo does not specify an Internet standard of any kind.
.P
Distribution of this memo is unlimited.
.P
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups.
Note that
other groups may also distribute working documents as Internet-Drafts.
.P
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.
It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
.P
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
.P
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
.P
This Internet-Draft will expire on \*e.
.H
Copyright Notice
.P
Copyright (C) The Internet Society \*c.  All Rights Reserved.
.bp
.H
Table of Contents
.P
.T 1
.S "1" "Introduction" "3"
.S "2" "Lower-level Background and Notes" "4"
.S "2.1" "Packet Handling" "4"
.S "2.2" "Ciphers" "5"
.S "2.3" "Interfaces" "5"
.S "3" "IKE Infrastructural Issues" "5"
.S "3.1" "Continuous Channel" "5"
.S "3.2" "Retransmission" "5"
.S "3.3" "Replay Prevention" "6"
.S "4" "Basic Keying and Rekeying" "7"
.S "4.1" "When to Create SAs" "7"
.S "4.2" "When to Rekey" "8"
.S "4.3" "Choosing an SA" "9"
.S "4.4" "Why to Rekey" "9"
.S "4.5" "Rekeying ISAKMP SAs" "10"
.S "4.6" "Bulk Negotiation" "10"
.S "5" "Deletions, Teardowns, Crashes" "11"
.S "5.1" "Deletions" "11"
.S "5.2" "Teardowns and Shutdowns" "12"
.S "5.3" "Crashes" "13"
.S "5.4" "Network Partitions" "13"
.S "5.5" "Unknown SAs" "14"
.S "6" "Misc. IKE Issues" "16"
.S "6.1" "Groups 1 and 5" "16"
.S "6.2" "To PFS Or Not To PFS" "16"
.S "6.3" "Debugging Tools, Lack Thereof" "16"
.S "6.4" "Terminology, Vagueness Thereof" "17"
.S "6.5" "A Question of Identity" "17"
.S "6.6" "Opportunistic Encryption" "17"
.S "6.7" "Authentication and RSA Keys" "17"
.S "6.8" "Misc. Snags" "18"
.S "7" "Security Considerations" "19"
.S "8" "References" "19"
.S "" "Authors' Addresses" "20"
.S "" "Full Copyright Statement" "21"
.T 0
.bp
.H
Abstract
.P
The current IPsec specifications for key exchange and connection management,
RFCs 2408 [ISAKMP] and 2409 [IKE],
leave many aspects of connection management unspecified,
most prominently rekeying practices.
Pending clarifications in future revisions of the specifications,
this document sets down some successful experiences,
to minimize the extent to which new implementors have to rely
on unwritten folklore.
.P
The Linux FreeS/WAN implementation of IPsec interoperates
with almost every other IPsec implementation.
This document describes how the FreeS/WAN project has resolved
some of the gaps in the IPsec specifications
(and plans to resolve some others),
and what difficulties have been encountered,
in hopes that this generally-successful experience
might be informative to new implementors.
.P
This is offered as an Informational RFC.
.P
This -\*r revision mainly:
discusses ISAKMP SA expiry during IPsec-SA rekeying (4.5),
revises the discussion of bidirectional Deletes (5.1),
suggests remembering the parameters of successful negotiations
for later use (4.2, 5.3),
notes an unsuccessful negotiation from the other end as a hint of a possibly
broken connection (5.5),
and adds sections on network partitions (5.4),
authentication methods and the subtleties of RSA public keys (6.7),
and miscellaneous interoperability concerns (6.8).
.H
1. Introduction
.P
The current IPsec specifications for key exchange and connection management,
RFCs 2408 [ISAKMP] and 2409 [IKE],
leave many aspects of connection management unspecified,
most prominently rekeying practices.
This is a cryptic puzzle which
each group of implementors has to struggle with,
and differences in how the ambiguities and gaps are resolved are
potentially a fruitful source of interoperability problems.
We can hope that future revisions of the specifications will clear this up.
Meanwhile, it seems useful to set down some successful experiences,
to minimize the extent to which new implementors have to rely
on unwritten folklore.
.P
The Linux FreeS/WAN implementation of IPsec interoperates
with almost every other IPsec implementation,
and because of its free nature,
it also sees some use as a reference implementation by other implementors.
The high degree of interoperability is noteworthy
given its organizers' strong minimalist bias,
which has caused them to implement only
a small subset of the full glory of IPsec.
This document describes how the FreeS/WAN project has resolved
some of the gaps in the IPsec specifications
(and plans to resolve some others),
and what difficulties have been encountered,
in hopes that this generally-successful experience
might be informative to new implementors.
.P
One small caution about applicability:
this experience may not be relevant
to severely resource-constrained implementations.
FreeS/WAN's target environment is previous-generation PCs,
now available at trivial cost (often,
within an organization, at no cost),
which have quite impressive CPU power and memory by the standards
of only a few years ago.
Some of the approaches discussed here may be inapplicable to
implementations with severe external constraints which prevent them
from taking advantage of modern hardware technology.
.H
2. Lower-level Background and Notes
.H
2.1. Packet Handling
.P
FreeS/WAN implements ESP [ESP] and AH [AH] straightforwardly,
although AH sees little use among our users.
Our ESP/AH implementation cannot currently handle packets
with IP options;
somewhat surprisingly, this has caused little difficulty.
We insist on encryption and do not support authentication-only
connections, and this has not caused significant difficulty either.
.P
MTU and fragmentation issues, by contrast, have been a constant headache.
We will not describe the details of our current approach to them,
because it still needs work.
One difficulty we have encountered is that many combinations of
packet source and packet destination
apparently cannot cope with an "interior minimum" in the path MTU,
e.g. where an IPsec tunnel intervenes and its headers reduce the MTU
for an intermediate link.
This is particularly prevalent when using common PC software to
connect to large well-known web sites;
we think it is largely due to
misconfigured firewalls which do not pass ICMP
Fragmentation Required messages.
The only solution we have yet found is to lie about the MTU of the tunnel,
accepting the (undesirable) fragmentation of the ESP packets
for the sake of preserving connectivity.
.P
We currently zero out the TOS field of ESP packets,
rather than copying it from the inner header,
on the grounds that it lends itself too well to traffic analysis
and covert channels.
We provide an option to restore RFC 2401 [IPSEC] copying behavior,
but this appears to see little use.
.H
2.2. Ciphers
.P
We initially implemented both DES [DES] and 3DES [CIPHERS] for both
IKE and ESP,
but after the Deep Crack effort [CRACK] demonstrated its inherent insecurity,
we dropped support for DES.
Somewhat surprisingly,
our insistence on 3DES has caused almost no interoperability problems,
despite DES being officially mandatory.
A very few other systems either do not support 3DES or support it only
as an optional upgrade,
which inconveniences a few would-be users.
There have also been one or two cases of systems
which don't quite seem to know the difference!
.P
See also section 6.1 for a consequence of our insistence on 3DES.
.H
2.3. Interfaces
.P
We currently employ PF_KEY version 2 [PFKEY],
plus various non-standard extensions,
as our interface between keying and ESP.
This has not proven entirely satisfactory.
Our feeling now is that keying issues and policy issues
do not really lend
themselves to the clean separation that PF_KEY envisions.
.H
3. IKE Infrastructural Issues
.P
A number of problems in IPsec connection management become easier if
some attention is first paid to providing an infrastructure
to support solving them.
.H
3.1. Continuous Channel
.P
FreeS/WAN uses an approximation to the "continuous channel" model,
in which ISAKMP SAs are maintained between IKEs
so long as any IPsec SAs are open between the two systems.
The resource consumption of this is minor:
the only substantial overhead is occasional rekeying.
IPsec SA management becomes significantly simpler if there is always
a channel for transmission of control messages.
We suggest (although we do not yet fully implement this) that
inability to maintain (e.g., to rekey) this control path
should be grounds for tearing down the IPsec SAs as well.
.P
As a corollary of this,
there is one and only one ISAKMP SA maintained between a pair of IKEs
(although see sections 5.3 and 6.5 for minor complications).
.H
3.2. Retransmission
.P
The unreliable nature of UDP transmission is a nuisance.
IKE implementations should always be prepared to retransmit the most recent
message they sent on an ISAKMP SA,
since there is some possibility that the other end did not get it.
This means, in particular,
that the system sending the supposedly-last message of an exchange
cannot relax and assume that the exchange is complete,
at least not until a significant timeout has elapsed.
.P
Systems must also retain information about the message most recently received
in an exchange,
so that a duplicate of it can be detected
(and possibly interpreted as a NACK for the response).
.P
The retransmission rules FreeS/WAN follows are:
(1) if a reply is expected, retransmit only if it does not appear
before a timeout;
and (2) if a reply is not expected (last message of the exchange),
retransmit only on receiving a retransmission of the previous message.
Notably, in case (1) we do NOT retransmit on receiving a retransmission,
which avoids possible congestion problems arising from packet duplication,
at the price of slowing response to packet loss.
The timeout for case (1) is 10 seconds for the first retry,
20 seconds for the second, and 40 seconds for all subsequent
retries (normally only one,
except when
configuration settings call for persistence and the message is
the first message of Main Mode with a new peer).
These retransmission rules have been entirely successful.
.P
(Michael Thomas of Cisco has pointed out that the retry timeouts should
include some random jitter, to de-synchronize hosts which are
initially synchronized by, e.g., a power outage.
We already jitter our rekeying times,
as noted in section 4.2,
but that does not help with initial startup.
We're implementing jittered retries,
but cannot yet report on experience with this.)
.P
There is a deeper problem, of course, when an entire "exchange" consists
of a single message,
e.g. the ISAKMP Informational Exchange.
Then there is no way to decide whether or when a retransmission is
warranted at all.
This seems like poor design, to put it mildly
(and there is now talk of fixing it).
We have no experience in dealing with this problem at this time,
although it is part of the reason why we have delayed implementing
Notification messages.
.H
3.3. Replay Prevention
.P
The unsequenced nature of UDP transmission is also troublesome,
because it means that higher levels must consider the possibility
of replay attacks.
FreeS/WAN takes the position that systematically eliminating this
possibility at a low level is strongly preferable to forcing careful
consideration of possible impacts at every step of an exchange.
RFC 2408 [ISAKMP] section 3.1 states that the Message ID of an
ISAKMP message must be "unique".
FreeS/WAN interprets this literally,
as forbidding duplication of Message IDs
within the set of all messages sent via a single ISAKMP SA.
.P
This requires remembering all Message IDs until the ISAKMP SA is
superseded by rekeying,
but that is not costly (four bytes per sent or received message),
and it ELIMINATES replay attacks from consideration;
we believe this investment of resources is well worthwhile.
If the resource consumption becomes excessive\(emin our experience
it has not\(emthe ISAKMP SA can be rekeyed early to collect the garbage.
.P
There is theoretically an interoperability problem when talking to
implementations which interpret "unique" more loosely
and may re-use Message IDs,
but it has not been encountered in practice.
This approach appears to be completely interoperable.
.P
The proposal by
Andrew Krywaniuk [REPLAY],
which advocates turning the Message ID into an anti-replay counter,
would achieve the same goal without the minor per-message memory overhead.
This may be preferable,
although it means an actual protocol change and more study is needed.
.H
4. Basic Keying and Rekeying
.H
4.1. When to Create SAs
.P
As Tim Jenkins [REKEY] pointed out,
there is a potential race condition in Quick Mode:
a fast lightly-loaded Initiator might start using IPsec SAs very
shortly after sending QM3 (the third and last message of Quick Mode),
while a slow heavily-loaded Responder might
not be ready to receive them until after spending
a significant amount of time creating its inbound SAs.
The problem is even worse if QM3 gets delayed or lost.
.P
FreeS/WAN's approach to this is what Jenkins called "Responder Pre-Setup":
the Responder creates its inbound IPsec SAs before it sends QM2,
so they are always ready and waiting
when the Initiator sends QM3 and begins sending traffic.
This approach is simple and reliable,
and in our experience it interoperates with everybody.
(There is potentially still a problem if FreeS/WAN is the Initiator
and the Responder does not use Responder Pre-Setup,
but no such problems have been seen.)
The only real weakness of Responder Pre-Setup
is the possibility of replay attacks,
which we have eliminated by other means (see section 3.3).
.P
With this approach, the Commit Bit is useless,
and we ignore it.
In fact, until quite recently we discarded any IKE message containing it,
and this caused surprisingly few interoperability problems;
apparently it is not widely used.
We have recently been persuaded that simply ignoring it is preferable;
preliminary experience with this indicates that the result is successful
interoperation with implementations which set it.
.H
4.2. When to Rekey
.P
To preserve connectivity for user traffic,
rekeying of a connection
(that is, creation of new IPsec SAs to supersede the current ones)
must begin before its current IPsec SAs expire.
Preferably one end should predictably start rekeying negotiations first,
to avoid the extra overhead of two simultaneous negotiations,
although either end should be prepared to rekey if the other does not.
There is also a problem with "convoys" of keying negotiations:
for example, a "hub" gateway with many IPsec connections
can be inundated with rekeying negotiations
exactly one connection-expiry time after it reboots,
and the massive overload this induces tends to make this
situation self-perpetuating,
so it recurs regularly.
(Convoys can also evolve gradually from initially-unsynchronized negotiations.)
.P
FreeS/WAN has the concept of a "rekeying margin", measured in seconds.
If FreeS/WAN was the Initiator for the previous rekeying
(or the startup, if none) of the connection,
it nominally starts rekeying negotiations at expiry time
minus one rekeying margin.
Some random jitter is added to break up convoys:
rather than starting rekeying exactly at minus one margin,
it starts at a random time between minus one margin
and minus two margins.
(The randomness here need not be cryptographic in quality,
so long as it varies over time and between hosts.
We use an ordinary PRNG seeded with a few bytes from a cryptographic
randomness source.
The seeding mostly just ensures that the PRNG sequence is different
for different hosts, even if they start up simultaneously.)
.P
If FreeS/WAN was the Responder for the previous rekeying/startup,
and nothing has been heard from the previous Initiator
at expiry time minus one-half the rekeying margin,
FreeS/WAN will initiate rekeying negotiations.
No jitter is applied;
we now believe that it should be jittered,
say between minus one-half margin and minus one-quarter margin.
.P
Having the Initiator lead the way is an obvious way of deciding
who should speak first,
since there is already an Initiator/Responder asymmetry in the connection.
Moreover, our experience has been that Initiator lead gives a significantly
higher probability of successful negotiation!
The negotiation process itself is asymmetric,
because the Initiator must make a few specific proposals which the Responder
can only accept or reject,
so the Initiator must try to guess where its "acceptable" region
(in parameter space)
might overlap with the Responder's.
We have seen situations where negotiations would succeed or fail
depending on which end initiated them,
because one end was making better guesses.
Given an existing connection,
we KNOW that the previous Initiator WAS able to initiate a successful
negotiation,
so it should (if at all possible) take the lead again.
Also, the Responder should remember the Initiator's successful proposal,
and start from that
rather than from his own default proposals if he must take the lead;
we don't currently implement this completely but plan to.
.P
FreeS/WAN defaults the rekeying margin to 9 minutes,
although this can be changed by configuration.
There is also
a configuration option to alter the permissible range of jitter.
The defaults were chosen somewhat arbitrarily,
but they work extremely well
and the configuration options are rarely used.
.H
4.3. Choosing an SA
.P
Once rekeying has occurred,
both old and new IPsec SAs for the connection exist,
at least momentarily.
FreeS/WAN accepts incoming traffic
on either old or new inbound SAs,
but sends outgoing traffic only on the new outbound ones.
This approach appears to be significantly more robust than
using the old ones until they expire,
notably in cases where renegotiation has occurred because something has
gone wrong on the other end.
It avoids having to pay meticulous attention to the state of the other end,
state which is difficult to learn reliably given the limitations of IKE.
.P
This approach has interoperated successfully with ALMOST all other
implementations.
The only (well-characterized) problem cases have been implementations
which rely on receiving a Delete message for the old SAs to tell them
to switch over to the new ones.
Since delivery of Delete is unreliable,
and support for Delete is optional,
this reliance seems like a serious mistake.
This is all the more true because Delete
announces that the deletion has
already occurred [ISAKMP, section 3.15], not that it is about to occur,
so packets already in transit in the other direction could be lost.
Delete should be used for resource cleanup, not for switchover control.
(These matters are discussed further in section 5.)
.H
4.4. Why to Rekey
.P
FreeS/WAN currently implements only time-based expiry (life in seconds),
although we are working toward
supporting volume-based expiry (life in kilobytes) as well.
The lack of volume-based expiry has not been an interoperability
problem so far.
.P
Volume-based expiry does add some minor complications.
In particular, it makes explicit Delete of now-disused SAs more important,
because once an SA stops being used,
it might not expire on its own.
We believe this lacks robustness and is generally unwise,
especially given the lack of a reliable Delete,
and expect to use volume-based expiry only as a supplement
to time-based expiry.
However, Delete support (see section 5) does seem advisable
for use with volume-based expiry.
.P
We do not believe that volume-based expiry alters the desirability
of switching immediately to the new SAs after rekeying.
Rekeying margins are normally a small fraction of the total life of an SA,
so we feel there is no great need to "use it all up".
.H
4.5. Rekeying ISAKMP SAs
.P
The above discussion has focused on rekeying for IPsec SAs,
but FreeS/WAN applies the same approaches to rekeying for ISAKMP SAs,
with similar success.
.P
One issue which we have noticed, but not explicitly dealt with,
is that difficulties may ensue if an IPsec-SA rekeying negotiation
is in progress at the time when the relevant ISAKMP SA gets rekeyed.
The IKE specification [IKE] hints, but does not actually say,
that a Quick Mode negotiation should remain on a single ISAKMP SA throughout.
.P
A reasonable rekeying margin will generally
prevent the old ISAKMP SA from actually expiring during a negotiation.
Some attention may be needed to prevent in-progress negotiations from
being switched to the new ISAKMP SA.
Any attempt at pre-expiry deletion of the ISAKMP SA must be postponed
until after such dangling negotiations are completed,
and there should be enough delay between ISAKMP-SA rekeying and a
deletion attempt to (more or less)
ensure that there are no negotiation-starting packets still in transit
from before the rekeying.
.P
At present, FreeS/WAN does none of this,
and we don't KNOW of any resulting trouble.
With normal lifetimes, the problem should be uncommon,
and we speculate that an occasional disrupted negotiation simply gets retried.
.H
4.6. Bulk Negotiation
.P
Quick Mode nominally provides for negotiating possibly-large numbers of
similar but unrelated IPsec SAs simultaneously
[IKE, section 9].
Nobody appears to do this.
FreeS/WAN does not support it, and its absence has caused no problems.
.H
5. Deletions, Teardowns, Crashes
.P
FreeS/WAN currently ignores all Notifications and Deletes,
and never generates them.
This has caused little difficulty in interoperability,
which shouldn't be surprising (since Notification and Delete support is
officially entirely optional) but does seem to surprise some people.
Nevertheless, we do plan some changes to this approach
based on past experience.
.H
5.1. Deletions
.P
As hinted at above,
we plan to implement Delete support, done as follows.
Shortly after rekeying of IPsec SAs,
the Responder issues a Delete for its old inbound SAs
(but does not actually delete them yet).
The Responder initiates this because the Initiator started using the
new SAs on sending QM3, while the Responder started using them only
on (or somewhat after) receiving QM3,
so there is less chance of old-SA packets still being in transit from
the Initiator.
The Initiator issues an unsolicited Delete only if it does not hear one
from the Responder after a longer delay.
.P
Either party, on receiving a Delete
for one or more of the old outbound SAs of a connection,
deletes ALL the connection's SAs,
and acknowledges with a Delete for the old inbound SAs.
A Delete for nonexistent SAs
(e.g., SAs which have already been expired or deleted) is ignored.
There is no retransmission of unacknowledged Deletes.
.P
In the normal case,
with prompt reliable transmission (except possibly for loss of the
Responder's initial Delete)
and conforming implementations
on both ends, this results in three Deletes being transmitted,
resembling the classic three-way handshake.
Loss of a Delete after the first, or multiple losses,
will cause the SAs not to be deleted on at least one end.
It appears difficult to do much better without at least
a distinction between request and acknowledgement.
.P
RFC 2409 section 9 "strongly suggests" that there be no response to
informational messages such as Deletes,
but the only rationale offered is prevention of infinite loops
endlessly exchanging "I don't understand you" informationals.
Since Deletes cannot lead to such a loop
(and in any case, the nonexistent-SA rule prevents more than one
acknowledgement for the same connection),
we believe this recommendation is inapplicable here.
.P
As noted in section 4.3, these Deletes are intended for
resource cleanup, not to control switching between SAs.
But we expect that they will improve interoperability
with some broken implementations.
.P
We believe strongly that connections need to be considered as a whole,
rather than treating each SA as an independent entity.
We will issue Deletes only for the full set of inbound SAs of
a connection,
and will treat a Delete for any outbound SA as equivalent to deletion
of all the outbound SAs for the associated connection.
.P
The above is phrased in terms of IPsec SAs,
but essentially the same approach can be applied to ISAKMP SAs
(the Deletes for the old ISAKMP SA should be sent via the new one).
.H
5.2. Teardowns and Shutdowns
.P
When a connection is not intended to be up permanently,
there is a need to coordinate teardown,
so that both ends are aware that the connection is down.
This is both for recovery of resources,
and to avoid routing packets through
dangling SAs which can no longer deliver them.
.P
Connection teardown will use the same bidirectional exchange of Deletes
as discussed in section 5.1:
a Delete received for current IPsec SAs (not yet obsoleted by rekeying)
indicates that the other host wishes to tear down the associated connection.
.P
A Delete received for a current ISAKMP SA indicates that the other host
wishes to tear down not only the ISAKMP SA but also all IPsec SAs
currently under the supervision of that ISAKMP SA.
The 5.1 bidirectional exchange might seem impossible in this case,
since reception of an ISAKMP-SA Delete indicates that the other end
will ignore further traffic on that ISAKMP SA.
We suggest using the same tactic discussed in 5.1 for IPsec SAs:
the first Delete is sent without actually doing the deletion,
and the response to receiving a Delete is to do the deletion and reply
with another Delete.
If there is no response to the first Delete,
retry a small number of times and then give up and do the deletion;
apart from being robust against packet loss,
this also maximizes the probability that an implementation which does
not do the bidirectional Delete will receive at least one of the Deletes.
.P
When a host with current connections knows that it is about to shut down,
it will issue Deletes for all SAs involved (both IPsec and ISAKMP),
advising its peers (as per the meaning of Delete [ISAKMP, section 3.15])
that the SAs have become useless.
It will ignore attempts at rekeying or connection startup thereafter,
until it shuts down.
.P
It would be better to have a Final-Contact notification,
analogous to Initial-Contact but indicating that no new negotiations
should be attempted until further notice.
Initial-Contact actually could be used for shutdown notification (!),
but in networks where connections are intended to exist permanently,
it seems likely to provoke unwanted attempts
to renegotiate the lost connections.
.H
5.3. Crashes
.P
Systems sometimes crash.
Coping with the resulting loss of information is easily the most
difficult problem we have found in implementing robust IPsec systems.
.P
When connections are intended to be permanent,
it is simple to specify renegotiation on reboot.
With our approach to SA selection (see section 4.3),
this handles such cases robustly and well.
We do have to tell users that BOTH hosts should be set this way.
In cases where crashes are synchronized (e.g. by power interruptions),
this may result in simultaneous negotiations at reboot.
We currently allow both negotiations to proceed to completion,
but our use-newest selection method
effectively ignores one connection or the other,
and when one of them rekeys,
we notice that the new SAs replace those of both old connections,
and we then refrain from rekeying the other.
(This duplicate detection is desirable in any event, for robustness,
to ensure that the system converges on a reasonable state eventually
after it is perturbed by difficulties or bugs.)
.P
When connections are not permanent, the situation is less happy.
One particular situation in which we see problems is when a number of
"Road Warrior" hosts occasionally call in to a central server.
The server is normally configured not to initiate such connections,
since it does not know when the Road Warrior is available (or what IP
address it is using).
Unfortunately, if the server crashes and reboots,
any Road Warriors then connected have a problem:
they don't know that the server has crashed,
so they can't renegotiate,
and the server has forgotten both the connections and
their (transient) IP addresses,
so it cannot renegotiate.
.P
We believe that the simplest answer to this problem is what John Denker
has dubbed "address inertia":
the server makes a best-effort attempt to remember (in nonvolatile storage)
which connections were active and what the far-end addresses were
(and what the successful proposal's parameters were),
so that it can attempt renegotiation on reboot.
We have not implemented this yet, but intend to;
Denker has implemented it himself,
although in a somewhat messy way,
and reports excellent results.
.H
5.4. Network Partitions
.P
A network partition, making the two ends unable to reach each other,
has many of the same characteristics as having the other end crash... until
the network reconnects.
It is desirable that recovery from this be automatic.
.P
If the network reconnects before any rekeying attempts
or other IKE activities occurred,
recovery is fully transparent,
because the IKEs have no idea that there was any problem.
(Complaints such as ICMP Host Unreachable messages are unauthenticated
and hence cannot be given much weight.)
This fits the general mold of TCP/IP:
if nobody wanted to send any traffic, a network outage doesn't matter.
.P
If IKE activity did occur,
the IKE implementation will discover that the other end doesn't seem
to be responding.
The preferred response to this depends on the nature of the connection.
If it was intended to be ephemeral (e.g. opportunistic encryption [OE]),
closing it down after a few retries is reasonable.
If the other end is expected to sometimes drop the connection without
warning, it may not be desirable to retry at all.
(We support both these forms of configurability,
and indeed we also have a configuration option to suppress
rekeying entirely on one end.)
.P
If the connection was intended to be permanent, however,
then persistent attempts to re-establish it are appropriate.
Some degree of backoff is appropriate here,
so that retries get less frequent as the outage gets prolonged.
Backoff should be limited,
so that re-established connectivity is not followed by a long delay
before a retry.
Finally, after many retries (say 24 hours' worth),
it may be preferable to just declare the connection down and rely
on manual intervention to re-establish it,
should this be desirable.
We do not yet fully support all this.
.H
5.5. Unknown SAs
.P
A more complete solution to crashes
would be for an IPsec host to note the arrival
of ESP packets on an unknown IPsec SA,
and report it somehow to the other host, which can then decide to renegotiate.
This arguably might be preferable in any case\(emif
the non-rebooted host has no traffic to send,
it does not care whether the connection is intact\(embut
delays and packet loss will be reduced
if the connection is renegotiated BEFORE there is traffic for it.
So unknown-SA detection is best reserved as a fallback method,
with address inertia used to deal with most such cases.
.P
A difficulty with unknown-SA detection is,
just HOW should the other host be notified?
IKE provides no good way to do the notification:
Notification payloads (e.g., Initial-Contact) are unauthenticated
unless they are sent under protection of an ISAKMP SA.
A "Security Failures - Bad SPI" ICMP message [SECFAIL]
is an interesting alternative,
but has the disadvantage of likewise being unauthenticated.
It's fundamentally unlikely that there is a simple solution to this,
given that almost any way of arranging or checking authentication for such a
notification is costly.
.P
We think the best answer to this is a two-step approach.
An unauthenticated Initial-Contact or
Security Failures - Bad SPI cannot be taken as a reliable
report of a problem,
but can be taken as a hint that a problem MIGHT exist.
Then there needs to be some reliable way of checking such hints,
subject to rate limiting since the checks are likely to be costly
(and checking the same connection repeatedly at short intervals is unlikely
to be worthwhile anyway).
So the rebooted host sends the notification,
and the non-rebooted host\(emwhich still thinks it has a connection\(emchecks
whether the connection still works,
and renegotiates if not.
.P
Also, if an IPsec host which believes it has a connection to another host
sees an unsuccessful attempt by that host to negotiate a new one,
that is also a hint of possible problems,
justifying a check and possible renegotiation.
("Unsuccessful" here means a negotiation failure due to lack of a
satisfactory proposal.
A failure due to authentication failure
suggests a denial-of-service attack by a third party,
rather than a genuine problem on the legitimate other end.)
As noted in section 4.2,
it is possible for negotiations to succeed or fail based on which
end initiates them, and some robustness against that is desirable.
.P
We have not yet decided what form the notification should take.
IKE Initial-Contact is an obvious possibility,
but has some disadvantages.
It does not specify which connection has had difficulties.
Also, the specification [IKE section 4.6.3.3]
refers to "remote system" and "sending system"
without clearly specifying just what "system" means;
in the case of a multi-homed host using multiple forms of identification,
the question is not trivial.
Initial-Contact does have the fairly-decisive advantage
that it is likely to convey the right general
meaning even to an implementation which does not do things
exactly the way ours does.
.P
A more fundamental difficulty is what form the reliable check takes.
What is wanted is an "IKE ping",
verifying that the ISAKMP SA is still intact
(it being unlikely that IPsec SAs have been lost while the ISAKMP SA has not).
The lack of such a facility is a serious failing of IKE.
An acknowledged Notification of some sort would be ideal,
but there is none at present.
Some existing implementations are known
to use the private Notification values 30000 as ping
and 30002 as ping reply,
and that seems the most attractive choice at present.
If it is not recognized, there will probably be no reply,
and the result will be an unnecessary renegotiation,
so this needs strict rate limiting.
(Also, when a new connection is set up,
it's probably worth determining by experiment whether the other end
supports IKE ping, and remembering that.)
.P
While we think this facility is desirable,
and is about the best that can be done with the poor tools available,
we have not gotten very far in implementation and cannot comment
intelligently about how well it works or interoperates.
.H
6. Misc. IKE Issues
.H
6.1. Groups 1 and 5
.P
We have dropped support for the first Oakley Group (group 1),
despite it being officially mandatory,
on the grounds that it is
grossly too weak to provide enough randomness for 3DES.
There have been some interoperability problems,
mostly quite minor:
ALMOST everyone supports group 2 as well,
although sometimes it has to be explicitly configured.
.P
We also support the quasi-standard group 5 [GROUPS].
This has not been seriously exercised yet,
because historically
we offered group 2 first and almost everyone accepted it.
We have recently changed to offering group 5 first,
and no difficulties have been reported.
.H
6.2. To PFS Or Not To PFS
.P
A persistent small interoperability problem is that
the presence or absence of PFS (for keys [IKE, section 5.5])
is neither negotiated nor announced.
We have it enabled by default,
and successful interoperation often requires having
the other end turn it on in their implementation,
or having the FreeS/WAN end disable it.
Almost everyone supports it, but it's usually not the default,
and interoperability is often impossible unless the two ends
somehow reach prior agreement on it.
.P
We do not explicitly support the other flavor of PFS,
for identities [IKE, section 8],
and this has caused no interoperability problems.
.H
6.3. Debugging Tools, Lack Thereof
.P
We find IKE lacking in basic debugging tools.
Section 5.4, above,
notes that an IKE ping would be useful for connectivity verification.
It would also be extremely helpful for determining that UDP/500
packets get back and forth successfully between the two ends,
which is often an important first step in debugging.
.P
It's also quite common to have IKE negotiate a connection successfully,
but to have some firewall along the way blocking ESP.
Users find this mysterious and difficult to diagnose.
We have no immediate suggestions on what could be done about it.
.H
6.4. Terminology, Vagueness Thereof
.P
The terminology of IPsec needs work.
We feel that both the specifications and user-oriented
documentation would be greatly clarified by concise, intelligible names for
certain concepts.
.P
We semi-consistently use "group" for the set of IPsec SAs which are
established in one direction
by a single Quick Mode negotiation and are used together
to process a packet (e.g., an ESP SA plus an AH SA),
"connection" for the logical packet path provided
by a succession of pairs of groups
(each rekeying providing a new pair, one group in each direction),
and "keying channel" for the corresponding supervisory path provided
by a sequence of ISAKMP SAs.
.P
We think it's a botch that "PFS" is used to refer to two very different things,
but we have no specific new terms to suggest, since we only implement
one kind of PFS and thus can just ignore the other.
.H
6.5. A Question of Identity
.P
One specification problem deserves note:
exactly when can an existing phase 1 negotiation
be re-used for a new phase 2 negotiation,
as IKE [IKE, section 4] specifies?
Presumably,
when it connects the same two "parties"... but exactly what is a "party"?
.P
As noted in section 5.4,
in cases involving multi-homing and multiple identities,
it's not clear exactly what criteria are used for deciding
whether the intended far end for a new negotiation is the same one
as for a previous negotiation.
Is it by Identification Payload?
By IP address?
Or what?
.P
We currently use a somewhat-vague notion of "identity",
basically what gets sent in Identification Payloads,
for this, and this seems to be successful,
but we think this needs better specification.
.H
6.6. Opportunistic Encryption
.P
Further IKE challenges appear in the context of Opportunistic Encryption
[OE],
but operational experience with it is too limited as yet for us
to comment usefully right now.
.H
6.7. Authentication and RSA Keys
.P
We provide two IKE authentication methods:
shared secrets ("pre-shared keys")
and RSA digital signatures.
(A user-provided add-on package generalizes the latter to limited
support for certificates;
we have not worked extensively with it ourselves yet and cannot comment
on it yet.)
.P
Shared secrets, despite their administrative difficulties,
see considerable use,
and are also the method of last resort for interoperability problems.
.P
For digital signatures,
we have taken the somewhat unorthodox approach of using "bare" RSA public keys,
either supplied in configuration files or fetched from DNS,
rather than getting involved in the complexity of certificates.
We encode our RSA public keys using the DNS KEY encoding [DNSRSA]
(aka "RFC 2537", although that RFC is now outdated),
which has given us no difficulties and which we highly recommend.
We have seen two difficulties in connection with RSA keys, however.
.P
First,
while a number of IPsec implementations are able to take "bare" RSA public keys,
each one seems to have its own idea of what format should be used
for transporting them.
We've had little success with interoperability here,
mostly because of key-format issues;
the implementations generally WILL interoperate successfully if you can
somehow get an RSA key into them at all, but that's hard.
X.509 certificates seem to be the lowest (!)
common denominator for key transfer.
.P
Second,
although the content of RSA public keys has been stable,
there has been a small but subtle change over time in the content
of RSA private keys.
The "internal modulus",
used to compute the private exponent "d" from the public exponent "e"
(or vice-versa)
was originally [RSA] [PKCS1v1] [SCHNEIER] specified to be (p-1)*(q-1),
where p and q are the two primes.
However, more recent definitions [PKCS1v2] call it
"lambda(n)" and define it to be lcm(p-1,\ q-1);
this appears to be a minor optimization.
The result is that private keys generated with the new definition
often fail consistency checks in implementations using the old definition.
Fortunately, it is seldom necessary to move private keys around.
Our software now consistently uses the new definition
(and thus will accept keys generated with either definition),
but our key generator also has an option to generate old-definition keys,
for the benefit of users who upgrade their networks incrementally.
.H
6.8. Misc. Snags
.P
Nonce size is another characteristic that is neither negotiated nor announced
but that the two ends must somehow be able to agree on.
Our software accepts anything between 8 and 256, and defaults to 16.
These numbers were chosen rather arbitrarily,
but we have seen no interoperability failures here.
.P
Nothing in the ISAKMP [ISAKMP] or IKE [IKE] specifications says
explicitly that a normal Message ID must be non-zero,
but a zero Message ID in fact causes failures.
.P
Similarly, there is nothing in the specs which says that ISAKMP cookies
must be non-zero, but zero cookies will in fact cause trouble.
.H
7. Security Considerations
.P
Since this document discusses aspects of building robust and
interoperable IPsec implementations,
security considerations permeate it.
.H
8. References
.R AH
Kent, S., and Atkinson, R.,
"IP Authentication Header",
RFC 2402,
Nov 1998.
.R CIPHERS
Pereira, R., and Adams, R.,
"The ESP CBC-Mode Cipher Algorithms",
RFC 2451,
Nov 1998.
.R CRACK
Electronic Frontier Foundation,
"Cracking DES:
Secrets of Encryption Research, Wiretap Politics and Chip Design",
O'Reilly 1998,
ISBN 1-56592-520-3.
.R DES
Madson, C., and Doraswamy, N.,
"The ESP DES-CBC Cipher Algorithm",
RFC 2405,
Nov 1998.
.R DNSRSA
D. Eastlake 3rd,
"RSA/SHA-1 SIGs and RSA KEYs in the Domain Name System (DNS)",
RFC 3110,
May 2001.
.R ESP
Kent, S., and Atkinson, R.,
"IP Encapsulating Security Payload (ESP)",
RFC 2406,
Nov 1998.
.R GROUPS
Kivinen, T., and Kojo, M.,
"More MODP Diffie-Hellman groups for IKE",
<draft-ietf-ipsec-ike-modp-groups-04.txt>,
13 Dec 2001 (work in progress).
.R IKE
Harkins, D., and Carrel, D.,
"The Internet Key Exchange (IKE)",
RFC 2409, Nov 1998.
.R IPSEC
Kent, S., and Atkinson, R.,
"Security Architecture for the Internet Protocol",
RFC 2401, Nov 1998.
.R ISAKMP
Maughan, D., Schertler, M., Schneider, M., and Turner, J.,
"Internet Security Association and Key Management Protocol (ISAKMP)",
RFC 2408, Nov 1998.
.R OE
Richardson, M., Redelmeier, D. H., and Spencer, H.,
"A method for doing opportunistic encryption with IKE",
<draft-richardson-ipsec-opportunistic-06.txt>,
21 Feb 2002 (work in progress).
.R PKCS1v1
Kaliski, B.,
"PKCS #1: RSA Encryption, Version 1.5",
RFC 2313, March 1998.
.R PKCS1v2
Kaliski, B., and Staddon, J.,
"PKCS #1: RSA Cryptography Specifications, Version 2.0",
RFC 2437, Oct 1998.
.R PFKEY
McDonald, D., Metz, C., and Phan, B.,
"PF_KEY Key Management API, Version 2",
RFC 2367, July 1998.
.R REKEY
Tim Jenkins, "IPsec Re-keying Issues",
<draft-jenkins-ipsec-rekeying-06.txt>,
2 May 2000 (draft expired, work no longer in progress).
.R REPLAY
Krywaniuk, A.,
"Using Isakmp Message Ids for Replay Protection",
<draft-krywaniuk-ipsec-antireplay-00.txt>,
9 July 2001
(work in progress).
.R RSA
Rivest, R.L., Shamir, A., and Adleman, L.,
"A Method for Obtaining Digital Signatures and Public-Key
Cryptosystems",
Communications of the ACM v21n2, Feb 1978, p. 120.
.R SCHNEIER
Bruce Schneier, "Applied Cryptography", 2nd ed.,
Wiley 1996, ISBN 0-471-11709-9.
.R SECFAIL
Karn, P., and Simpson, W.,
"ICMP Security Failures Messages",
RFC 2521,
March 1999.
.H
Authors' Addresses
.P
.nf
.ne 8
Henry Spencer
SP Systems
Box 280 Stn. A
Toronto, Ont. M5W1B2
Canada

henry@spsystems.net
416-690-6561
.ne 8
.sp 2
D. Hugh Redelmeier
Mimosa Systems Inc.
29 Donino Ave.
Toronto, Ont. M4N2W6
Canada

hugh@mimosa.com
416-482-8253
.bp
.H
Full Copyright Statement
.P
Copyright (C) The Internet Society \*c. All Rights
Reserved.

This document and translations of it may be copied and
furnished to others, and derivative works that comment on or
otherwise explain it or assist in its implmentation may be
prepared, copied, published and distributed, in whole or in
part, without restriction of any kind, provided that the above
copyright notice and this paragraph are included on all such
copies and derivative works.  However, this document itself may
not be modified in any way, such as by removing the copyright
notice or references to the Internet Society or other Internet
organizations, except as needed for the  purpose of developing
Internet standards in which case the procedures for copyrights
defined in the Internet Standards process must be followed, or
as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will
not be revoked by the Internet Society or its successors or
assigns.

This document and the information contained herein is provided
on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE
OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE.
