The ToCRLF/StringToCRLF functions are not very performance critical, but
we call it for each mail, and the current implementation is very
inefficient (mainly because it goes one byte at a time).
This patch replaces it with a better implementation that goes line by line.
The new implementation of ToCRLF is ~40% faster, and StringToCRLF is ~60%
faster.
```
$ benchstat old.txt new.txt
goos: linux
goarch: amd64
pkg: blitiri.com.ar/go/chasquid/internal/normalize
cpu: 13th Gen Intel(R) Core(TM) i9-13900T
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
ToCRLF-32 162.96µ ± 6% 95.42µ ± 12% -41.44% (p=0.000 n=10)
StringToCRLF-32 190.70µ ± 14% 76.51µ ± 6% -59.88% (p=0.000 n=10)
geomean 176.3µ 85.44µ -51.53%
```
The RFCs are very clear that in DATA contents:
> CR and LF MUST only occur together as CRLF; they MUST NOT appear
> independently in the body.
https://www.rfc-editor.org/rfc/rfc5322#section-2.3https://www.rfc-editor.org/rfc/rfc5321#section-2.3.8
Allowing "independent" CR and LF can cause a number of problems.
In particular, there is a new "SMTP smuggling attack" published recently
that involves the server incorrectly parsing the end of DATA marker
`\r\n.\r\n`, which an attacker can exploit to impersonate a server when
email is transmitted server-to-server.
https://www.postfix.org/smtp-smuggling.htmlhttps://sec-consult.com/blog/detail/smtp-smuggling-spoofing-e-mails-worldwide/
Currently, chasquid is vulnerable to this attack, because Go's standard
libraries net/textproto and net/mail do not enforce CRLF strictly.
This patch fixes the problem by introducing a new "dot reader" function
that strictly enforces CRLF when reading dot-terminated data, used in
the DATA input processing.
When an invalid newline terminator is found, the connection is aborted
immediately because we cannot safely recover from that state.
We still keep the internal representation as LF-terminated for
convenience and simplicity.
However, the MDA courier is changed to pass CRLF-terminated lines, since
that is an external program which could be strict when receiving email
messages.
See https://github.com/albertito/chasquid/issues/47 for more details and
discussion.
This patch adds a missing docstrings for exported identifiers, and
adjust some of the existing ones to match the standard style.
In some cases, the identifiers were un-exported after noticing they had
no external users.
Besides improving documentation, it also reduces the linter noise
significantly.
This patch extends various packages and integration tests, increasing
test coverage. They're small enough that it's not worth splitting them
up, as it would add a lot of noise to the history.
We should ignore the domains' case, and treat them uniformly, specially when it
comes to local domains.
This patch extends the existing normalization (IDNA, keeping domains as
UTF8 internally) to include case conversion and NFC form for
consistency.
This patch implements local username normalization using PRECIS
(https://tools.ietf.org/html/rfc7564,
https://tools.ietf.org/html/rfc7613)
It makes chasquid accept local email and authentication regardless of
the case. It covers both userdb and aliases.
Note that non-local usernames remain untouched.