When we put something in the queue and respond "250 ok" to the client,
that is taken as accepting the email.
As part of putting something in the queue, we write it to disk, but
today we don't do an fsync on that file.
That leaves a gap where a badly timed crash on some systems could lead
to the file being empty, causing us to lose an email that we accepted.
To elliminate (or drastically reduce on some filesystems) the chances of
that situation, we call fsync on the file that gets written when we put
something in the queue.
Thanks to nolanl@github for reporting this in
https://github.com/albertito/chasquid/issues/78.
Today, the maximum number of items in the queue, as well as how long we
keep attempting to send each item, is hard-coded and not changed by end
users.
While they are totally adequate for chasquid's main use cases, it can
still be useful for some users to change them.
So this patch adds two new configuration options for those settings.
They're marked experimental for now, so we can adjust them if needed
after they get more exposure.
Thanks to Lewis Ross-Jones <lewis_r_j@hotmail.com> for suggesting this
improvement, and help with testing it.
This patch implements "via" aliases, which let us explicitly select a
server to use for delivery.
This feature is useful in different scenarios, such as a secondary MX
server that forwards all incoming email to a primary.
For now, it is experimental and the syntax and semantics are subject to
change.
This commit updates the uses of math/rand to math/rand/v2, which was
released in Go 1.22 (2024-02).
The new package is generally safer, see https://go.dev/blog/randv2 for
the details.
There are no user-visible changes, it is only adjusting the name of
functions, simplify some code thanks to v2 having a better API, etc.
This patch regenerates the auto-generated files. There are no
significant changes.
- Protobuf files updated the comment formatting to match recent changes
in Go libraries.
- IANA assignment for a AEGIS (currently an IETF draft) has been
updated.
- The link to the human-readable IANA assignment tables from the
generator was manually updated.
This patch regenerates the auto-generated files.
There are no significant changes, the protobuf just get an updated
comment due to protoc version change, but it is just informational.
Two new TLS ciphers are added, matching the new IANA assignments.
This patch changes several internal packages to receive and pass tracing
annotations, making use of the new tracing library, so we can have
better debugging information.
This patch does a general pass updating Go modules to recent versions, and
regenerates the protobufs accordingly.
The main purpose is to make sure people building from source are using
relatively recent versions of our dependencies.
This patch implements support for catch-all aliases, where users can add
a `*: destination` alias. Mails sent to unknown users (or other aliases)
will not be rejected, but sent to the indicated destination instead.
Please see https://github.com/albertito/chasquid/issues/23 and
https://github.com/albertito/chasquid/pull/24 for more discussion and
background.
Thanks to Alex Ellwein (aellwein@github) for the alternative patch and
help with testing; and to ThinkChaos (ThinkChaos@github) for help with
testing.
This patch does a general pass updating Go modules to recent versions, and
regenerates the protobufs accordingly.
The main purpose is to make sure people building from source are using
relatively recent versions of our dependencies.
This patch does a general pass updating Go modules to recent versions,
and regenerates the protobufs accordingly.
The main purpose is to make sure people building from source are using
relatively recent versions of our dependencies.
We also regenerate protobufs since the newer versions of the liberaries
have a much cleaner dependency tree, which speeds up fetches.
The queue protobuf definition currently uses the well-known timestamp
protobuf package.
This adds a build-time dependency on it, which is fairly harmless when
building from source (since the golang protobuf compiler includes it
already), but adds overhead for packaging on distributions.
Since this is the only external proto dependency we have, and the
protobuf message itself is trivial, this patch removes it an instead
embeds a compatible definition.
That way we remove the dependency and simplify packaging, with almost
negligible code overhead.
The change is fully backwards compatible and has no functional changes.
This patch makes chasquid's monitoring server expose an OpenMetrics
metrics endpoint.
It adds a new package "expvarom" which implements an HTTP handler that
exports expvar variables in the OpenMetrics text format.
Then, the handler is registered by the monitoring server at /metrics
(where most things expect it to be).
The existing exported variables are also extended with descriptions,
which is optional, but improves the readability of the metrics.
There is a new protobuf library (and corresponding code generator) for
Go: google.golang.org/protobuf.
It is fairly compatible with the previous v1 API
(github.com/golang/protobuf), but there are some changes.
This patch adjusts the code and generated files to the new API.
The on-wire/on-disk format remains unchanged so this should be
transparent to the users.
When creating a new Queue instance, we os.MkdirAll the queue directory.
Currently we don't check if it fails, which will cause us to find out
about problems when the queue is first used, where it is more annoying
to troubleshoot.
This patch adjusts the code so that we check and propagate the error.
That way, problems with the queue directory will be more evident and
easier to handle.
The testing couriers are currently only used in the queue tests, but we
also want to use them in smtpsrv tests so we can make them more robusts
by checking the emails got delivered.
This patch moves the testing couriers to testlib, and makes both queue
and smtpsrv use them.
This patch updates the auto-generated code to match the latest tooling
versions.
In particular, the protobufs are regenerated, and the new version no
longer supports unkeyed literals, so some minor changes are needed.
Other than that, the cipher list is extended with the latest ciphers.
There are a few context.WithDeadline calls that can be simplified by
using context.WithTimeout.
At the time they were added, WithTimeout was too new so we didn't want
to depend on it. But now that the minimum Go version has been raised to
1.9, we can simplify the calls.
This patch does that simplification, which is purely mechanical, and
does not change the logic itself.
This patch contains some changes to generate tidier DSNs, which should
make them slightly more readable.
In particular, it also makes it able to handle multi-line errors much
better than before.
Our non-delivery status notifications are quite simple today, but that
makes it much more difficult to support internationalization and
cross-language reporting.
There is a standard for internationalized DSNs, RFC 6533 (which builds
on top of the structured DSNs from RFC 3464).
This patch changes our DSN messages to be based on those standards, so
it is easier for MUAs to display reports according to the users'
languages preferences.
Note we still use message/rfc822 + 8bit to transmit the message, instead
of message/global, for compatibility reasons. This seems to be more
universally compatible, but the decision might be revisited in the
future. See RFC 5335 (section 4.6 in particular).
This patch contains some minor code style improvements, to leave the
linter happier and generally follow best practices in some areas where
things snuck through.
Some transient issues might take more than 12h to resolve, specially if
they happen overnight.
20h gives a bit more margin for retries, while still being short enough
so that users are notified early.
This patch adds a missing docstrings for exported identifiers, and
adjust some of the existing ones to match the standard style.
In some cases, the identifiers were un-exported after noticing they had
no external users.
Besides improving documentation, it also reduces the linter noise
significantly.
This patch extends various packages and integration tests, increasing
test coverage. They're small enough that it's not worth splitting them
up, as it would add a lot of noise to the history.
We have many places in our tests where we create temporary directories,
which we later remove (most of the time). We have at least 3 helpers to
do this, and various places where it's done ad-hoc (and the cleanup is
not always present).
To try to reduce the clutter, and make the tests more uniform and
readable, this patch introduces two helpers in a new "testutil" package:
one for creating and one for removing temporary directories.
These new functions are safer, better tested, and make the tests more
consistent. All the tests are updated to use them.
Picking the domain used in the DSN message "From" header is more
complicated than it needs to be, causing confusing code paths and having
different uses for the hostname, which should be purely aesthetic.
This patch makes the queue pick the DSN "From" domain from the message
itself, by looking for a local domain in either the sender or the
original recipients. We should find at least one, otherwise it'd be
relaying.
This allows the code to be simplified, and we can narrow the scope of
the hostname option even further.
The queue IDs are internal to chasquid, and we don't need them to be
cryptographically secure, so this patch changes the ID generation to use
the PRNG.
This also helps avoid entropy issues.
glog works fine and has great features, but it does not play along well
with systemd or standard log rotators (as it does the rotation itself).
So this patch replaces glog with a new logging module "log", which by
default logs to stderr, in a systemd-friendly manner.
Logging to files or syslog is still supported.
Calculating the next delay based on the previous delay causes daemon
restarts to start from scratch, as we don't persist it.
This can cause a few server restarts to generate many unnecessary sends.
This patch changes the next delay calculation to use the creation time
instead, and also adds a <=1m random perturbation to avoid all queued
emails to be retried at the exact same time after a restart.
TestAliases is unfortunately racy, and cause occasional failures.
This patch rewrites it to be more end-to-end, similar to TestBasic,
which should remove the races while keeping the main objectives of the
test.
The queue currently only considers failed recipients when deciding
whether to send a DSN or not. This is a bug, as recipients that time out
are not taken into account.
This patch fixes that issue by including both failed and pending
recipients in the DSN.
It also adds more comprehensive tests for this case, both in the queue
and in the dsn generation code.
The default INFO logs are more oriented towards debugging and can be
a bit too verbose when looking for high-level information.
This patch introduces a new "maillog" package, used to log messages of
particular relevance to mail transmission at a higher level.
Today, we pick the domain used to send the DSN from based on what we
presented to the client at EHLO time, which itself may be based on the
TLS negotiation (which is not necessarily trusted).
This is complex, not necessarily correct, and involves passing the
domain around through the queue and persisting it in the items.
So this patch simplifies that handling by always using the main domain
as specified by the configuration.
This patch is the result of running go vet, go fmt -s and the linter,
and fixing some of the things they noted/suggested.
There shouldn't be any significant logic changes, it's mostly
readability improvements.
This patch simplifies the sending loop code:
- Move the recipient sending function from a closure to a method.
- Simplify the status update logic: we now update and write
unconditionally (as we should have been doing).
- Create a function for counting recipients in a given status.
It also adds a test for the removal of completed items from the queue,
which was not covered before and came up during development.
This patch introduces expvar counters to chasquid and the queue
packages.
For now there's only a handful of counters, but they will be expanded in
future patches.
The test courier has a racy map access, and this started to manifest
after some recent changes. This patch fixes the race by implementing the
corresponding locks.
This patch reviews various debug and informational messages, making more
uniform use of tracing, and extends the monitoring http server with
useful information like an index and a queue dump.
If there's an alias to forward email to a non-local domain, using the original
From is problematic, as we may not be an authorized sender for it.
Some MTAs (like Exim) will do it anyway, others (like gmail) will construct a
special address based on the original address.
This patch implements the latter approach, which is safer and allows the
receiver to properly enforce SPF.
We construct a (hopefully) reasonable From based on the local user, and
embedding the original From (but transformed for IDNA, as the receiver may not
support SMTPUTF8).
This patch performs some minor cleanups for things detected by "go vet":
- Remove one line of unreachable code.
- Don't leak contexts until their deadline expires, cancel them.
When we permanently failed to deliver to one or more recipients, send delivery
status notifications back to the sender.
To do this, we need to extend a couple of internal structures, to keep track
of the original destinations (so we can include them in the message, for
reference), and the hostname we're identifying ourselves as (this is arguable
but we're going with it for now, may change later).
This patch makes the queue and couriers distinguish between permanent and
transient errors when delivering mail to individual recipients.
Pipe delivery errors are always permanent.
Procmail delivery errors are almost always permanent, except if the command
exited with code 75, which is an indication of transient.
SMTP delivery errors are almost always transient, except if the DNS resolution
for the domain failed.
This patch tidies up the Procmail courier:
- Move the configuration options to the courier instance, instead of using
global variables.
- Implement more useful string replacement options.
- Use exec.CommandContext for running the command with a timeout.
As a consequence of the first item, the queue now takes the couriers via its
constructor.