There is very peculiar behavior when a Windows DNS server is starting up. With the right request timing can result in a client with a poisoned DNS cache for that lasts for 1 hour (by default). That just seems crazy to me.

To reproduce the issue, I loaded up a Windows DNS server with hundreds of zones and hundreds of records in each zone to simulate a reasonably large environment where loading the DNS database when the DNS server service starts actually takes a few seconds instead of near-instantly.

To create all the zones and records, run the following powershell on a server with the DNS role installed.


# Create a large DNS database this ensures it takes a good amount of time to load
# 250 zones, and 250 records each zone = 62.5k records
# This will take like 5 minutes to run

$numZones=250
$recordsPerZone=250

For ($i=0; $i -lt $numZones; $i++) {
    $zoneName = "Zone$i.home.stevenpolley.net"
    Add-DNSServerPrimaryZone -Name $zoneName -ReplicationScope Forest
    $progressPercent = ($i*100)/$numZones
    Write-Progress -Activity "Adding DNS Zones" -Status "$progressPercent% complete.  Current zone: $zoneName" -PercentComplete $progressPercent
    For ($j=0; $j -lt $recordsPerZone; $j++) {
        Add-DnsServerResourceRecord -ZoneName $zoneName -A -Name "host$j" -IPv4Address "10.69.$i.$j" -TimeToLive 01:00:00 -AgeRecord
    }
}

We can then hammer the DNS server repeatedly by querying a random record that is confirmed to be present, and which the DNS server can make an authoritative response.


# Hammer the DNS server with queries
While ($true) {
    & nslookup host55.zone50.home.stevenpolley.net 2>&1 | Out-Null
}

We can then see DNS requests:

A normal DNS request
A normal DNS request

And we can also see the responses:

A normal DNS response
A normal DNS response

So far so good, so let’s restart the DNS server service.


Restart-Service DNS

And part way through the restart the DNS suffix gets dropped and query gets sent to the root servers.

A failed DNS request with a non-existent domain response
A failed DNS request with a non-existent domain response

I suspect this forwarding occurs to allow for the root servers to point to another nameserver registered with the domain, which is great in a public DNS setting. In the case of a private / internal DNS server such as within my lab environment, it results in the client getting an NXDOMAIN DNS response.

The downside to something like this happening is if a DNS client implementation caches negative responses, in which case even though the restart of the DNS server is in the order of seconds, the negative response may be cached up the minimum TTL value for the SOA record. A Windows server by default is 1 hour, in the case of the root servers above it’s a full day!

RFC2308 specicially calls out this case

   Name servers authoritative for a zone MUST include the SOA record of
   the zone in the authority section of the response when reporting an
   NXDOMAIN or indicating that no data of the requested type exists.
   This is required so that the response may be cached.  The TTL of this
   record is set from the minimum of the MINIMUM field of the SOA record
   and the TTL of the SOA itself, and indicates how long a resolver may
   cache the negative answer.  The TTL SIG record associated with the
   SOA record should also be trimmed in line with the SOA's TTL.

It may be better if the DNS server did not service any traffic until all zones are loaded and fully initialized to avoid this case.

See below for attached wireshark packet capture and Windows Event Log.


Attachments

Event Viewer - Requires DNS role be installed to view

Wireshark packet capture - Recommend enabling UTC column to align with event viewer

References

RFC2308