Home » Tools » EmailScraper – Email Harvesting

How to Run email_scraper.py from Anywhere on Kali Linux

This guide shows you how to:

  • Install the required Python libraries

  • Save the email_scraper.py script

  • Create a global email-scraper command you can run from any folder

Tested on Kali Linux (default shell: zsh).


1. Install Required Python Packages

On Kali, the easiest way is to use apt instead of pip:

 
sudo apt update
sudo apt install -y python3-requests python3-bs4

This installs:

  • requests – for HTTP requests

  • BeautifulSoup (bs4) – for parsing HTML


2. Create the Tools Folder

Create a folder to store the Python script:

 
mkdir -p ~/tools

3. Create the email_scraper.py Script

Create the script file in the tools folder:

 
nano ~/tools/email_scraper.py

Paste the following Python code into the file:

 

import re
import sys
import time
import argparse
from collections import deque
from urllib.parse import urljoin, urlparse

import requests
from bs4 import BeautifulSoup

# Simple email regex (good enough for recon)
EMAIL_REGEX = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}’)

def is_same_domain(start_netloc, url):
try:
netloc = urlparse(url).netloc
# Strip ‘www.’ for loose matching
return netloc.replace(‘www.’, ) == start_netloc.replace(‘www.’, )
except Exception:
return False

def normalize_url(base_url, link):
return urljoin(base_url, link.split(‘#’)[0]) # drop fragments

def scrape_emails_from_text(text):
return set(EMAIL_REGEX.findall(text))

def crawl_for_emails(start_url, max_pages=50, delay=0.5, timeout=10):
start_url = start_url.strip()
parsed = urlparse(start_url)

if not parsed.scheme:
start_url = “https://” + start_url
parsed = urlparse(start_url)

start_netloc = parsed.netloc

visited = set()
queue = deque([start_url])
found_emails = set()

session = requests.Session()
session.headers.update({
“User-Agent”: “RedTeamEmailCrawler/1.0 (+authorized-testing)”
})

pages_visited = 0

while queue and pages_visited < max_pages:
url = queue.popleft()

if url in visited:
continue

visited.add(url)
pages_visited += 1

print(f”[+] Visiting ({pages_visited}/{max_pages}): {url}“)

try:
resp = session.get(url, timeout=timeout)
content_type = resp.headers.get(“Content-Type”, “”)
if “text/html” not in content_type:
print(” [-] Skipping non-HTML content”)
continue
except requests.RequestException as e:
print(f” [!] Request failed: {e}“)
continue

# Extract emails from raw text
page_text = resp.text
emails = scrape_emails_from_text(page_text)
if emails:
print(f” [+] Found {len(emails)} email(s)”)
found_emails.update(emails)

# Parse links for further crawling
try:
soup = BeautifulSoup(page_text, “html.parser”)
except Exception as e:
print(f” [!] HTML parse error: {e}“)
continue

for tag in soup.find_all(“a”, href=True):
link = tag[“href”]
full_url = normalize_url(url, link)

# Only follow http/https links
parsed_link = urlparse(full_url)
if parsed_link.scheme not in (“http”, “https”):
continue

# Stay on same domain
if not is_same_domain(start_netloc, full_url):
continue

if full_url not in visited:
queue.append(full_url)

time.sleep(delay) # be polite

return found_emails

def main():
parser = argparse.ArgumentParser(
description=“Scrape a website (same domain) for email addresses. “
“Use ONLY with explicit permission.”
)
parser.add_argument(“url”, help=“Start URL, e.g. https://example.com”)
parser.add_argument(“–max-pages”, type=int, default=50,
help=“Maximum pages to crawl (default: 50)”)
parser.add_argument(“–delay”, type=float, default=0.5,
help=“Delay between requests in seconds (default: 0.5)”)
args = parser.parse_args()

emails = crawl_for_emails(args.url, max_pages=args.max_pages, delay=args.delay)

print(“\n=== Unique email addresses found ===”)
for e in sorted(emails):
print(e)

print(f”\nTotal unique emails: {len(emails)}”)

if __name__ == “__main__”:
if len(sys.argv) == 1:
print(“Usage: python email_scraper.py https://target-domain.com”)
sys.exit(1)
main()

Save and exit in Nano:

  • Press Ctrl + O, then Enter

  • Press Ctrl + X


4. Create the bin Folder for Custom Commands

Create a bin directory in your home folder:

 
mkdir -p ~/bin

5. Create the Global email-scraper Command

Create a small wrapper script that calls email_scraper.py:

 

cat > ~/bin/email-scraper << 'EOF'
#!/usr/bin/env bash
# Wrapper to run the email scraper from anywhere

python3 $HOME/tools/email_scraper.py” $@
EOF

Make the wrapper executable:

 
chmod +x ~/bin/email-scraper

6. Add ~/bin to Your PATH (zsh)

On Kali (with zsh), add this line to your ~/.zshrc:

 
echo 'export PATH="$HOME/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

Verify that the command is now visible:

 
which email-scraper

You should see something like:

 
/home/kali/bin/email-scraper

7. Usage Examples

Now you can run the scraper from any folder:

Basic usage:

 
email-scraper https://example.com

Crawl more pages with a custom delay:

 
email-scraper https://targetdomain.com --max-pages 200 --delay 0.3

At the end of the run, the script prints:

  • A list of unique email addresses found

  • The total number of unique emails


8. Important Note (Red Team / Legal)

Use this script only when:

  • You have explicit, written permission from the owner of the target domain

  • The domain is in-scope for your engagement

  • You comply with the Rules of Engagement and local laws

This tool is intended for authorized security testing and training only.

Scroll to Top