How to Run email_scraper.py from Anywhere on Kali Linux
This guide shows you how to:
Install the required Python libraries
Save the
email_scraper.pyscriptCreate a global
email-scrapercommand you can run from any folder
Tested on Kali Linux (default shell: zsh).
1. Install Required Python Packages
On Kali, the easiest way is to use apt instead of pip:
sudo apt update
sudo apt install -y python3-requests python3-bs4
This installs:
requests– for HTTP requestsBeautifulSoup(bs4) – for parsing HTML
2. Create the Tools Folder
Create a folder to store the Python script:
mkdir -p ~/tools
3. Create the email_scraper.py Script
Create the script file in the tools folder:
nano ~/tools/email_scraper.py
Paste the following Python code into the file:
import re
import sys
import time
import argparse
from collections import deque
from urllib.parse import urljoin, urlparseimport requests
from bs4 import BeautifulSoup
# Simple email regex (good enough for recon)
EMAIL_REGEX = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}’)
def is_same_domain(start_netloc, url):
try:
netloc = urlparse(url).netloc
# Strip ‘www.’ for loose matching
return netloc.replace(‘www.’, ”) == start_netloc.replace(‘www.’, ”)
except Exception:
return False
def normalize_url(base_url, link):
return urljoin(base_url, link.split(‘#’)[0]) # drop fragments
def scrape_emails_from_text(text):
return set(EMAIL_REGEX.findall(text))
def crawl_for_emails(start_url, max_pages=50, delay=0.5, timeout=10):
start_url = start_url.strip()
parsed = urlparse(start_url)
if not parsed.scheme:
start_url = “https://” + start_url
parsed = urlparse(start_url)
start_netloc = parsed.netloc
visited = set()
queue = deque([start_url])
found_emails = set()
session = requests.Session()
session.headers.update({
“User-Agent”: “RedTeamEmailCrawler/1.0 (+authorized-testing)”
})
pages_visited = 0
while queue and pages_visited < max_pages:
url = queue.popleft()
if url in visited:
continue
visited.add(url)
pages_visited += 1
print(f”[+] Visiting ({pages_visited}/{max_pages}): {url}“)
try:
resp = session.get(url, timeout=timeout)
content_type = resp.headers.get(“Content-Type”, “”)
if “text/html” not in content_type:
print(” [-] Skipping non-HTML content”)
continue
except requests.RequestException as e:
print(f” [!] Request failed: {e}“)
continue
# Extract emails from raw text
page_text = resp.text
emails = scrape_emails_from_text(page_text)
if emails:
print(f” [+] Found {len(emails)} email(s)”)
found_emails.update(emails)
# Parse links for further crawling
try:
soup = BeautifulSoup(page_text, “html.parser”)
except Exception as e:
print(f” [!] HTML parse error: {e}“)
continue
for tag in soup.find_all(“a”, href=True):
link = tag[“href”]
full_url = normalize_url(url, link)
# Only follow http/https links
parsed_link = urlparse(full_url)
if parsed_link.scheme not in (“http”, “https”):
continue
# Stay on same domain
if not is_same_domain(start_netloc, full_url):
continue
if full_url not in visited:
queue.append(full_url)
time.sleep(delay) # be polite
return found_emails
def main():
parser = argparse.ArgumentParser(
description=“Scrape a website (same domain) for email addresses. “
“Use ONLY with explicit permission.”
)
parser.add_argument(“url”, help=“Start URL, e.g. https://example.com”)
parser.add_argument(“–max-pages”, type=int, default=50,
help=“Maximum pages to crawl (default: 50)”)
parser.add_argument(“–delay”, type=float, default=0.5,
help=“Delay between requests in seconds (default: 0.5)”)
args = parser.parse_args()
emails = crawl_for_emails(args.url, max_pages=args.max_pages, delay=args.delay)
print(“\n=== Unique email addresses found ===”)
for e in sorted(emails):
print(e)
print(f”\nTotal unique emails: {len(emails)}”)
if __name__ == “__main__”:
if len(sys.argv) == 1:
print(“Usage: python email_scraper.py https://target-domain.com”)
sys.exit(1)
main()
Save and exit in Nano:
Press
Ctrl + O, then EnterPress
Ctrl + X
4. Create the bin Folder for Custom Commands
Create a bin directory in your home folder:
mkdir -p ~/bin
5. Create the Global email-scraper Command
Create a small wrapper script that calls email_scraper.py:
cat > ~/bin/email-scraper << 'EOF'
#!/usr/bin/env bash
# Wrapper to run the email scraper from anywhere
python3 “$HOME/tools/email_scraper.py” “$@“
EOF
Make the wrapper executable:
chmod +x ~/bin/email-scraper
6. Add ~/bin to Your PATH (zsh)
On Kali (with zsh), add this line to your ~/.zshrc:
echo 'export PATH="$HOME/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
Verify that the command is now visible:
which email-scraper
You should see something like:
/home/kali/bin/email-scraper
7. Usage Examples
Now you can run the scraper from any folder:
Basic usage:
email-scraper https://example.com
Crawl more pages with a custom delay:
email-scraper https://targetdomain.com --max-pages 200 --delay 0.3
At the end of the run, the script prints:
A list of unique email addresses found
The total number of unique emails
8. Important Note (Red Team / Legal)
Use this script only when:
You have explicit, written permission from the owner of the target domain
The domain is in-scope for your engagement
You comply with the Rules of Engagement and local laws
This tool is intended for authorized security testing and training only.
