EmailScraper – Email Harvesting - Cyber Security World | Hacking Courses

How to Run `email_scraper.py` from Anywhere on Kali Linux

This guide shows you how to:

Install the required Python libraries
Save the email_scraper.py script
Create a global email-scraper command you can run from any folder

Tested on Kali Linux (default shell: zsh).

1. Install Required Python Packages

On Kali, the easiest way is to use apt instead of pip:

This installs:

requests – for HTTP requests
BeautifulSoup (bs4) – for parsing HTML

2. Create the Tools Folder

Create a folder to store the Python script:

3. Create the `email_scraper.py` Script

Create the script file in the tools folder:

Paste the following Python code into the file:

import re import sys import time import argparse from collections import deque from urllib.parse import urljoin, urlparse
import requests
from bs4 import BeautifulSoup
# Simple email regex (good enough for recon)
EMAIL_REGEX = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}’)
def is_same_domain(start_netloc, url):
try:
netloc = urlparse(url).netloc
# Strip ‘www.’ for loose matching
return netloc.replace(‘www.’, ”) == start_netloc.replace(‘www.’, ”)
except Exception:
return False
def normalize_url(base_url, link):
return urljoin(base_url, link.split(‘#’)[0]) # drop fragments
def scrape_emails_from_text(text):
return set(EMAIL_REGEX.findall(text))
def crawl_for_emails(start_url, max_pages=50, delay=0.5, timeout=10):
start_url = start_url.strip()
parsed = urlparse(start_url)
if not parsed.scheme:
start_url = “https://” + start_url
parsed = urlparse(start_url)
start_netloc = parsed.netloc
visited = set()
queue = deque([start_url])
found_emails = set()
session = requests.Session()
session.headers.update({
“User-Agent”: “RedTeamEmailCrawler/1.0 (+authorized-testing)”
})
pages_visited = 0
while queue and pages_visited < max_pages:
url = queue.popleft()
if url in visited:
continue
visited.add(url)
pages_visited += 1
print(f”[+] Visiting ({pages_visited}/{max_pages}): {url}“)
try:
resp = session.get(url, timeout=timeout)
content_type = resp.headers.get(“Content-Type”, “”)
if “text/html” not in content_type:
print(” [-] Skipping non-HTML content”)
continue
except requests.RequestException as e:
print(f” [!] Request failed: {e}“)
continue
# Extract emails from raw text
page_text = resp.text
emails = scrape_emails_from_text(page_text)
if emails:
print(f” [+] Found {len(emails)} email(s)”)
found_emails.update(emails)
# Parse links for further crawling
try:
soup = BeautifulSoup(page_text, “html.parser”)
except Exception as e:
print(f” [!] HTML parse error: {e}“)
continue
for tag in soup.find_all(“a”, href=True):
link = tag[“href”]
full_url = normalize_url(url, link)
# Only follow http/https links
parsed_link = urlparse(full_url)
if parsed_link.scheme not in (“http”, “https”):
continue
# Stay on same domain
if not is_same_domain(start_netloc, full_url):
continue
if full_url not in visited:
queue.append(full_url)
time.sleep(delay) # be polite
return found_emails
def main():
parser = argparse.ArgumentParser(
description=“Scrape a website (same domain) for email addresses. “
“Use ONLY with explicit permission.”
)
parser.add_argument(“url”, help=“Start URL, e.g. https://example.com”)
parser.add_argument(“–max-pages”, type=int, default=50,
help=“Maximum pages to crawl (default: 50)”)
parser.add_argument(“–delay”, type=float, default=0.5,
help=“Delay between requests in seconds (default: 0.5)”)
args = parser.parse_args()
emails = crawl_for_emails(args.url, max_pages=args.max_pages, delay=args.delay)
print(“\n=== Unique email addresses found ===”)
for e in sorted(emails):
print(e)
print(f”\nTotal unique emails: {len(emails)}”)
if __name__ == “__main__”:
if len(sys.argv) == 1:
print(“Usage: python email_scraper.py https://target-domain.com”)
sys.exit(1)
main()

Save and exit in Nano:

Press Ctrl + O, then Enter
Press Ctrl + X

4. Create the `bin` Folder for Custom Commands

Create a bin directory in your home folder:

5. Create the Global `email-scraper` Command

Create a small wrapper script that calls email_scraper.py:

Make the wrapper executable:

6. Add `~/bin` to Your PATH (zsh)

On Kali (with zsh), add this line to your ~/.zshrc:

Verify that the command is now visible:

You should see something like:

7. Usage Examples

Now you can run the scraper from any folder:

Basic usage:

Crawl more pages with a custom delay:

At the end of the run, the script prints:

A list of unique email addresses found
The total number of unique emails

8. Important Note (Red Team / Legal)

Use this script only when:

You have explicit, written permission from the owner of the target domain
The domain is in-scope for your engagement
You comply with the Rules of Engagement and local laws

This tool is intended for authorized security testing and training only.

How to Run email_scraper.py from Anywhere on Kali Linux

1. Install Required Python Packages

2. Create the Tools Folder

3. Create the email_scraper.py Script

4. Create the bin Folder for Custom Commands

5. Create the Global email-scraper Command

6. Add ~/bin to Your PATH (zsh)