Ensure the following prerequisites are installed on your system before running the script:
For Debian-based distributions (like Ubuntu), use the following commands:
sudo apt update
sudo apt install bash curl gawk
For Red Hat-based distributions (like CentOS), use:
sudo yum install bash curl gawk
For Android devices using Termux, follow these steps:
pkg update
pkg install bash
pkg install curl
pkg install gawk
This script automates the process of searching for IT and technology-related keywords on the Mir Ali Shahidi website and processes the search results. Here’s a brief overview:
curl
to fetch the HTML content of the search results page and saves it for processing.
#!/bin/bash
# Define the list of search keywords
keywords=(
"IT" "ICT" "OT" "IIOT" "IOT" "network" "cybersecurity" "AI" "machine+learning" "data+science"
"cloud+computing" "blockchain" "automation" "digital+transformation" "big+data"
"analytics" "software+development" "IT+consulting" "networking" "virtualization"
"system+integration" "tech+trends" "IT+strategy" "smart+devices" "enterprise+IT"
"cyber+defense" "data+protection" "IT+infrastructure" "technology+solutions"
"security+services" "cloud+storage" "IT+support" "tech+innovation" "software+engineering"
"information+security" "IT+management" "digital+marketing" "IT+services" "enterprise+solutions"
"IT+architecture" "IT+operations" "mobile+computing" "IT+project+management" "IT+training"
"tech+consulting" "network+security" "IT+systems" "data+analytics" "IT+compliance"
"IT+governance" "IT+trends" "IT+support+services" "IT+outsourcing" "technology+consulting"
)
# Function to get a random keyword from the list
get_random_keyword() {
echo "${keywords[$RANDOM % ${#keywords[@]}]}"
}
# Function to mask headers
mask_headers() {
sed -e 's/Authorization: .*/Authorization: [REDACTED]/' \
-e 's/User-Agent: .*/User-Agent: [REDACTED]/' \
-e 's/Referer: .*/Referer: [REDACTED]/' \
-e 's/X-Forwarded-For: .*/X-Forwarded-For: [REDACTED]/' \
"$1"
}
# Infinite loop to continuously run the script
while true; do
# Get a random keyword
keyword=$(get_random_keyword)
# Define the search URL with the random keyword
search_url="https://www.google.com/search?q=${keyword}+site:miralishahidi.ir"
echo "--------------------------------------------------"
echo "Selected Keyword: $keyword"
echo "Search URL: $search_url"
# Fetch the page content and headers
echo "Fetching page content from: $search_url"
curl -v -D headers.txt "$search_url" -o page_content.html
# Check if the curl command was successful
if [ $? -eq 0 ]; then
echo "Page content fetched successfully."
else
echo "Failed to fetch page content."
continue
fi
# Mask and print the headers for debugging
echo "Masked HTTP Headers:"
mask_headers headers.txt
# Extract and clean up links
echo "Extracting and cleaning up links..."
links=$(grep -oP '(?<=href="/url\?q=)[^"]*' page_content.html | \
sed 's/&/&/g' | \
sed 's/%3A/:/g; s/%2F/\//g; s/%3F/?/g; s/%26/&/g; s/%2C/,/g; s/%2B/+/g; s/%20/ /g' | \
awk -F'&' '{print $1}' | \
sed 's/%20/ /g; s/%21/!/g; s/%2A/*/g; s/%28/(/g; s/%29/)/g; s/%7E/~/' | \
sed 's/%2520/ /g' | \
grep '^https://' | \
grep '\.html$')
if [ -n "$links" ]; then
echo "Links extracted and cleaned successfully."
echo "Number of links found: $(echo "$links" | wc -l)"
else
echo "No valid links found."
fi
echo "Extracted HTML links:"
echo "$links"
# Simulate clicking on each link
echo "Simulating clicks on extracted links..."
for link in $links; do
echo "Processing link: $link"
# Fetch the link content and headers
echo "Fetching content from: $link"
response=$(curl -v -s -D link_headers.txt -w "%{http_code}" -o /tmp/link_content.html "$link")
http_status=$(echo "$response" | tail -n 1)
# Mask and print the link headers
echo "Masked Link HTTP Headers:"
mask_headers link_headers.txt
# Check if the file exists before reading
if [ -f /tmp/link_content.html ]; then
echo "HTTP Status Code: $http_status"
# Display the first few lines of the response body for context
echo "Link content preview:"
head -n 10 /tmp/link_content.html
echo "Content fetched successfully from: $link"
else
echo "Failed to retrieve content for $link"
fi
echo "----------------------------------------"
done
# Clean up temporary files
rm -f page_content.html headers.txt link_headers.txt /tmp/link_content.html
echo "Temporary files cleaned up."
# Optional sleep to avoid hitting the server too frequently
sleep 10
done
The script runs in an infinite loop. The time for each iteration varies depending on network speed and the number of links extracted. On average, each iteration may take approximately 1 minute to complete.
Estimated Time for Each Iteration: (may vary based on network conditions and the number of links).
This Bash script automates the process of querying IT and technology-related search terms on the Mir Ali Shahidi website and processing the results. Here is a detailed breakdown:
curl
to fetch the HTML content of the search results page and saves it for processing.This process allows the script to continually interact with the content on the Mir Ali Shahidi website using a variety of IT-related search queries.