Bash Script for IT Keyword Search

Installation Prerequisites

Ensure the following prerequisites are installed on your system before running the script:

Installing Prerequisites on Linux

For Debian-based distributions (like Ubuntu), use the following commands:


sudo apt update
sudo apt install bash curl gawk
    

For Red Hat-based distributions (like CentOS), use:


sudo yum install bash curl gawk
    

Installing Prerequisites on Termux

For Android devices using Termux, follow these steps:

  1. Update the package lists: pkg update
  2. Install Bash: pkg install bash
  3. Install Curl: pkg install curl
  4. Install Gawk: pkg install gawk

Script Overview

This script automates the process of searching for IT and technology-related keywords on the Mir Ali Shahidi website and processes the search results. Here’s a brief overview:

Script Details


#!/bin/bash

# Define the list of search keywords
keywords=(
    "IT" "ICT" "OT" "IIOT" "IOT" "network" "cybersecurity" "AI" "machine+learning" "data+science"
    "cloud+computing" "blockchain" "automation" "digital+transformation" "big+data"
    "analytics" "software+development" "IT+consulting" "networking" "virtualization"
    "system+integration" "tech+trends" "IT+strategy" "smart+devices" "enterprise+IT"
    "cyber+defense" "data+protection" "IT+infrastructure" "technology+solutions"
    "security+services" "cloud+storage" "IT+support" "tech+innovation" "software+engineering"
    "information+security" "IT+management" "digital+marketing" "IT+services" "enterprise+solutions"
    "IT+architecture" "IT+operations" "mobile+computing" "IT+project+management" "IT+training"
    "tech+consulting" "network+security" "IT+systems" "data+analytics" "IT+compliance"
    "IT+governance" "IT+trends" "IT+support+services" "IT+outsourcing" "technology+consulting"
)

# Function to get a random keyword from the list
get_random_keyword() {
    echo "${keywords[$RANDOM % ${#keywords[@]}]}"
}

# Function to mask headers
mask_headers() {
    sed -e 's/Authorization: .*/Authorization: [REDACTED]/' \
        -e 's/User-Agent: .*/User-Agent: [REDACTED]/' \
        -e 's/Referer: .*/Referer: [REDACTED]/' \
        -e 's/X-Forwarded-For: .*/X-Forwarded-For: [REDACTED]/' \
        "$1"
}

# Infinite loop to continuously run the script
while true; do
    # Get a random keyword
    keyword=$(get_random_keyword)
    
    # Define the search URL with the random keyword
    search_url="https://www.google.com/search?q=${keyword}+site:miralishahidi.ir"
    
    echo "--------------------------------------------------"
    echo "Selected Keyword: $keyword"
    echo "Search URL: $search_url"
    
    # Fetch the page content and headers
    echo "Fetching page content from: $search_url"
    curl -v -D headers.txt "$search_url" -o page_content.html

    # Check if the curl command was successful
    if [ $? -eq 0 ]; then
        echo "Page content fetched successfully."
    else
        echo "Failed to fetch page content."
        continue
    fi

    # Mask and print the headers for debugging
    echo "Masked HTTP Headers:"
    mask_headers headers.txt

    # Extract and clean up links
    echo "Extracting and cleaning up links..."
    links=$(grep -oP '(?<=href="/url\?q=)[^"]*' page_content.html | \
        sed 's/&/&/g' | \
        sed 's/%3A/:/g; s/%2F/\//g; s/%3F/?/g; s/%26/&/g; s/%2C/,/g; s/%2B/+/g; s/%20/ /g' | \
        awk -F'&' '{print $1}' | \
        sed 's/%20/ /g; s/%21/!/g; s/%2A/*/g; s/%28/(/g; s/%29/)/g; s/%7E/~/' | \
        sed 's/%2520/ /g' | \
        grep '^https://' | \
        grep '\.html$')

    if [ -n "$links" ]; then
        echo "Links extracted and cleaned successfully."
        echo "Number of links found: $(echo "$links" | wc -l)"
    else
        echo "No valid links found."
    fi

    echo "Extracted HTML links:"
    echo "$links"

    # Simulate clicking on each link
    echo "Simulating clicks on extracted links..."
    for link in $links; do
        echo "Processing link: $link"
        
        # Fetch the link content and headers
        echo "Fetching content from: $link"
        response=$(curl -v -s -D link_headers.txt -w "%{http_code}" -o /tmp/link_content.html "$link")
        http_status=$(echo "$response" | tail -n 1)
        
        # Mask and print the link headers
        echo "Masked Link HTTP Headers:"
        mask_headers link_headers.txt
        
        # Check if the file exists before reading
        if [ -f /tmp/link_content.html ]; then
            echo "HTTP Status Code: $http_status"
            
            # Display the first few lines of the response body for context
            echo "Link content preview:"
            head -n 10 /tmp/link_content.html
            
            echo "Content fetched successfully from: $link"
        else
            echo "Failed to retrieve content for $link"
        fi
        
        echo "----------------------------------------"
    done
    
    # Clean up temporary files
    rm -f page_content.html headers.txt link_headers.txt /tmp/link_content.html
    echo "Temporary files cleaned up."

    # Optional sleep to avoid hitting the server too frequently
    sleep 10
done
    

Process Execution Time

The script runs in an infinite loop. The time for each iteration varies depending on network speed and the number of links extracted. On average, each iteration may take approximately 1 minute to complete.

Estimated Time for Each Iteration: (may vary based on network conditions and the number of links).

Detailed Program Explanation

This Bash script automates the process of querying IT and technology-related search terms on the Mir Ali Shahidi website and processing the results. Here is a detailed breakdown:

This process allows the script to continually interact with the content on the Mir Ali Shahidi website using a variety of IT-related search queries.