Selenium网络爬虫入门

作者 Chloe P ( 更新于 2025年十二月 4日 )

更新于 2025年十二月 4日

网络爬虫是从互联网上的各种网页抓取数据的过程。如果您想了解如何使用Multilogin个人资料进行网络爬虫，请按照本指南操作，您将学会如何编写一个简单的脚本！

本文旨在指导您逐步创建脚本。如果您想参考完整的脚本，请直接滚动到文末。

步骤 1：准备ID E 或类似软件

你需要一些工具来编写脚本。使用什么工具取决于你，但我们建议使用ID E。请按照以下文章中的前 4 个步骤操作：自动化脚本入门。

步骤 2：创建连接到API脚本并定义函数

在此步骤中，您需要使脚本与API协同工作。脚本将包含：

API端点
凭证变量
定义了登录、打开和关闭个人资料的功能
已导入模块，包括requests 、 hashlib和time 。一些与Selenium相关的模块也会被包含进来。
登录请求

请使用以下模板：

import requests
import hashlib
import time
from selenium import webdriver
from selenium.webdriver.chromium.options import ChromiumOptions
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By

MLX_BASE = "https://api.multilogin.com"
MLX_LAUNCHER = "https://launcher.mlx.yt:45001/api/v1"
MLX_LAUNCHER_V2 = (
    "https://launcher.mlx.yt:45001/api/v2"  # recommended for launching profiles
)
LOCALHOST = "http://127.0.0.1"
HEADERS = {"Accept": "application/json", "Content-Type": "application/json"}
# TODO: Insert your account information in both variables below
USERNAME = ""
PASSWORD = ""
# TODO: Insert the Folder ID and the Profile ID below
FOLDER_ID = ""
PROFILE_ID = ""


def signin() -> str:
    payload = {
        "email": USERNAME,
        "password": hashlib.md5(PASSWORD.encode()).hexdigest(),
    }
    r = requests.post(f"{MLX_BASE}/user/signin", json=payload)
    if r.status_code != 200:
        print(f"\nError during login: {r.text}\n")
    else:
        response = r.json()["data"]
    token = response["token"]
    return token


def start_profile() -> webdriver:
    r = requests.get(
        f"{MLX_LAUNCHER_V2}/profile/f/{FOLDER_ID}/p/{PROFILE_ID}/start?automation_type=selenium",
        headers=HEADERS,
    )
    response = r.json()
    if r.status_code != 200:
        print(f"\nError while starting profile: {r.text}\n")
    else:
        print(f"\nProfile {PROFILE_ID} started.\n")
    selenium_port = response["data"]["port"]
    driver = webdriver.Remote(
        command_executor=f"{LOCALHOST}:{selenium_port}", options=ChromiumOptions()
    )
    # For Stealthfox profiles use: options=Options()
    # For Mimic profiles use: options=ChromiumOptions()
    return driver


def stop_profile() -> None:
    r = requests.get(f"{MLX_LAUNCHER}/profile/stop/p/{PROFILE_ID}", headers=HEADERS)
    if r.status_code != 200:
        print(f"\nError while stopping profile: {r.text}\n")
    else:
        print(f"\nProfile {PROFILE_ID} stopped.\n")

token = signin()
HEADERS.update({"Authorization": f"Bearer {token}"})

该模板与Selenium自动化示例类似，只是开头导入了以下模块（我们需要用它来进行数据抓取）：

from selenium.webdriver.common.by import By

步骤 3：选择要抓取数据的网页

您可以使用任何包含文本的网站，但对于本指南，我们建议您尝试此页面——它非常适合练习自动化任务：大型和深度 DOM 。

第四步：寻找目标信息

在本例中，我们将使用下表中的数据：

我们将获取表格中的所有值。您可以这样做：

在浏览器中打开开发者工具。以下是基于 Chromium 和 Firefox 的浏览器的操作方法：
1. Windows和Linux ：按Ctrl + Shift + I
2. macOS ：按Cmd + Option + I
请确保您位于“元素”选项卡上。
使用搜索快捷键查找目标值
1. Windows和Linux ： CTRL + F
2. macOS ： Cmd + F
输入您想要看到的文本值。在本例中，它是“表格”。
找到您需要用于抓取的值。在本例中，它将是以下值： <table id="large-table">
将鼠标悬停在“元素”选项卡中带有标签的元素上
右键单击，然后左键单击“复制”——“复制选择器”
把这个数值记下来——你以后会用到的。

步骤 5：返回ID E 并添加新的代码字符串

返回到你选择的ID E（例如， VS Code ）
点击代码字段，添加一个变量，用于打开个人资料并执行操作： driver = start_profile()
添加 driver.get(“<您的网站>”)。在本例中，它将是以下命令：
```
driver.get("https://the-internet.herokuapp.com/large")
```
现在我们需要给脚本一些延迟，让它在打开网页 5 秒后尝试执行其他命令： time.sleep(5)

步骤 6：编写脚本以查找元素

使用以下命令查找元素： driver.find_element(By.<attribute on the page>, "<element>") 。它会告诉脚本要在页面上查找什么。由于我们在步骤 4 中复制了 CSS 选择器，因此您的实际命令将如下所示：

driver.find_element(By.ID, "large-table")

我们稍后需要获取它的值，所以我们需要为该命令创建一个变量，例如fetch ：

fetch = driver.find_element(By.ID, "large-table")

步骤 7：打印最终结果并停止分析

使用 print() 函数打印最终结果。由于我们需要提取文本值，因此需要从变量中获取文本。结果如下：
```
print(fetch.text)
```
添加在最后停止分析的功能：
```
stop_profile()
```
保存.py脚本，稍后需要执行一些额外步骤。

步骤 8：运行脚本前先准备好脚本

安装以下 Python 库（更多详情请参阅您的ID E 文档）：
1. 请求
2. 硒
请将您的值代入脚本中的以下变量：
1. USERNAME ：您的Multilogin X帐户邮箱
2. PASSWORD ：您的Multilogin X帐户密码（无需MD5 加密）
3. FOLDER_ ID 、 PROFILE_ ID ：请使用我们的DevTools或Postman指南查找这些值。

步骤 9：运行脚本

打开桌面应用程序（如果您使用的是网页界面，则连接代理）。
默认情况下，以下脚本适用于Mimic 。要将其用于Stealthfox ，请将以下行中的options=ChromiumOptions()替换为options=Options() ：
driver = webdriver.Remote(command_executor=f'{LOCALHOST}:{selenium_port}', options=ChromiumOptions())
运行包含自动化代码的.py文件

要在 VS Code 中运行脚本，请点击“运行”→“不调试运行”（或“开始调试”）。

如果一切操作正确，您将在终端中看到结果。
fetch.py 2025-12-02 下午 4:39:25

笔记

恭喜你完成了第一个爬虫脚本！你并非只能使用这几种方法。Python 和Selenium是非常灵活的工具，它们还有更大的潜力。以下是一些建议：

如果您需要根据相似值（例如ID ）获取多个值，可以使用以下函数：
driver.find_element s (By.<attribute on the page>, "<element>")
你可以向 `print()` 函数添加多个值。你可以在网上找到更多相关信息。例如，你可以在fetch.text之前添加文本。这将使打印结果更易于阅读，并且对于调试脚本也很有用。以下是一个示例，你可以在脚本中进行测试：
```
print("Your values: ", fetch.text)
```
Selenium的实现方式不止一种。请查看其帮助中心了解更多详情： Selenium文档

完整剧本

import requests
import hashlib
import time
from selenium import webdriver
from selenium.webdriver.chromium.options import ChromiumOptions
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By

MLX_BASE = "https://api.multilogin.com"
MLX_LAUNCHER = "https://launcher.mlx.yt:45001/api/v1"
MLX_LAUNCHER_V2 = (
    "https://launcher.mlx.yt:45001/api/v2"  # recommended for launching profiles
)
LOCALHOST = "http://127.0.0.1"
HEADERS = {"Accept": "application/json", "Content-Type": "application/json"}
# TODO: Insert your account information in both variables below
USERNAME = ""
PASSWORD = ""
# TODO: Insert the Folder ID and the Profile ID below
FOLDER_ID = ""
PROFILE_ID = ""


def signin() -> str:
    payload = {
        "email": USERNAME,
        "password": hashlib.md5(PASSWORD.encode()).hexdigest(),
    }
    r = requests.post(f"{MLX_BASE}/user/signin", json=payload)
    if r.status_code != 200:
        print(f"\nError during login: {r.text}\n")
    else:
        response = r.json()["data"]
    token = response["token"]
    return token


def start_profile() -> webdriver:
    r = requests.get(
        f"{MLX_LAUNCHER_V2}/profile/f/{FOLDER_ID}/p/{PROFILE_ID}/start?automation_type=selenium",
        headers=HEADERS,
    )
    response = r.json()
    if r.status_code != 200:
        print(f"\nError while starting profile: {r.text}\n")
    else:
        print(f"\nProfile {PROFILE_ID} started.\n")
    selenium_port = response["data"]["port"]
    driver = webdriver.Remote(
        command_executor=f"{LOCALHOST}:{selenium_port}", options=ChromiumOptions()
    )
    # For Stealthfox profiles use: options=Options()
    # For Mimic profiles use: options=ChromiumOptions()
    return driver


def stop_profile() -> None:
    r = requests.get(f"{MLX_LAUNCHER}/profile/stop/p/{PROFILE_ID}", headers=HEADERS)
    if r.status_code != 200:
        print(f"\nError while stopping profile: {r.text}\n")
    else:
        print(f"\nProfile {PROFILE_ID} stopped.\n")


token = signin()
HEADERS.update({"Authorization": f"Bearer {token}"})
driver = start_profile()
driver.get("https://the-internet.herokuapp.com/large")
time.sleep(5)
fetch = driver.find_element(By.ID, "large-table")
print(fetch.text)
stop_profile()

本文包含第三方链接，我们并未正式认可这些链接。

自定义 Python 脚本

联系我们