Python爬一下工作岗位

年后回来开始大规模开始投简历,在gpt的帮助下快速写了个小爬虫爬取一下boss直聘上相关的岗位

这个网站有反爬机制,但是用webdriver直接控制电脑打开浏览器一页页的翻过去爬,还是挺稳妥,需要的可以直接拿去用,把baseurl换掉就可以爬了,记得最后加上”&page=“,翻页用的。后面的参数是筛选后的城市、年线、工资水平、岗位类型等,自己在网页上筛选后把URL拿下来就行

可以用来分析一下行情,地域分布,薪资分布,行业分布等等,我在excel简单做了下,就没有麻烦pandas了

然后,这个对找不找得到工作,并没有多大的帮助,哈哈

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

# Boss直聘的URL,可自行在浏览器打开并且加入【筛选条件】以后再复制到下方
base_url = 'https://www.zhipin.com/web/geek/job?query=%E4%BA%A7%E5%93%81%E7%BB%8F%E7%90%86&city=101280100&degree=203&jobType=1901&salary=406&page='

driver = webdriver.Chrome()

all_jobs = []
for page in range(1, 11):
url = base_url + str(page)
driver.get(url)
#等待1分钟
time.sleep(60)

try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, 'job-card-wrapper'))
)
finally:
job_name = [elem.text for elem in driver.find_elements(By.CLASS_NAME, 'job-name')]
job_area = [elem.text for elem in driver.find_elements(By.CLASS_NAME, 'job-area')]
salary = [elem.text for elem in driver.find_elements(By.CLASS_NAME, 'salary')]
years_graduate = [elem.text for elem in driver.find_elements(By.CLASS_NAME, 'tag-list')]
info_public = [elem.text for elem in driver.find_elements(By.CLASS_NAME, 'info-public')]
company = [elem.text for elem in driver.find_elements(By.CLASS_NAME, 'company-name')]
com_info = [elem.text for elem in driver.find_elements(By.CLASS_NAME, 'company-tag-list')]
detail = [elem.text for elem in driver.find_elements(By.CLASS_NAME, 'info-desc')]
job_links = [elem.get_attribute('href') for elem in driver.find_elements(By.CSS_SELECTOR, 'a.job-card-left') ]

jobs = list(zip(job_name, job_area, salary, years_graduate, info_public, company, com_info, detail, job_links))
all_jobs.extend(jobs)

#写入dataframe
df = pd.DataFrame(all_jobs, columns=[ '公司名', '职位', '区域', '薪资', '经验要求', '信息公开','企业信息', '详细', '链接'])

# 保存到Excel文件
df.to_excel('boss直聘产品岗位列表.xlsx', index=False)