微信建设银行官方网站,cn域名的网站,wordpress phpcms 开发,网站返回503的含义是前言#xff1a;
本篇文章主要作为一个爬虫项目的小练习#xff0c;来给大家进行一下爬虫的大致分析过程以及来帮助大家在以后的爬虫编写中有一个更加清晰的认识。
一#xff1a;环境配置
Python版本#xff1a;3.7
IDE:PyCharm
所需库#xff1a;requests#xff0…前言
本篇文章主要作为一个爬虫项目的小练习来给大家进行一下爬虫的大致分析过程以及来帮助大家在以后的爬虫编写中有一个更加清晰的认识。
一环境配置
Python版本3.7
IDE:PyCharm
所需库requestsbs4,xlwt
二网页分析
1我们需要去找到user-Agent 三编写代码
1导入所需库
import requests
from bs4 import BeautifulSoup
import xlwt
2编写请求头与参数
url https://trains.ctrip.com/TrainBooking/Search.aspx
headers{User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36,Cookie:UnionOUIDindexAllianceID4897SID155952SourceIDcreatetime1693561627Expires1694166426834; MKT_OrderClickASID4897155952AID4897CSID155952OUIDindexCT1693561626835CURLhttps%3A%2F%2Fwww.ctrip.com%2F%3Fsid%3D155952%26allianceid%3D4897%26ouid%3DindexVAL{}; _ubtstatus%7B%22vid%22%3A%221693561626984.ex3rp%22%2C%22sid%22%3A1%2C%22pvid%22%3A1%2C%22pid%22%3A102001%7D; MKT_CKID1693561627205.kumds.y2nu; MKT_CKID_LMT1693561627205; GUID09031035213146004963; _jzqco%7C%7C%7C%7C1693561627595%7C1.1256646287.1693561627210.1693561627210.1693561627210.1693561627210.1693561627210.0.0.0.1.1; _RF1183.230.199.69; _RSG..qaukvM.m2ykJjUVrQ3T8; _RDG28437eee4e4c56259b173f8be0c752f59b; _RGUID2c3e5b9b-b893-4fbe-8743-6b57deb53bbc; MKT_PagesourcePC; _bfaStatusPVSend1; _bfip1%3D102001%26p2%3D0%26v1%3D1%26v2%3D0; _bfaStatussuccess; nfes_isSupportWebP1; nfes_isSupportWebP1; Hm_lvt_576acc2e13e286aa1847d8280cd967a51693561632; UBT_VID1693561626984.ex3rp; __zpspc9.1.1693561627.1693561631.3%232%7Cwww.baidu.com%7C%7C%7C%25E6%2590%25BA%25E7%25A8%258B%7C%23; _resDomainhttps%3A%2F%2Fbd-s.tripcdn.cn; Hm_lpvt_576acc2e13e286aa1847d8280cd967a51693580464; _bfa1.1693561626984.ex3rp.1.1693580463154.1693580623580.1.6.10650065554; _pd%7B%22_o%22%3A30%2C%22s%22%3A154%2C%22_s%22%3A1%7D
}
params{from:wushan,to:chongqing,dayday:false,fronCn:巫山,toCn:重庆,date:2023-09-02,
}
3发送请求并编写表头把数据写入excel文件
responserequests.get(urlurl,headersheaders,paramsparams)
soupBeautifulSoup(response.text,html.parser)
ticket_listsoup.select(#div_Result .list_item)workbook xlwt.Workbook(encodingutf-8)
worksheetworkbook.add_sheet(Ticket Info,cell_overwrite_okTrue)worksheet.write(0,0,label车次)
worksheet.write(0,1,label出发时间)
worksheet.write(0,2,label到达时间)
worksheet.write(0,3,label历时)
worksheet.write(0,4,label余票)row1
for ticket in ticket_list:train_noticket.select(.numa)[0].text.strip()start_timeticket.select(.cds .start_time)[0].text.strip()end_time ticket.select(.cds .end_time)[0].text.strip()duration ticket.select(.cds .time)[0].text.strip()remarks ticket.select(.cds .note)[0].text.strip()ticket_url https://trains.ctrip.com/TrainBooking/TrainQuery.aspxticket_params{from:wushan,to:chongqing,dayday:false,date:2023-09-02,trainNo:train_no,}ticket_responserequests.get(ticket_url,headersheaders,paramsticket_params)ticket_soupBeautifulSoup(ticket_response.text,html.parser)ticket_remainingticket_soup.select(.new_situation p span)[0].text.strip()worksheet(row,0,labeltrain_no)worksheet(row, 1,labelstart_time)worksheet(row, 2,labelend_time)worksheet(row, 3,labelduration)worksheet(row, 4,labelticket_remaining)row 1print(train_no,start_time,end_time,duration,remarks,ticket_remaining)
workbook.save(ticket_info.xls)
以上便是基本的源码由于12306官网具有严格的反爬机制所以不建议对12306官网进行爬取如果未经授权将会承担相关责任所以请选择其他软件进行示范不过其他软件也会具有一些反爬机制会导致爬取失败。