当前位置：首页 > news >正文

大型网站建站公司上市中国菲律宾热身赛

news 2025/12/20 13:04:09

大型网站建站公司上市,中国菲律宾热身赛,joomla 做外贸网站好的,石家庄百度首页python爬虫的BeautifulSoup4 BeautifulSoup4导入模块解析文件创建对象python解析器beautifulsoup对象的种类Tag获取整个标签获取标签里的属性和属性值Navigablestring 获取标签里的内容BeautifulSoup获取整个文档Comment输出的内容不包含注释符号BeautifulSoup文档遍历Beautifu… python爬虫的BeautifulSoup4 BeautifulSoup4导入模块解析文件创建对象python解析器beautifulsoup对象的种类Tag获取整个标签获取标签里的属性和属性值Navigablestring 获取标签里的内容BeautifulSoup获取整个文档Comment输出的内容不包含注释符号BeautifulSoup文档遍历BeautifulSoup文档搜索 BeautifulSoup4 导入模块 from bs4 import BeautifulSoup解析文件如果是本地文件直接以写入权限打开并用bs解析 with open(index.html, r, encodingutf-8) as f:html f.read()如果是网页文件则需要先用爬虫爬取然后解析 response requests.get(urlurl, headersheaders) html response.text创建对象解析的第一步是构建一个BeautifulSoup对象基本用法: response requests.get(urlurl, headersheaders) html response.text soup beautifulsoup(html,html.parser) #处理html的解析器python解析器 soup beautifulsoup(html,html.parser) soup beautifulsoup(html,lxml) soup beautifulsoup(html,xml)beautifulsoup对象的种类 Beautiful Soup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为4种 TagNavigableStringBeautifulSoupComment Tag获取整个标签 tag中最重要的属性name和attributes from bs4 import BeautifulSoup # 逐一解析数据把html使用html.parser进行解析 bs BeautifulSoup(html,html.parser) print(bs.a) # 返回找到的第一个a标签返回时的整个标签 Tag print(bs.title) title百度一下你就知道title获取标签里的属性和属性值 bs BeautifulSoup(html,html.parser) print(bs.a.attrs) # 返回找到的第一个title标签的属性和属性值字典形式 {href: https://accounts.douban.com/passport/login?sourcemovie, class: [nav-login], rel: [nofollow]} print(bs.a.attrs[href]) #查看某个属性的值 https://accounts.douban.com/passport/login?sourcemovie# 获取p标签的属性 bs.a.attrs(返回字典) or soup.p.attrs[class](class返回列表其余属性返回字符串) bs.a.[class](class返回列表其余属性返回字符串) bs.a.get(class)(class返回列表其余属性返回字符串)Navigablestring 获取标签里的内容 bs BeautifulSoup(html,html.parser) print(bs.title.string) # 返回找到的第一个title标签的内容字符串百度一下你就知道 bs.title.string bs.title.text bs.title.get.text()BeautifulSoup获取整个文档 bs BeautifulSoup(html,html.parser) print(bs) # 返回整个文档的内容Comment输出的内容不包含注释符号 soup BeautifulSoup(p classt1!-- div classenvenv的信息内容/div --/p, html.parser) print(soup.p.string) #如果标签内部的内容是注释例如!-- --那么该NavigableSring对象会转换成Comment对象并且会把注释符号去掉。 div classenvenv的信息内容/div BeautifulSoup文档遍历 bs BeautifulSoup(html,html.parser) print(bs.a.contens) # 返回a中的所有contens 列表形式可以用列表遍历 print(bs.a.contens[2])BeautifulSoup文档搜索 1.find() 查找第一个与字符串完全匹配的内容 bs BeautifulSoup(html,html.parser) a_list bs.find(a) # 查找第一个的a标签返回一个对象 a_list bs.find(a) a_list bs.find(a, class_xxx) # 注意class后的下划线 a_list bs.find(a, titlexxx) a_list bs.find(a, idxxx) a_list bs.find(a, idcompile(rxxx))2.find_all() 字符串过滤会查找所有与字符串完全匹配的内容 bs BeautifulSoup(html,html.parser) a_list bs.find_all(a) # 查找所有的a标签 a_list bs.find_all(a) a_list bs.find_all([a,span]) #返回所有的a和span标签 a_list bs.find_all(a, class_xxx) a_list bs.find_all(a, idcompile(rxxx)) # 提取出前两个符合要求的 soup.find_all(a, limit3)3.find_parent 查找当前标签的父标签 bs BeautifulSoup(html,html.parser) a_list bs.find(a).find_parent(div) # 查找当前a标签的父div标签4.find_next_sibling 查找当前标签的下一个兄弟标签 bs BeautifulSoup(html,html.parser) a_list bs.find(a).find_next_sibling(div) # 查找当前a标签的下一个div标签5.find_previous_sibling 查找当前标签的前一个兄弟标签 bs BeautifulSoup(html,html.parser) a_list bs.find(a).find_previous_sibling(div) # 查找当前a标签的前一个div标签2.search() 正则表达式搜索:使用search()方法来匹配内容 a_list bs.find_all(re.compile(a))3.get_text() 获取标签内的文本内容 a_list bs.find(a).get_text()3.自己写方法查询 def name_is_exists(tag):return tag.has_attr(name) # 查询标签中属性的名字为name的t_list bs.find_all(name_is_exists) for tag in t_list:print(tag)4.kwargs 参数 t_list bs.find_all(idhead) # 查找所有的idhead的标签 t_list bs.find_all(classTrue) t_list bs.find_all(herfhttp://news.baidu.com)5.text参数 t_list bs.find_all(texthao123) # 查找所有的idhead的标签 t_list bs.find_all(text[hao123,新闻,贴吧]) for tag in t_list:print(tag) t_list bs.find_all(text re.compile(\d)) # 应用正则表达式来查找包含特定文本的内容6.limit参数 t_list bs.find_all(a,limit3) # 查找前三个a标签7.css选择器 t_list bs.select(a) # 查找所有的a标签 t_list bs.select(.mnav) # 查找所有的类名为.mnav标签 t_list bs.select(#u1) # 查找所有的id为#u1的标签 t_list bs.select(a[classbri]) # 查找属性为bri的标签 t_list bs.select(headtitle) # 查找head标签下的title标签 t list bs.select(.mnav ~ .bri) # 查找.mnav的兄弟标签.bri的text print(t_list[0].get_text())

查看全文

http://www.pierceye.com/news/322659/