图书简介:
项目一 Python基础认知 ····················································································· 1任务一初识 Python......................................................................................................................................... 1一、Python简介........................................................................................................................................ 1二、安装 Python........................................................................................................................................ 2三、安装 PyCharm .................................................................................................................................... 6四、Python语法规范 .............................................................................................................................. 11任务二了解 Python命令的组成 .................................................................................................................. 13一、基本符号 .......................................................................................................................................... 14二、常量与变量 ...................................................................................................................................... 16三、数据类型 .......................................................................................................................................... 19四、功能符号 .......................................................................................................................................... 24任务三了解程序结构 ................................................................................................................................... 26一、表达式语句 ...................................................................................................................................... 26二、顺序结构 .......................................................................................................................................... 27三、选择结构 .......................................................................................................................................... 28四、循环结构 .......................................................................................................................................... 30五、条件表达式 ...................................................................................................................................... 31六、程序的流程控制 .............................................................................................................................. 32项目实战 ........................................................................................................................................................... 33实战输出百度网址 .............................................................................................................................. 33项目二网络爬虫基础认知 ················································································· 35任务一了解网络爬虫 ................................................................................................................................... 35一、网络爬虫的基本原理 ...................................................................................................................... 36二、网络爬虫系统框架 .......................................................................................................................... 37三、爬行策略 .......................................................................................................................................... 37四、网络爬虫的分类 .............................................................................................................................. 38五、开源网络爬虫框架/项目 ................................................................................................................. 39 任务二认识 HTTP........................................................................................................................................ 41一、HTTP的工作原理........................................................................................................................... 41二、Urllib模块库 ................................................................................................................................... 42三、URL定义......................................................................................................................................... 43四、URL编码设置................................................................................................................................. 47任务三熟悉网页请求过程 ........................................................................................................................... 50一、发送请求报文.................................................................................................................................. 51二、返回响应.......................................................................................................................................... 52三、HTTP消息....................................................................................................................................... 53项目实战........................................................................................................................................................... 54实战一搜索商品网址.......................................................................................................................... 54实战二搜索食品价格网址.................................................................................................................. 56项目三 Urllib请求模块库的应用 ········································································· 58任务一发送网页请求 ................................................................................................................................... 58一、基本 HTTP请求 .............................................................................................................................. 58二、Request网络请求............................................................................................................................ 66三、设置请求头...................................................................................................................................... 67四、Handler方法发送请求 .................................................................................................................... 69五、设置代理 IP ..................................................................................................................................... 71六、身份验证.......................................................................................................................................... 73任务二网页下载........................................................................................................................................... 77一、网页结构.......................................................................................................................................... 77二、写入网页文件.................................................................................................................................. 77三、网页文件下载.................................................................................................................................. 79项目实战 ........................................................................................................................................................... 82实战一下载 Python学习网址............................................................................................................. 82实战二下载公司网页 HTML文件..................................................................................................... 85项目四安装 Urllib3请求模块库并发送请求··························································· 87任务一安装 Urllib3请求模块库 .................................................................................................................. 87一、安装 Anaconda................................................................................................................................. 87二、安装 Urllib3模块库 ........................................................................................................................ 92任务二发送请求 ........................................................................................................................................... 95一、创建代理对象.................................................................................................................................. 96二、请求方法.......................................................................................................................................... 98三、定义请求头...................................................................................................................................... 99四、设置代理 IP ................................................................................................................................... 101五、自动重试........................................................................................................................................ 102六、重定向............................................................................................................................................ 103项目实战 ......................................................................................................................................................... 104 实战发送请求访问淘宝网站 ............................................................................................................ 104项目五 Requests请求模块库的应用 ··································································106任务一网页请求 ......................................................................................................................................... 106一、标准的 HTTP请求 ........................................................................................................................ 107二、返回响应消息 ................................................................................................................................ 109三、JSON格式数据 .............................................................................................................................. 114任务二发送请求方法 ................................................................................................................................. 117一、发送 GET请求方法....................................................................................................................... 118二、发送 POST请求方法..................................................................................................................... 120三、其他请求方法 ................................................................................................................................ 125任务三复杂网络请求 ................................................................................................................................. 126一、复杂请求头 .................................................................................................................................... 126二、上传文件 ........................................................................................................................................ 129三、Cookies验证 .................................................................................................................................. 131四、会话保持 ........................................................................................................................................ 131任务四异常处理 ......................................................................................................................................... 133一、try-except语句 ............................................................................................................................... 133二、Urllib异常处理模块...................................................................................................................... 134三、Urllib3异常处理模块.................................................................................................................... 135四、request异常处理模块.................................................................................................................... 135项目实战 ......................................................................................................................................................... 138实战爬取豆瓣最受欢迎的影评网址................................................................................................. 138项目六解析网页 ····························································································141任务一使用正则表达式解析网页 .............................................................................................................. 141一、正则表达式模式 ............................................................................................................................ 142二、使用 re模块实现正则表达式 ....................................................................................................... 143三、字符串查找 .................................................................................................................................... 144四、字符串替换 .................................................................................................................................... 148五、字符串分割 .................................................................................................................................... 149任务二利用 XPath解析网页...................................................................................................................... 150一、XPath概述 ..................................................................................................................................... 150二、XPath网页解析 ............................................................................................................................. 152三、获取节点信息 ................................................................................................................................ 154四、节点关系 ........................................................................................................................................ 160五、查找节点信息 ................................................................................................................................ 162六、属性节点 ........................................................................................................................................ 163七、XPath运算符 ................................................................................................................................. 165八、XML节点轴 .................................................................................................................................. 168 任务三使用 BeautifulSoup解析网页 ........................................................................................................ 170一、安装 BeautifulSoup........................................................................................................................ 171二、创建 BeautifulSoup对象............................................................................................................... 171三、通过属性获取节点内容................................................................................................................ 173四、根据节点关系获取节点................................................................................................................ 176五、查找节点内容................................................................................................................................ 178六、通过 CSS选择器查找节点内容 ................................................................................................... 182项目实战......................................................................................................................................................... 183实战一获取查询网中河北省石家庄市的邮编区号 ........................................................................ 183实战二爬取销售热门图书名称 ........................................................................................................ 186实战三下载销售热门图书的图片 .................................................................................................... 188项目七 Scrapy网络爬虫框架认知及应用·····························································190任务一 Scrapy网络爬虫框架基础认知 ..................................................................................................... 190一、Scrapy网络爬虫框架基础............................................................................................................ 190二、Scrapy常用命令............................................................................................................................ 192三、创建 Scrapy项目........................................................................................................................... 193任务二使用模板创建 Spider文件............................................................................................................. 194一、创建网络爬虫文件命令 ................................................................................................................ 195二、创建 basic模板文件...................................................................................................................... 196三、创建 crawl模板文件 ..................................................................................................................... 197四、创建 csvfeed模板文件.................................................................................................................. 198五、创建 xmlfeed模板文件................................................................................................................. 198任务三 Scrapy网络爬虫文件 ..................................................................................................................... 199一、Spider类 ........................................................................................................................................ 199二、配置网络爬虫................................................................................................................................ 201三、启动网络爬虫................................................................................................................................ 202四、提取数据........................................................................................................................................ 207项目实战 ......................................................................................................................................................... 209实战提取景区名称............................................................................................................................ 209
展开
在互联网大数据时代,海量数据爆炸式地出现在网络中,给人们的生活带来极大的便利。但同时,在海量的信息中,大多数信息是无效的垃圾信息。如何在海量的信息碎片中得到真正需要的信息,成为人们的迫切需求。最简单的数据信息获取方式是人工操作浏览器搜索信息,但是单靠人工进行筛选不太现实,于是网络爬虫技术应运而生。通过该技术将相关的内容收集起来,再经过分析、筛选才能得到人们真正需要的信息。网络爬虫(又被称为网页蜘蛛、网络机器人)是一种模拟浏览器发送网络请求、接收请求响应,按照一定的规则自动抓取互联网信息的程序。网络爬虫可以用来爬表格、爬图片、爬视频等,能通过浏览器访问的数据都可以通过网络爬虫获取。本书以由浅入深、循序渐进的方式展开讲解,并通过经典的实例对 Python网络爬虫的功能进行详细介绍,具有极高的实用价值。通过本书的学习,读者可以掌握 Python网络爬虫的基本原理和应用方法。一、本书特点 .实例丰富本书中的实例不管是数量还是种类,都非常丰富。本书结合大量的 Python网络爬虫实例,详细介绍 Python网络爬虫的基本原理,让读者在学习实例的过程中潜移默化地掌握 Python网络爬虫的应用方法。 .突出提升技能本书从全面提升读者的 Python网络爬虫实际应用能力出发,通过深入剖析实例,使读者能够独立地完成各种 Python网络爬虫应用操作。书中的大部分实例源自 Python网络爬虫项目案例,经过编者的精心提炼和改编,不仅能帮助读者学好知识点,而且能够提升读者的实际操作水平。 .技能与思政教育紧密结合本书在介绍 Python网络爬虫专业知识的同时,紧密结合思政教育主旋律,使读者在学好专业知识的同时,还能强化思政教育。 .项目式教学,实操性强本书的编者都是高校从事 Python网络爬虫教学与研究多年的一线教师,具有丰富的教学实践经验与教材编写经验。多年的教学工作使他们能够准确地把握学生的心理与实际需求。编者总结多年的开发经验及教学心得体会,力求在本书中全面、细致地展现 Python网络爬虫的基本原理和应用方法。
.项目形式,实用性强本书采用项目的形式组织内容,把 Python网络爬虫的理论知识分解并融入每个项目中,增强了本书的实用性。二、本书的基本内容本书共 7个项目,具体内容包括: Python基础认知、网络爬虫基础认知、 Urllib请求模块库的应用、安装 Urllib3请求模块库并发送请求、 Requests请求模块库的应用、解析网页、 Scrapy网络爬虫框架认知等。三、关于本书的服务本书本书由江西青年职业学院彭涛、谢宏兰担任主编,由武汉厚溥集团厚溥研究院(合作企业)高级工程师李伟、东华理工大学全蕾、江西青年职业学院余丽娜、付比鹤担任副主编。其中,彭涛老师编写项目三、六和项目七的内容,谢宏兰老师编写项目五的内容,李伟高级工程师和付比鹤老师编写项目二的内容,余丽娜老师编写项目四的内容,全蕾老师编写项目一的内容,付比鹤老师还承担了项目资料的整理工作。为满足教师的教学需求,本书配备了丰富的教学资源,包括电子课件、源文件等,读者可以登录华信教育资源网(www.hxedu.com.cn)免费注册后下载本书的相关教学资源。如有问题,请在网站留言板留言或与电子工业出版社联系( E-mail: hxedu@phei.com.cn)。由于编者水平有限,书中不妥之处在所难免,恳请广大读者批评指正。编者
展开