鼠标在「元素」中划过 HTML 块时,左侧对应的元素将高亮显示。按下「检查」按钮(快捷键 Ctrl+Shift+C,可以随时呼出开发者工具面板),可以实现鼠标划过网页时高亮显示 HTML 块,单击即可固定。此外,右键某一个 HTML 块,选择「滚动到视图」也可以轻松定位元素在网页中的位置。
我们可以手动展开或折叠 HTML 代码,右键我们想要爬取的内容,选择复制 XPath。
这样你就可以轻松定位需要爬取元素的位置。
我们可以现在打开 Scrapy shell 验证一下:
1 2 3 4
scrapy shell "https://quotes.toscrape.com/"
In [1]: response.xpath("/html/body/div/div[2]/div[1]/div[1]/span[1]/text()").getall() Out[1]: ['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”']
但是这个完整 XPath 表达显得不太灵活。通过开发者工具我们观察到网站中一句名言的结构是这样的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
<divclass="quote"itemscope=""itemtype="http://schema.org/CreativeWork"> <spanclass="text"itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span> <span>by <smallclass="author"itemprop="author">Albert Einstein</small> <ahref="/author/Albert-Einstein">(about)</a> </span> <divclass="tags"> Tags: <metaclass="keywords"itemprop="keywords"content="change,deep-thoughts,thinking,world"> <aclass="tag"href="/tag/change/page/1/">change</a> <aclass="tag"href="/tag/deep-thoughts/page/1/">deep-thoughts</a> <aclass="tag"href="/tag/thinking/page/1/">thinking</a> <aclass="tag"href="/tag/world/page/1/">world</a> </div> </div>
In [2]: response.xpath('//span[has-class("text")]/text()').getall() Out[2]: ['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”'......