2019/06/04

C#.Net Html Agility Pack

Html Agility Pack是一套用來爬網頁還蠻不錯的套件,之前做專案也有用到
今天拿個露天w10序號來展示一下好了XD
假設我要取得非注目以外的商品的標題以及金額,那就必須知道那段文字的XPath路徑

大概看一下原始碼

得到XPath路徑應該為:
#所有的標題
//dl[@class='search_form s_grid']//div[@class='prod_info']//h5//a
#所有的價格
//dl[@class='search_form s_grid']//div[@class='prod_info']//ul//li//span[@class='price'][1]

接著就用Html Agility Pack去載入網頁,並XPath路徑餵給Html Agility Pack去取得標題與價格的Nodes


程式碼:

using HtmlAgilityPack;
using System;
using System.Windows.Forms;
using HtmlDocument = HtmlAgilityPack.HtmlDocument;

namespace WindowsFormsApp1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {
            HtmlWeb web = new HtmlWeb();
            HtmlDocument document = web.LoadFromBrowser("https://find.ruten.com.tw/s/?cateid=001100060001&q=win10");
            var docNode = document.DocumentNode;
            var titleNodes = docNode.SelectNodes(@"//dl[@class='search_form s_grid']//div[@class='prod_info']//h5//a");
            var priceNodes = docNode.SelectNodes(@"//dl[@class='search_form s_grid']//div[@class='prod_info']//ul//li//span[@class='price'][1]");
            int count = titleNodes.Count;
            for (int index = 0; index < count; index++)
            {
                Console.WriteLine("{0} price={1}", titleNodes[index].InnerText, priceNodes[index].InnerText);
            }
        }
    }
}


執行結果: