Thursday, May 17, 2012

Groovy web scraping

Groovy web page scraping the easy way. I found this from an example on and it works quite well even today on Grails 2. This uses Tag Soup 1.2.1 and Groovy's XMLSlurper.

In about 10 lines if code I can scrape the form fields (this one only does inputs and selects) off a web page:

 def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
        def slurper = new XmlSlurper(tagsoupParser)
        def htmlParser = slurper.parse(config.clientUrl)
        ArrayList inputs = new ArrayList();

   == 'input' || == 'select'
        }.each {
            if (it.attributes().get(id) ) {

No comments: