Thursday, May 17, 2012

Groovy web scraping

Groovy web page scraping the easy way. I found this from an example on http://www.maclovin.de/2010/02/robust-html-parsing-the-groovy-way/ and it works quite well even today on Grails 2. This uses Tag Soup 1.2.1 and Groovy's XMLSlurper.

In about 10 lines if code I can scrape the form fields (this one only does inputs and selects) off a web page:


 def tagsoupParser = new org.ccil.cowan.tagsoup.Parser()
        def slurper = new XmlSlurper(tagsoupParser)
        def htmlParser = slurper.parse(config.clientUrl)
        ArrayList inputs = new ArrayList();

        htmlParser.'**'.findAll{
            it.name() == 'input' || it.name() == 'select'
        }.each {
            if (it.attributes().get(id) ) {
                inputs.add(it)
            }
        }

No comments: