Tag: Scraping

Web Scraping Made Simple With Zenscrape

Web scraping has always been taken care of by actual developers, since a lot of coding, proxy management and CAPTCHA-solving is involved. However, the scraped data is very often needed by people that are non-coders: Marketers, Analysts, Business Developers etc.

Zenscrape is an easy-to-use web scraping tool that allows people to scrape websites without having to code.

Let’s run through a quick example together:

Select the data you need

The setup wizard guides you through the process of setting up your data extractor. It allows you to select the information you want to scrape visually. Click on the desired piece of content and specify what type of element you have. Depending on the package you have bought (they also offer a free plan), you can select up to 30 data elements per page.

The scraper is also capable of handling element lists.

Schedule your extractor

Perhaps, you want to scrape the selected data at a specific time interval. Depending on your plan, you can choose any time span between one minute to one hour. Also, decide what is supposed to happen with the scraped data after it has been gathered.

Use your data

In this example, we have chosen the .csv-export method and have selected a 10 minute scraping interval. Our first set of data should be ready by now. Let’s take a look:

Success! Our data is ready for us to be downloaded. We can now access all individual data sets or download all previously gathered data at once, in one file.

Need more flexibility?

Zenscrape also offers a web scraping API that returns the HTML markup of any website. This is especially useful for complicated scraping projects, that require the scraped content to be integrated into a software application for further processing.

Just like the web scraping suite, the API does not forward failed requests and takes care of proxy management, Capotcha-solving and all other maintenance tasks that are usually involved with DIY web scrapers.

Since the API returns the full HTML markup of the related website, you have full flexibility in terms of data selection and further processing.

Try Zenscrape

The post Web Scraping Made Simple With Zenscrape appeared first on CSS-Tricks.

CSS-Tricks

, , ,

scrapestack: An API for Scraping Sites

(This is a sponsored post.)

Not every site has an API to access data from it. Most don’t, in fact. If you need to pull that data, one approach is to “scrape” it. That is, load the page in web browser (that you automate), find what you are looking for in the DOM, and take it.

You can do this yourself if you want to deal with the cost, maintenance, and technical debt. For example, this is one of the big use-cases for “headless” browsers, like how Puppeteer can spin up and control headless Chrome.

Or, you can use a tool like scrapestack that is a ready-to-use API that not only does the scraping for you, but likely does it better, faster, and with more options than trying to do it yourself.

Say my goal is to pull the latest completed meetup from a Meetup.com page. Meetup.com has an API, but it’s pricy and requires OAuth and stuff. All we need is the name and link of a past meetup here, so let’s just yank it off the page.

We can see what we need in the DOM:

To have a play, let’s scrape it with the scrapestack API client-side with jQuery:

$ .get('https://api.scrapestack.com/scrape',   {     access_key: 'MY_API_KEY',     url: 'https://www.meetup.com/BendJS/'   },   function(websiteContent) {      // we have the entire sites HTML here!    } );

Within that callback, I can now also use jQuery to traverse the DOM, snagging the pieces I want, and constructing what I need on our site:

// Get what we want let event = $ (websiteContent)   .find(".groupHome-eventsList-pastEvents .eventCard")   .first(); let eventTitle = event   .find(".eventCard--link")[0].innerText; let eventLink =    `https://www.meetup.com/` +    event.find(".eventCard--link").attr("href");  // Use it on page $ ("#event").append(`   $ {eventTitle} `);

In real usage, if we were doing it client-side like this, we’d make use of some rudimentary storage so we wouldn’t have to hit the API on every page load, like sticking the result in localStorage and invalidating after a few days or something.

It works!

It’s actually much more likely that we do our scraping server-side. For one thing, that’s the way to protect your API keys, which is your responsibility, and not really possible on a public-facing site if you’re using the API directly client-side.

Myself, I’d probably make a cloud function to do it, so I can stay in JavaScript (Node.js), and have the opportunity to tuck the data in storage somewhere.

I’d say go check out the documentation and see if this isn’t the right answer next time you need to do some scraping. You get 10,000 requests on the free plan to try it out anyway, and it jumps up a ton on any of the paid plans with more features.

Direct Link to ArticlePermalink

The post scrapestack: An API for Scraping Sites appeared first on CSS-Tricks.

CSS-Tricks

, ,
[Top]