ZimmWriter ScrapeOwl Integration

by Matt Zimmerman

Updated October 13, 2023
an own scraping a book with a knife

ScrapeOwl is an API that allows ZimmWriter users to hide their IP address and retrieve data from websites that would normally block you. In addition, it uses geo targeting, JavaScript rendering, CSS element extraction, captcha bypassing, and more.

Woah, that sounds very technical.

Can you explain it like I’m 5?

Sure!

ScrapeOwl lets ZimmWriters:

  1. Use Scraping Surgeon
  2. Get data from Amazon
  3. Get data from Youtube (videos and video transcripts)
  4. Get data from websites that would normally block you
  5. Get data from Goo… wait, that’s coming soon

Now let’s cover the various options in the ScrapeOwl menu.

Buy a ScrapeOwl API Key

The average price for a scraper is about $49/mo for a limited amount of credits. But that amount of credits is normally well beyond what a normal ZimmWriter user needs.

So I worked out a sweet deal with the ScrapeOwl team. I make $0 on it and it’s 100% benefiting you.

If you click the “Buy a ScrapeOwl API key” (or follow this link) it will activate a secret pricing mode. Follow the link then click on the “pricing” tab at the top. You should see a $5 and $10 plan.

Sign up for those plans (immediately after clicking the button or following the link). You’ll get a free trial with 1,000 free credits (no credit card required)!

When your free trial is over, it will ask you to sign up for a paid plan. As long as you followed my directions, you should be able to choose the $5 and $10 plans. But if you did not follow my directions, or there is an issue, reach out to ScrapeOwl support and they can help. Tell them ZimmWriter sent you.

Set New ScrapeOwl API Key

You can click this button to set a new ScrapeOwl API key.

It’s just like setting your OpenAI API key.

You can generate a new key at your leisure and input it using this button.

Temp Disable ScrapeOwl API and Use Your Own IP When Scraping

Not every website needs ScrapeOwl. You definitely need it for Scraping Surgeon, Amazon, YouTube, (and eventually Google). But you might find that you’d rather temporarily disable it and save your credits.

So if you want to try scraping without it, just click this box.

Premium Proxy & Domains Using Premium Proxy

You can choose a premium proxy (I usually always just choose the United States) and then enter one or more domains that you want to use the premium proxy.

A premium proxy uses a residential IP address (instead of a more common datacenter IP address) and therefore can hide its footprint better from some websites with advanced detection.

Scraping a URL that matches a domain in the premium proxy will cost 10 credits. It may cost 25 credits total if that domain is also listed in the render JavaScript box.

Even though it’s not visible, I use a premium proxy for Amazon scraping. I find that scraping usually fails 80% of the time without it.

When might someone need to use a premium proxy? Well, if you have a domain you scrape a lot and find that the scrape is failing, then try entering the domain in the premium proxy box and see if it works.

Credit Cost: 10 (but 25 if using Javascript).

Domains Using Render JavaScript

Some webpages need additional content, such as JavaScript, to load on the page before a scrape is performed. Otherwise the scrape will return useless data.

Even though it’s not visible, I use this option for YouTube scraping. I find that the scraping of YouTube will not work without it.

Again, it’s hard to tell if you need this option or not. But the rule of thumb is when a scrape from a URL is failing a lot, first try adding the URL to this Render JavaScript box. That might solve the issue.

If it doesn’t, then try adding the domain to the premium proxy box. With a bazillion domains out there, you’ve got to test things out to see what works.

Credit Cost: 5 (but 25 if using Premium Proxies). By default, YouTube uses this behind the scenes.

Domains Needing Full URL

Sometimes people enter very long URLs. For example:

https://www.amazon.com/DURATECH-Retractable-Measurement-Construction-Woodworking/dp/B09F92S5VB/?_encoding=UTF8&pd_rd_w=Y5LP9&content-id=amzn1.sym.5f7e0a27-49c0-47d3-80b2-fd9271d863ca%3Aamzn1.symc.e5c80209-769f-4ade-a325-2eaec14b8e0e&pf_rd_p=5f7e0a27-49c0-47d3-80b2-fd9271d863ca&pf_rd_r=5QNJKBVZNFSZSX9Y45V3&pd_rd_wg=Y3hDR&pd_rd_r=a040d524-d647-4595-b207-b97dd10a7540&ref_=pd_gw_ci_mcx_mr_hp_atf_m

But for Amazon, all that is needed is technically:

https://www.amazon.com/dp/B09F92S5VB/

ZimmWriter won’t throw an error if you enter a long domain. Instead, it discards what it thinks are the useless bits and keeps the good stuff.

However, sometimes this discarding determination fails and in reality, you need the entire URL. If you find yourself in that situation then add the domain to the input box.

A good indicator that this might be what you need is when you get an error message box about ZimmWriter not being able to find an H1 on the page, or the page is a 404, and you’ve already tried adding the domain to the other JavaScript and Premium proxy box.

Credit Cost: no additional credit cost because it’s a ZimmWriter feature.

Domains Needing Lazy Loading

Sometimes images and content are lazy loaded, which requires a browser to scroll down the page to load the content. Putting a domain in the Lazy Load section forces JavaScript mode which is 5 credits. The reason I don’t enable lazy loading for the JavaScript section by default is because some sites with infinite scroll will break with lazy loading. So I separated it out as a standalone item.

Credit Cost: it enables JavaScript rendering and sets the lazy loading flag, so there is no additional credit cost above the JavaScript rendering above.

Domains Block Resources Fale

Sometimes domains render content using REACT and without this setting don’t return all of the data. So in those situations, adding a domain to this area can help solve that issue.

Credit Cost: it enables JavaScript rendering and sets the block_resources=false flag so there is no additional credit cost above the JavaScript rendering above.

ranking tactics logo

The information provided on this website is provided for entertainment purposes only. I make no representations or warranties of any kind, expressed or implied, about the completeness, accuracy, adequacy, legality, usefulness, reliability, suitability, or availability of the information, or about anything else. Any reliance you place on the information is therefore strictly at your own risk. Read more in my terms of use and privacy policy. You can also contact me with questions.

Owned and Operated by Revindir LLC
Copyright 2024