Ruby Mechanize: 5 Common Errors (and How to Fix Them)
Here I am, blabbing on about Ruby’s Mechanize gem again. I decided to compile a short list of the most common problems I’ve seen in my long (not that long), storied career as a renegade data miner.
HTTPS/SSL Errors
This error commonly presents itself with something like:SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)
Which is incredibly ugly/scary for anyone new to Mechanize. But, there’s a simple fix:
Ok, ok.. the truth is you don’t want to use this in production code that depends on secure communication, but if you just wanna get all that juicy data in a pinch — try it out.
User Agent Blocked
This one is harder to diagnose, but if you find yourself getting empty response bodies or anomalous/weird errors back from the server, do yourself a favor and punch this in:
You can, of course, choose your own UA as well. This is just more fun.
Following Redirects
This one is weird as well. Sometimes you get a 404 error, and sometimes just a chunk of HTML saying something like This page has moved to..
Another obvious indicator is a 301 response code in the headers. I wrote a previous article talking about using Burp Suite for web scraping, which you should totally check out if you wanna really analyze that browser traffic.
Oh yeah, the fix(es):
Oh no! My Request Timed Out..
Self explanatory, probably, but this is where Mechanize tries to request a resource but doesn’t get a response within the allotted time. So, we just need to increase that allotted time:
And that’s pretty much it for that one.
Are we speaking the same language?
This one is important, and I guarantee you will run into this issue over and over again. You’re speaking JSON, but the server only wants URL Encoded crap. If you’re sending a request that you just know should be understood, but the server keeps crying or denying your request — try a new content type!
And yes, there are many types to try, but I’ve only seen a few in the wild.
Conclusion
The article is over, that’s the conclusion. I’m sure there are many more errors I could list here, but these are the ones that hurt me so often I memorized them.
Happy coding and whatnot.