Get image from link url?

jynx2234 · December 7, 2018, 12:35pm

I am trying to make a form where a user can submit a link to a product, and display the meta image from that url in a repeating group without asking the user to upload an image. Can this be done?

keith · December 9, 2018, 4:37am

I find this question pretty interesting (even if it is rather basic). I have a video forthcoming where I’ll go into some of this a little deeper (not a full-fledged tutorial, but more of a “here’s how you think about this and here’s what you might explore” kind of thing). With that being said, here’s the text answer:

Sure you can do this. Let’s start by answering the question in the most simple way. Edit mode:

Run mode:

^^^^ The above is a proof of concept. Showing you can snag a URL (specifically, the URL of an image) and display it in an image element dynamically. If we can’t do that, none of the rest of this will work.

So, while being able to display an arbitrary image URL as an image in a Bubble page is necessary, it is not sufficient for that the OP wants to do.

Of course, what the OP seems to want to do is to extract meta images URLs from remote sites.

You might think this is easy-ish. But it’s actually a bit difficult.

Why? In part, because of CORS restrictions (Cross-Origin Resource Sharing). And in part because (even if you could just load any old HTML from any arbitrary site), Bubble doesn’t really have extensive parsing functions (either on the server or client side) at present.

So to get this done, we have to explore other options. And that’s why this question is interesting.

The main approach to this is to utilize some sort of external service (a remote API) to ingest and parse the remote page and pass back the info that we need. (In this case, the URLs to images used as meta image data for the pages.)

I couldn’t find a particularly robust metadata extraction REST API (besides embed.ly, which is costly), so it looks like you have to build this yourself.

So building a little API is helpful. With about 10 minutes of Googling about, I found a very popular NPM package called “metascraper” that is VERY VERY handy. Never knew about this one.

The documentation isn’t great (because it’s kind of no-nonsense), but once you start playing around with it, everything becomes super-obvious. It’s a fairly straightforward library.

However, you can’t just deploy it in a webpage. It’s for nodejs. Good thing I’ve got a lil’ bit of experience with webtask.io, which is a function-as-a-service provider (that’s free for a certain amount of use).

Services like this let you run nodejs functions (relatively) quickly and easily and create what’s often called a microservice. (Another alternative to webtask.io is Amazon Lambda, which is perhaps more flexible, but is harder to set up. So for this proof-of-concept, we’ll just use webtask.)

Again: The reason we can’t just deploy metascraper in a webpage is partly because of CORS, but also, this library doesn’t provide instructions for a browser-based install. We weren’t going to do that, anyway.

So, I did a couple of different versions of this, playing around with basic applications of metascraper and then realized that things like Amazon (and YouTube and other popular things to scrape) have their own extensions for metascraper.

So here’s a really simple use of metascraper, set up in webtask.io… Less than 20 lines of active code, but of course there’s nothing fancy like error handling – this is just a proof of concept:

And here’s our Bubble API Connector setup to call it:

And here’s run mode of an example of the scraper working:

And here’s what that looks like in edit mode:

And here’s the business end of that… insanely simple logic (once you have the API Connector configured):

^^^ the astute observer will note that there’s no database interaction going on here. Everything is just happening in the page. This is pretty cool, and one of the reasons I’m doing a video on these experiments.

Additionally, that video isn’t just for noobs… Thinking about this lead me to think about using Amazon’s Product Advertising API to do this in a more robust way… Turns out a lot of folks here have gotten stuck on how to implement the Amazon Product Advertising API (for good reason – it’s a bit hard to get one’s head around in Bubble), and I’ll show the steps you need to take to get that done. It’s tricky!

keith · December 9, 2018, 4:44am

BTW, here’s a run mode screen shot of the Amazon Product Advertising API in use. The search here is for “stupid gifts”:

Mission accomplished!!!

luke2 · December 9, 2018, 8:35am

Nice, detailed post there Keith. I followed the question in hopes of seeing a well thought out answer and as usual you’ve put a lot of thought into the answer, appreciate this.

Your right, building your own API although more laborious in some aspects is the better way from a price and flexibility point. Services like https://www.import.io/ are really quite expensive even if there is a API plugin available in Bubble’s library.

As an aside, the video would be really helpful for sure - hope you get a chance to produce it.

Cheers

keith · December 9, 2018, 10:44am

Hey Luke! Yeah, I’m going to walk thru a bunch of interesting stuff wrt this question in a video.

Import.io wouldn’t be the right solution for this application at all. That’s more for bulk scraping. To just retrieve meta data from random webpages, there’s no need for stuff like that (esp at $299/mo).

And you know whenever someone says something like “get the image for a product”, you just know they’re talking about AMZN.

And it seems nobody really gets how to access APIs like AMZN’s product search one (which uses a digital signing type of security for API calls… Turns out it’s entirely possible in vanilla Bubble, but just kind of a pain in the ass to set up… especially without a few pointers.)

Also, I did a BUNCH of really cool stuff in this without even touching the database, which I thought was interesting (all of this pretty much just happens in the page without making database things) and there are a bunch of cool tips and tricks along the way. (Finally a good example of what I’m talking about when I say, “construct a list” — it seems people don’t know what I’m getting at when I say that.)

I just didn’t get a quiet moment to do a quick walkthru today. Should have it up tomorrow!)

NigelG · December 9, 2018, 11:03am

Or use …

to get the urls and then …

to resize and check the original size so you aren’t scraping thumbnails.

keith · December 9, 2018, 11:36am

Article URLs are one thing, but the implied use case doesn’t seem to be that. Think about Bubble sites for example… they’re not using structured data this way unless the author is really savvy. And even then, just on a subset of pages. (Like my changelog pages are set up like articles, but everything else just has OG meta stuff.)

The advantage of metascraper is that it’s highly configurable. (Though it too has its origins in parsing blog/article style content.)

Anyway, the ultimate point is that product type pages tend to have their own proprietary meta data conventions. What I found wrt AMZN is that metascraper’s Amazon module is rather easier to use than connecting to the Product Advertising API (Which also requires one to be a formal affiliate partner to get access). Seems this is harder to become than in the past — I realized that I actually had affiliate access from long ago, so am able to compare the two approaches.

All this is probably way beyond what the OP was asking about, but it’s a super interesting topic!

NigelG · December 9, 2018, 11:41am

Yes, that is what the second Algorithmia API doing, as many sites hide their images behind another URL so you can’t just use the first link you come to. So it renders the image on the server, rather than using the linked one.

The first one isn’t just articles, and it seems to work on Amazon.

Scraping images from sites is pretty hard when you want decent looking images.

luke2 · December 10, 2018, 10:59am

Great, looking forward to it - I thought about this concept not long ago so would be interesting to understand a little further.

Yeah I realise Import.io may not be the right tool and frankly overkill - although it would help scrape on URL’s outside of Amazon e.g. for a user to specify any value. For instance it might be handy to scrape from a URL any images on an e-commerce site.

Good point, the DB untouched, just the power of the API with dynamic data utilised there.

keith · December 11, 2018, 2:34am

AS PROMISED: Here’s an hour and 12 minutes or so of me rambling on about interesting things that I did in response to this question. Showing an image URL in an image element, Funny cat pictures, weird Amazon products, building a simple meta data scraper with nodejs, the metascraper NPM library and webtask.io, tips about repeating groups, doing interesting stuff completely in the page and… wait for it… a very long and talky and possibly COMPLETELY confusing explanation of how to configure the NOTORIOUSLY difficult Amazon Product Advertising Search API (https://webservices.amazon.com/scratchpad/index.html)!

wOOT wOOT!

Here it is:

https://vimeo.com/305624479

jynx2234 · December 11, 2018, 9:22pm

Wow Keith you put a lot more into this than I could have expected! Thanks a ton!

edtyli9 · August 29, 2021, 7:31am

Super helpful tutorial. Thanks Keith!

I used the Link Preview plugin for a 5-min solution for those only looking to get the meta image from a link. I created a step-by-step guide. You can also find the Bubble editor so you can copy the elements and workflows directly to your Bubble app.

initialsjz · May 15, 2023, 1:40pm

@edtyli9 please share a link? i searched Nalfe and there is no template? thank you in advance, jenny