andrewducker: (Default)
andrewducker ([personal profile] andrewducker) wrote2011-09-02 01:14 pm

Question for the geeks in the crowd

Okay, so if I want something to scrape an RSS feed and turn it into a daily
LJ/DW post, how hard is that going to be? Anyone got something handy that
I can kick off on a daily basis?

Or a service they could recommend?
birguslatro: Birgus Latro III icon (Default)

[personal profile] birguslatro 2011-09-03 07:45 am (UTC)(link)
I wrote something like that in [gulp] 2004, but it was done because the blog in question didn't have an RSS feed. Obviously not of use to you (unless you want to learn REBOL). And I wouldn't have a clue if the method used there to post to LJ would still work. Still, REBOL's parse is excellent for sifting data out of text files.

[identity profile] fub.livejournal.com 2011-09-02 12:53 pm (UTC)(link)
I don't know of any tools that do it -- I'd roll my own based on Python, and add an entry to the crontab on my desktop. But that may not be a plausible solution for you.

[identity profile] hawkida.livejournal.com 2011-09-02 04:04 pm (UTC)(link)
I'll invite you to ifttt.com, you might be able to do it via post-by-email using that, not sure.

[identity profile] johnbobshaun.livejournal.com 2011-09-02 05:13 pm (UTC)(link)
Something with Yahoo Pipes?

[identity profile] johnbobshaun.livejournal.com 2011-09-02 06:14 pm (UTC)(link)
Not hard at all. I've used it for a couple of bits and pieces myself.

[identity profile] johnbobshaun.livejournal.com 2011-09-02 06:15 pm (UTC)(link)
Especially if you don't need to faff around with the BigTable datastore.

[identity profile] drjon.livejournal.com 2011-09-03 01:30 am (UTC)(link)
If you get a good answer, I'd certainly be interested. I researched the question some time ago, but didn't come up with a good answer.
nameandnature: Giles from Buffy (Default)

[personal profile] nameandnature 2011-09-03 12:52 pm (UTC)(link)
I run it once a day from a cron job. When it runs, it gathers new items from the feed and generates HTML for them. It posts the accumulated HTML at the point where it's been a while since the last post, or when there are more than N items gathered since the last post (I've messed about with the exact figures, but the script should make it obvious what you need to tweak to adjust that).

So, when there are more than N, it posts all of them at the point where it notices, and it has a chance to notice every time it runs.

If you use it, you might want to take out the hack that means it never mentions religion in the post titles and you'll need to tell it your password a different way (mine comes from an XML file I use to configure the backup tool).
nameandnature: Giles from Buffy (Default)

[personal profile] nameandnature 2011-09-03 12:53 pm (UTC)(link)
Running from GAE should just be a matter of replacing the pickle/unpickle with use of whatever backend GAE has, as ISTR there's no filesystem on GAE.

[identity profile] johnbobshaun.livejournal.com 2011-10-29 10:38 am (UTC)(link)
Good to know. I take it that Objectify is an ORM-style tool? Like Hibernate? Or whatever the .NET equivalent is?

TBH, everything I've used GAE for has had such simple data requirements that I've never needed to dig into that side of things. But I've used MongoDB when dicking around with Rails and am guessing that it's reasonably similar.