andrewducker | Question for the geeks in the crowd

Crossposts: http://andrewducker.livejournal.com/2517831.html

Question for the geeks in the crowd

Okay, so if I want something to scrape an RSS feed and turn it into a daily
LJ/DW post, how hard is that going to be? Anyone got something handy that
I can kick off on a daily basis?

Or a service they could recommend?

Threaded | Top-Level Comments Only

I wrote something like that in [gulp] 2004, but it was done because the blog in question didn't have an RSS feed. Obviously not of use to you (unless you want to learn REBOL). And I wouldn't have a clue if the method used there to post to LJ would still work. Still, REBOL's parse is excellent for sifting data out of text files.

I don't know of any tools that do it -- I'd roll my own based on Python, and add an entry to the crontab on my desktop. But that may not be a plausible solution for you.

I'm tempted to do something like that.

Just not sure when I'll have the time.

I'll invite you to ifttt.com, you might be able to do it via post-by-email using that, not sure.

Thank you!

It can't do it at the moment (it doesn't have an "all the entries on the feed in the last 24 hours" option), but I've sent them a suggestion. Cheers!

Something with Yahoo Pipes?

I considered that, but last time I played with Pipes it didn't refresh very well.

I'm thinking I could write something for Google App Engine if I can get a few hours this weekend. How hard can it be?

Not hard at all. I've used it for a couple of bits and pieces myself.

Especially if you don't need to faff around with the BigTable datastore.

http://www.noctua.org.uk/paul/software/lj-minifeed.html

If you get a good answer, I'd certainly be interested. I researched the question some time ago, but didn't come up with a good answer.

Perfect. How often do you run it, and what does it do when there are more than 10?

Never mind. Looks entirely usable. I'll get it set up over the weekend.

And I may use it for the basis of something on GAE, if I have the time.

I run it once a day from a cron job. When it runs, it gathers new items from the feed and generates HTML for them. It posts the accumulated HTML at the point where it's been a while since the last post, or when there are more than N items gathered since the last post (I've messed about with the exact figures, but the script should make it obvious what you need to tweak to adjust that).

So, when there are more than N, it posts all of them at the point where it notices, and it has a chance to notice every time it runs.

If you use it, you might want to take out the hack that means it never mentions religion in the post titles and you'll need to tell it your password a different way (mine comes from an XML file I use to configure the backup tool).

Running from GAE should just be a matter of replacing the pickle/unpickle with use of whatever backend GAE has, as ISTR there's no filesystem on GAE.

Currently hacking around with it. Trying to get the XMLRPC stuff I need for livejournal posting into it - but need to work around the fact that that makes http get requests itself, which Google doesn't allow.

Getting there though.

Turns out I do! But it's not that hard if you use a library like Objectify to do all the hard work for you.

Oh, I totally ripped off your idea of putting the tags into the header :->

Good to know. I take it that Objectify is an ORM-style tool? Like Hibernate? Or whatever the .NET equivalent is?

TBH, everything I've used GAE for has had such simple data requirements that I've never needed to dig into that side of things. But I've used MongoDB when dicking around with Rails and am guessing that it's reasonably similar.

NHibernate :->

It's the hibernate equivalent for BigTable. Allows you to just put objects into the datastore and get them back. And because BigTable is basically just a key/value store it works incredibly simply.

You can do stuff like this:
Car porsche = new Car("2FAST", "red");
ofy.put(porsche);

and then retrieve it with:
Car car = ofy.query(Car.class).filter("vin", "2FAST").get();

Threaded | Top-Level Comments Only

Question for the geeks in the crowd

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject