andrewducker: (overwhelming firepower)
andrewducker ([personal profile] andrewducker) wrote2011-11-24 08:07 am

Anyone here know much about Java?

I have user input for a URL field. I want them to be able to enter anything from "http://andrewducker.wordpress.com/xmlrpc.php" to "andrewducker.wordpress.com" and be able to end up at the same end point.

I've wasted a couple of hours messing around with the various constructors for URL and not got to anywhere satisfactory, should I just do string checking and construct it myself?

I should make it clear - I always want the /xmlrpc.php bit to be what's on the end of the URL, that's a Wordpress standard, so I don't need to do any complex discovery. I just need to append that if it's not there.

I was hoping that someone would have written a class that could append bits of URLs together, but the basic stuff in the built in URL class doesn't quite cut it.
pseudomonas: per bend sinister azure and or a chameleon counterchanged (Default)

[personal profile] pseudomonas 2011-11-24 01:16 pm (UTC)(link)
Can you do all the manipulations as Strings and then just convert it to URL at the end?
pseudomonas: per bend sinister azure and or a chameleon counterchanged (Default)

[personal profile] pseudomonas 2011-11-24 05:47 pm (UTC)(link)
Ah, right. I only follow you here.

[identity profile] bugshaw.livejournal.com 2011-11-24 08:13 am (UTC)(link)
Would "Enter your Wordpress domain" work?
drplokta: (Default)

[personal profile] drplokta 2011-11-24 09:15 am (UTC)(link)
Regardless of programming language, that sounds to me like a job for a single regular expression -- something like (untested): s/(htttp\:)*(\/)*(.*)(/(xmlrpc(\.)*(php)*)*)*/\3/
drplokta: (Default)

[personal profile] drplokta 2011-11-24 09:39 am (UTC)(link)
The regular expression should extract the actual domain name with the optional extraneous bits removed, so you can then prepend "http://" and append "/xmlrpc.php" and you're done.

[identity profile] skington.livejournal.com 2011-11-24 09:52 am (UTC)(link)
Sounds like a series of regular expressions is your best bet, then. In Perl, that would be e.g.

if ($url !~ m{^ https?:// }x) {
$url = 'http://' . $url;
}
if ($url !~ m{ /xmlrpc\.php $ }x) {
$url .= '/xmlrpc.php';
}

And then use your standard libraries to check whether that URL is valid or not. (Validation step might just be to call the resulting "url" and see if it works.)

[identity profile] pozorvlak.livejournal.com 2011-11-24 09:54 am (UTC)(link)
The https issue is easy to fix: add s? to tge regex after http. The other issue is handled already - the .* is greedy, and consumes as many characters as it can.

Of course, we're now up to two edge cases with no guarantee we've thought of them all, which is the usual weakness of regexp-based approaches.

[identity profile] pozorvlak.livejournal.com 2011-11-24 01:16 pm (UTC)(link)
Typically character-index based solutions have all of the brittleness and none of the readability of rexexp-based solutions, but in this case I think your algorithm makes sense.

[identity profile] pozorvlak.livejournal.com 2011-11-24 01:17 pm (UTC)(link)
*regexp
drplokta: (Default)

[personal profile] drplokta 2011-11-24 09:52 am (UTC)(link)
I think you'll need to add an (s)* to cope with the https possibility.

[identity profile] pozorvlak.livejournal.com 2011-11-24 09:56 am (UTC)(link)
I for one find your use of * in place of ? surprising. A hangover from pre-PCRE regexes?
drplokta: (Default)

[personal profile] drplokta 2011-11-24 10:01 am (UTC)(link)
I just simplify matters for myself by not bothering to remember what ? does unless I actually need it. In this case, it doesn't really matter if something like "http:http://" gets stripped off the beginning or "xmlrpc.phphp" gets stripped off the end, so I didn't bother. If it was important to match only a single occurrence, I'd look it up again. Yes, most of the *s can probably be ?s, except in the "(\/)*" bit where * is needed.

[identity profile] pozorvlak.livejournal.com 2011-11-24 09:48 am (UTC)(link)
I spot one bug: "htttp" instead of "http".

[identity profile] strawberryfrog.livejournal.com 2011-11-24 09:39 am (UTC)(link)
If that was c#, I would not use a regex - I'd
1) construct a Uri from the string
2) read off some of the properties of the uri. Something like uri.Scheme + uri.Host + "/xmlrpc.php" should be what you want (unless port numbers are involved).
Edited 2011-11-24 09:39 (UTC)

[identity profile] strawberryfrog.livejournal.com 2011-11-24 09:50 am (UTC)(link)
Good point. That is in the "AbsolutePath" property but you might have to massage it - remove everything after the last slash and substitute "xmlrpc.php"