andrewducker | Thinking about code

Thinking about code

If I have a method:
int DoSomething(string someStuff)
{
//Do Stuff
return 42;
}
then effectively I have an unnamed variable that gets set by the return statement, yes?

That being the case, wouldn't it be in some ways clearer to have an explicit, named, variable that gets set instead?

int DoSomething(string someStuff)
{
//Do Stuff
returnValue = 42;
}

where "returnValue" is a keyword that's used to return the value.

As it is I frequently end up with code that creates (or sets) a variable at various points through the code just so it can be returned at the end. Making this an explicit part of the language just makes sense to me.

I assume there are languages out there that do this.

Flat | Top-Level Comments Only

VB lets you do both in that you can have:

Function DoSomething(someStuff as String) as Integer
  'Do Stuff
  Return 42
End Function

or:

Function DoSomething(someStuff as String) as Integer
  'Do Stuff
  DoSomething = 42
End Function

where it lets you use the function name as a variable of the same type as your explicit named variable that will be returned.

Edited 2010-02-11 15:19 (UTC)

Aha! Well done to VB. I assume this includes VB.Net

Yeah, VS2003 onwards.

I was going to mention VB. I think the choice of using the function name, while I can see why it makes sense, is unfortunately confusing during the body of the function.

It definitely clashes with C#, which allows you to pass methods around.

Yep, you can ever do this:

Function DoStuff As Integer
   'do stuff
   DoStuff = 4
   DoStuff = DoStuff + 1
End Function

Msgbox CStr(DoStuff())   'Returns "5"

Beat me to it!

I can't decide whether that's silly or perlish.

Ah. Yes. It's silly because if you set $return halfway down your function, you can't tell at the end that you are returning something. Potentially hard to work out what is going on.

I'm not sure what you mean by an unnamed variable being set though... maybe that's a C# thing?

From a low-level point of view you're putting a value _somewhere_ for the calling function to retrieve it from. In .Net that's popping it on the stack, I can't remember how C++ does it - presumably a register. In either case, it's effectively an unnamed variable location.

Why does that matter though?

It means that you effectively _have_ a variable, so why not make it explicit?

If you have code like this:
Messages DoStuffAndReturnMessages()
{
Messages messages = new Messages();
//Stuff
messages.Add("Got here");
//More Stuff
messages.Add("And here");
//Yet more stuff
return messages;
}

then why make me create a "nessages" variables? We know we're going to need one (to return), why not have it exist automatically?

You could be writing in a language which either doesn't need to declare variables for pretty much everything, or can return more than one variable.

e.g. a fibonacci function in Perl.

{
my @fib = (0, 1, 1);
sub fib {
my ($num) = @_;
return if $num < 0 || $num != int($num);
return $fib[$num] if $fib[$num];
return fib[$num] = $fib[$num - 2] + $fib[$num -1];
}
}

No need for a return variable there.

True, but in an environment like my current one, with numerous people all changing the same codebase, static typing is very handy for making sure that someone hasn't broken something obscure.

Oh that's because you're in a silly language that makes you declare variables!

I mean, "Messages messages = new Messages();" -- what the fuck is that meant to say? Gibberish!

I never got on with programming back when I was trying to do VB macros. Then I found perl....

Yep, because

#:: ::-| ::-| .-. :||-:: 0-| .-| ::||-| .:|-. :||
open(Q,$0);while(){if(/^#(.*)$/){for(split('-',$1)){$q=0;for(split){s/\|
/:.:/xg;s/:/../g;$Q=$_?length:$_;$q+=$q?$Q:$Q*20;}print chr($q);}}}print"\n";
#.: ::||-| .||-| :|||-| ::||-| ||-:: :|||-| .:|

Makes *SO* much more sense... ;)

(I have to use Perl at work, and just did a Java module on my Master's course. A pox on both their houses - give me a *proper* language any day...)

Firstly -- the lines starting with # are just comments, surely... oh hang on, you're opening your own script. That's just silly. Second, of course it's illegible -- you haven't spaced it out at all! Third, use the idiom. Don't nest an if directly in a loop; use unless - next to first skip over things you don't care about. Use m// in list context to capture to named variables rather than rely on $1 if you want to use it beyond the immediate syntactic vicinity of the m// (or the s///). You can write crap in any language ;)

That's not *my* code - that was the winner of the 2000 Obfuscated Perl Contest ;)

But, sadly, it looks like some of the scripts I have to debug at work...

That'll be why it's crazy then!!! ;)

I agree, some HORRIBLE things can be written in Perl. Probably more so than in other languages. I still like it though :)

What's so complicated about that line?

I can separate it out into:

MessageList myMessageList;

myMessageList = new MessageList();

Where the first line says that there is a variable called myMessageList of type MessageList, and the second line puts a new MessageList into it. But the combination is more compact.

As to not declaring variables - I'm not giving up the massive benefits of statically typed variables - including IntelliSense, refactoring support and that compiling the code tells me whether I've spelled anything wrong, or got my types confused.

Yes, but it's called the same as its type! Arg! And you are having to repeat the type name -- that's a total violation of DRY, surely.

For that matter, why do you need a type as specific as a message list? Why is that not just an array?

Aaaah, because in the case of MessageList (at work that is) MessageList.Warnings returns just those messages that have a MessageType of Warning, while MessageList.Errors returns just the errors, etc. Utility methods are very handy :->

In C# 3 I can actually say:
var myErrorMessages = new MessageList();
and it will use type inference to work out what type it is at compile time.

Ah, so it's an object rather than just a data type? I can see that would be useful.

You don't have an unnamed variable. You have, in this case, a constant that is passed back to the caller, without any memory allocation.

Now, having syntactic sugar where saying functionname = might be useful in some cases, although the fact that it looks and behaves like a normal variable, but has special behaviour based on the fact that it's called the same thing as the function, is something that would make me uncomfortable: you have to remember to update that variable name if you change the function name, or cut and paste some code from one function to another.

Also, if your function doesn't have a name (in the case of anonymous functions and/or closures), you can't do that. So you have to fall back to the far more obvious and readable return statement.

I think if you've got a number of different return values, that you'll decide according to various criteria, scattered through the code of the function, and you then finally return the value after having done all your processing, then it makes sense to declare a return variable before all of your ifs and switches etc. and then return it.

But I don't see why you'd want the overhead of doing that in all cases.

(I'm coming at this from a Perl, and occasionally Javascript, background, FWIW.)

In this case it's a constant - but it's being put _somewhere_ for the calling function to find.

Why do you care, as the called function? It ends up on the stack or heap, your job is done.

As the calling function, you'll assign the result to a variable of your choice (or not assign it at all, if you can do that sort of thing and you only care whether the function returned e.g. a true value, or you'll pass the value to a switch statement).

I don't care - except that it would be nice to have that location named - and I can't really see the overhead of doing so. I wouldn't name it after the function - I'd have a keyword, so anonymous functions aren't a problem.

So you basically want exactly the same as the return keyword, except you can call it at any time without actually returning. So what if you want to return part-way through your function? Do you now have to use two keywords?

It's nothing like the return keyword. It's simply a reference to a variable.

Let's have another example:
void MyMethod(string input, out string output)
{
output = input + " was processed";
}

That's perfectly valid C# syntax - and works fine. No need for a return - all I have to do is make sure that I set "output" sometime before the end of the method.

What I'd like is for _all_ output variables (including the implicit one) to work that way.

In a decent optimising compiler, a constant value return isn't "put" anywhere except outside the function, where it is used as a precompiled/hardcoded value (which may in itself be used in a context allowing further precompilation reducing runtime overhead). In particular, it may never be put on the stack to be returned, although it may be used to set/initialize a value in the calling context.

As an extreme example, imagine that the function is called in the context

if (DoSomething(x)) {
... stuff
} else {
... other stuff
}

The compiler should never actually store "42" but will instead elide the else branch of the test.

Only if the compiler does interprocedural analysis, which not many do. Alternatively, it could be inlined, but most will only do so if it's in the same compilation unit, and you ramp up the optimiser setting sufficiently.

I was thinking of the "same compilation unit" case, in fact. I was also deliberately ignoring the fact that the obvious reason for returning a constant is the implementation of an interface (C++: of a virtual function) for a family of functions which may return different values, in which case obviously there will be no optimisation when the interface / base class is used.

One variant of this optimisation which is more common, though, is returning an object by value and assigning to a class constructed at that point:

Foo x = somefunction();

For some compilers the object will be constructed once, at the higher level, and there will be no temporary variable of that type in the returned data on the stack (which will be data required to initialize the object -- the difference obviously being that operations in the constructor are called only once).

Lisp and Scheme do something orthogonal, but related; the value returned by a lambda expression is the value of the last expression evaluated in it.

For example (in Scheme), if we have a function:

(define fib
  (let ((ult 0)(pen 1))
    (lambda ()
      (let ((tmp pen))
        (set! pen (+ ult pen))
        (set! ult tmp)
       ult))))

Each time fib is called, the value returned is that of ult (set! returns an unspecified value).

If we wrote this in Common Lisp instead (where setq returns the value used to set the variable) we could write this instead:

(let ((ult 0) (pen 1))
  (defun fib ()
    (let ((tmp pen))
      (setq pen (+ ult pen))
      (setq ult tmp))))

Here, the value returned by fib is the value of the final setq, which is the value of tmp.

Edited 2010-02-11 16:05 (UTC)

Cool - nice example of a different way of doing it. Thanks!

Nobody's mentioned Pascal yet, so: assigning to the name of the function is the way to set the return value in Pascal.

I prefer having multiple return statements. Languages that have this sort of a variable also tend to not allow for early function termination, forcing you to wrap your code in layers of conditionals.

Languages with a return variable:

function monkeys() {
if (foobar) {
return = 42;
} else {
// ...
if (dagnabit) {
return = 13;
} else {
// ...
if (consarnit()) {
return = e^(-i*pi)
} else {
// ...
}
}
}
}
// fall out of function
}

Language with a return statement:

function monkeys() {
if (foobar) return 42;
// ...
if (dagnabit) return 13;
// ...
if (consarnit()) return = e^(-i*pi);
// ...
}

I find the latter preferable.

VB has the "exit" command for exiting loops

For nRow = 1 To nLastRow
If ws.Cells(nRow, nColumn).Address = "$A$7" Then
Debug.Print "Found cell. Exiting for loop."
Exit For
Else
Debug.Print ws.Cells(nRow, nColumn).Address
End If
Next

In some ways I like having the function name as the return variable, since the calling line will have something like

b = f(a) + 4;

and the code will have

int f(int a){
f = 3*a;
}

so mentally f is a function that returns an int *and* is an int. Of course in some languages you'll then have problems if you ask for the address of f (is it a pointer to the nominal variable, or a function pointer?)

And it gets even more fun with recursive calls!

I quite often have code like (vastly simplified)

string center(string astring) {
string mystring;
mystring = "";
if len(astring) > 0 then mystring = "<center>" + astring + "</center>";
return mystring;
}

I'm sure I've used a language whereby, if you don't explicitly return something, it automatically returns whatever the value of the last line/expression is. I have no idea which one that is though!

perl'll do that.

In matlab you declare the name of your result variable in the function definition:

function outvar=func_name(in1, in2)
outvar = in1 + in2;

multiple results are returned like this:
function [out1 out2]=func2(in1, in2)
out1 = in+1;
out2 = in+2;

In Eiffel, you set the result of a function by assigning to a compiler-created variable called Result.

I read this thread because code [like Logarithms] is an abject mystery to me. I keep on believing that one day it'll magically make sense.

nope. not today. Brain is goo, again.

C style coding is one of those things that it took me years to actually get to. It always looked daunting from a distance, but once I actually sat down and played with it it made perfect sense quite quickly.

Flat | Top-Level Comments Only

Thinking about code

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject