The "Last Line Effect" in Programming
Category Programming Techniques
My name is Andrey Karpov, and I have studied hundreds of errors caused by "copy-paste." It is a certainty that programmers often make mistakes in the last segment of a long block of code. It seems that no programming books have discussed this phenomenon, so I decided to write about it myself. I call it the "Last Line Effect."
The Last Line Effect
When programming, programmers often need to write a series of similar structures. Typing each line manually is tedious and inefficient. This is why they use the "copy-paste" method: a piece of code is copied and pasted several times, then modified. Everyone knows the downside of this: it's easy to forget to modify something after pasting, leading to issues. Unfortunately, there often isn't a better method.
So, what pattern did I discover? I found that errors often occur in the last pasted block of code.
Here is a brief example:
inline Vector3int32& operator+=(const Vector3int32& other) {
x += other.x;
y += other.y;
z += other.y;
return *this;
}
Notice this line: "z += other.y;". The programmer forgot to replace 'y' with 'z'.
You might think this is a hypothetical example, but it actually comes from a real application. Next, I will show you that this is a common error. Programmers frequently make this mistake at the end of a series of similar operations.
I've heard that climbers often fall in the last few meters. It's not because they are tired, but because they are too excited about reaching the end, they become careless and slip. I suspect programmers are similar.
Now, let's look at some data.
After studying the database, I identified 84 code segments generated by "copy-paste." In 41 of these segments, the error occurred in the middle of the pasted blocks. For example:
strncmp(argv[argidx], "CAT=", 4) &&
strncmp(argv[argidx], "DECOY=", 6) &&
strncmp(argv[argidx], "THREADS=", 6) &&
strncmp(argv[argidx], "MINPROB=", 8)) {
The "THREADS=" string is 8 characters long, not 6.
In the other 43 segments, the error occurred in the last pasted block.
Of course, 43 is not much more than 41. But note that a program may have many similar code blocks, so errors can occur in the first, second, fifth, or even tenth block. Therefore, in other blocks, we have a relatively even distribution, while the last block has a peak.
On average, there are 5 similar code blocks.
So, the first 4 blocks have 41 errors evenly distributed, averaging 10 errors per block.
However, the last block has 43 errors!
The following distribution chart highlights this phenomenon:
Distribution Chart of Errors in Five Similar Code Blocks
Thus, we can conclude a rule:
The probability of errors in the last pasted block is 4 times higher than in other blocks.
This rule may not be universally applicable. It's an interesting finding, and its practical value is to remind you to stay vigilant when writing the last block.
Examples:
Next, I will demonstrate that this is not just my imagination but a real trend. Here are some examples.
Of course, I won't list all examples, only simple and representative ones.
Source Engine SDK
inline void Init( float ix=0, float iy=0,
float iz=0, float iw = 0 )
{
SetX( ix );
SetY( iy );
SetZ( iz );
SetZ( iw );
}
The last line should be SetW().
Chromium
if (access & FILE_WRITE_ATTRIBUTES)
output.append(ASCIIToUTF16("\tFILE_WRITE_ATTRIBUTES\n"));
if (access & FILE_WRITE_DATA)
output.append(ASCIIToUTF16("\tFILE_WRITE_DATA\n"));
if (access & FILE_WRITE_EA)
output.append(ASCIIToUTF16("\tFILE_WRITE_EA\n"));
if (access & FILE_WRITE_EA)
output.append(ASCIIToUTF16("\tFILE_WRITE_EA\n"));
break;
The last two lines are identical.
ReactOS
if (*ScanString == L'\"' ||
*ScanString == L'^' ||
*ScanString == L'\"')
Multi Theft Auto
class CWaterPolySAInterface
{
public:
WORD m_wVertexIDs[3];
};
CWaterPoly* CWaterManagerSA::CreateQuad (....)
{
....
pInterface->m_wVertexIDs [ 0 ] = pV1->GetID ();
pInterface->m_wVertexIDs [ 1 ] = pV2->GetID ();
pInterface->m_wVertexIDs [ 2 ] = pV3->GetID ();
pInterface->m_wVertexIDs [ 3 ] = pV4->GetID ();
....
}
The last line is redundant code due to habitual pasting. The array size is 3.
Source Engine SDK
intens.x=OrSIMD(AndSIMD(BackgroundColor.x,no_hit_mask),
AndNotSIMD(no_hit_mask,intens.x));
intens.y=OrSIMD(AndSIMD(BackgroundColor.y,no_hit_mask),
AndNotSIMD(no_hit_mask,intens.y));
intens.z=OrSIMD(AndSIMD(BackgroundColor.y,no_hit_mask),
AndNotSIMD(no_hit_mask,intens.z));
The programmer forgot to change "BackgroundColor.y" to "BackgroundColor.z" in the last line.
Trans-Proteomic Pipeline
void setPepMaxProb(....)
{
....
double max4 = 0.0;
double max5 = 0.0;
double max6 = 0.0;
double max7 = 0.0;
....
if ( pep3 ) { ... if ( use_joint_probs && prob > max3 ) ... }
....
if ( pep4 ) { ... if ( use_joint_probs && prob > max4 ) ... }
....
if ( pep5 ) { ... if ( use_joint_probs && prob > max5 ) ... }
....
if ( pep6 ) { ... if ( use_joint_probs && prob > max6 ) ... }
....
if ( pep7 ) { ... if ( use_joint_probs && prob > max6 ) ... }
....
}
The programmer forgot to change "prob > max6" to "prob > max7" in the last condition.
SeqAn
inline typename Value<Pipe>::Type const & operator*() {
tmp.i1 = *in.in1;
tmp.i2 = *in.in2;
tmp.i3 = *in.in2;
return tmp;
}
SlimDX
for( int i = 0; i < 2; i++ )
{
sliders[i] = joystate.rglSlider[i];
asliders[i] = joystate.rglASlider[i];
vsliders[i] = joystate.rglVSlider[i];
fsliders[i] = joystate.rglVSlider[i];
}
The last line should use rglFSlider.
Qt
if (repetition == QStringLiteral("repeat") ||
repetition.isEmpty()) {
pattern->patternRepeatX = true;
pattern->patternRepeatY = true;
} else if (repetition == QStringLiteral("repeat-x")) {
pattern->patternRepeatX = true;
} else if (repetition == QStringLiteral("repeat-y")) {
pattern->patternRepeatY = true;
} else if (repetition == QStringLiteral("no-repeat")) {
pattern->patternRepeatY = false;
pattern->patternRepeatY = false;
} else {
//TODO: exception: SYNTAX_ERR
}
The last block is missing 'patternRepeatX'. The correct code should be:
pattern->patternRepeatX = false;
pattern->patternRepeatY = false;
ReactOS
const int istride = sizeof(tmp[0]) / sizeof(tmp[0][0][0]);
const int jstride = sizeof(tmp[0][0]) / sizeof(tmp[0][0][0]);
const int mistride = sizeof(mag[0]) / sizeof(mag[0][0]);
const int mjstride = sizeof(mag[0][0]) / sizeof(mag[0][0]);
'mjstride' is always equal to 1. The last line should be:
const int mjstride = sizeof(mag[0][0]) / sizeof(mag[0][0][0]);
Mozilla Firefox
if (protocol.EqualsIgnoreCase("http") ||
protocol.EqualsIgnoreCase("https") ||
protocol.EqualsIgnoreCase("news") ||
protocol.EqualsIgnoreCase("ftp") || <<<---
protocol.EqualsIgnoreCase("file") ||
protocol.EqualsIgnoreCase("javascript") ||
protocol.EqualsIgnoreCase("ftp")) { <<<---
The final "ftp" is suspicious; it has already been compared earlier.
Quake-III-Arena
if (fabs(dir[0]) > test->radius ||
fabs(dir[1]) > test->radius ||
fabs(dir[1]) > test->radius)
The value of dir[2] was forgotten to be checked.
Clang
return (ContainerBegLine <= ContaineeBegLine &&
ContainerEndLine <= ContaineeEndLine &&
(ContainerBegLine != ContaineeBegLine ||
SM.getExpansionColumnNumber(ContainerRBeg) <=
SM.getExpansionColumnNumber(ContaineeRBeg)) &&
(ContainerEndLine != ContaineeEndLine ||
SM.getExpansionColumnNumber(ContainerREnd) >=
SM.getExpansionColumnNumber(ContainerREnd)));
In the last block, the expression "SM.getExpansionColumnNumber(ContainerREnd)" is being compared to itself.
MongoDB
bool operator==(const MemberCfg& r) const {
....
return _id==r._id && votes == r.votes &&
h == r.h && priority == r.priority &&
arbiterOnly == r.arbiterOnly &&
slaveDelay == r.slaveDelay &&
hidden == r.hidden &&
buildIndexes == buildIndexes;
}
The programmer forgot the "r" at the last line.
Unreal Engine 4
static bool PositionIsInside(....)
{
return
Position.X >= Control.Center.X - BoxSize.X * 0.5f &&
Position.X <= Control.Center.X + BoxSize.X * 0.5f &&
Position.Y >= Control.Center.Y - BoxSize.Y * 0.5f &&
Position.Y >= Control.Center.Y - BoxSize.Y * 0.5f;
}
In the last line, the programmer made two mistakes. First, ">=" should be changed to "<=", and second, the minus sign should be changed to a plus sign.
Qt
qreal x = ctx->callData->args[0].toNumber();
qreal y = ctx->callData->args[1].toNumber();
qreal w = ctx->callData->args[2].toNumber();
qreal h = ctx->callData->args[3].toNumber();
if (!qIsFinite(x) || !qIsFinite(y) ||
!qIsFinite(w) || !qIsFinite(w))
In the last qIsFinite, the parameter passed should be 'h'.
OpenSSL
if (!strncmp(vstart, "ASCII", 5))
arg->format = ASN1_GEN_FORMAT_ASCII;
else if (!strncmp(vstart, "UTF8", 4))
arg->format = ASN1_GEN_FORMAT_UTF8;
else if (!strncmp(vstart, "HEX", 3))
arg->format = ASN1_GEN_FORMAT_HEX;
else if (!strncmp(vstart, "BITLIST", 3))
arg->format = ASN1_GEN_FORMAT_BITLIST;
The string "BITLIST" has a length of 7, not 3.
That's enough. The examples I've given should be sufficient to illustrate the issue, right?
Conclusion
This article tells you that the "copy-paste" method is four times more likely to fail in the last pasted code block than in other blocks.
This is related to human psychology, not technical proficiency. The article demonstrates that even top programmers in projects like Clang or Qt can make such mistakes.
I hope this discovery is helpful to programmers and might encourage them to study our bug database. I believe this could help in finding new patterns in these errors and formulating new programming advice.
This article is reprinted from: http://www.vaikan.com/the-last-line-effect/
** Click to Share Notes
-
-
-