Running a very long php script…

I think I mentioned before that I did php on my spare time, sometimes.
I used to work with it on my day work, but not anymore. A friend had requested me to do a clean up on a MySQL table. This table had email text imported from an inbox that to be shown in a forum site. The problem was: the emails had lots of footers with the Original Message, promotions etc and he wanted to clean up the emails before showing them on the website. The php algorithm wasn’t a big deal, just detecting the strings, removing the text from that position to the end of the email and making sure the markup in the email (if it had HTML format) wouldn’t break. We left this last task to Tidy.
Anyways, I had to run the final clean up script on the whole table with about 30MB of information, so I knew the script would be running for quite some time. I had no shell access on the server so I had to run the script from my browser. I had set up some output on the script so it would show what record had been updated and what was the final cleaned up text.
After making a backup of the table, I pointed my FireFox 1.7 to run the clean up script, oh my, what a pain.
First I couldn’t make the script run for more than 60 seconds. I solved that with the following statement:
ini_set(‘max_execution_time’, ‘0’);
Then Firefox started to complain the script was too long and I should cancel it cause it was making FF to run slow and the computer might become unresponsive.
I commented out most of the echo statements and tried again, the script ran longer without the pop up window, but again FF complained 🙁 and eventually the FF window would disappear from the task bar. It was a burden cause I wasn’t able to know where the script had ended.
IE came to save my…situation 🙂 I must confess I had stopped using IE almost completely because I kept being infected with spyware and it would download ActiveX on my back from time to time…but now the relashionship has improved considerably again :). IE never complained and ran the script a lot faster. The task was done!

Smells to refactoring between classes.

Primitive Obsession

Use small objects to represent data such as money (which combines quantity and currency) or a date range object

Data Class

Classes with fields and getters and setters and nothing else (aka, Data Transfer Objects – DTO)

Data Clumps Clumps of data items that are always found together.
Refused Bequest Subclasses don’t want or need everything they inherit.

The Liskov Substitution Principle (LSP) says that you should be able to treat any subclass of a class as an example of that class.

Inappropriate Intimacy

Two classes are overly entertwined.

Lazy Class

Classes that aren’t doing enough should be refactored away.

Feature Envy

Often a method that seems more interested in a class other than the one it’s actually in. In general, try to put a method in the class that contains most of the data the method needs.

Message Chains

This is the case in which a client has to use one object to get another, and then use that one to get to another, etc. Any change to the intermediate relationships causes the client to have to change.

Middle Man

When a class is delegating almost everything to another class, it may be time to refactor out the middle man.

Divergent Change

Occurs when one class is commonly changed in different ways for different reasons. Any change to handle a variation should change a single class.

Shotgun Surgery

The opposite of Divergent Change. A change results in the need to make a lot of little changes in several classes.

Parallel Inheritance Hierarchies A special case of Shotgun Surgery. Every time you make a subclass of one class, you also have to make a subclass of another.

Smells to refactoring. Beautiful code continues…

I just found this good resource with a summary of techniques to identify and refactor smelly code.
It’s a good summary of what you get explained with examples and detailed comments in the Fowler’s book.

The original post is at the java.net wiki SmellsToRefactorings

Smells Within Classes


Comments Should only be used to clarify “why” not “what”.
Can quickly become verbose and reduce code clarity.
Long Method The longer the method the harder it is to see what it’s doing.
Long Parameter List Don’t pass in everything the method needs; pass in enough so that the method can get to everything it needs.
Duplicated Code
Large Class A class that is trying to do too much can usually be identified by looking at how many instance variables it has. When a class has too many instance variables, duplicated code cannot be far behind.
Type Embedded in Name Avoid redundancy in naming. Prefer schedule.add(course) to schedule.addCourse(course)
Uncommunicative Name Choose names that communicate intent (pick the best name for the time, change it later if necessary).
Inconsistent Names Use names consistently.
Dead Code A variable, parameter, method, code fragment, class, etc is not used anywhere (perhaps other than in tests).
Speculative Generality Don’t over-generalize your code in an attempt to predict future needs.

This post will be moved to TNW Wiki and Resources area… with the full list.

I had mentioned in another post I found a refactoring tool for VB.NET 2003 at KnowDotNet. I just wanted to thank their developers for saving the VB.NET 2003 community from doing refactoring manually, and special thanks to Les Smith for correcting some problems and giving a fast support.
For those curious about the tool, the url is here

Beautiful, beautiful, beatiful, beautiful code…

A programmer should take great pride on what she does, coding in a hurry and with a tight deadline often produces unreadable code, it might work, but yikes! who’s gonna be able to maintain it afterwards…Only when you are a zen on identifying code smells you can create beautiful code from the beginning and in a hurry. It takes practice, it’s an art and you don’t become a sensei in a blink.
I think I mentioned before I was embedding myself into the pages of this book:

My programmer’s life has been more fixing someone else’s code than creating code from scratch. No programmer likes to do maintainance, but hey! welcome to real world, code has to be maintained.

I just found this great article on what can be defined as beautiful code. It doesn’t have Fowler’s technical explanations but coming from a mathematician (Matthew Heusser) it’s certainly concise: (taken from the original article at DDJ)

  1. Beautiful Code is readable. Perhaps nothing is worse than trying to maintain software that does everything yet no one understands. This means that our functions shouldn’t be too long and should accept a reasonable number of variables. What’s “reasonable”? Military science (and chess, for that matter) teaches us that most people can only keep 3-7 variables in our head at the same time. When we get beyond that, we lose track of things, and that means we have forgotten things, which means defects. At that point, it’s time to consider breaking our software into smaller chunks.
  2. Beautiful Code is Focused. Code should do one thing, and do it well. All the object oriented theory about model-view-controller is essentially an attempt to separate the user interface from the business logic from the back-end system. Mixing these up limits reuse and makes test automation look more and more like fantasyland. I will just say that we should strive for simplicity and generality–to do one thing and do it well. I would submit, however, that this subject is worth researching.
  3. Beautiful Code is Testable. A well-defined function, given specific input, should have a clear expected output. This makes sense, yet many functions are written with “out of bounds” behaviors that are undefined. For example, consider a function designed to take in a coupon_id and determine if that coupon is valid. You can write the function signature in pseudocode in two different ways:

    • Option 1:
      sub get_eligibility_of_coupon(coupon_id)
      returns boolean;
    • Option 2:
      struct eligibility_type(ok boolean, msg string, good_coupon boolean);
      sub get_eligibility_of_coupon(coupon_id, optional date assume sysdate)
      returns eligibility_type;

      Assume a coupon_id is an integer. What happens in option 1 when you pass in -1? Or, for that matter, any value that isn’t a valid coupon? Or Null? We don’t know. To test option 1, we need to know some coupons that are valid or not valid as of right now. We probably have to write a query.

      For option 2, we can create a list of coupons that are valid as of a certain date, then pass in that date. We can test all kinds of interesting scenarios like “what happens on leap years?” We can provide a different response if the coupon_id was invalid than if the coupon_id was just expired. In short, option 2 is testable. Murphy’s Law applied means that testable code is better code.

  4. Beautiful Code is Elegant. I’ve heard it said that the charlatan makes the simple seem hard to understand, and the genius makes the complex easy to understand. Elegant Code (like recursion) shows up when the complex program has a simple solution. When the specification is three pages and the code implementation is one. It’s hard to grasp “elegance”, but we’ve all seen bloated, bug-ridden code that was not elegant. If you’re like me and can’t clearly define elegant, then my suggestion is simple: When coding, remember buggy, junky code, and create software that is the opposite. Elegant code is powerful code… As Antoine de Saint-Exupiry said so eloquently: “Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away.”

Continues on next post…

a week without bloggin’, tsk tsk

It’s been a busy week indeed. Not busier than before though. I managed to create an installer for the windows service, fine tuned the algorithm, refactored the code with this new cool tool I found netRefactor from KnowDotNet and have kept myself busy in the evenings doing some code for fun…
Kenny and I are trying to start studying for the MCSD cert online, already set up our YMs for that and I hope we don’t procrastinate longer than this weekend, heh.
On the code for fun side :). It’s been a fun project, kind of tricky, reading emails from an IMAP inbox and importing them into a database, handling attachments, the always messy MS Outlook stationeries et-cetera. I’ll talk about it as soon as I get home…yup, a gal have to work 😛

PS. And we got a comment at TNW’s Resources about an Ajax debugger, can’t wait to try it to finally set up TNW’s home page, Google won’t score us in the first page yet, so we need to improve that “under const” space, hopefully with some Ajax stuff 😀

The windows service works, let’s go party!

TGIF!
I’ve spent two weeks in a row working on this part of the project. We have an old database that is internal to our intranet and want to publish part of its data online, the old database is still in use. For publishing the data we should feed an online MSSQL DB. Due to the fact that the old database is offline and on an intranet the sync process has to run on a machine inside the intranet. There were three possible paths for this task:

  1. Use a DTS package running on the MSSQL every certain time, probably midnight, SQLAgent would deal with scheduling/running the job. The main drawbacks for this option was the offline old database, that we still had to convert/translate some datatypes between databases and the internal staff had no access to run the DTS.
  2. Use a DTS package and make it run from a client machine with a dtsrun utility (worse case scenario SQL-DMO). Still the DTS package would have to translate datatypes, the server would have to have access to the old DB datasource, still the DTS would have to be invoked from/wrapped into an application of some sort (windows app/service/console app)…
  3. Use some client application (probably windows service) that would do the sync process at midnight and also on demand using ADO.NET.

We chose the third option, it’s probably not the most efficient way, to use ADO.NET over DTS, but the fact that the internal staff needed to run the sync process on demand and to receive notifications of success or failure plus the location of the old database, made us think this was a way to go.
There was actually a huge thread on DOTNET-WINFORMS dicussion list at Develop.com regarding architectural issues, I was almost convinced to give up on the windows service implementation and make a DTS package, I might give it a try later with the help of the sysadmin. Creating the DTS is not the issue here but giving a client machine/user access to run the DTS package on demand. You can see the thread Changing the DataRow.RowState Property

Anyway, I found a good article on how to create and debug a windows service at OnDotNet.com
Developing Windows Services
and got the chance to code a windows form wrapper for the service. The service is showing a notification Icon on the systray, trying to emulate the MSSQL Server Service Manager’s behavior.

Right now I’m having issues accessing the configuration file from the dll that has the service. The rest of the code works :). I just googled and found a discussion thread about this
Config Files for DLLs
and I’ll give it a try on Monday.
TGIF, let’s go party!

The Ideal Programmer

I forgot the exact source of this paper. I believe it’s used at UCLA, to impress junior students.

What an ideal programmer should be/do/have done:
(1) Read a manual
(2) Written a program from scratch on your own
(3) Written an application used by more than 10
people
(4) Written application code that was used (is being
used) for 3 consecutive years
(5) Documented your code
(6) Published your documentation (for posterity)
(7) Written code and shared it with another
organization
(8) Use consistent naming conventions within your
code
(9) Use a database
(10) Use source control (any form of history of
previous versions of code)
(11) Have separate development and production
environments
(12) Have separate development, test, and production
environments
(13) Have done a security analysis of your application
and environment
(14) Designed and tested a disaster recovery plan for
your application

The bitter-sweet taste of open source

From an essay of Paul Graham: (Original essay here)

That’s why the business world was so surprised by one lesson from open source: that people working for love often surpass those working for money. Users don’t switch from Explorer to Firefox because they want to hack the source. They switch because it’s a better browser.

It’s not that Microsoft isn’t trying. They know controlling the browser is one of the keys to retaining their monopoly. The problem is the same they face in operating systems: they can’t pay people enough to build something better than a group of inspired hackers will build for free.

I’m a professional programmer, eventhough I use open source to do ocassional jobs and I certainly enjoy the work in the open source community, I do tend to agree with Joel Spolsky on an interview he offered on the radio about the different types of software licenses and software economy in general. Open Source take us, professional programmers, to the edge. Here’s the original interview in mp3

Here’s a good thread on Open Source vs Commercial

The point is, Open Source value comes from Darwinian selection and thousands of volunteers all over the world, who are willing to do things for free, who are, some of them amateurs, that want to try how far they can go and who offer their code for free. The only thing Commercial software might have in its defense is Support and sometimes you get good quality support on open source forums.

How can software companies survive open source? I frankly don’t know.

Reg exps, they’ve always been my favs

Today I had a busy/kind of caotic day at work, not much of coding, just preparing excel files out of the SQL Server tables to hand to a translation team. It was soo time consuming and all I wanted was to get back to programming.
Anyways, feeling a little bit frustrated today, I arrived home and remembered I had this job a friend wanted me to do to filter some footers from email messages, I had to remove the original messages from the text.

My target phrases would be like the following ones:

On Tue, 2 Aug 2005 16:14:10 -0500 "Eric Marr"
<northland@charterinteret.cam> writes:

--- <fe.sola@infom.cu> wrote:

On 8/1/05, Belayer via sql-l
<sql-l@groups.ittoolbo.cem> wrote:

So it definetely cried out for regular expressions.
I downloaded the best tool for testing reg exps.
The freeware Regex Coach
My thumbs up for this tool,
I'll donate some money to it as soon I as clean up my CC, heh.

I came up with some reg exps that needs some extra tuning,
they are here on this text file
(Blogger complains of script injection, hehe)
Text file with RegExps

The php function for the final clean up
might come in a following post.
Cheers,
PG