Monday, June 27, 2005

YAPC 2005 Day One

This week I'm attending the North American YAPC (Yet Another Perl Conference) in Toronto. This is my first YAPC, since I didn't make it to Buffalo last year.

It's an interesting display of organizational prowess, so far. Last night the schedule said we could register between 22:00 and 23:00, but when we arrived there were no registration personnel to be found. The conference organizers present had no keys to the room with the nametags and t-shirts. The next morning, when we returned to register, one of the staffers put out a call for a video camera to record the proceedings (or at least the keynote).

We're now into the "opening ceremonies", scheduled for 9:00 but beginning at 9:30. Glad it's not me trying to organize this thing--I'd be frazzled silly.

Keynote: Larry Wall

Larry opened his remarks with a picture of the Golden Gate Bridge, and described his talk as "bridges and other things". His focus was on the idea of building communities, particularly of course the Perl community. He suggested that the right questions for OSS authors to ask are:

        * Who will naturally be interested in my project?
        * Who are we accidentally/purposely excluding from our project?
        * Who should lead/follow/contribute?

        * What is the goal of our community?
        * What can people contribute?
        * What are the community's rules and structure?
        * What's in it for the volunteers?

        * Where will the community meet in cyber/physical space?
        * Where can sub-communities form, either by design or spontaneously?

        * When is it too soon to form a community?
        * When does the community reach a "tipping point"?
        * When is it time to form sub-communities?
        * When is the right time to fork?
        * When are we done?

        * Why do we really want a community?
        * Why do people join and leave the community?
        * Why do people fight and stop fighting?

        * How do we do it?

On the question of when to form a community, Larry remarked that in the first days of Perl, users proposed that Larry start a Perl newsgroup. He resisted this request for about a year, because he wanted Perl users to infest the shell-users' newsgroups. As a result, Perl users promoted Perl by injecting Perl-ish solutions in addition to shell-ish ones.

Concerning the question of our real motivations for starting a community, Larry remarked that the desire for a large pool of free labor isn't a very good motivation for community building.

Waxing philosophical, Larry invoked the idea of "tensegrity", or "tensional integrity". The idea is that a stable structure results from balancing the forces that "pull" and the forces that "push". He accompanied this with several pretty pictures involving rods and rubber bands. By contrast, he suggested that the "geek" community resembles a big pile of rocks (geeks)--disorganized, formless, and without opposing forces (each member autistic to one degree or other). The linux community might be represented as a few stone pillars (distributions) around a beach, with Linus at the top, and where of course the users are dirt. A militaristic model would be a single tall tower--stratified, in which pecking order is the foremost consideration.

A more dynamic community exhibiting "tensegrity" involves both pushers and pullers, and requires us to grapple with seeming contradictions. The result of such a dynamic is "larger structures that don't fall down". Among the "contradictions" that must be reconciled to build a community are:

        1. People are naturally good / bad.
        2. People love / hate new ideas.
        3. People love / hate outsiders.
        4. People should all be alike / different.
        5. People should / shouldn't be in charge of others.
        6. It should be easy / hard to break into the community.
        7. We should / shouldn't try hard to keep people in.
        8. Specialization is good / bad.
        9. People volunteer for altruistic / selfish reasons.
        10. We do / don't need a benevolent dictator.
        11. Larry Wall is important / unimportant. Or more to the point, he's a genius / idiot.
        12. Modernism is good / bad.

From there he rambled off into a discussion of "natural communities" from a darwinian perspective, invoking notions like a "large gene pool", "speciation", "range of variation", etc. He suggested that part of what's keeping Perl 6 is the effort to ensure that it's modeled on natural communities that thrive, rather than those that go extinct. For example, the Perl 5 community is seen by Larry as neither sufficiently unified nor sufficiently diverse.

        A community needs to share a set of core values, but also to allow honest differences on the periphery.

A technological solution to these set of problems both is and isn't possible. On the positive side, Perl 6 will have a finer-grained extension mechanism. Among other things, scoping will be clarified and cleaned up. A CPAN-like repository can provide a gene pool. We can separate combatants, if we can convince them to join separate mailing lists. Technologically, we can at least provide enough mailing lists for hostile tribes to coexist peaceably.

On the negative side, people are still basically irrational. One way to mitigate this is to look for cheerleading opportunities. We can try to tolerate differences within the community, but it ain't easy. We want to encourage and discourage cultism. We want to "have fun", but we can't always. Sometimes building a community involves submitting to crucifixion.

Allison Randal: The State of the Carrot

A "carrot" is what you get when you cross a camel with a parrot. Allison read a parody based on "The Hunting of the Snark".

Over the past year Perl 5 has experienced a bunch of fixes and optimizations. More interestingly, reverse sort no longer uses an intermediate list, which improves performance. Some setuidperl exploits have been fixed. PLEASE stop using setuidperl. Another -Dusersitecustomize option permits customization of @INC using a site customization script.

On the Perl 6 front, there are many pieces.

At the bottom is Parrot, the VM for Perl. There was the parrot 0.1.1 release last october, including incremental garbage collection and a "make install" target. Parrot has moved to subversion in version 0.2.0. Current version is 0.2.1.

Next is Ponie, the Perl 5 compatibility layer. Snapshot 4 was released today. Ponie work has benefited Perl 5 as well, because improvements made in Ponie are being back-ported to Perl 5.

Pugs is a Perl 6 prototype. Some 80-90% of the Perl 6 semantics have been implemented already. It's currently written in Haskell, but ultimately it should be written in Perl 6.

Allison also reported some things about the funding of the Perl foundation, Perl Mongers and Perl Monks. There's a new Perl logo (a pearl onion) that can be used without legal encumbrance, because O'Reilly owns the camel.

Session 1: The Tester's Toolkit

Pete Krawczyk opened with the usual ra-ra in favor of automated regression testing: tests supplement documentation; they facilitate bug reports; reduce maintenance costs, etc.

A "test" is a perl program with extra modules, that reports actual versus expected results. Tests are usually invoked via a "test" target in the makefile. Another useful command, as of Perl 5.8.3, is "prove", which runs a directory of tests. A script named t/TEST is also sometimes used. But since a test is a Perl script, you can run it by hand (but without the summary features provided by the test harness). Example code for this talk is found in the Acme::PETEK::Testkit module.

Considerations when writing tests:

1. Make sure most important code is tested. People don't actually test every branch of code, and ROI diminishes as you jump through hoops trying to achieve complete coverage. Conversely, from zero tests, every added test is an improvement.
2. Test scripts should have a "plan".
3. Don't print to STDOUT! Use diag() instead. Testing scripts that print to stdout may involve extra work to capture output.
4. Test for failure as well as success.
5. Give tests a description. If you don't you'll be stuck figuring out which one was "test 50046".

Moving on to testing specifics, Pete introduced Test::More by showing some examples of the standard tests, use_ok(), is(), is_deeply(), cmp_ok(), can_ok(), etc. Tests can be put in a "SKIP" or a "TODO" block, and Test::More will handle them gracefully.

The prove script handles invocation of tests in a folder, using Test::Harness, simplifying the relevant Perl one-liner. It also has extra features such as "verbose" and "shuffle" modes. It is intended as a development tool, to run tests with some granularity during debug/test cycles.

He went on to talk about Test::Inline, and I didn't pay close attention to that section. Including tests within the script to be tested is a matter of taste, and my taste doesn't run that way. I also skipped the Test::Legacy section. It's intended purely to migrate tests written with Test.pm to the new testing framework. Likewise, I already know about Test::POD.

For those who shy away from complex tests, you can use Test::Simple. It's a subset of Test::More functionality, so you can retarget tests written with Test::Simple to Test::More when you need the additional features. Test::Simple has only one method: ok().

Other test modules exist for things like performing web browsing or accessing a database. Most of these test modules combine nicely. Apache::Test specifically is the topic of another talk at YAPC. It appears to provide a kind of "sandbox" for testing. In that vein, it handles issues of hosts and ports, so the test writer doesn't have to worry about it. It uses Test::Legacy syntax, so potentially offers interoperability issues with other testing modules. Test::More support is being added, but should be considered experimental today. Pete showed an example using Apache::Test, which I looked at cursorily, since I plan to attend the Apache::Test talks later today.

Test::WWW::Mechanize can be used to perform traversal of sites. It handles cookies and form values, etc. It can be used with Test::Apache, where Apache::TestRequest::module2url() is used to convert relative URLs to something usable against the "sandbox" Apache instance.

Test::DatabaseRow can be used to perform simple tests against the database. You assign it a database handle to run against, and it can generate some SQL for you.

Test::Expect can be used to test console apps, including tests of remote applications using ssh or telnet. It uses a syntax reminiscent of Expect, as you might expect.

Test::Differences puts test diffs in a table for viewing. This can be useful for determining which parts of a test suite did not behave as expected.

Test::SQL::Translator can be used to verify the correctness of a DB schema.

To determine how much of your code is covered by your tests, you can use Devel::Cover from CPAN. It runs transparently with your tests, and compiles statistics on your code coverage. It can generate HTML output for viewing in a web browser, with color codes.

General tips:

        1. Write a test for each bug you fix.
        2. Automate your automated tests.
        3. Consider test-first development.
        4. Help write tests for others' modules that you use.
        5. Encourage others to test their code.


chromatic & Ian Langworth: Solutions To Common Testing Problems


General Enhancements to Test::More


People usually start with Test::More, but soon end up wanting better diagnostics than it provides. For example, people commonly use is_deeply(). The benefit of using it is that it will highlight, on failure, where in the data structure a disagreement is found. You can use diff to get all differences, but is_deeply() gives only the first point of disagreement. Test::Differences provides a similar functionality but shows all differences. It also shows per-line differences between multiline strings.

When using is() to compare strings, both strings are printed in full. This can be useless for comparison, especially if the strings long. Test::Longstring addresses this problem, via functions is_string(), contains_string(), lacks_string().

Beyond strings, another target of testing is nested data structures. One approach to them is to focus on the composition of the structure, rather than its content. Test::Deep offers cmp_deeply() for this purpose. You can tell is_deeply() what the structure should look like in general terms. One argument to cmp_deeply() is a template for the data structure to be tested. The template can specify an array, a subhash or superhash. Another supported concept is a "bag", which batches bags (i.e., unordered sets possibly containing duplicates).


Testing with Databases


One tricky part of testing is that more than Perl code needs to be tested. One trick you can use is mock objects to mock the DB, but that isn't always the best trick. You can use a different database instead, with test data. Or you can connect to the live system for testing.

To mock the DB connection, you can use the DBD::Mock module. One of the obvious candidates for testing with a mock DB is to test failure modes, such as login failure. In the mock object, you can set flags to simulate login failure, DB connection going away, success, etc.

One candidate for a substitute database is DBI::SQLLite, which accepts SQL commands but has no network connection, multi-user support, etc. This can be used for inserts and selects, without affecting the target DB.

The final approach to DB testing is to use the same DB back-end as the production environment, with a test data set. In the build file, the installer can be prompted for a DB name, user, password, etc., to use in tests. To the end of putting this stuff in your build file, you can use Module::Build. It's much easier than make-maker. Among other things, it provides facilities for prompting users for settings, along with logic to adopt the defaults when performing an automated build.


Testing Web Sites


With Test::WWW::Mechanize you can simulate a web browser to test web sites. It provides methods for submitting forms, clicking links, etc. It also provides test methods for examining titles and other page elements. Another utility, included with Test::WWW::Mechanize is mech-dump, which can be used to examine the structure of a page, for example to learn the name of a form if you don't already know it. Instead of mech-dump, you can use HTTP::Recorder to create a proxy and examine data as it flows back and forth. The proxy can be used to pop up an additional page displaying some information about the exchange. Note that HTTP::Recorder is new and limited: it doesn't handle SSL or Javascript.

The HTML can be validated as a whole using Test::HTML::Tidy. If you only want to check certain things in the HTML you can use Test::HTML::Lint. In response to a question from the crowd: the speakers don't know if there's a handy module for testing XSS vulnerabilities.


Testing with Mock Objects


Mock objects are handy when testing conditions that are difficult to produce for one reason or other: supplying a missing network connection; pretending to reformat the hard drive; simulating obscure failures; etc.

For example, suppose you want to test a bit of code that makes a system call that may or may not succeed on the testing machine (for example, due to lack of speakers at the time of test). This can be done by overriding "system" as follows. Note that "system" must be overrided before the module to be tested is loaded.

        package TestModule;
        use subs 'system';
        *TestModule::system = sub {
        }
        [..]

One word of advice: tests involving mocking like this should probably be run in their own files, so weird things like overridden functions don't have side effects that leak into other tests.

Mock objects to be created for testing with the Test::MockObject. The author has written some articles about Test::MockObject on Perl.com.


Unit Testing with Test::Class

Test scripts discussed so far in this session are procedural. Test::Class treats tests as objects. A new test is created as a subclass of Test::Class. This class implements an analogue of Ruby fixtures, which are services such as the setup and tear-down surrounding execution of test cases. To use a test class based on Test::Class, simply use the module you've created and execute the class method runtests().

There are facilities for skipping tests in units of one class (possibly including all its subclasses).

Another advantage of Test::Class is that you can ship the test classes with your package. Then users that subclass your objects can also subclass your tests and leverage your effort.

Test::Class also facilitates creation of test plans by allowing you to specify test counts piecemeal, and then collecting them into a plan for you.


A Few Cool Things about mod_perl 2.0

The list of mod_perl directives has grown some. The list of classes has grown tremendously.

Writing a PerlTypeHandler didn't work before, because mod_mime has a stranglehold on the request. That has changed. You can now write PerlTypeHandlers easily. You probably won't, because nobody ever needs to, but you can.

What about Apache2::Const::OK? Big changes came right before mod_perl 2.0 because of the "Great Namespace War" of 2004. So now there's something you need to know to migrate mod_perl 2. What you need to know is that all Apache:: modules now live in the Apach2:: namespace. The only exception is Apache::Test. That includes Apache constants, because they are fully qualified by their package:

        - Apache2::Const::OK
        - APR::Const::SUCCESS

But no matter what the docs say, you don't need to use -compile. Just do a:

        use Apache2::Const qw(OK);
        ...

It isn't too hard to migrate to mod_perl 2--you can practically do it with a sed script, as long as you're on mod_perl 1.99. Going from 1 to 2, see last year's talk, "Why mod_perl 2.0 Sucks". But back to what's cool about mod_perl 2.0...

Apache 2.0 has over 340 directives, but only 90 are from "core" Apache. The rest are from extension modules. Those extension directives must be wrapped in IfModule directives in the Apache config. Both versions of mod_perl provide an API for defining new Apache directives, but the API in 1.0 was too intimidating. The 2.0 directive handler is in pure Perl.

Total Access is another cool mod_perl 2.0 feature. Whereas version 1.0 was incomplete, 2.0 offers complete access to everything in the Apache API. There's even a method called assbackwards() (whatever that does).

Output Filters are a new feature in Apache 2.0. Output filters are "things" that allow you to post-process content after the content phase has run. One example of an Apache filter is the one that processes SSI tags: the output of CGI scripts are not run through that mechanism, so CGI scripts can't use SSI tags. Although mod_perl has been able to filter content for years, it was previously only able to process mod_perl output itself. Now it's possible to filter output at a later stage. It's now possible to use mod_perl to filter output of PHP scripts, for example.

Stacked Perl Handlers are an idea borrowed from Apache that mod_perl 1.0 didn't get right. In Apache, how the module list is traversed depends on the phase. In some phases, the handler list is exhausted. In other phases, the list is traversed until the first handler returns OK (authentication is one example of this). The mod_perl 1.0 version didn't allow for early termination on return of OK, but it's now been fixed. One effect of this fix is that PerlAuthenHandler was able to be re-written very nicely, without all the ridiculous workarounds.

To finish with a plug: Apache::Test totally rocks, and you should use it for everything, every time.


Perl Black Magic: Obfuscation and Golfing

José Castro introduced himself by requesting that people stop calling him "Hosé": being Portuguese, the correct pronunciation is Joe-say. From this lighthearted beginning, José proceeded to give a hilarious presentation concerning obfuscated Perl. His pace was much too fast to keep up with, so I won't try to capture his talks in detail. Just remember to check out his slides when they become available at http://www.jose-castro.org/talks/index.html. Here are a couple of teasers:

To impress your friends with obfuscation, you have to give them something that they don't understand right away, but do understand eventually. If, when you explain what a script does, they still don't understand it, they won't be impressed with your cleverness. But if they find out that your incredibly convoluted script prints, "Just another Perl hacker," they'll be impressed.

Some clever ways to make your Perl incomprehensible include:

        1. Gratuitous use of the ternary operator
        2. Adding distractors in the untaken branch of the ternary operator
        3. Using lots of semicolons you don't need
        4. Remove whitespace to enhance unreadability
        5. Never use /// in regexes: "ss from s to s" is much more confusing
        6. Use lots of pound signs, until people can't tell what's a comment and what ain't

Definition: "Golfing" is the art of programming with as few characters as possible. You start, of course, by eliminating all whitespace. And you never use a variable name longer than one letter. Of course you leave off semicolons whenever it's allowed. Above all, you shouldn't forget to exploit the power of Perl's command-line options, of which "-n" is only the tip of the iceberg.

Two clever operators:

        The Eskimo operator: }{

This cute little number looks very confusing, but think what it does at the start of a script invoked with "perl -n"! Check the manpage if you can't figure it out directly.

        The shopping-cart operator: @{{}}

This operator can be used to perform operations within strings, for example, as long as the innermost code block returns a (possible empty) array reference.

No comments: