Perl is great, but today you can just as well (or easier) use ruby, python or ev...

tux1968 · on Oct 20, 2013

Well, I have close to 30 years Unix experience so you don't have to convince me of the power of a pipeline. But I do have to disagree with your affection for awk and sed even though in the past I have used them both extensively.

The advantage of learning perl is that you can replace both utilities in your piped command line and only have to remember one syntax. And it integrates just as well as sed and awk do in any pipeline. But quite often, it will obviate the need for piping through other utilities.

Perl is much closer to sed and awk than any of the other scripting languages you mention. As noted in other posts, you can even automatically convert awk scripts to perl. And it doesn't take much effort to convert sed scripts to perl where you can leverage more powerful pattern matching and procedural facilities to boot.

If you already know sed and awk, by all means keep using them, they're fine. But if you're new to the Unix command line, you'll get the most bang for your buck by learning perl instead of awk and sed.

ghshephard · on Oct 20, 2013

As a sysadmin (not web/app or systems programmer) I spent about 7 years using Perl (wrote my company's HR <-> LDAP synchronization system in it) until Python came along (for me, around 2003), nowadays for one or two liners, it's sed/awk - the simplicity of awk '{print $1,$2}', awk '{tot+=$2; c++}END{print tot/c}' has just worked it's way into my fingers, I don't even have to think about sed 's/^.client="$.$",dev.*/\1/' foo.txt - I just look at the log file and the sed expression just appears at my fingertips. Anything more complex than that (unless it's a really obvious fit for awk) I switch over to Python. Its simple data model for things like HoA or AoH just imposes less cognitive overhead on me, and it's rich assembly of built in libraries make me productive from the get-go. This sort of thing would be incredibly difficult in awk - not sure if I would have to go to CPAN with perl - but I just assumed (and was correct) that python would have it built in.

  import json
  f=open("bus-stops.json")
  j=json.load(f)
  for a in j:
    print a['no'],",",a['lat'],",",a['lng'],",",a['name']

A simple "import json", "help(json)" at the Python command line, and 2 minutes later I was done. Also - I'm able to understand my code the next day - something I was never able to do with Perl, but for some reason I can with Python.

I probably spend 90% of my time in sed/awk, and 10% of my time in Python. Haven't touched Perl in 10 years - not because it isn't an awesome language (it really is) - it's just that I have room in my head for one full blown language at a time, and Python replaced Perl for me.

tux1968 · on Oct 20, 2013

Yeah, i have no problem with python or any other language. Use what works for you. But perl does an acceptable job on every one of your examples..

  perl -naE 'say $F[0], $F[1]'

  perl -nae '$tot+=$F[1]; $c++;END{print $tot/$c}'

  perl -ape 's/^.client="(.)",dev.*/\1/' foo.txt

And finally:

  use JSON;
  use IO::All;
  use feature 'say';
  $f = io('bus-stops.json')->all;
  $j = decode_json($f);
  $,=',';
  for (@$j) {
     say @$_{'no','lat','lng','name'};
  }

I still think newbies would get much more value out of learning perl basics than spending any time on sed/awk intricacies. But to each his own.

jnazario · on Oct 20, 2013

awk tip: in your averaging example, the "c++" and reference to c in the END block can be replaced by the built-in var NR (for number of records). awk's builtins are useful.

berntb · on Oct 20, 2013

Good luck doing one liners with Python, as long as it use wspace indentation for e.g. loops and don't support {}:s as an option. :-(

I'd argue with most others, Perl is the single most useful command line tool. Not the only one, of course. But afaik, you can't e.g. load a JSON lib in awk as part of a pipeline. (I deserialize dumped data structures multiple times a week in [ad hoc testing with] pipelined cmds.)

Imho, if you know Ruby or PHP as the back of your hand, don't learn another scripting language for command line use. Learn some completely different language for some other use, instead.

a3n · on Oct 20, 2013

https://en.wikipedia.org/wiki/One-liner_program#Python

You can use semi-colons instead of newlines, and colons don't require a newline.

  python -c 'for c in "abc": print c; print c'

Imports and actual pipelines are a little more tedious, it can be done, but python isn't as straightforward as perl or shell-ish tools for pipelines.

berntb · on Oct 20, 2013

Thanks. Can you also use that for if/else, other block things?

(Last I looked -- the answer was no. It removes a use case for ideology, sigh.)

a3n · on Oct 20, 2013

I think you're right. The closest, I think, is the pseudo ternary operator:

print "hi" if True else "bye"

mtdewcmu · on Oct 20, 2013

No, you can't work with JSON very easily in awk. Any kind of hierarchical format like JSON and XML will give you a headache in awk, and CSV can be difficult as well.

kermatt · on Oct 20, 2013

While not as complete as a full CSV parsing lib, finding this made working with CSV in awk much easier: http://www.gnu.org/software/gawk/manual/html_node/Splitting-...

  gawk -vFPAT='[^,]*|"[^"]*"'

http://stackoverflow.com/questions/4205431/parse-a-csv-using...

berntb · on Oct 20, 2013

That regexp fails with fields containing '"':s, but I guess you can grep for embedded double quotes ("") first.

Are there multiple variants of coding '"' in CSV fields? I don't know -- but some people who do know are those who write the CSV libs I use!

Edit: And as your link notes, it fails for embedded \n:s. Imnsho, awk needs csv (and json, etc) builtin, preferable as a plugin architecture. But then, why not just use the Perl superset?

mtdewcmu · on Oct 20, 2013

Arnold Robbins created FPAT to parse CSV, but it doesn't really do that very well. I agree that it would have been better to just hardcode a CSV mode. CSV is common, so you shouldn't have to think hard in order to parse it, and FPAT is hard. PHP makes parsing CSV a breeze. You could write a good CSV parser in gawk and @include it in other scripts as a solution short of hacking gawk itself. But it's generally easier just to find some other way, such as swapping CSV for TSV -- which works better in awk.

Hierarchical formats like JSON are a little different, because they don't fit the awk model very well. You could add functions to work with JSON, but working with it this way wouldn't be very awk-like. You're better off preprocessing the JSON into records with another tool to make it more awk-friendly, or simply using another language altogether.

billjings · on Oct 20, 2013

Man, seconded. I'm not sure if a CSV-ized awk is a sensible idea, but I'd love to have it if it were. CSV might be #1 on my list of "things that will cause problems for you because they are slightly harder than you think they are".

berntb · on Oct 20, 2013

I hear you, re CSV.

Join the dar... cough, Perl side, we have cookies. :-) We have CSV parsers and everything else, all the way up to e.g. good web libraries and the best OO among the scripting languages (Moose, ~ like the Common Lisp OO environment; more or less std for new Perl projects today.)

And there is more! You can reuse most everything you know from awk! Write: perldoc perlrun

Check for -n, -p, -i, -E flags. And, as many have noted, there is a2p.

http://perldoc.perl.org/5.16.2/perlrun.html

http://perldoc.perl.org/5.16.2/a2p.html

But the main reason is that we have fun. An insane programming language which throw all this "minimal mathematical notation" stuff out the window with some linguist inspirations, but still works wonderfully (do insist on keeping to the coding standards in your group. Seriously. At a minimum -- lie and say that you do that, when people interview for a job at your place. :-) )

thezilch · on Oct 20, 2013

Which is why, in similar fashion to awk, we have utilities to deal with JSON [0] and CSV [1] output.

[0] https://github.com/stedolan/jq [1] https://github.com/onyxfish/csvkit