Biggest -doc packages

After loosing a lot of time (because I couldn't remember awk syntax correctly), now I know what are the biggest -doc packages in sid:

$ egrep "Package:.*-doc|^Size" /var/lib/dpkg/available | grep -A1 "doc" | grep -v "^--$" | awk '{ if ( $1 ~ "Package:" ) { pkg = $2 } else { print $2" "pkg } }' | sort -n | tail -n 3
77647098 vtk-doc
86947420 libxmpp4r-ruby-doc
107700626 sofa-doc

don't say it's ugly, I know ;)

What's the purpose of this? I wanted to know what medal new version (0.98.5) of python-matplotlib-doc package would win, and that would be silver!!

$ ls -l python-matplotlib-doc_0.98.5-1_all.deb
-rw-r--r-- 1 morph morph 91141234 2008-12-16 10:39 python-matplotlib-doc_0.98.5-1_all.deb

I really hope to reduce it before upload it...


a1fie said...

dctrl-tools can help you here pretty well:

grep-available -FPackage -sPackage,Size -n -e -- '-doc$' | while read package; do read size; read; printf "%d\t%s\n" $size $package; done | sort -n | tail -n3

jamessan said...

Utilizing more tools from dctrl-tools:

grep-available -FPackage -e -- '-doc$' | tbl-dctrl -cSize -cPackage -H -d' ' - | sort -n | tail -n3

a1fie said...

Sorry that I didn't think about it more, jamessan is of course right. But even the sorting can be done with that fine package, and then you can get it displayed in any format you find fitting that tbl-dctrl offers.

grep-available -FPackage -sSize,Package -e -- '-doc$' | sort-dctrl -kSize:n | tbl-dctrl -cSize
-cPackage -H -d' ' | tail -n3

Now we only need to limit the amount of shown entries without utilizing something outside the scope of the package. ;)