

- #LINUX PARSE PDFINFO OUTPUT PDF#
- #LINUX PARSE PDFINFO OUTPUT FREE#
- #LINUX PARSE PDFINFO OUTPUT WINDOWS#
I tried to use my limited Windows CLI knowledge and get it to feed the PDFs to pdfinfo, with no joy. I tried against a single file, and that worked fine. Which would give STDOUT (or could be redirected to a text file, for instance). Alas, that was not to be the tool is designed to be run like
#LINUX PARSE PDFINFO OUTPUT PDF#
I had already located and exported the PDF files in question out to a single directory for parsing, and I was hoping it'd be as quick and easy as pointing pdfinfo to that directory and redirecting output to a file of my choosing.

It's a command-line utility, which is fine by me.
#LINUX PARSE PDFINFO OUTPUT FREE#
pdfinfo (which is a free utility, by the way) will extract this metadata from within a PDF file. PDF file metadata (author, title, revision, etc) is primarily stored in a couple different places within a PDF - the Info Dictionary, and/or the XMP (eXtensible Metadata Platform) stream.
:max_bytes(150000):strip_icc()/linux-log-tail-syslog-d58aef7f98664189a32f28275a9e05d8.jpg)
For those who are not familiar with it, pdfinfo is part of xpdf, an open source PDF viewer utility. No fancy commercial tools such as EnCase were at my disposal to automate the task for me, so I turned to pdfinfo. Here's the scenario: I was stuck in Windows, and had a virtual ton of PDF files from which I need to extract metadata. Raise ValueError('Invalid disk name /*/start'.This is going to be just a quick, short post (hey, don't laugh - it *can* happen!) with something I wanted to pass along to all my fearless readers. You would be better off inspecting /sys directly. After all, lsblk is just a way to report the contents of the sysfs filesystem. I actually wouldn't bother with parsing the output of lsblk at all. Best practice would be to run the command with a specific executable and pre-parsed command-line options: run('/bin/lsblk -o name -n -s -l'.split(), stdout=PIPE) The shell introduces a set of potential security vulnerabilities - for example, shenanigans with the PATH environment variable. Whenever practical, I recommend avoiding the shell when executing subprocesses. It might also be fooled by exceptionally tricky naming of LVM volumes. It could be fooled if there are more than 26 physical disks: the 27th disk would be named sdaa. I don't think that doing a substring search ( if x != disk and disk in x in your code) is a reliable filter. Note that LVM volumes (such as vg-root above) would still appear in the output. To list the devices on sda, it would be better to run lsblk -o name -n -l /dev/sda - that would immediately drop sr0 from consideration, for example. If you drop it, then the output makes more sense: $ lsblk -o name -n Now, it's more apparent that the -s option isn't helpful. To understand the output, you need to drop the -l flag so that the list appears in tree form: $ lsblk -o name -n -s In the output, sda appears multiple times. For example, on my machine: $ lsblk -o name -n -s -l You may experience compatibility issues on slightly older GNU/Linux installations.īut I don't see why you would want the -s option at all - it just gives you an inverted device tree. The -s option to lsblk was introduced to util-linux rather recently, in release 2.22. Should be written with doubled parentheses: print(('\t' + partition))įor Python 3 support, I have checked the print() documentation, but have been unable to find a reference for including function calls and string concatenation in double parentheses when calling print(). In both files the IDE suggests that my print() statements: print('\t' + partition) I have coded the functions in Ninja-Ide with lint and PEP8 suggestions turned on. I would appreciate a quick review on anything that is not idiomatic/pythonic code. The functions are in two different files in the same directory. Results.extend(output_string.split('\n')) The function parses the lsblk utility output.Ī string containing a disk name such as 'sda'Ī list of strings representing partitions. Gets all partitions present on a physical disk Gets all partitions for a given physical disk. The second function: from subprocess import run, PIPE Output = run(command, shell=True, stdout=PIPE) Gets all physical drive names on a Linux system,Ī list of strings representing drive names. Here is the first function: from subprocess import run, PIPE I have written two small functions that parse the lsblk output and return Linux physical and logical disks. I am a Python beginner learning Python 3.
