I remember an early reviewer of the Mac being surprised when he, an avid WordPerfect user, used Cmd+V to paste some text in MacWrite out of habit and found it worked. Meanwhile, P was taken, used IIRC for the printing functions. Given there was no standard word meaning "Insert a copy of a block" (now called "Paste"), there was no less reason to use V than any other letter. WordPerfect had a lot of control+key combinations, and CTRL-X/C/V were, as others have mentioned, all on the same part of the keyboard. The history of this is that these actually all date back to a word processor application for DOS called WordPerfect. Of meta data including all relevant font information.Most of comments here are partially right, but omit the history and avoid mentioning the fact the words "Cut" and "Paste" weren't actually standard when the shortcuts were created. Then, run it through the inbuilt OCR of AdobeĪcrobat ('Enhance' feature).(I found it optimum if I first convert each slide I converted the whole pdf file in to another pdf (with images asĮach slides).When I ran into similar problems these are the steps I performed Since the problem lies in ambiguous pdf standard (which allows removal of font information), one best practice would be OCR. In other words, though a reader can render it fine from the binary, the ascii equivalent (possible if font data is present) is not available. Not necessarily, this is because many at times the information regarding the font is not present inside the pdf. Would installing such fonts in Microsoft Word work it out?
the subset info (but not the prefixes used for subset font names),īut you do not get the info about the presence of a /ToUnicode table.What you can get via Menu -> File -> Properties. You can, unfortunately, not get the exactly same info about the fonts used by a PDF via Acrobat or Adobe Reader. There is no solution other than doing this manually.If not, how could I solve this problem?.(Funnily, my most popular StackOverflow answer deals with exactly that question - I dunno why people seem to be so crazy about extracting fonts from PDF files other than for debugging purposes.) You could extract the subsetted fonts from the PDF itself.If so, where can I get or even create those subsets of the fonts I need?.(I cannot give a definite answer without having myself access to the PDF in question.) Would installing such fonts in Microsoft Word work it out?.If it doesn't, you can already forget about MS Word. You should check first, if copy'n'pasting of text works if you use a simple text file as a target (not an MS Word document). Hence it is not easy to extract text that is shown with this font (extraction would require manual reverse engineering - but then you can also just "read" the PDF pages). The font SSKFGJ+ArialMT uses a custom encoding, but the PDF has no /ToUnicode for this font, as indicated by the no entry for the column headed uni. In the above case, both used fonts are embedded as subsets (indicated by the XYZABC+-prefixes to their names, as well as by the yes in the emb and the sub columns). The command above asked for the fonts used in the page range 3 (first to check) to 5 (last page to check). SSKFGJ+ArialMT CID TrueType Custom yes yes no 11 0 IADKRB+Arial-BoldMT CID TrueType Identity-H yes yes yes 10 0 Pdffonts returns a few basic information items about the fonts used by your PDF.Įxample output: $ pdffonts -f 3 -l 5 sample.pdf In order to successfully extract text (or copy'n'paste it) from a PDF, the font should either use a standard encoding (not a Custom one), and it should have a /ToUnicode table associated with it inside the PDF. That is part of the XPDF package for Windows and can be used without installing, just from a DOS box. You should check your PDF document's fonts first with the help of the pdffonts utility.