EXTRACTING PDF FILE USING PYTHON AND PYPDF2

  

EXTRACTING PDF FILE USING PYTHON AND PYPDF2



Extracting pdf file using PyPDF2 in jupyter notebook using python

blog1

What is PyPDF2?

PyPDF2 is a Pure-Python Library, by Pure-Python we means that it does not depend on any external libraries and should run on any python platform. It is a kind of PDF toolkit, capable of extracting information from documents, splitting and merging documents, cropping pages, merging pages , encryption and decryption of PDF document, etc. It works on StringIO objects.
Visit PyPDF2 official site here

What is Anaconda?

A data science platform
Anaconda is an open-source distribution of R programming language as well as Python language. It simplify package management and deployment through scientific computing. The distribution contains data-science packages compatible with Windows, Linux, and macOS.
Visit official website of Anaconda here

Steps to follow :

Step 1: Open anaconda prompt

blog1
Step 2: Install PyPDF2
reach to system32 through cd windows/system32
to install PyPDF2 : pip install PyPDF2
and you are done!

blog1
Step 3: Open Jupyter notebook and write your code
Open new file

blog1
write your code
import PyPDF2 as p
file = open("F:\data extraction with python\demoPDFfile.pdf", "rb");
copy and paste pdf file path and run codes

pd=p.PdfFileReader(file)
i=pd.getPage(0)
j=pd.getPage(1)
k=pd.getPage(2)
print(i.extractText())
print(j.extractText())
print(k.extractText())

blog1
l=i.extractText()
split=l.splitlines()
print (split)


blog1

Done !

No comments:

Post a Comment