Code for Converting PDF to Audio

Introduction: Converting PDF to Audio

In today’s fast-paced world, the ability to consume information efficiently is more important than ever. This is particularly true in the realm of reading and processing written documents, such as PDFs, which are a standard format for disseminating information across various fields and industries.

However, reading through lengthy PDF documents can be time-consuming and is not always feasible, especially for individuals with busy schedules or for those who have visual impairments that make reading challenging.

Converting PDF documents to audio presents a solution that caters to a range of needs and preferences, enhancing accessibility and convenience in several ways:

  1. Accessibility for Visually Impaired Users: One of the most significant advantages of converting PDFs to audio is the increased accessibility it provides to visually impaired users. It enables them to access the information in PDFs without the need for Braille or other specialized reading tools.
  2. Multitasking and Time Management: Listening to audio allows for multitasking. People can consume the content of PDFs while engaging in other activities, such as commuting, exercising, or performing household chores, making better use of their time.
  3. Learning and Retention: Some individuals retain information more effectively through listening rather than reading. Converting PDFs to audio can facilitate learning and improve information retention for auditory learners.
  4. Ease of Use: Audio files are easy to handle and can be played on a wide range of devices, including smartphones, tablets, and laptops, providing flexibility in how and where the content is accessed.
  5. Language Learning and Pronunciation: For non-native speakers, listening to content in the target language can be incredibly beneficial. It aids in language learning, especially in terms of understanding pronunciation and natural language flow.
  6. Eye Strain Reduction: Reading large volumes of text, particularly on digital screens, can lead to eye strain. Listening to audio is a comfortable alternative that reduces the strain on the eyes.

In summary, converting PDFs to audio opens up a new dimension of accessibility and convenience. It not only empowers individuals with visual impairments but also caters to the diverse preferences and needs of a broad audience, making information consumption more flexible and efficient.

Using Google Text-to-Speech

You can use gTTS (Google Text-to-Speech) to read text. gTTS is a very convenient tool for converting text to speech and saving it as an audio file, typically in MP3 format.

Unlike pyttsx3, gTTS does not provide real-time speech playback but instead allows you to generate audio files that you can play back using any standard audio player.

Here’s a basic example of how you can use gTTS to convert text to an MP3 file:

from gtts import gTTS

def text_to_mp3(text, filename):
    tts = gTTS(text, lang='en')
    tts.save(filename)

# Example usage
text_to_mp3("Hello, this is a test of text-to-speech conversion.", "output.mp3")

In this example, text_to_mp3 is a function that takes the text and a filename as inputs. It uses gTTS to convert the text to speech and then saves it as an MP3 file. You can play the output.mp3 file with any media player.

Advantages of gTTS:

  1. Ease of Use: gTTS is straightforward and easy to use for generating speech from text.
  2. Quality: It leverages Google’s Text-to-Speech API, so the quality of the speech is generally quite good.
  3. Language Support: gTTS supports multiple languages, making it a versatile choice for international applications.

Limitations:

  1. Internet Dependency: gTTS requires an internet connection to work, as it sends the text to Google’s servers for processing.
  2. No Real-time Speech: It doesn’t support real-time speech generation. The output is an audio file.

This method is ideal if you’re okay with having the speech output in the form of an audio file and you have a reliable internet connection.

PDF to mp3/wav via gTTS

Initial code:

  • Convert a PDF to text
  • Convert text to mp3 using Google Text-to-Speech
  • Convert mp3 to wav
from gtts import gTTS
from pydub import AudioSegment 
import PyPDF2

# Function to convert MP3 to WAV
def convert_mp3_to_wav(mp3_file, wav_file):
    audio = AudioSegment.from_mp3(mp3_file)
    audio.export(wav_file, format="wav")

# Path of the PDF file 
path = 'c:\myfolder\test.pdf'

# Creating a PdfFileReader object 
pdfReader = PyPDF2.PdfReader(path)

# The page with which you want to start 
# This will read the first page
from_page = pdfReader.pages[0]

# Extracting the text from the PDF 
text = from_page.extract_text()

# Convert text to speech and save as MP3
tts = gTTS(text, lang='en')
tts.save("output.mp3")

# Convert the saved MP3 to WAV
convert_mp3_to_wav("output.mp3", "output.wav")


Python code to read text from a PDF file and then use a text-to-speech engine to speak it out.

  1. Importing PyPDF2: The correct way to import the PyPDF2 module is import PyPDF2.
  2. Opening the PDF File: The approach to open the file is correct, but make sure the path 'c:/myfolder/test.pdf' is valid and accessible.
  3. Creating PdfReader Object: In PyPDF2, you should create a PdfReader object directly from the file path.
  4. Accessing a Page: To access a page, you should use indexing like pdfReader.pages[0] for the first page (note that pages are zero-indexed).
  5. Extracting Text: The method extractText() might not always extract text perfectly, depending on the PDF’s formatting. Add regex to remove lien feeds
  6. Text-to-Speech: The use of pyttsx3 seems correct, but ensure that it’s installed and working on your system.

Improving Reading Quality

Improving the quality of text extracted from a PDF can be challenging, especially when dealing with formatting issues like line breaks. PDFs are primarily designed for layout rather than text structure, which can make text extraction tricky.

Here are some strategies you can use:

  1. Adjusting PDF Reading Options:
    • Some PDF readers or libraries offer options to adjust the way text is extracted. For example, PyPDF2 or its more advanced fork, PyMuPDF (also known as fitz), may provide different results. Experimenting with different libraries can sometimes yield better results.
  2. Post-Processing the Extracted Text:
    • After extracting the text, you can apply some post-processing to clean it up. Common tasks include:
      • Removing Unnecessary Line Breaks: You can replace line breaks that occur within a paragraph. This might involve replacing newline characters (\n) with spaces, but only where a newline doesn’t signify a new paragraph.
      • Handling Hyphenation: If a word is hyphenated at the end of a line, you may want to join it back together.
      • Regular Expressions: Python’s re module can be useful for finding patterns in text and making adjustments.
  3. Using Advanced PDF Processing Tools:
    • Tools like Adobe Acrobat Pro have more sophisticated text recognition capabilities and might offer better results, especially for complex layouts or scanned documents.
  4. Optical Character Recognition (OCR):
    • For scanned PDFs, OCR tools like Tesseract can be more effective. They interpret the actual characters in the image rather than relying on embedded text, which can be more accurate for certain types of documents.

Here’s an example of how you might implement some basic post-processing in Python:

import re
import PyPDF2

def clean_text(text):
    # Replace end-of-line hyphens with an empty string
    text = re.sub(r'-\n', '', text)
    
    # Replace line breaks within paragraphs with a space
    text = re.sub(r'(?<!\n)\n(?!\n)', ' ', text)
    
    return text

# Read and process PDF
path = 'your-pdf-file.pdf'
pdfReader = PyPDF2.PdfReader(path)
from_page = pdfReader.pages[0]
text = from_page.extract_text()

# Clean the extracted text
cleaned_text = clean_text(text)

This script will remove hyphenation at the end of lines and replace line breaks that aren’t paragraph breaks with spaces. You may need to adjust the regular expressions based on the specific formatting issues you’re encountering in your PDFs.

Using pyttsx3

pyttsx3 is a text-to-speech (TTS) library for Python that allows the conversion of text into speech. It is a cross-platform library, meaning it works on different operating systems such as Windows, macOS, and Linux.

One of the key advantages of pyttsx3 is that it works offline, as it does not rely on external services or internet connectivity.

Key Features of pyttsx3:

  1. Offline Capability: Unlike some other TTS libraries that require an internet connection to access cloud-based services, pyttsx3 operates entirely offline. This makes it useful for applications where internet access is limited or unavailable.
  2. Cross-Platform: It is compatible with multiple operating systems, allowing the same script to run on Windows, macOS, and Linux without requiring changes.
  3. Control Over Speech Properties: pyttsx3 provides control over various aspects of speech, such as voice properties, speech rate, and volume. This allows customization of the speech output according to user preferences or specific requirements.
  4. Multiple Voice Support: It supports different voices installed on the user’s system. This means you can switch between voices, often including different accents and genders, depending on what’s available on the operating system.
  5. Synchronous and Asynchronous Speech Generation: pyttsx3 can be used for both synchronous and asynchronous speech generation, giving flexibility in how the speech output is integrated into applications.
  6. Event Hooks: The library allows hooking into events like the start and end of speech, providing more control over the speech generation process.

Common Use Cases:

  • Accessibility Features: For applications designed for visually impaired users, pyttsx3 can provide an essential interface for auditory feedback.
  • Desktop Applications: It can be used in desktop applications where text-to-speech functionality is needed, such as reading out instructions, alerts, or notifications.
  • Educational Tools: In educational software, especially language learning tools, it can be used to provide pronunciation guides and reading assistance.
  • Automated Responses: For automated systems like chatbots or virtual assistants, pyttsx3 can give a voice to text-based outputs.

Basic Usage Example:

Here’s a simple example of using pyttsx3 to convert text to speech:

import pyttsx3

engine = pyttsx3.init()
engine.say("Hello, how are you today?")
engine.runAndWait()

In this example, the pyttsx3.init() function is used to get a reference to a speech engine. The say method queues a string of text to be spoken, and runAndWait processes the speech commands.

Overall, pyttsx3 is a versatile and practical library for text-to-speech conversion in Python, suitable for a variety of applications where speech output is required.

Changing Voices

Changing the voice in a text-to-speech (TTS) system can be done differently depending on the TTS engine you’re using. For gTTS (Google Text-to-Speech) and pyttsx3, the methods are distinct:

Changing Voice in gTTS

gTTS doesn’t offer much flexibility in terms of changing voices. It primarily uses the default Google Translate voices, and your options are mostly limited to changing the language or the accent. For example, you can change the accent in English by specifying different regional standards like ‘en-us’ for American English, ‘en-uk’ for British English, etc.

Example:

tts = gTTS(text, lang='en-uk')  # British English
tts.save("output.mp3")

Changing Voice in pyttsx3

pyttsx3 allows more flexibility in voice selection since it utilizes the voices available on your system (SAPI5 on Windows, NSSpeechSynthesizer on macOS, etc.).

Here’s how to change voices using pyttsx3:

  1. List Available Voices: First, find out what voices are available on your system. import pyttsx3 engine = pyttsx3.init() voices = engine.getProperty('voices') for voice in voices: print(f"ID: {voice.id}, Name: {voice.name}, Language: {voice.languages}")
  2. Set a Specific Voice: Once you know the available voices, you can set the voice you want by its ID. engine.setProperty('voice', voice_id) # replace `voice_id` with your chosen voice's ID engine.say("Your text here") engine.runAndWait()

Remember, the availability of different voices depends on your system and the TTS engine it uses. Some voices might not be available on all systems, and the quality or characteristics of these voices can vary.

Checking dependencies

To check if ffmpeg is installed and accessible for audio format conversion, especially for libraries like pydub that rely on it, you can use Python’s subprocess module to run a command line check. The idea is to execute a simple ffmpeg command and see if it returns an error or not.

Here’s a function that checks if ffmpeg is installed:

import subprocess

def is_ffmpeg_installed():
    try:
        # Try running a simple ffmpeg command and capture its output
        subprocess.run(["ffmpeg", "-version"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
        return True
    except (subprocess.CalledProcessError, FileNotFoundError):
        # CalledProcessError or FileNotFoundError means ffmpeg is not installed or not in PATH
        return False

# Check if ffmpeg is installed
if is_ffmpeg_installed():
    print("ffmpeg is installed.")
else:
    print("ffmpeg is not installed.")

This function attempts to run ffmpeg -version using subprocess.run(). If ffmpeg is installed and properly set in the system’s PATH, this command will execute without error, and the function will return True. If ffmpeg is not installed or not found in the PATH, it will raise either FileNotFoundError or subprocess.CalledProcessError, and the function will return False.

Remember, for this check to work correctly, ffmpeg must be installed and added to the system’s PATH environment variable so that it can be invoked from the command line.

Playing Audio

To play an MP3 file in Python, you can use various libraries, but one of the simplest and most commonly used ones is pygame. Here is an example of how you can use pygame to play an MP3 file:

First, you’ll need to install pygame if you haven’t already. You can install it using pip:

pip install pygame

Then, you can use the following script to play an MP3 file:

import pygame
import time

def play_mp3(file_path):
    # Initialize pygame mixer
    pygame.mixer.init()

    # Load the MP3 file
    pygame.mixer.music.load(file_path)

    # Play the MP3 file
    pygame.mixer.music.play()

    # Wait for the music to play before exiting
    while pygame.mixer.music.get_busy():
        time.sleep(1)

# Example usage
play_mp3("output.mp3")

In this script, play_mp3 is a function that takes the path to the MP3 file as input. It uses pygame to load and play the file. The script waits until the file has finished playing before exiting.

This method should work for basic needs. However, note that pygame‘s mixer module is mainly intended for game development, so it might not have all the features of a dedicated audio processing library. For more complex audio playback needs, you might want to explore other libraries like pydub or even external applications controlled via Python.

Audio File Conversion Quality

The pydub.AudioSegment.export method allows you to specify various parameters for the output file, including quality settings. However, when converting to WAV format, the concept of “quality” is a bit different than for lossy formats like MP3.

WAV files are typically uncompressed and lossless, so the primary quality-related parameter is the sample rate depth (bit depth). By default, pydub will use the same sample rate and bit depth as the input file.

If you want to specify a different bit depth for the WAV file, you can use the parameters argument of the export method. Here’s how you can modify your function to allow setting a custom bit depth:

def convert_mp3_to_wav(mp3_file, wav_file, bit_depth=16):
    audio = AudioSegment.from_mp3(mp3_file)
    audio.export(wav_file, format="wav", parameters=["-acodec", "pcm_s16le" if bit_depth == 16 else "pcm_s24le"])

In this function:

  • bit_depth is an optional parameter that allows you to choose between 16-bit and 24-bit depth. The default is set to 16-bit.
  • parameters=["-acodec", "pcm_s16le" if bit_depth == 16 else "pcm_s24le"] tells ffmpeg (which pydub uses under the hood) to use either 16-bit linear PCM (pcm_s16le) or 24-bit linear PCM (pcm_s24le), depending on the chosen bit depth.

You can call this function with the desired bit depth:

convert_mp3_to_wav("input.mp3", "output.wav", bit_depth=24)

This would convert the MP3 file to a 24-bit WAV file. If you don’t specify the bit_depth, it will default to 16-bit.

Remember, increasing the bit depth will result in a larger file size and may not always provide a noticeable improvement in quality, especially if the source material (in this case, an MP3 file) is of lower quality.

My Initial Code for Convert PDF to VOICE (PDF2VF)

# This code convert .pdf to .mp3

# importing the modules
import os
import re
import sys
import subprocess
import importlib.util
import pyttsx3
from gtts import gTTS
from pydub import AudioSegment 
import PyPDF2
import pygame
import time

# path of the PDF file
 
# path = 'c:/myfolder/Project/mypdf.pdf'
path = 'mypdf.pdf'

required_modules = ['pyttsx3', 'gtts', 'pydub', 'PyPDF2', 're', 'os', 'pygame']

# Define voices for pyttsx3
voices = {
    'UK': 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-GB_HAZEL_11.0',
    'US': 'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_ZIRA_11.0'
}

# Define language codes for gTTS
lang_codes = {
    'UK': 'en-uk',
    'US': 'en-us'
}

# User's choice for region
user_choice = 'UK'  # or 'US'

def check_dependencies(modules):
    missing_modules = []
    for module in modules:
        if not importlib.util.find_spec(module):
            missing_modules.append(module)
    return missing_modules

def exit_if_dependencies_missing(modules):
    missing = check_dependencies(modules)
    if missing:
        print("Missing required modules:", missing)
        sys.exit(1)  # Exits the script with an error status

def is_ffmpeg_installed():
    try:
        # Try running a simple ffmpeg command and capture its output
        subprocess.run(["ffmpeg", "-version"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
        return True
    except (subprocess.CalledProcessError, FileNotFoundError):
        # CalledProcessError or FileNotFoundError means ffmpeg is not installed or not in PATH
        return False        

def clean_text(text):
    # Replace end-of-line hyphens with an empty string
    text = re.sub(r'-\n', '', text)
    # Replace line breaks within paragraphs with a space
    text = re.sub(r'(?<!\n)\n(?!\n)', ' ', text)
    return text

# Function to convert MP3 to WAV
def convert_mp3_to_wav(mp3_file, wav_file, bit_depth):
    audio = AudioSegment.from_mp3(mp3_file)
    # audio.export(wav_file, format="wav")
    audio.export(wav_file, format="wav", parameters=["-acodec", "pcm_s16le" if bit_depth == 16 else "pcm_s24le"])

def read_text(read_text, region):
    engine = pyttsx3.init()
    engine.setProperty('voice', voices[region])  # replace `voice_id` with your chosen voice's ID
    engine.say (read_text)
    engine.runAndWait()

# Function to save text to speech using gTTS
def save_text(save_text, region, mp3_file):
    # Convert text to speech and save as MP3
    tts = gTTS(save_text, lang=lang_codes[region])
    tts.save(mp3_file)
    mp3_file_play = mp3_file
    # Convert the saved MP3 to WAV
    convert_mp3_to_wav(mp3_file, wav_filename, 16) #The default is set to 16-bit. 
    # larger bit depth = larger file and not better quality if the input is low quality like mp3.
    return mp3_file_play

def readPDF(ffile, fpage):
    # creating a PdfFileReader object 
    pdfReader = PyPDF2.PdfReader(ffile)
    # the page with which you want to start     
    from_page = pdfReader.pages[fpage]
    # extracting the text from the PDF 
    text = from_page.extract_text()
    # Clean the extracted text
    cleaned_text = clean_text(text)
    return cleaned_text

def play_mp3(file_path):
    # Initialize pygame mixer
    pygame.mixer.init()
    # Load the MP3 file
    pygame.mixer.music.load(file_path)
    # Play the MP3 file
    pygame.mixer.music.play()
    # Wait for the music to play before exiting
    while pygame.mixer.music.get_busy():
        time.sleep(1)

exit_if_dependencies_missing(required_modules)

# Check if ffmpeg is installed
if is_ffmpeg_installed():
    print("ffmpeg is installed.")
else:
    print("ffmpeg is not installed.")

# Extract base name for the output file
base_name = os.path.splitext(os.path.basename(path))[0]
mp3_filename = f"{base_name}.mp3"
wav_filename = f"{base_name}.wav"

# Read the PDF to text so it can be converted to voice
cleaned_text = readPDF(path, 0)

# reading the text to voice (option)
#read_text(cleaned_text, user_choice)

# Save the text to voice and get the filename of the saved MP3
mp3_file_path = save_text(cleaned_text, user_choice, mp3_filename)

# play the mp3 output (option)
# play_mp3(mp3_file_path)

Convert PDF to Voice Overview

Designing an architecture for a script that converts PDF content to voice involves several components, each responsible for handling different aspects of the process. Here’s a high-level architecture for such a script:

1. PDF Reader Module

  • Purpose: To read and extract text from a PDF file.
  • Components:
    • PDF Extraction Library: Use a library like PyPDF2 or PyMuPDF.
    • Text Extraction Function: Function to extract text from each page.
    • Error Handling: Manage cases where text extraction is not possible (e.g., scanned PDFs).

2. Text Processing Module

  • Purpose: To clean and format the extracted text for TTS (Text-to-Speech).
  • Components:
    • Text Cleaning Functions: Remove or replace unwanted characters, handle hyphenation, and manage line breaks.
    • Markdown or HTML Parser (Optional): If the PDF contains structured text like Markdown or HTML, parse it to handle elements like headers, lists, etc.
    • Text Segmentation: Break text into manageable chunks for TTS processing, if necessary.

3. Text-to-Speech (TTS) Module

  • Purpose: Convert the processed text into speech.
  • Components:
    • TTS Engine: Choose a TTS library like gTTS or pyttsx3.
    • Voice and Language Configuration: Functionality to select different voices or languages.
    • Speech Synthesis Function: Convert text chunks to speech.

4. Audio Output Module

  • Purpose: Handle the output of the TTS module.
  • Components:
    • Audio Format Conversion: If necessary, convert the TTS output to desired formats (e.g., WAV, MP3) using pydub.
    • File Saving: Save the audio output to disk.
    • Playback Functionality (Optional): Include the ability to play back the audio directly from the script.

5. User Interface (UI) or Command-Line Interface (CLI)

  • Purpose: Provide an interface for users to interact with the script.
  • Components:
    • Input Options: Allow users to specify the PDF file, voice options, and output format.
    • Execution Commands: Facilitate the conversion process through a series of commands or buttons.
    • Error Messages and Logs: Display error messages and logs for user awareness.

6. Dependency Management and System Check

  • Purpose: Ensure that all required dependencies are installed and the system meets the requirements.
  • Components:
    • Dependency Check Function: Check if libraries like PyPDF2, gTTS, pydub, pygame, etc., are installed.
    • System Requirements Check: Verify the presence of necessary tools like ffmpeg.

7. Documentation and Help

  • Purpose: Provide users with guidance on how to use the script.
  • Components:
    • User Manual: Detailed documentation on how to use the script.
    • Help Command: A command-line argument or a UI section that displays usage instructions.

Architectural Workflow:

  1. User Input: The user inputs a PDF file and selects desired voice and output settings.
  2. PDF Reading: The script reads text from the PDF using the PDF Reader Module.
  3. Text Processing: The extracted text is cleaned and formatted.
  4. Text-to-Speech Conversion: The processed text is converted into speech.
  5. Audio Output Handling: The speech is saved to a file and/or played back.
  6. User Feedback: The user is informed of the process completion and any errors.

Optional Enhancements:

  • Batch Processing: Ability to process multiple PDFs in a batch.
  • Advanced Text Parsing: Handle complex PDF structures or embedded media.
  • Custom Voice Models: If using advanced TTS services, allow the use of custom voice models.

This architecture provides a structured approach, modular design, and allows for future enhancements or modifications based on specific requirements or new features.

Markdown to mp3 using gTTS

Parsing Markdown and converting it to speech while handling elements like headers and lists is a multi-step process. You’ll need to parse the Markdown to extract and interpret different elements, then convert the interpreted text to speech. Here’s a high-level overview of how you might approach this:

  1. Parse the Markdown: Use a Markdown parser to convert Markdown text into a structured format that you can manipulate in Python. A popular choice for this is the markdown library.
  2. Interpret Markdown Elements: After parsing, you’ll need to handle different Markdown elements (like headers, lists, etc.) to convert them into a format that makes sense when read aloud. For example, you might prepend “Header: ” before headers or “List item: ” before list items.
  3. Convert Text to Speech: Once you’ve got the interpreted text, use a text-to-speech library like gTTS to convert the text to speech.

Here’s an example Python script that demonstrates this process:

Step 1: Install Required Packages

You’ll need to install markdown and gtts if you haven’t already:

pip install markdown gtts

Step 2: Python Script

import markdown
from gtts import gTTS
import os

def markdown_to_speech(md_text, output_filename):
    # Convert Markdown text to HTML
    html = markdown.markdown(md_text)
    
    # Process HTML to create a speech-friendly version
    # This can be as simple or as complex as you need
    # For now, we'll just replace some HTML tags with readable text
    speech_text = html.replace('<h1>', 'Header one: ').replace('</h1>', '. ')
    speech_text = speech_text.replace('<h2>', 'Header two: ').replace('</h2>', '. ')
    speech_text = speech_text.replace('<ul>', '').replace('</ul>', '')
    speech_text = speech_text.replace('<li>', 'List item: ').replace('</li>', '. ')
    speech_text = speech_text.replace('<p>', '').replace('</p>', '. ')

    # Convert processed text to speech
    tts = gTTS(speech_text, lang='en')
    tts.save(output_filename)

# Example Markdown text
md_text = """
# Heading One
## Heading Two
Regular text.
- List item 1
- List item 2
"""

# Convert Markdown to speech
markdown_to_speech(md_text, "output.mp3")

# Play the MP3 file (assuming pygame is still being used)
play_mp3("output.mp3")

In this example, the markdown_to_speech function:

  • Converts Markdown to HTML using the markdown library.
  • Processes the HTML to replace certain tags with speech-friendly text.
  • Uses gTTS to convert the processed text to speech and save it as an MP3 file.

This script is a basic starting point. Depending on the complexity of your Markdown content and how you want different elements to be spoken, you might need to enhance the HTML processing part.

For instance, handling nested lists, code blocks, or links might require more sophisticated text manipulation.

Adding a User Interface

Creating a simple graphical user interface (GUI) in Python to specify the PDF file, voice options, and output format for a PDF-to-voice conversion script can be done using a library like tkinter, which is included in standard Python installations.

Below is a basic example of how such a UI might look. This script will create a window where users can select a PDF file, choose a voice option, and select an output format.

First, ensure you have tkinter available in your Python environment. It’s typically included with Python, so you shouldn’t need to install anything extra.

Python Script with tkinter UI

import tkinter as tk
from tkinter import filedialog, messagebox, ttk

def convert_pdf():
    pdf_path = file_path_entry.get()
    voice = voice_option.get()
    output_format = format_option.get()
    
    # Placeholder for conversion function
    # You would call your PDF to voice conversion function here
    print(f"Converting {pdf_path} with voice {voice} to {output_format} format.")
    
    messagebox.showinfo("Conversion Started", f"Converting {pdf_path} to {output_format}.")

# Set up the main tkinter window
root = tk.Tk()
root.title("PDF to Voice Converter")

# Create a frame for file selection
file_frame = ttk.Frame(root, padding="10")
file_frame.grid(row=0, column=0, sticky=(tk.W, tk.E))

# File path entry
file_path_entry = ttk.Entry(file_frame, width=50)
file_path_entry.grid(row=0, column=1, sticky=(tk.W, tk.E))

# File selection button
file_select_button = ttk.Button(file_frame, text="Select PDF", 
                                command=lambda: file_path_entry.insert(0, filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])))
file_select_button.grid(row=0, column=2)

# Voice selection
voice_option = tk.StringVar()
voice_label = ttk.Label(root, text="Choose Voice:")
voice_label.grid(row=1, column=0, sticky=tk.W, padx=10)
voice_combobox = ttk.Combobox(root, textvariable=voice_option, 
                              values=["UK Male", "UK Female", "US Male", "US Female"])
voice_combobox.grid(row=1, column=1, sticky=(tk.W, tk.E), padx=10)
voice_combobox.current(0)

# Output format selection
format_option = tk.StringVar(value="MP3")
format_label = ttk.Label(root, text="Output Format:")
format_label.grid(row=2, column=0, sticky=tk.W, padx=10)
format_combobox = ttk.Combobox(root, textvariable=format_option, 
                               values=["MP3", "WAV"])
format_combobox.grid(row=2, column=1, sticky=(tk.W, tk.E), padx=10)
format_combobox.current(0)

# Convert button
convert_button = ttk.Button(root, text="Convert", command=convert_pdf)
convert_button.grid(row=3, column=1, sticky=tk.E, padx=10, pady=10)

# Run the application
root.mainloop()

How the UI Works:

  • File Selection: Users can select a PDF file, and its path will be displayed in an entry box.
  • Voice Option: A dropdown to select the desired voice.
  • Output Format: A dropdown to choose between MP3 and WAV formats.
  • Convert Button: When clicked, it triggers the conversion process (currently, it just prints the selections to the console).

Integrating with Your Conversion Script:

Replace the print statement in convert_pdf with a call to your actual PDF-to-voice conversion function, passing pdf_path, voice, and output_format as arguments.

Notes:

  • This script provides a basic UI without actual PDF-to-voice conversion logic. You’ll need to integrate it with your existing conversion code.
  • tkinter is quite flexible, and you can expand this UI with additional features like progress bars, more complex settings, or better file handling as needed.

Code Modules

To build a Python script that takes input from the user for converting a PDF to voice, we can structure the code into several modules. Each module will handle a specific part of the process, such as reading the PDF, processing the text, converting it to speech, and playing or saving the audio. Let’s break it down:

1. PDF Reader Module

This module will handle the extraction of text from a given PDF file.

import PyPDF2

def read_pdf(file_path, page_num=0):
    """
    Read text from a specified page of a PDF file.
    
    :param file_path: Path to the PDF file
    :param page_num: Page number to extract text from (default is the first page)
    :return: Extracted text from the page
    """
    with open(file_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        page = pdf_reader.pages[page_num]
        text = page.extract_text()
    return text

2. Text Processing Module

This module will clean and format the extracted text for better speech synthesis.

import re

def clean_text(text):
    """
    Clean and format extracted text for TTS.
    
    :param text: Raw text extracted from PDF
    :return: Cleaned and formatted text
    """
    # Example: replace end-of-line hyphens with an empty string and remove extra spaces
    text = re.sub(r'-\n', '', text)
    text = re.sub(r'\s+', ' ', text)
    return text

3. Text-to-Speech Module

This module will use gTTS to convert text to speech.

from gtts import gTTS

def text_to_speech(text, lang='en', output_file='output.mp3'):
    """
    Convert text to speech and save as an audio file.
    
    :param text: Text to convert to speech
    :param lang: Language for TTS
    :param output_file: Filename to save the audio
    """
    tts = gTTS(text, lang=lang)
    tts.save(output_file)

4. Main Script

This is where you combine all the modules and create a script that takes user input.

def main():
    print("PDF to Voice Converter")
    file_path = input("Enter the path to the PDF file: ")
    page_num = int(input("Enter the page number to read (starting from 0): "))
    output_file = input("Enter the output audio file name (e.g., output.mp3): ")

    # Read and process PDF
    text = read_pdf(file_path, page_num)
    cleaned_text = clean_text(text)

    # Convert to speech
    text_to_speech(cleaned_text, output_file=output_file)

    print(f"Conversion completed. Audio saved as {output_file}")

if __name__ == "__main__":
    main()

Running the Script

  1. Execute the script, and it will prompt you for the path to a PDF file, the page number you want to read, and the name of the output audio file.
  2. The script reads and processes the specified page from the PDF, cleans up the text, and then uses gTTS to convert it into speech, saving the result as an MP3 file.

Dependencies

Make sure you have PyPDF2 and gTTS installed:

pip install PyPDF2 gtts

Notes

  • This script is a basic implementation. You can expand it to handle multiple pages, different languages, or more sophisticated text processing.
  • Error handling (e.g., for invalid file paths or page numbers) is minimal in this example and should be expanded for a robust application.

Code for PDF 2 VOICE with a UI (PDF2VFU)


import os
import tkinter as tk
from tkinter import filedialog, messagebox, ttk
from gtts import gTTS, gTTSError
import re
import PyPDF2
import pygame
from pydub import AudioSegment

output_mp3_path = ""  # Global variable to store the full path of the output MP3 file

def check_gtts_connectivity():
    try:
        # Attempt a small TTS conversion
        test_tts = gTTS("test", lang='en')
        test_tts.save("test.mp3")
        os.remove("test.mp3")  # Clean up the test file
        return True
    except gTTSError as e:
        print(f"gTTS connectivity check failed: {e}")
        return False

def read_pdf(file_path, page_num=0):

    with open(file_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        page = pdf_reader.pages[page_num]
        text = page.extract_text()
    return text

def clean_text(text):

    # Example: replace end-of-line hyphens with an empty string and remove extra spaces
    text = re.sub(r'-\n', '', text)
    text = re.sub(r'\s+', ' ', text)
    return text

def text_to_speech(text, lang='en', output_file='output.mp3'):

    tts = gTTS(text, lang=lang)
    tts.save(output_file)

def play_mp3():
    pygame.mixer.init()
    try:
        pygame.mixer.music.load(output_mp3_path.replace('/', os.sep).replace('\\', os.sep))
        pygame.mixer.music.play()
        stop_button.config(state=tk.NORMAL)  # Enable the stop button when playing
    except pygame.error as e:
        status_label.config(text=f"Error playing file: {e}")
    # You may want to handle the end of the playback or looping the playback as needed.

def stop_mp3():
    pygame.mixer.music.stop()
    stop_button.config(state=tk.DISABLED)  # Disable the stop button once stopped

def convert_mp3_to_wav(mp3_file_path):
    wav_file_path = mp3_file_path.replace('.mp3', '.wav')
    audio = AudioSegment.from_mp3(mp3_file_path)
    audio.export(wav_file_path, format="wav")
    return wav_file_path

def select_pdf():
    file_path = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
    file_path_entry.delete(0, tk.END)
    file_path_entry.insert(0, file_path)

def start_conversion():

    # Check gTTS connectivity first
    if not check_gtts_connectivity():
        status_label.config(text="gTTS connectivity check failed. Please check your internet connection.")
        return
            
    global output_mp3_path
    # Reset the status label for a new conversion
    status_label.config(text="Converting...")

    pdf_path = file_path_entry.get().strip()
    # Check if the PDF file path is empty
    if not pdf_path:
        status_label.config(text="Please select a PDF file.")
        return
    page_num = int(page_num_entry.get())
    language = lang_option.get()
    output_file_name = output_file_entry.get().strip()

    if not output_file_name:
        status_label.config(text="Please enter a name for the output file.")
        return

    # If no directory is specified in output_file_name, use the same directory as the PDF
    if not os.path.dirname(output_file_name):
        pdf_dir = os.path.dirname(pdf_path)
        base_name = os.path.splitext(os.path.basename(pdf_path))[0]
        output_mp3_path = os.path.join(pdf_dir, base_name + '.mp3')
    else:
        output_mp3_path = output_file_name

    # Call the PDF reading module
    text = read_pdf(pdf_path, page_num)
    cleaned_text = clean_text(text)

    # Call the TTS conversion module
    text_to_speech(cleaned_text, lang=language, output_file=output_file_name)
    
    output_mp3_path = output_file_name  # Update the path after successful creation

    # Update the status label
    if convert_to_wav_var.get() == 1:
        # Convert the MP3 to WAV
        wav_file_path = convert_mp3_to_wav(output_mp3_path)
        status_label.config(text=f"Conversion completed. MP3 and WAV saved as {output_mp3_path} and {wav_file_path}")
    else:
        status_label.config(text=f"Conversion completed. MP3 saved as {output_mp3_path}")
    play_button.config(state=tk.NORMAL)  # Enable the play button


def show_help():
    help_text = (
        "PDF to Voice Converter Help\n\n"
        "Select PDF: Click to choose a PDF file.\n\n"
        "Page Number: Enter the page number in the PDF you want to convert to voice (starting from 0).\n\n"
        "Language: Select the language for the text-to-speech conversion.\n\n"
        "Output File Name: Enter the name for the output audio file (default extension is .mp3).\n\n"
        "Convert to WAV: Tick to additionally convert the .mp3 to .wav \n\n"
        "Convert: Click to start the conversion process.\n\n"
        "Play MP3: Click to play the converted audio file.\n\n"
        "Stop MP3: Click to stop the play of the converted audio file.\n\n"
        "Note: Ensure you have an active internet connection for the conversion."
    )
    messagebox.showinfo("Help - PDF to Voice Converter", help_text)    

root = tk.Tk()
root.title("PDF to Voice Converter")

# PDF file selection
file_path_entry = ttk.Entry(root, width=40)
file_path_entry.grid(row=0, column=1)
ttk.Button(root, text="Select PDF", command=select_pdf).grid(row=0, column=2)

# Page number
ttk.Label(root, text="Page Number:").grid(row=1, column=0)
page_num_entry = ttk.Entry(root)
page_num_entry.grid(row=1, column=1)
page_num_entry.insert(0, '0')  # Set default value to 0

# Language selection
ttk.Label(root, text="Language:").grid(row=2, column=0)
lang_option = ttk.Combobox(root, values=["en", "es", "fr"])
lang_option.grid(row=2, column=1)
lang_option.current(0)

# Output file name
ttk.Label(root, text="Output File Name:").grid(row=3, column=0)
output_file_entry = ttk.Entry(root)
output_file_entry.grid(row=3, column=1)
output_file_entry.insert(0, 'output.mp3')  # Set default value to 'output.mp3'

# Checkbox for MP3 to WAV conversion
convert_to_wav_var = tk.IntVar()
convert_to_wav_checkbox = ttk.Checkbutton(root, text="Convert to WAV", variable=convert_to_wav_var)
convert_to_wav_checkbox.grid(row=4, column=1, pady=5)

# Start conversion button
ttk.Button(root, text="Convert", command=start_conversion).grid(row=5, column=1)

# Status label for updates
status_label = ttk.Label(root, text="")
status_label.grid(row=6, column=0, columnspan=2)

# Button to play the MP3 file
play_button = ttk.Button(root, text="Play MP3", command=play_mp3, state=tk.DISABLED)
play_button.grid(row=7, column=1, pady=5)

# Stop button for stopping the MP3 playback
stop_button = ttk.Button(root, text="Stop MP3", command=stop_mp3, state=tk.DISABLED)
stop_button.grid(row=8, column=1, pady=5)

# Help button
help_button = ttk.Button(root, text="Help", command=show_help)
help_button.grid(row=9, column=1, pady=5)

root.mainloop()

Summary

This Tkinter-based Python application is designed for converting text from a PDF file to speech and saving the output as an audio file.

Here are the main components and functionalities of the code:

  1. PDF Selection and Validation:
    • A field where the user can input or select the path to a PDF file.
    • Validation to ensure a PDF file is selected before proceeding.
  2. Page Number Input:
    • An input field for specifying the page number in the PDF to be converted to speech. It defaults to ‘0’ (the first page).
  3. Language Selection:
    • A dropdown menu allowing the user to select the language for the text-to-speech conversion.
  4. Output File Specification:
    • An entry field for specifying the name of the output audio file, with a default value of ‘output.mp3’.
    • Validation to ensure an output file name is provided.
  5. MP3 to WAV Conversion Option:
    • A checkbox giving the user the option to convert the MP3 output file to a WAV file.
  6. Conversion and Playback Controls:
    • A “Convert” button that starts the conversion process using gTTS (Google Text-to-Speech).
    • Once the MP3 file is created, a “Play” button becomes active, allowing the user to play the audio.
    • A “Stop” button to stop the audio playback.
    • After conversion, if the user selected the option, the MP3 file is also converted to WAV format using pydub.
  7. Help and Status Information:
    • A “Help” button displays instructions and information about using the application.
    • A status label updates the user about the current process or any errors.
  8. Core Functionalities:
    • read_pdf: Extracts text from the specified page of the selected PDF.
    • clean_text: Cleans and formats the extracted text.
    • text_to_speech: Converts the cleaned text to speech and saves it as an MP3 file.
    • convert_mp3_to_wav (if applicable): Converts the MP3 file to a WAV file.
    • play_mp3: Plays the audio file using pygame.
    • stop_mp3: Stops the audio playback.
  9. Error Handling and Connectivity Check:
    • Checks and handles errors related to file paths, gTTS connectivity, and audio playback.
    • The application ensures that all necessary conditions (like file existence and internet connectivity for gTTS) are met before proceeding with each step.

This application provides a user-friendly interface for converting PDF text to audio, making it accessible for users to generate audio files from PDF documents. It includes features for customizing the conversion process, such as selecting the language, choosing the output format, and playing back the converted audio.