added format check for method read_data in rawread #70

Charliechen1 · 2018-07-11T07:58:25Z

The python audioop.lin2lin will complain if the length of data can not be divided by old_width, and it's not that convenient to check the length of the audio before using the model, especially when a large batches of audio files are used in some machine learning tasks. Therefore, I have made some patch for the input audio data if the length is not to the satisfaction. Thank you for taking my suggestion into consideration, and the project is truly intensive for me. 👍

sampsyo

Interesting! Thank you for the contribution—this looks promising.

Is it possible to provide a sample audio file where this fix is necessary? It would be great to have that documented here so we can come back to it if something goes wrong in the future.

I'd also love to hear more detail about why padding with the byte 0xFF is the right thing. I admit I'm not too knowledgable about the sample format, but I would have guessed that 0x00 (null) would have been the right padding byte.

Finally, I made some nitpicky Python style comments inline.

Thanks again!

sampsyo · 2018-07-11T13:29:08Z

audioread/rawread.py

+
+            remainder = len(data) % old_width 
+            if remainder != 0 :
+                data = data + PATCH_BYTE*(old_width-remainder)


Here are some very low-level style (i.e., PEP8) comments:

Please remove the whitespace on the blank line.

Please remove the space between the 0 and the : in the if statement.

Please add spaces around the binary operators * and -.

sampsyo · 2018-07-11T13:29:42Z

audioread/rawread.py

@@ -23,6 +23,7 @@

 # Produce two-byte (16-bit) output samples.
 TARGET_WIDTH = 2
+PATCH_BYTE = b'\xff'


The comment above doesn't apply to this constant. So please add a blank line above it and, ideally, write a brief sentence explaining what this is useful for.

Really thanks to your advice, I am a fresh graduate, so there must be lots of things to learn lol. The suggestions are quite helpful to me.

Charliechen1 · 2018-07-12T03:14:47Z

Here for your reference:
I print the data and get:
b'\xff\xff\xff\xff\xfe\xff\xfc\xff\xfc\xff\xfc'
And I figured out that it's due to a broken download.
Therefore, would it be better to raise a warning under this circumstance?

sampsyo · 2018-07-12T13:30:17Z

Hmm; perhaps! But on the other hand, another reasonable (silent) fix might be to round down instead of up—that is, to drop the last (partial) sample if it exists. Would that make sense to you?

Charliechen1 · 2018-07-13T04:23:09Z

It should works.

sampsyo · 2018-07-13T12:14:49Z

OK, great! Want to give it a try and see if it works on the file you have?

Charliechen1 · 2018-07-16T09:11:38Z

Sure~

added format check for method read_data in rawread

0fb2adf

sampsyo reviewed Jul 11, 2018

View reviewed changes

some format modification and change patching to warning

39aa139

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added format check for method read_data in rawread #70

added format check for method read_data in rawread #70

Charliechen1 commented Jul 11, 2018

sampsyo left a comment

sampsyo Jul 11, 2018

sampsyo Jul 11, 2018

Charliechen1 Jul 12, 2018

Charliechen1 commented Jul 12, 2018

sampsyo commented Jul 12, 2018

Charliechen1 commented Jul 13, 2018

sampsyo commented Jul 13, 2018

Charliechen1 commented Jul 16, 2018

added format check for method read_data in rawread #70

Are you sure you want to change the base?

added format check for method read_data in rawread #70

Conversation

Charliechen1 commented Jul 11, 2018

sampsyo left a comment

Choose a reason for hiding this comment

sampsyo Jul 11, 2018

Choose a reason for hiding this comment

sampsyo Jul 11, 2018

Choose a reason for hiding this comment

Charliechen1 Jul 12, 2018

Choose a reason for hiding this comment

Charliechen1 commented Jul 12, 2018

sampsyo commented Jul 12, 2018

Charliechen1 commented Jul 13, 2018

sampsyo commented Jul 13, 2018

Charliechen1 commented Jul 16, 2018