tiff header

Started by Blaumeise00, March 29, 2023, 10:28:02 AM

Previous topic - Next topic

Blaumeise00

Hello,

i am wondering if anyone has an example jpeg image where the value offset of the tiff header is not 8?

i am programming a php script that reads jpeg images and i do my offset count differently than Phil. I think that Phil reads a file adress beginning at the same place. I begin within the e1 tag after the tiff header. Thus, i would have to also implement a file offset counter in order to find an ifd offset within the file and not within e1.

if noone has a sample image, then is anyone able to confirm that the address is always within the e1 tag?

Best wishes.

Phil Harvey

Various tiff-based files routinely use a different offset for IFD0.  I don't know if I have any baseline tiff examples of this, but I'm sure they exist.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Blaumeise00

Hi Mr. Harvery,

I actually found an image of mine with an address not starting at 08000000! I have an older microscope that makes jpeg photos and the address begins further away from the tiff header. My code is reading the address since i have it coded to read only at the matching address. So this is a good learning experience. The address is still within the E1 App1 marker. I don't think that the address should be somewhere else in the file. That would be silly and i think it should be invalid but whatever. Atleast i have experience with a different address.

Meantime, this image has a red notice next to the orientation tag: IDF0-04 Orientation (seq)
what is this red (seq)? is this a flag for erroneous data or what? the tag appears normal to me.
12 01 03 00 01 00 00 00 01 00 00 00

it is twelve bytes contains a type and component byte plus a value. Not sure what the red (seq) is about.

also, the image contains another red (seq) flag and it is strange:
00 00 04 00 01 00 00 00 00 00 00 00
it appears that this IFD0 entry has no marker (00 00). Is this common? i have no idea why a null marker would exist.

the htmldump command issues a warning about Unrecognized Maker Notes. I guess that i would need to decode these notes for the benfit of others or maybe someone has already documented these maker notes.

Phil Harvey

Quote from: Blaumeise00 on April 02, 2023, 07:23:08 AMI actually found an image of mine with an address not starting at 08000000!

This is most likely a byte-ordering problem.

Quotewhat is this red (seq)?

See the HtmlDump documentation.

Quotealso, the image contains another red (seq) flag and it is strange:
00 00 04 00 01 00 00 00 00 00 00 00
it appears that this IFD0 entry has no marker (00 00). Is this common? i have no idea why a null marker would exist.

Do you mean the null next-IFD pointer at the end of the IFD?  This isn't used in some maker notes.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Blaumeise00

Quote from: Phil Harvey on April 02, 2023, 07:50:31 AMThis is most likely a byte-ordering problem.

you are a programmer, i am not a programmer. I imagine that you mean the BOM and this doesn't make sense to me. The adress begins at 0802 not 0800.I wrote code that reads e1 after the tiff header. Then i read the tiff specification and it is mentioned that the ifd can begin anywhere in the image. Thus, i wanted an example image that has ifd not beginning at 8. The image from my microscope does not begin at 8. I think that this is normal according to the tiff specification. why is it a byte order problem to you?

Quotealso, the image contains another red (seq) flag and it is strange:
00 00 04 00 01 00 00 00 00 00 00 00
it appears that this IFD0 entry has no marker (00 00). Is this common? i have no idea why a null marker would exist.

Do you mean the null next-IFD pointer at the end of the IFD?  This isn't used in some maker notes.

- Phil
[/quote]

oh lord, my terminology is a bit different because i am not a programmer. I do not have programmin friends and i am not in any priveleged programming circle. I'm more of a hacker than a programmer. I get things to work without understanding them (hacking versus educated application). I call the ifd after ifd0 as ifd0-data not next ifd. I know that the address follows or it is null if the next ifd is the last ifd. The tag that i mention is within the ifd entries of ifd0. I'll just attach a photo. I have outlined the areas that i have mentioned in my post: ifd0 address, orientation tag with red (seq) and the null tag ifd0 entry.

hopefully, you can make sense out of my post? the image should help.

Blaumeise00

HtmlDump documentation.

I see: the orientation tag is out of (seq)uence because it is 12 01 coming after 31 01. Makes sense also for the null tag but a null tag doesn't make sense. I am not sure why the IFD0-12 tag exists (see my attached photo in the last post).

Thank you for clarifying the sequence question. Much appreciated.

StarGeek

Quote from: Blaumeise00 on April 02, 2023, 08:16:16 AM
Quote from: Phil Harvey on April 02, 2023, 07:50:31 AMThis is most likely a byte-ordering problem.

you are a programmer, i am not a programmer. I imagine that you mean the BOM and this doesn't make sense to me.

I don't believe it's a BOM, as that's for text files, afaik.

See Endianness (Wikipedia).  Byte order can be big-endian (BE) or little-endian (LE).  That describes the ordering of the bytes.  Big-endian is basically human readable, left to right, while little-endian is reversed.  See this section on that wikipedia page (ignoring the middle-endian stuff, never heard of that before).
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Blaumeise00

4949 and 4d4d marks the byte order, so i believe that this is a BOM. B0 = byte order and i consider it to be a mark. I'm not sure why this will be a byte order but not a byte order mark. Very confusing.

I understand le and be but i do not know what middle endian is either. I just saw this term last night but i ignored it :-)

The orginial question revolves around the following:
IFD-0 offset is supposed to occur after the byte order (mark?) of a tiff header.
so, for example, using le:
49492a00 08000000

since i am not a programmer, i coded my php script to begin processing IFD0 after reading the tiff header and the offset to IFD-0. Later, I read in the tiff 6 specification that IFDs could be found anywhere in a tiff image. I wondered 'could this be true for jpeg? so the address 08000000 could be something else?'

i posted asking if anyone has an example jpeg with an ifd0 offset other than 08000000, so that i can change my code to find ifd-0 no matter where it occurs in the image. I found a photo on my harddrive from my old microscope that has an address other than 08000000. I am happy to find an example. Mr. Harvey (i do not know him, so calling him Phil might be viewed negatively) mentioned a byte order problem. I have no idea what he is referring to. The offset being other than 08000000 is not a problem and it is not a byte order error. So i am asking what he means by the offset being a byte order problem (if that is what he is referring to.)

I see a bit of data before the ifd0 offset. I do not yet know what is contained in the data. I have to figure out how to decode it. Anyway, maybe Mr. Harvey is referring to the out of sequence questions as byte order problems.

Phil Harvey

Quote from: Blaumeise00 on April 02, 2023, 11:45:56 AMMr. Harvey (i do not know him, so calling him Phil might be viewed negatively) mentioned a byte order problem. I have no idea what he is referring to. The offset being other than 08000000 is not a problem and it is not a byte order error.

Call me Phil.

Sorry.  I misread your post.  I read "an address starting at 08000000" (ie. missed reading the "not").  I thought you meant the integer value was 0x08000000 and not 0x08, which would be a byte ordering problem.  But I see that's not what you meant.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Blaumeise00

I am usually mentally tired at the end of each day, so i probably did not express myself clearly. My apologies for the confusing text.

I was searching for some data about the arithmetic coding temp tag 0x01 and i came across another albeit older exif tool:
http://maazl.de/project/misc/exiftool.html

I am currently rewriting my code to produce a beta version of my php scanner. I hope that i've done enough research to make it useable for its intended purpose. I have spent alot of time on this project. I did not realize how difficult it can be to read a jpeg and sort out the data.

something that bothers me about jpeg since i began my jpeg research:
the marker d9 is referred to as EOI or end of image marker. (you know this but i am summarizing for my next point.)

a non-programmer, such as myself, interprets this info as an image begins with d8 SOI and ends with d9 EOI. This is false and misleading. So many websites fail to notice that d9 occurs after any image in a jpeg image, such as thumbnails. So checking for d9 is quite a task whenever 50 of them are scattered throughout a jpeg, thus it becomes quite clear that EOI is a misnomer and the data written about d9 is misleading. JPEG file formats are messy. I'd like to see a new JPEG format where only one EOI occurs and it is only at the end of the JPEG. I am beginning to hate this unstructured, unorganized illogical file format. I think that i'll switch to png because of this project. Honestly, i think that jpeg format was imagined by a juvenile and not some degree bearing intelligent programmer. I really cannot fathom the appellation EOI at this point. And i thought that i am stupid. good lord!

anyway, Thanks for all of the help along the way. I have learned alot.

Phil Harvey

There is only one EOI in a JPEG file.  Other ff d9 sequences may appear in the contained data, but are not EOI's for the file.  You need to properly parse the file to skip over other data -- you can't just brute-force scan for the byte sequence ff d9.  The same is true for just about every data structure in every type of file (except for mp3, in which brute force scans are allowed and other data must be unsynchronized to prevent false matches).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Blaumeise00

to quotea jpeg decoding team: "this is bad software engineering". I agree. JPEG is horrible.

i have a new prescan for markers code and i am getting all of the markers corrctly plus trailing data. I say prescan because i am not reading/parsing the segments yet. I am only prescanning the jpeg for markers to give my parser vision of the file before processing (like turning a light on before entering a dark room in order to avoid disasters such as tripping.)

so i have this image and my prescanner has the following markers:
c4 c4 da c4 da c4 da c4 da da c4 da c4 da c4 da d9

the usual jpeg has four c4 markers and one da:
c4 c4 c4 c4 da d9

why is the image with multiple c4 and da markers valid? i have only seen this before in a corrupted jpeg photo. so the multiple c4 da markers is valid? i ask because calculating the size of da will require extra code to detect this multi c4 da pattern.

the exif tool html dump shows the same data (multiple c4 da markers) and the exif tool also does not calculate a single da size since the image has multiple da markers.

i'd like to add the size of all of the da markers for a single da size but if the multiple c4 da markers is invalid, then i do not have to deal with this matter perse.

any tips is most helpful.

Blaumeise00

i see: progressive jpeg. well, that complicates the matter a bit but i will just maintain an array of da encounters and keep the size as values in the array.

i wanted to make a watermarked image and a non watermarked image to see what changes in the jpeg data, then i noticed that my progressive jpegs have multiple c4 da markers. I didn't know that progressive jpegs store the data in such a manner. I've learned something new...