Sorry I had to miss it. I had another meeting (which turned out to be boring, I probably should have ditched it for NMGLUG). Robert Citek writes:
Here are some of the links that Mark mentioned: [ ... ] https://github.com/unoconv/unoconv # deprecated, see unoserver https://github.com/unoconv/unoserver/
Why is unoconv deprecated? I read the "Comparison with unoconv" section at the bottom of the unoserver README, but I'm still not really clear why it's needed (for the user; I do get why a clean rewrite is better for the maintainer). I use unoconv a lot, and I haven't (knowingly) hit any of those problems. The only problem I've seen is that, being LibreOffice, it's slow, and unoconv's default timeout isn't long enough on some processors, so I tend to run it with -T 10 so it will wait up to 10 seconds for LO to start up. For people who need to do a lot of Word-to-HTML conversions, consider mammoth (a Python module that can be run as a command as well as used in a program). It's a different approach from unoconv: instead of producing horrible unmaintainable HTML that tries to mimic every style of the Word document, it produces clean, semantic HTML with tags like <em> and <strong>. https://github.com/mwilliamson/python-mammoth I use both mammoth and unoconv. For one-time conversions where I want to preserve the formatting as much as possible, including text colors, I use unoconv. But when someone sends me content for a web page that I'm going to have to maintain for years, or if I need to parse the page to use the contents in some other way, mammoth produces much better output. Mammoth only understands .docx, not .doc, so for .doc files I first use unoconv to convert doc to docx, then run mammoth on the docx. It's worth the extra step to get the clean mammoth output. There's also wvHtml, but I haven't used that in a while, and can't remember exactly why I stopped using it. ...Akkana